175
pages
English
Documents
2009
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
175
pages
English
Documents
2009
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
Publié par
Publié le
01 janvier 2009
Nombre de lectures
18
Langue
English
Poids de l'ouvrage
10 Mo
Dissertation
TOPIC MODELS FOR IMAGE RETRIEVAL ON
LARGE-SCALE DATABASES
Eva Hörster
Department of Computer Science
University of AugsburgAdviser: Prof. Dr. Rainer Lienhart
Readers: Prof. Dr. Rainer Lienhart
Prof. Dr. Bernhard Möller
Prof. Dr. Wolfgang Effelsberg
Thesis Defense: July 14, 2009Abstract
With the explosion of the number of images in personal and on-line collections, efficient tech-
niques for navigating, indexing, labeling and searching images become more and more impor-
tant. In this work we will rely on the image content as the main source of information to retrieve
images. We study the representation of images by topic models in its various aspects and ex-
tend the current models. Starting from a bag-of-visual-words image description based on local
image features, images representations are learned in an unsupervised fashion and each image
is modeled as a mixture of topics/object parts depicted in the image. Thus topic models allow
us to automatically extract high-level image content descriptions which in turn can be used to
find similar images. Further, the typically low-dimensional topic-model-based representation
enables efficient and fast search, especially in very large databases.
In this thesis we present a complete image retrieval system based on topic models and evaluate
the suitability of different types of topic models for the task of large-scale retrieval on real-world
databases. Different similarity measure are evaluated in a retrieval-by-example task.
Next, we focus on the incorporation of different types of local image features in the topic mod-
els. For this, we first evaluate which types of feature detectors and descriptors are appropriate
to model the images, then we propose and explore models that fuse multiple types of local
features. All basic topic models require the quantization of the otherwise high-dimensional
continuous local feature vectors into a finite, discrete vocabulary to enable the bag-of-words
image representation the topic models are built on. As it is not clear how to optimally quantize
the high-dimensional features, we introduce different extensions to a basic topic model which
model the visual vocabulary continuously, making the quantization step obsolete.
On-line image repositories of the Web 2.0 often store additional information about the images
besides their pixel values, called metadata, such as associated tags, date of creation, ownership
and camera parameters. In this work we also investigate how to include such cues in our retrieval
system. We present work in progress on (hierarchical) models which fuse features from multiple
modalities.
Finally, we present an approach to find the most relevant images, i.e., very representative im-
ages, in a large web-scale collection given a query term. Our unsupervised approach ranks
highest the image whose image content and its various metadata types gives us the highest
probability according to a the model we automatically build for this tag.
Throughout this thesis, the suitability of all proposed models and approaches is demonstrated
by user studies on a real-world, large-scale database in the context of image retrieval tasks. We
use databases consisting of more than 240,000 images which have been downloaded from the
public Flickr repository.Contents
1. Introduction 1
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1. Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2. Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3. Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4. Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2. Topic Models 9
2.1. Latent Semantic Analysis (LSA) . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2. probabilistic Latent Semantic Analysis (pLSA) . . . . . . . . . . . . . . . . . 12
2.3. Latent Dirichlet Allocation (LDA) . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4. Correlated Topic Model (CTM) . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3. Topic-Model-Based Image Retrieval 19
3.1. Retrieval System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2. Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1. Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.2. Local Feature Descriptors . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3. Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4. Different Similarity Measures . . . . . . . . . . . . . . . . . . . . . . 29
3.3.5. Different Types of Probabilistic Topic Models . . . . . . . . . . . . . . 32
3.3.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4. SVM-based Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.1. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4. Visual Features and their Fusion 41
4.1. Feature Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.1. Local Region Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iContents
4.1.2. Local Feature Descriptors . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.3. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2. Fusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1. Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2. Image Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.3. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5. Continuous Vocabulary Models 71
5.1. Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.1.1. pLSA with Shared Gaussian Words (SGW-pLSA) . . . . . . . . . . . . 72
5.1.2. pLSA with Fixed Shared Gaussian Words (FSGW-pLSA) . . . . . . . 74
5.1.3. pLSA with Gaussian Mixtures (GM-pLSA) . . . . . . . . . . . . . . . 75
5.2. Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.1. SGW-pLSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.2. FSGW-pLSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.3. GM-pLSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3.1. Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3.2. Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6. Deep-Network-Based Image Retrieval 89
6.1. Deep Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2. Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.1. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7. Models for Metadata Fusion 97
7.1. Metadata Fusion via Concatenating Topic Vectors . . . . . . . . . . . . . . . . 98
7.2. Metadata Fusion via Multilayer Multimodal pLSA (mm-pLSA) . . . . . . . . . 98
7.2.1. Training and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2.2. Fast Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.3. Metadata Fusion via Deep Networks . . . . . . . . . . . . . . . . . . . . . . . 103
7.4. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.4.1. Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.4.2. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.4.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
iiContents
7.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8. Image Ranking 113
8.1. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.1.1. Visual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.1.2. Tag Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.1.3. Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.2. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.2.1. Visual Feature Implementation . . . . . . . . . . . . . . . . . . . .