8
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
8
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
Publié par
Langue
English
A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms
Roberto Tron ReneV´ idal
Center for Imaging Science, Johns Hopkins University
308B Clark Hall, 3400 N. Charles St., Baltimore MD 21218, USA
http://www.vision.jhu.edu
Abstract have been developed (see§2 for a brief review). How-
ever, all existing techniques have been typically evalu-
Over the past few years, several methods for segment- ated on a handful of sequences, with limited compari-
ing a scene containing multiple rigidly moving objects have son against other methods. This motivates a study on
been proposed. However, most existing methods have been the real performances of these methods.
tested on a handful of sequences only, and each method has
2. Perspective methods assume a perspective projectionbeen often tested on a different set of sequences. Therefore,
model. In this case, point trajectories associated withthe comparison of different methods has been fairly limited.
each moving object lie in a multilinear variety (bilinearIn this paper, we compare four 3-D motion segmentation al-
for two views, trilinear for three views, etc.) There-gorithms for affine cameras on a benchmark of 155 motion
fore, motion segmentation is equivalent to clusteringsequences of checkerboard, traffic, and articulated scenes.
these multilinear varieties. Because this problem is
nontrivial, most prior work has been limited to alge-
1. Introduction braic methods for factorizing bilinear and trilinear va-
rieties (see e.g. [18, 7]) and statistical methods for two
Motion segmentation is a very important pre-processing
[15] and multiple [13] views. At present, the evalua-
step for several applications in computer vision, such as
tion of perspective methods is still far behind that of
surveillance, tracking, action recognition, etc. During the
affine methods. It is arguable that perspective methods
nineties, these applications motivated the development of
still need to be significantly improved, before a mean-
several 2-D motion segmentation techniques. Such tech-
ingful evaluation and comparison can be made.
niques aimed to separate each frame of a video sequence
into different regions of coherent 2-D motion (optical flow). In this paper, we present a benchmark and a compari-
For example, a video of a rigid scene seen by a moving cam- son of 3-D motion segmentation algorithms. We choose to
era could be segmented into multiple 2-D motions, because compare only affine methods, not only because the affine
of depth discontinuities, occlusions, perspective effects, etc. case is better understood, but also because affine meth-
However, in several applications the scene may contain
ods are at present better developed than their perspective
several moving objects, and one may need to identify each
counterparts. We compare four state-of-the-art algorithms,
object as a coherent entity. In such cases, the segmentation
GPCA [16], Local Subspace Affinity (LSA) [21], Multi-
task must be performed based on the assumption of several
StageLearning(MSL)[14]andRANSAC[4], onadatabase
motions in 3-D space, not simply in 2-D. This has motivated
of 155 motion sequences. The database includes 104 in-
several works on 3-D motion segmentation during the last
door checkerboard sequences, 38 outdoor traffic sequences,
decade, which can be roughly separated into two categories:
and 13 articulated/non-rigid sequences, all with two or three
1. Affine methods assume an affine projection model, motions. Our experiments show that LSA is the most accu-
which generalizes orthographic, weak-perspective and rate method, with average classification errors of 3.45% for
paraperspective projection. Under the affine model, twomotions and 9.73% for threemotions. However, for two
point trajectories associated with each moving object motions, GPCA and RANSAC are faster and have a limited
across multiple frames lie in a linear subspace of di- 1%-2% drop in accuracy. More importantly, the results vary
mension at most 4. Therefore, 3-D motion segmen- depending on the type of sequences: LSA is more accurate
tation can be achieved by clustering point trajectories for checkerboard sequences, while GPCA is more
into different motion subspaces. At present, several al- for traffic and articulated scenes. The MSL algorithm is of-
gebraic and statistical methods for performing this task ten very accurate, but significantly slower.
2F×P
i2. Multibody Motion Segmentation Problem where the columns of W ∈ R are the P trajecto-i i
n
ries associated with the ith moving object, P = P , andIn this section, we review the geometry of the 3-D i
i=1
motion segmentation problem from multiple affine views
P×P
Γ ∈R is an unknown matrix permuting the P trajec-
and show that it is equivalent to clustering multiple low-
tories according to the n motions. SinceW can be factorizedi
dimensional linear subspaces of a high-dimensional space.
2F×d P ×dˆ i ˆ i iinto matricesM ∈R andS ∈R asi i
2.1. Motion Subspace of a Rigid-Body Motion ˆ ˆ
W = M S i=1,...,n, (4)i i i
f=1,...,F
2Let {x ∈ R } be the projections of P 3-Dfp p=1,...,P the matrix associated with all the objects can be factorized
P P
3 P n n
2F× d P× dpoints{X ∈P } lying on a rigidly moving object onto i ip i=1 i=1p=1 into matricesM∈R andS∈R as
F frames of a rigidly moving camera. Under the affine pro-
2F×P
W = W ,W ,··· ,W Γ∈R
1 2 njection model, which generalizes orthographic, weak per-
⎡ ⎤
spective, and paraperspective projection, the images satisfy ˆ
S
1
the equation ⎢ ⎥
ˆ
⎢ S ⎥
2
x = A X , (1) ˆ ˆ ˆ ⎢ ⎥fp f p = M ,M ,··· ,M Γ (5)
1 2 n
⎢ . ⎥.
⎡ ⎤
⎣ . ⎦
1000
R tf f
2×4 ˆ
S
⎣ ⎦where A = K 0100 ∈ R is the nf f
0 1
0001 = MS Γ.
affine camera matrix at frame f, which depends on the cam-
2×3 It follows that one possible way of solving the motionera calibration parameters K ∈ R and the object posef
segmentation problem is to find a permutation matrix Γ,relative to the camera (R ,t )∈ SE(3).f f
2F×P such that the matrix WΓ can be decomposed into a mo-LetW ∈R be the matrix whose P columns are the
1
P tion matrix M and a block diagonal structure matrix S.Thisimage point trajectories{x } . It follows from (1) thatfp p=1
idea has been the basis for most existing motion segmenta-
2F×4
W can be decomposed into a motion matrix M ∈ R
1 1
tionalgorithms[1,3,5,8,10,11,19].However,asshownP×4and a structure matrixS ∈R as
1
in [10], in order forW to factor according to (5), the motion
2F nsubspaces {W ⊂ R } must be independent, that is,
W = M S i
1 1 1 i=1
⎡ ⎤ ⎡ ⎤ for all i = j =1,...,n,wemusthavedim(W ∩W)=0,i j
x ···x A
11 1P 1
i
(2) so that rank(W)= d , where d =dim(W ).
⎢ ⎥ ⎢ ⎥ i i i. . . i=1. . = . X ···X ,
⎣ ⎦ ⎣ ⎦
1 P. . .
4×P Unfortunately, most practical motion sequences exhibit
x ···x AF1 FP F partially dependent motions, i.e. there are i,j∈{1,...,n}
2F×P 2F×4
such that 0 < dim(W ∩W) < min{d ,d}. For exam-i j i j
hence rank(W ) ≤ 4. Note also that the rows of each A
1 f ple, when two objects have the same rotational but different
involve linear combinations of the first two rows of the ro- translational motion relative to the camera [14], or for artic-
tation matrix R , hence rank(W ) ≥ rank(A )=2. There-f 1 f ulated motions [20]. This has motivated the development of
fore, under the affine projection model, the 2-D trajecto- several algorithms for dealing with partially dependent mo-
ries of a set of 3-D points seen by a rigidly moving camera tions, including statistical methods [6, 14], spectral meth-
2F(the columns ofW ) live in a subspace ofR of dimension
1 ods [21, 22] and algebraic methods [16]. We review some
d = rank(W )=2, 3 or 4.
1 1 of these methods in the next section.
2.2. Segmentation of Multiple Rigid-Body Motions
3. Multibody Motion Segmentation Algorithms
PAssume now that the P trajectories {x } corre-fp p=1 3.1. Generalized PCA (GPCA) [17, 16]
spond to n objects undergoing n rigid-body motions relative
to a moving camera. The 3-D motion segmentation problem Principal Component Analysis (GPCA) is
is the task of clustering these P trajectories according to an algebraic method for clustering data lying in multiple
the n moving objects. Since the trajectories associated with subspaces proposed by Vidal et al. [17]. The main idea be-
2Feach object span a d -dimensional linear subspace ofR , hind GPCA is that one can fit a union of n subspaces with ai
the 3-D motion segmentation problem is equivalent to clus- set of polynomials of degree n, whose derivatives at a point
2Ftering a set of points into n subspaces ofR of unknown give a vector normal to the subspace containing that point.
dimensions d ∈{2,3,4} for i=1,...,n.i The segmentation of the data is then obtained by grouping
Notice that the data matrix can be written as these normal vectors, which can be done using several tech-
niques. In the context of motion segmentation, GPCA op-
2F×P
W = W ,W ,··· ,W Γ∈R , (3)
1 2 n erates as follows [16]:1. Projection: Project the trajectories onto a subspace of 3.2. Local Subspace Affinity (LSA) [21]
2F
R of dimension 5 t