### Understanding Structure-from-Motion (SfM) - VCC

Understanding Structure-from-Motion (SfM)
Dr. Neil Smith, VCC 2014
1. Feature Detection
2. Feature Matching
3. Bundle Adjustment (Camera Pose and Intrinsics)
4. Multi-View Stereo Dense Reconstruction (Common Final
Output)
Following slides adapted from Computer Vision courses taught at Cornell
(Snavely) and University of Washington (Szelinski).
Richard Szeliski, Computer Vision: Algorithms and Applications, 2010
Feature Detection
What Features might we choose to find correspondences?
From Szleski 2010: fig. 4.2
Feature Detection
Which of the three sample windows would have the most
distinguishable features for matching?
Which of the three sample windows would have the most
distinguishable features?
From Szleski 2010: fig. 4.2
Feature Detection
Patch A: Texture-less, nothing to match
Patch B: Gradient Change (Border between snow and sky)
-Can only match the curves, ambiguity over which feature is
which. Called the Aperture Problem
From Szleski 2010: fig. 4.3
Feature detection
Local measure of feature uniqueness
• How does the window change when you shift it?
• Shifting the window in any direction causes a big change
“flat” region:
no change in all
directions
“edge”:
no change along
the edge direction
“corner”:
significant change
in all directions
Slide adapted from Darya Frolova, Denis Simakov, Weizmann Institute.
Feature detection: the math
Consider shifting the window W by (u,v)
• how do the pixels in W change?
• compare each pixel before and after by
summing up the squared differences (SSD)
• Compute an Auto-correlation surface
• this defines an SSD “error” of E(u,v):
I is Image
W is Window
x,y Window position
(u; v) Displacement Vector
W
Feature detection: the math
Consider shifting the window W by (u,v)
• how do the pixels in W change?
• compare each pixel before and after by
summing up the squared differences (SSD)
• Compute an Auto-correlation surface
• this defines an SSD “error” of E(u,v):
W
The second moment matrix
The surface E(u,v) is locally approximated by a quadratic
form.
Corner detection: the math
xmin
xmax
Eigenvalues and eigenvectors of H
• Define shift directions with the smallest and largest change in
error
• xmax = direction of largest increase in E
• λmax = amount of increase in direction xmax
• xmin = direction of smallest increase in E
• λmin = amount of increase in direction xmin
Feature detection summary
Here’s what you do
•
•
•
•
•
Compute the gradient at each point in the image
Create the H matrix from the entries in the gradient
Compute the eigenvalues.
Find points with large response (λ- > threshold)
Choose those points where λ- is a local maximum as features
The Harris operator
λ- is a variant of the “Harris operator” for feature detection
•
•
•
•
•
•
•
The trace is the sum of the diagonals, i.e., trace(H) = h11 + h22
Very similar to λ- but less expensive (no square root)
Called the “Harris Corner Detector” or “Harris Operator”
Emphasizes the large gradients in all directions
Remains orientation invariant
Lots of other detectors, this is one of the most popular
Major Problem: Very Sensitive to Changes in Image Scale
Difference of Gaussian (DoG)
• An approximation of the Laplacian of Gaussian which has a strong
positive responses for dark blobs (extrema/features)
• LOG/DOG Blob Detector: Finds points regarded as a bright (dark) blob
if the value at this point is greater (smaller) than the value in all its 26
neighbours. Identifies Extrema.
• The subtraction of one blurred version of an original image from
another, less blurred version of the original
• convolves the original grayscale images with Gaussian kernels having
differing standard deviations
Invariance
Suppose we are comparing two images I1 and I2
• I2 may be a transformed version of I1
• What kinds of transformations are we likely to encounter in
practice?
We’d like to find the same features regardless of the
transformation
• This is called transformational invariance
• Most feature methods are designed to be invariant to
– Translation, 2D rotation, scale
• They can usually also handle
– Limited 3D rotations (SIFT works up to about 60 degrees)
– Limited affine transformations (some are fully affine invariant)
– Limited illumination/contrast changes
How to achieve invariance
Need both of the following:
1. Make sure your detector is invariant
• Harris is invariant to translation and rotation
• Scale is trickier
– common approach is to detect features at many scales using a
Gaussian pyramid (e.g., MOPS)
– More sophisticated methods find “the best scale” to represent
each feature (e.g., SIFT)
2. Design an invariant feature descriptor
• A descriptor captures the information in a region around the
detected feature point
• The simplest descriptor: a square window of pixels
– What’s this invariant to?
• Let’s look at some better approaches…
Scale Invariant Feature Transform
Search for DoG features (Extrema) that remain stable across scales
Downsample and compare
From Lowe 2004
Scale Invariant Feature Transform
Orientation Assignment:
•
•
•
•
Take 16x16 square window around detected feature
Compute edge orientation (angle of the gradient - 90°) for each pixel
Throw out weak edges (threshold gradient magnitude)
Create histogram of surviving edge orientations
2π
0
angle histogram
Adapted from slide by David Lowe
SIFT descriptor
Full version
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Adapted from slide by David Lowe
Properties of SIFT
Extraordinarily robust matching technique
• Can handle changes in viewpoint
– Up to about 60 degree out of plane rotation
• Can handle significant changes in illumination
– Sometimes even day vs. night (below)
• Fast and efficient—can run in real time
• Lots of code available
–
Feature descriptors
We know how to detect good points
Next question: How to match them?
?
Feature descriptors
We know how to detect good points
Next question: How to match them?
?
Lots of possibilities (this is a popular research area)
• Simple option: match square windows around the point
• State of the art approach: SIFT
– David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/
Image Matching
Feature matching
Given a feature in I1, how to find the best match in I2?
Need a Matching Strategy
1. Use Feature Description Vectors to measure distance
between Descriptors
2. Set a threshold (maximum distance) to eliminate bad
matches but not so low that you miss matches.
Feature distance
How to define the difference between two features f1,
f2?
•
•
Simple approach: L2 distance, ||f1 - f2 ||
can give good scores to ambiguous (incorrect) matches
f1
f2
I1
I2
Feature distance
How to define the difference between two features f1, f2?
• Better approach: ratio distance = ||f1 - f2 || / || f1 - f2’ ||
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• gives large values for ambiguous matches
f2'
f1
I1
I2
f2
Evaluating the results
How can we measure the performance of a feature matcher?
50
75
200
feature distance
True/false positives
50
75
true match
200
false match
feature distance
Problem: High Threshold = False Positives (Incorect matches)
Low Threshold = False Negatives (Miss matches)
Solution: Model Performance of Matching at varying Thresholds
using a Receiving operating characteristic curve (ROC)
Evaluating the results
How can we measure the performance of a feature matcher?
ROC
1
curve
0.7
# true positives
# correctly matched features
(positives)
“recall”
true
positive
rate
0
0.1 false positive rate
1
# false positives
# incorrectly matched features (negatives)
1 - “precision”
Feature distance
State-of-the-Art Techniques:
1. Matching Nearest Neighbor Features with
2. Nearest Neighbor Distance Ratio
-Compare NN distance to second NN
3. Verify Matches using Random Sampling i.e. RANSAC
Feature distance
From Szleski 2010: fig. 4.25
X4
X1
X3
X2
X5
minimize
g(R, T, X)
X7
non-linear least squar
X6
p1,1
p1,2
Camera 1
R1,t1
p1,3
Camera 3
Camera 2
R2,t2
R3,t3
• What makes this non-linear minimization hard?
•
•
•
•
many more parameters: potentially slow
poorer conditioning (high correlation)
potentially lots of outliers
gauge (coordinate) freedom
CSE 576, Spring 2008
Structure from Motion
31
• Minimize sum of squared reprojection errors:
predicted
observed
image locationimage location
indicator variable:
is point i visible in image j ?
• Minimizing this function is called bundle adjustment
– Optimized using non-linear least squares,
Levenberg-Marquardt
e.g.
Multi-view Stereo
Input: calibrated images and sparse points from SfM
Output: Dense Point Cloud or Mesh
Figures by Carlos Hernandez
Stereo: basic idea
error
depth
Multi-view Stereo
Different Approaches:
Figures from Furukawa and Ponce 2007
A. Patch Based: PMVS
B. Merge Stereo Depth Maps (SURE)
-Advantages – Can work with full resolution image. Not
dependent on original sparse point cloud.
-Disadvantage –When can’t achieve good alignment
Merging Depth Maps: Temple Model
input image
16
(ring)
317 images
47 images
(hemisphere)
ground truth model
Goesele, Curless, Seitz, 2006
Michael Goesele
Multi-view Stereo
Figures from Furukawa and Ponce 2007
Patch Based Method: Expand from sparse points taking a
patch of pixels (3D rectangle, with normal). Then Filter
From Reference Image R, compute score of photometric
discrepancy…want only small g(p)
Multi-view Stereo
The SfM Project
1.
Capture a part of KAUST campus using provided cameras
2.
Reconstruct Sparse and Dense Point Cloud using VisualSfM and
PMVS
3. Report on:
-Timings
-Images Used in Reconstruction of Model/Models
-Things you could do to improve results or timings
-What was reconstructed well, what was not reconstructed
4. Extra Credit:
Compare results with Kinect Scan
Calibrate your Camera using Matlab Toolbox and Calibration Chart then
rerun SfM and compare results
Try a wide angle 8mm lens instead of standard lens/GoPro/Video
Try a different MVS method (SURE or CMPMVS)
Try Creating a 360 Panorama
SfM in Practice
Proper acquisition plays a major role in the success of
SfM in Practice
1. Sharp in focus images
2. Not too dark or shadowed images
3. Need to have features in image (White walls-bad)
4. Fixed Focal Length use motion rather than zoom
5. 60-80% Overlap between images
6. Wide Field of View helps
7. Good Initial Pair
8. No sharp changes in Angle
9. Increase capture in difficult areas
10. Close the loop
VisualSfM
What is VisualSfM
- Open Source SfM GUI developed by Changchang Wu
to run:
1. SiftGPU – A gpu accelerated SIFT detector and
matcher.
- >10X
-Runs inside OpenGL/CUDA shaders (GPU memory)
-Processes Gaussian Pyramids and DOG in parallel
2. Multicore Bundler – A GPU accelerated Bundler
-30X Faster, One thread per camera
3. Many advanced and experimental features
Installing/Using VisualSfM
1. Transfer Package to computer
2. Run from Command Line
3. Use sample pictures for running reconstruction