Home > 6.899 Learning and Inference in Vision

6.899 Learning and Inference in Vision


 
 

MIT 6.899   
Learning and Inference in Vision 

  • Prof. Bill Freeman, wtf@mit.edu
  • MW 2:30 – 4:00
  • Room: 34-301
  • Course web page: http://www.ai.mit.edu/courses/6.899/
 
 

Reading class 

  • We’ll cover about 1 paper each class.
  • Seminal or topical research papers in the intersection of machine learning and vision.
  • One student will present each paper. Then we’ll discuss the paper as a class.
  • One student will write a computer example illustrating the paper’s main idea.
 
 

Learning and Inference 

  • “Learning”:  learn the parameter values or structure of a probabilistic model.
    • Look at many examples of people walking, and build up probabilistic model relating video images to 3-d motions.
  • “Inference”:  infer hidden variables, given a observations. 
     
    • Eg, given a particular video of someone walking, infer their motions in 3-d.
 
 

Statistical dependencies between variables 

Learning and Inference 

y1 

y2 

Observed variables 

x1 

x2 

Unobserved variables


 
 

Statistical dependencies between variables 

Learning and Inference 

Observed variables 

Unobserved variables 

“Learning”:  learn this model, and the form

of the statistical dependencies.


 
 

Statistical dependencies between variables 

Learning and Inference 

y1 

y2 

Observed variables 

x1 

x2 

Unobserved variables 

“Learning”:  learn this model, and the form

of the statistical dependencies. 

“Inference”:  given this model, and the observations, y1 & y2, infer x1 & x2, or their conditional distribution.


 
 

Cartoon history of speech recognition research 

  • 1960’s, 1970’s, 1980’s:  lots of different approaches;  “hey, let’s try this”.
  • 1980’s Hidden Markov Models (HMM), statistical approach took off.
  • 1990’s and beyond:  HMM’s now the dominant approach.  “The person with the best training set wins”.
 
 

Same story for document understanding 

  • The person with the best training set wins.
 
 

Computer vision is ready to make that transition 

  • Machine learning approaches are becoming dominant.
  • We get to make and watch the transition to principled, statistical approach happen.
  • It’s not trivial:  issues of representation, robustness, generalization, speed, …
 
 

Categories of the papers 

  1. Learning image representations
  2. Learning manifolds
  3. Linear and bilinear models
  4. Learning low-level vision
  5. Graphical models, belief propagation
  6. Particle filters and tracking
  7. Face and object recognition
  8. Learning models of object appearance
 

 


 
 

1 Learning image representations 

Example training image 

From http://www.amsci.org/amsci/articles/00articles/olshausencap1.html


 
 

1 Learning image representations 

From: http://www.cns.nyu.edu/pub/eero/simoncelli01-reprint.pdf


 
 

2 Learning manifolds 

From:  http://www.sciencemag.org/cgi/content/full/290/5500/2319 

Joshua B. Tenenbaum, Vin de Silva, John C. Langford

 


 
 

2 Learning manifolds 

From:  http://www.sciencemag.org/cgi/content/full/290/5500/2319


 
 

2 Learning manifolds 

From:  http://www.sciencemag.org/cgi/content/full/290/5500/2319


 
 

3 Linear and bilinear models 

From: http://www-psych.stanford.edu/~jbt/NC120601.pdf


 
 

4 Learning low-level vision 

From Y. Weiss, http://www.cs.berkeley.edu/~yweiss/iccv01.ps.gz 

Images, under different lighting 

reflectance 

illumination


 
 

5 Graphical models, belief propagation 

From: http://www.cs.berkeley.edu/~yweiss/nips96.pdf


 
 

6 Particle filters and tracking 

From: http://www.robots.ox.ac.uk/~ab/abstracts/eccv96.isard.html


 
 

7 Face and object recognition 

From Viola and Jones, http://www.ai.mit.edu/people/viola/research/publications/ICCV01-Viola-Jones.ps.gz


 
 

7 Face and object recognition 

From Viola and Jones, http://www.ai.mit.edu/people/viola/research/publications/ICCV01-Viola-Jones.ps.gz


 
 

7 Face and object recognition 

From: Pinar Duygulu, Kobus Barnard, Nando deFreitas, and David Forsyth,

 


 
 

8 Learning models of object appearance 
  

Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz 

Images containing the object 

Images not containing the object


 
 

8 Learning models of object appearance 
  

Test images 

Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz 

Contains the object? 

Contains the object?


 
 

8 Learning models of object appearance 

Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz


 
 

Guest lecturers/discussants 

  • Andrew Blake (Condensation, Oxford/Microsoft)
  • Baback Moghaddam (Bayesian face recognition, MERL)
  • Paul Viola (Fast face recognition, MERL)
 
 

Class requirements 

  1. Read each paper.  Think about them.  Discuss in class.
  2. Present one paper to the class.
  3. Present one computer example to the class.
  4. Final project:  write a conference paper related to vision and learning.
 
 

1. Read the papers, discuss them 

  • Write down 3 insights about the paper that you might want to share with the class in discussion.
  • Turn them in on a sheet of paper.
 
 

2. Presentations about a paper 

  • About 15 minutes long.  Set the stage for discussions.
  • Review the paper.  Summarize its contributions.   Give relevant background.  Discuss how it relates to other papers we’ve read.
  • Meet with me two days before to go over your presentation about the paper.
 
 

3. Programming example 

  • Present a computer implementation of a toy example that illustrates the main idea of the paper.
  • Show trade-offs in parameter settings, or in training sets.
  • Goal:  help us build up intuition about these techniques.
  • Ok to use on-line code.  Then focus on creating informative toy training sets.

 


 
 

Toy problems 

  • Simple summaries of the main idea.
  • Identify an informative idea from the paper
  • Make a simple example using it.
  • Play with it.
 
 

Toy problem 

by Ted Adelson


 
 

Toy problem 

“If you can make a system to solve this, I’ll give you a PhD” 

by Ted Adelson


 
 

Particle filter for inferring human motion in 3-d 

From:  Hedvig Sidenbladh’s thesis, http://www.nada.kth.se/~hedvig/publications/thesis.pdf


 
 

Particle filter toy example 

From:  Hedvig Sidenbladh’s thesis, http://www.nada.kth.se/~hedvig/publications/thesis.pdf


 
 

What we’ll have at the end of the class 

Non-negative matrix factorization example

1-d particle filtering example

Boosting for face recognition

Example of belief propagation for scene understanding.

Manifold learning comparisons.

… 

Code examples


 
 

4. Final project:  write a conference paper 

  • Submitting papers to conferences, you get just one shot, so it’s important to learn how to make good submissions.
  • We’ll discuss many papers, and what’s good and bad about them, during the class.
  • I’ll give a lecture on “how to write a good conference paper”.
  • Subject of the paper can be:
    • A project from your own research.
    • A project you undertake for the class.
      • Your idea
      • One I suggest to you
 
 

Feedback options 

  • At the end of the course:  “it would have been better if we had done this…”
    • Somewhat helpful
 
  • During the course:  “I find this useful;  I don’t find that useful…”
    • Very helpful
 
 

What background do you need? 
 
 

  • Be able to read and understand the papers
    • Linear algebra
    • Familiarity with estimation theory
    • Image filtering
  • Background in machine learning and computer vision.
 
 

Auditing versus credit 

  • If you’re a student and want to take the class, sign up for credit.
    • You’ll stay more engaged.
    • Makes it more probable that I can offer the class again.
  • But if you do audit: 
    • Please don’t come to class if you haven’t read the paper.
    • I may ask you to present to the class, anyway.
 
 

First paper 

  • Monday, Feb. 11.
  • Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Olshausen BA, Field DJ (1996) Nature, 381: 607-609
  • Presenter:  Bill Freeman
  • Computational demonstration:  need volunteer (software is available:  http://redwood.ucdavis.edu/bruno/sparsenet.html)

 


 
 

Second paper 

  • Wednesday, Feb. 13.
  • Learning the parts of objects by non-negative matrix factorization, D. D. Lee and H. S. Seung, Nature 401, 788-791 (1999), and commentary by Mel.
  • Presenter:  need volunteer
  • Computational demonstration:  need volunteer
Search more related documents:6.899 Learning and Inference in Vision

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP