Statistical Learning

Spring semester, 2007

This is the home page of the "Statistical Learning" course, part of the
PhD program on Electrical and Computer Engineering  of the
Department of Electrical and Computer Engineering, 
and also part (under the name “Teoria da Aprendizagem”) of the
PhD Programme on Information Systems and Computer Engineering of the
Department of Information Systems and Computer Engineering

 Instituto Superior Técnico

Professor: Mário Figueiredo



2007 Schedule:  Tuesday  and Friday,  10:30-12:00.  Room: LT-1


 

Summaries:

Week 1
Review of probability theory.

Week 2
Introduction to statistical decision theory; the frequentist approach and
the Bayesian approach. Admissibility and min-max rules. Optimal Bayes
decisions.
Bayesian decision rules for classification and estimation
(namely, maximum a posteriori, posterior mean, and posterior median).

Handout: lecture notes on Bayesian decision theory (namely estimation
and classification) can be downloaded from here.

Week 3
Topics in Bayesian inference: conjugate priors, non-informative priors,
mixtures of conjugate priors.

Week 4
More topics in Bayesian inference: compound decision rules (marginal
and joint rules). Sufficient statistics and the sufficiency principle.
Exponential families.

Week 5
Linear regression. The least squares and maximum likelihood criteria.
Properties of  least squares linear regression (best linear unbiased
estimator – the Gauss-Markov Theorem). Ridge regression and its
spectral view.
Dual variables in linear regression.

Handout: lecture notes on linear regression can be downloaded from here.

 



Program:

 

1. Review of probability theory and statistics.

 

2. Introduction to Bayes Decision Theory.

   Likelihood function and a priori probability;

   loss functions, expected risks, optimal decisions;

   conjugate priors;

   sufficient statistics;

   exponential families;

   non-informative priors (Jeffreys);

   hierarchical modelling;

   inference with missing data (EM algorithm);

 

4. Linear Regression.

   Criteria (minimum mean squared error, maximum likelihood);

   characterization (Gauss-Markov theorem);

   ridge and LASSO regression (criteria and algorithms);

   degrees of freedom and variable selection:

 

5. Linear Classification.

   Logistic regression (generative interpretation and algorithms);

   Fisher discriminants;

   support vector machines;

   large margin methods.

 

6. Non-Linear Regression and Classification

   Basis expansions (splines, polynomials, RBF);

   kernels and RKHS;

   classification and regression trees;

   additive models and boosting.

 

7. Unsupervised Learning.

   Clustering algorithms;

   finite and infinite mixtures;

   other problems (density estimation, PCA, MDS, ICA).

 

8. Introduction to Learning Theory and Model Selection

   Expected and empirical risks;

   cross-validation;

   empirical/structural risk minimization;

   generalization bounds;

   Hoeffding's inequality;

   uniform convergence and consistency;

   Vapnik-Chervonenkis theory;

   capacidade measures (VC, cover numbers, Rademacher).



 

Bibliography 

The Elements of Statistical Learning     
T. Hastie, R. Tibshirani, and J. Friedman, 
Springer-Verlag, 2001.

All of Statistics

Larry Wasserman,

Springer,  2004

Learning with Kernels
B. Schölkopf and A. Smola
MIT Press, 2002,

Pattern Recognition and Machine Learning
Christopher Bishop
Springer, 2006.

Kernel Methods for Pattern Analysis
John Shawe-Taylor and Nello Cristianini
Cambridge University
Press, 2004.

Several handouts to be made available from this web page during the semester or distributed in the class.