Instructor
Prof. David Banks
David Banks is the director of the Statistical and Applied Mathematical Sciences Institute and a fellow of the ASA and IMS. He is a past editor of the Journal of the American Statistical Association and founding editor of Statistics and Public Policy. His research areas include agent-based models, adversarial risk analysis, dynamic networks, text data, and human rights statistics.
Course description
This course describes nonparametric regression, including the additive model and its generalizations, also the LASSO, and LARS. Then it proceeds to classification (SVMs, random forests, boosting). The Curse of Dimensionality is described in both contexts. Bagging and stacking are covered. There is some coverage of cluster analysis, and some text analytics. The emphasis is upon the strengths and weaknesses of the tools, and guidance on when a particular method should be used.
Target audience
Those who are familiar with multiple linear regression. I expect someone who got a MS in statistics ten years ago would be a good fit. People with stronger backgrounds will still benefit, and probably acquire deeper insights.
Syllabus
- Machine learning overview
- Key ideas in nonparametric regression
- Methods for model assessment and quantified prediction
- Nonparametric regression methods
- Advances in variable selection methodology
- Classification techniques: SVMs and Random Forests
- The power of ensembles
- Cluster analysis
- Text data
Required software
None, but students should practice what they learn with R.