Math3346 -- 2009: Course Schedule

Tools andIssues for Data Miners --

Classification, Visualization, and Generalization

Lectures are Thurs 14.00 – 16.00 in Copland G30, and Fri 9.00-10.00 in JD G30

Laboratories are Tuedays 15:00 - 16:00 in JD LG104

Codes f=felix; j=John; g=Graham; m=mayukh; a=Alan; s=Stephen; tba=to be announced

Week 01 20-24 July 2009

  Introduction

: Lect 01f - Course Overview

: Lect 02f - Introduction to R

: Lect 03f - Introduction to R

Laboratory 01j - R Basics (Lab exercises 1)

Week 02 27–31 July 2009

  Statistical Basics for Data Mining

     : Lect 03f - Introduction to R

     : Lect 04f - Introduction to R   

     : Lect 05f - Distributions and Sampling Distributions

Laboratory 02f - R Basics (Lab exercises 1)

     Assignment 1j 20 marks

Week 03 03-07 August 2009

  Classification and other models - Models and Model Accuracy assessment

     : Lect 06j: Aug6 - Population & sample; source & target population, etc.

     : Lect 07j: Aug6 - Linear and Other Models; model formulae;

     : Lect 08j: Aug7 - Classification models - multi-way tables;

Laboratory 03j: Aug4 - Practice with R (Lab exercises 2)

Week 04 10–14 August 2009

  Statistics and Data Mining

     : Lect 9j: Aug13 - Training/test, cross-validation, bootstrap I

     : Lect 10j: Aug13 - Training/test, cross-validation, bootstrap II

     : Lect 11j: Aug14 - Generalizing from models; measurement of accuracy

Laboratory 04j: Aug11 - Informal & Formal Data Exploration

    (Lab exercises 3)           

Week 05 17-21 August 2009

: Lect 12j: Aug20 - Source/target differences; reject inference, etc.

: Lect 13j: Aug20 - Linear versus non-linear models

: Lect 14j: Aug21 - Variable selection effects

Laboratory 05j: Aug18 – Populations & Samples (Lab exercises 6, cf also 7 & 8)

Week 06 24-28 August 2009

: Lect 15m: Aug27 - Use and Interpretation of regression coefficients

      : Lect 16m: Aug27 – Errors in variables

: Lect 17m: Aug28 – Discriminant Methods & Associated Ordinations

Laboratory 06j: Aug25 - Linear Discriminant Analysis vs Random Forests (Lab exs 12)

      Assignment 2j 20 marks

 

Week 07 31Aug - 4Sept 2009

      : Lect 18j: Sept3 – Ordination methods – non-parametric

      : Lect 19j: Sept3 – Review of Lectures to date

Data Mining Techniques

      : Lect 20g: Sept4 - Data mining issues + tools

Laboratory 07j – Sept 1: Discriminant Methods & Associated Ordinations (Lab exs 13)

Week 08 7-11 September 2009

: Lect 21g: Sept10 - Clustering

: Lect 22g: Sept10 - Association Rules

: Lect 23g: Sept11 – Decision Trees + Deployment

Laboratory 08j: Sept8 – Trees, SVM & random forest discriminants (Lab exs 14)

Week 09 14-18 September 2009

: Lect 24g: Sept17 – Boosting and Random Forests

: Lect 25g: Sept17 - Neural Nets and Support Vector Machines

Special Topics

      : Lect 26j: Sept18 – Worked example

Laboratory 09g: Sept13 – Rattle

      Assignment 3g 15 marks –

Week 10 21-25 September 2009

Practical data analysis

: Lect 27tba: Sept24 - Commentary on "Hastie, Tibshirani & Freedman's

: Lect 28tba: Sept24 - Elements of Statistical Learning

: Lect 29tba: Sept25 – Support vector machines

Laboratory 10j: Sept22 - Data summary - traps for the unwary

                       (Lab exercises 5)

TERM BREAK

Week 11 12-16 October 2009

Practical data analysis

: Lect 27j: Oct15 – Overview

: Lect 28j: Oct15 - Worked example?

: Lect 29j: Oct16 - Worked example?

Laboratory 11j: Oct13 - Data analysis - a 'large' data set (Lab exs 16)

Week 12 19-23 October 2009

      : Lect 30j: Oct22 – Course review

      : Lect 31j: Oct22 – Wrap up and Survey and Feedback

: Lect 32j: Oct23 – To Be Announced

Week 13 26-30 October 2009

  Student Presentations:       Oct29 & 30 - 30 Marks

               Commentary on Presentations: 10 marks