Data Mining -- Introductory Exercises

Basic Ideas and Tools for Data Mining

Data mining -- ideas and tools

These notes offer a perspective on the nature of data mining, and on the tools used. They are in a relatively unpolished draft form.

Laboratory Notes, and R Scripts

Laboratory Exercises     R Scripts

Background Reading

Ian Ayres 2007, Super Crunchers. Why Thinking-By-Numbers is the New Way to be Smart. Bantam. [This places data mining in a wider context of data-based decision-making in business, government and consumer affairs. While popular in style and short on analysis detail, it offers a useful overview of ways in which applications of data mining and related analytical techniques are developing and changing, in part because of the new opportunities and challenges of the internet.]

Thomas H. Davenport and Jeanne G. Harris 2007, Competing on Analytics: The New Science of Winning. Harvard Business School Press. [Analytics is a buzzword for the application of data mining type approaches in commerce. Davenport and Thomas give a useful overview of issues for the deployment of analytical techniques within organizations - benefits and traps, choice of amenable tasks, the role of management, skill base issues, etc.]

John Maindonald and John Braun 2007, Data Analysis and Graphics Using R - An Example-Based Approach, 2nd edn Cambridge University Press. [Of greatest relevance to the course are Chapter 2 on Styles of Data Analysis, Chapters 5 & 6 (through to 6.3) on Linear Models, Chapter 8 (through to 8.3) on logistic regression, Chapter 11 on Tree-based Methods, and Chapter 12 (through to 12.2) on Multivariate Data Exploration & Discrimination.]

Links

Course materials for ANU Data Mining course in 2005 - 2008    Updated and more complete set of lab exercises
Suggestions for getting started on R     New York Times article on R
Information Management gives a plug for Math3346    See here also
Graham Williams' data mining web page (NB in particular rattle, a GUI interface to a data mining toolkit)
John Maindonald's data mining talks and papers