Data mining -- ideas and tools
These notes offer a perspective on the nature of data mining,
and on the tools used. They are in a relatively unpolished
draft form.
Ian Ayres 2007, Super Crunchers. Why Thinking-By-Numbers is the New
Way to be Smart. Bantam.
[This places data mining in a wider context of data-based
decision-making in business, government and consumer affairs. While
popular in style and short on analysis detail, it offers a useful
overview of ways in which applications of data mining and related
analytical techniques are developing and changing, in part because of
the new opportunities and challenges of the internet.]
Berk, Richard A 2008. Statistical Learning from a Regression
Perspective. Springer. ["... none of the techniques has ever
lived up to their most optimistic billing. Widespread misuse has
further increased the gap between promised ... and actual
performance. ... therefore the tone will be cautious, some might
even say dark". More positively, Berk argues that there are new ideas
and insights, and insightful new perspectives on more traditional
methods.] Review of Berk's book
Thomas H. Davenport and Jeanne G. Harris 2007, Competing on
Analytics: The New Science of Winning. Harvard Business School
Press.
[Analytics is a buzzword for the application of data mining type
approaches in commerce. Davenport and Thomas give a useful overview
of issues for the deployment of analytical techniques within
organizations - benefits and traps, choice of amenable tasks, the role
of management, skill base issues, etc.]
John Maindonald and John Braun 2010, Data Analysis and Graphics Using R - An Example-Based Approach, 3rd edn Cambridge University Press.
[Of greatest relevance to the course are Chapter 2
on Styles of Data Analysis, Chapters 5 & 6 (through to 6.3) on Linear Models,
Chapter 8 (through to 8.3) on logistic regression, Chapter 11 on Tree-based
Methods, and Chapter 12 (through to 12.2) on Multivariate Data Exploration &
Discrimination.]