Workshop on Modern Regression and Classification Using R - Preparation

Participants will be expected to bring their own laptops (PC or MacOS X or Linux), with a recent version of R (preferably R-2.13.2 or more recent; the current version is R-2.14.0) already installed. Additionally, a number of R packages should be installed. For details of R Packages that should be installed, click here

Intending participants with limited previous experience with R should do some modest amount of preparation.

In preparation for the Course - Getting Familiar with R

Copy down the R binary, install it on your machine, start up R, and start typing!
Windows users: Click here to obtain R

Other systems, click here to look for a binary for your system (MacOS X, some flavours of Linux).

What should I type?

> 1+1
This may suggest some other possibilities!
> nn <- 1:5
Create in the workspace an integer vector nn that holds the values 1,2,3,4,5.
NB: <- is the assignment symbol.
> nn Display (print) the contents of nn
> ls() Show the contents of the workspace. You should see "nn" listed.
> q() End (quit) the session. When asked if you want to save the workspace, make a habit of clicking on "Yes". This saves everything in the workspace into a file (called .RData, for those who really must know) in the working directory.

There will at some point be a need to know the path to the working directory. Start R again (the workspace, if saved on the previous exit, gets reloaded), and type:
> ls() Show the contents of the workspace.
> getwd() Get the path to the working directory
If not set or changed from the default, Windows systems are likely to use "C:/Documents and Settings/Owner/My Documents" as the working directory. Other uses for working directories (there can be as many as you want) will become apparent as the course proceeds.

There are a number of demonstrations to try.
> demo() Gives a list of demos that can be tried
> demo(graphics) Show off the graphics. Press the ENTER key to display the first graph,
and to display each successive graph.
A good follow-up is to run the code that is included in the document   Datasets, and familiarisation exercises
Familiarity with these datasets will help in following the tutorials and doing course exercises. R code is given that can be used to get summary information and to plot graphs that will help reveal important features of the data.

Tutorial Material for R

Work through chapters 1 and 2, and preferably also chapter 3, of the document
http://www.maths.anu.edu.au/~johnm/courses/r/notes/rnotes1-36.pdf   Click here if you want chapter 4 also!

Click here to get scripts   Scripts for all 15 chapters of intro to R

Other Introductory Documents from the Web

Go to http://mirror.aarnet.edu.au/pub/CRAN
and click on Documentation to see some of the possibilities.

Try, perhaps, R for Beginners (Emmanuel Paradis).

R Packages that should be installed

Laptops should as far as possible be set up, ready for use with R and R packages, prior to the course. Note that administrator priveleges are not required for installation of R. In the absence of administrator priveleges, R will be installed into a user directory;

After installing R, install also the packages animation, DAAG, gamclass, e1701, latticist, latticeExtra, playwith, Rcmdr, randomForest, rattle, scales, slam, Ecdat, nws, oz, survey, mlbench, fgui and ggplot2. Several of these packages have a number of dependencies, so that other packages will be installed along with them. Mac users who install from the Mac GUI should be sure to tick the box "Install dependencies".

Other packages to which there may be reference include dichromat, odfWeave, rpanel, fortunes, scatterplot3d, schoolmath and sp. adabag and ape.

For playwith, and preferably also for rattle, GTk2 should be installed. NB: Gtk2 is not part of R. It is required in order to use the abilities, or some of the abilities, in certain R packages

For R-2.12 or later under Windows, download and install http://downloads.sourceforge.net/gtk-win/gtk2-runtime-2.22.0-2010-10-21-ash.exe

For use with R under MacOS X, download and install http://r.research.att.com/libs/GTK_2.18.5-X11.pkg

(For R-2.11 under Windows, download and install http://downloads.sourceforge.net/gladewin32/gtk-2.12.9-win32-2.exe)

If R has access to a live internet connection, packages can be installed from the menu. You will need to select a repository. In Australia, choose an Australian repository. Alternatively, packages can be installed from the command line. For Rcmdr, a suitable command is:
  install.packages("Rcmdr", dependencies=TRUE)
The R commander has many dependencies, indirect as well as direct. Unless the internet connection is fast, this may take some time.

Installation of the RStudio Integrated Development Environment

This free and open source development environment (editor, and much more) is strongly recommended for use of R. Download it from RStudio website (Mac: ∼ 40MB; Windows: ∼ 24MB; Linux: ∼ 24MB)

Installation of Java JDK

For text mining using the tm package, a Java JDK must be installed. Go to: http://www.oracle.com/technetwork/java/javase/downloads/index.html. Then, under Java Platform, Standard Edition, click on click on Download JDK. (This is described as Java SE6 Update 23.) Mac users should already have JDK installed as part of the Macintosh system.

Checking the Installation

To check, e.g., that latticeExtra (and dependencies) is properly installed, start R and type, on the command line:
  library(latticeExtra)

As rattle will be important for the course, please check that you are able to run it. Start up R, and type:
  library(rattle)
  rattle()
Mac users may get warning messages. These can almost certainly be ignored.

Further Notes on the Installation of R

See the document Installation of R, of R packages, and editor environments

Installation of Packages (or even running R) from a DVD

DVDs and memory sticks will however be available at the course from which it will be possible to install, for R-2.14.0 or R-2.14.1 (if available by the time of the course), any packages that are lacking. Additionally, these DVDs will include an R executable that has relevant packages already installed. Once the DVD is in a computer's DVD drive, R can be run from the DVD.

Additional Exercises

These exercises are additional to those in the course notes

Click here to get the scripts

Weaving the Exercises (R's Sweave function; for Techos Only!)

Here is a brief introduction to the combining of LaTeX source and suitably annotated R code in a document that can then be processed through R's Sweave function to give the final document.
R talks to LaTeX

Sweave versions of the exercise scripts

Do you have data that you are happy to expose to wider view?

Contact the presenter with the details. Data that have been used for a published paper may be especially suitable.

Links

Further exercises; and Weaving with R (strictly for those who want some greater challenge!)

Web site for R (CRAN = Comprehensive R Archive Network)

There are further interesting R links here.

John Maindonald's web site

email: john.maindonald AT anu.edu.au
Last updated: November 3 2011.