Workshop on the R System and Packages

AMSI/SSAI ASC2008 Satellite Workshop: Computing With R

Dates: Saturday 28th June 2008 and Sunday 29th June 2008
Place:Lecture Theatre 2,
111 Barry Street,
c/- The University of Melbourne,
Victoria Australia 3010
This will be a satellite workshop for the Australian Statistical Conference (ASC2008) that will take place in Melbourne from Monday 30th June to Thursday 3rd July 2008.

Notice also the tutorial introduction to R that has been scheduled for Friday June 27th. Workshop organizers are John Maindonald and Neville Bartlett, with help from Ross Darnell and Andrew Robinson.

What is R?
Provisional Schedule
Purposes and Style
Registration fees
Abstracts
Report on Workshop

What is R?

The R system is a free software environment for scientific and statistical computing and graphics that runs on all commonly used computing platforms. An active and highly skilled developer community is working on development and improvement. It has become an environment of choice for the implementation of new methodology. It is at the same time attracting wide attention from statistical application area specialists.

Schedule

(Friday June 27, 9.30am - 5pm: Tutorial Introduction to R)
Saturday June 28
 8.30 - 10.00 Graphics - lattice, ggplot and rgl (John Maindonald) Abstract
10.00 - 11.00 Generalized Linear Models (Peter Dunn) Abstract
11.30 - 1.00 Rattle (Graham Williams) Abstract
  1.00 - 2.00 LUNCH
  2.00 - 3.30 Multi-level models (Andrew Robinson) Abstract
  4.00 - 5.30 Applications in medical statistics - meta-analysis,bnonparametric testing, and power calculations (Malcolm Hudson) Abstract
 
Sunday June 29
 8.30 - 9.30 Mixing R and LaTeX/Office (Peter Dunn) Abstract
  9.30 - 11.00 BRugs for Bayesian Analysis (Matt Wand) Abstract
11.30 - 1.00 Spatial statistics (Adrian Baddeley) Abstract
1.00 - 2.00 LUNCH
2.00 - 3.30 Time series forecasting (Rob Hyndman: web page) Abstract
4.00 - 5.30 Package Construction (Rob Hyndman: web page) Abstract

Purposes and Style

A purpose of this workshop is to give a sense of wide scope and power of the R system's abilities, for a broad audience ranging from researchers at the forefront of statistical research through to practitioners who are primarily interested in quickly and easily analysing data. Workshop sessions will NOT aim to be tutorials, though the various sessions may have tutorial components. Rather the aim is to provide pointers that participants can follow up at their leisure. Those who wish may be able to follow along, to some limited extent, using their own laptops. Code will in due course be posted that can be used to reproduce the demonstrations for many of the sessions. Where reasonably possible, presenters will use graphics to convey the gist of their analyses.

Tutorial R workshop: Friday June 27 (9.00am - 4.30pm)

This will be aimed at users who have no previous familiarity, or limited previous familiarity, with the R system. The aim is to provide some limited background on R system use that will be helpful in following the talks on the following two days. The tutorial will be limited to 20 participants.

There will be limited use of the R Commander (Rcmdr) graphical user interface, mainly for data input and examples. Participants should however be comfortable working at the command line.

Participants will be expected to bring their own laptops. R version 2.6.0 or later (preferably R 2.7.0 or later) should be installed. Note also the preparatory reading and simple familiarization exercises that are described on the course preparation web page.

Topics that will be covered, primarily illustrated via examples, include:
Data input using the R Commander GUI
Data frames (the rectangular data objects used by R)
Working directory and workspace
Packages and the search list
Simple graphics.

Registration fees

 Member Non-member Student
Three days$400 $625 $120
Two days $300 $450   $80
One day $175 $250   $40

Intending participants who join the Statistical Society of Australia will be able to take immediate advantage of the reduced member rate. The member rate is available also to members of the New Zealand Statistical Association.

Students and early career statisticians will be able to apply separately for a contribution towards travel costs. This support is in principle open to all students, irrespective of area of speciality.

A registration form will be posted in the next few days (date of last change to this note: April 10).

Abstracts

Graphics: lattice, ggplot2 and rgl.
The R system has several different flavours of graphics. These include: Base or traditional graphics; Lattice's highly stylized graphics; Grid graphics on which lattice is built; and the ggplot2 implementation of Wilkinson's Grammar of Graphics. For three-dimensional rotational graphics, note rgl and the dynamical graphical abilities of the rggobi interface to GGobi. This talk will focus mainly on lattice, drawing attention to some of the newer abilities and giving brief summary details of the customisation of its plots. Additionally, simple basic examples will demonstrate the use of ggplot2 and rgl.

The CRAN task view on R Graphics may be consulted for summary information on the rich variety of R graphics packages. For notes (28pp.) that supplement and summarize this lecture, click here. Code for the examples is available here.
Generalized Linear Models (Peter Dunn)
Generalized Linear Models extend linear models in ways that have proved especially useful in the analysis of count data. This talk will describe and demonstrate R's glm() function for use with count data, proportions, and other non-Normal data types.
Rattle (Graham Williams)
Rattle (the R Analytical Tool To Learn Easily) is a data mining toolkit used to analyse very large collections of data. Rattle presents statistical and visual summaries of data, transforms data into forms that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets. Through a simple and logical graphical user interface based on Gnome, Rattle can be used by itself to deliver data mining projects. Rattle also provides an entry into sophisticated data mining using the open source and free statistical language R.
Multi-level models (Andrew Robinson)
Many real data sets have a hierarchical multi-level structure of variation; for example multiple measurements within trees within stands within forests. The modeling approaches and that have been developed for such data provide a rich source of insights and challenges. We will showcase the nlme and lme4 packages, both of which provide extensive infrastructure for the analysis of multi-level data. The lme4 package is under very active continuing development, with new features and improvements appearing at regular intervals.
Applications in medical statistics - meta-analysis, nonparametric testing, and power calculations (Malcolm Hudson)
[provisional abstract] Medical Statistics analyses that will be discussed include:
i. graphic presentation of meta analysis results with,
ii. coin package for use with non-parametric testing and power computations, with comparisons with bootstrap procedures.
Mixing R and LaTeX/Office (Peter Dunn)
Sweave' provides a flexible framework for mixing text and S code [R implements the S language] for automatic report generation (for example, to enable reproducible research). The basic idea is to place R code into the LaTeX/Office document, and ask R to replace the code with its output, such that the final document only contains the text and the output of the statistical analysis. Currently, there is provision for incorporating S code, with markup, into either LaTeX or Open Office documents. The S code gets replaced by its output (text, tables and/or graphs) in the final markup file. This makes it possible to re-generate a report if the input data changes. It documents code that can reproduce the analysis in the same file that also produces the report. Where published papers report statistical analyses and/or summaries, it is too often hard to be sure just what analysis was done. Reference to an Sweave version (typically on a web page) documents the analysis to a standard and with a completeness that is not otherwise possible.
BRugs for Bayesian Analysis (Matt Wand)
The BRugs package facilitates Bayesian statistical analyses through the use of scripts; i.e. without the need for menus and mouse-clicks. Scripting in both R and the BUGS (Bayesian inference Using Gibbs Sampling) languages is required. Other than time, there is no firm limit on the complexity of Bayesian models that can be handled with BRugs. Because R is used at the front-end and back-end of the analysis one can take advantage of R's functionality for data input and pre-processing, as well as summary and graphical display. This component of the short course will provide illustrations at both introductory and advanced levels.
Spatial statistics (Adrian Baddeley, CSIRO/UWA)
This session will demonstrate basic techniques for analysing spatial data using R packages. There are three main kinds of spatial data: geostatistical data, where the response variable is recorded at a point location (e.g. daily temperature records at a set of weather stations); regional data, where the response variable is obtained from a spatial region (e.g. number of HIV notifications in each health authority area); and spatial point patterns, where the response is the location of an event (e.g. locations of petty crimes in Chicago). The R packages 'geoR', 'spdep' and 'spatstat' (respectively) provide functionality for these types of data.
Time series forecasting (Rob Hyndman: web page)
The forecasting bundle of R packages provides new forecasting methods, and graphical tools for displaying and analysing forecasts.
Package Construction (Rob Hyndman: web page)
Much of the power and flexibility of R derives from the large variety of powerful packages that are available to add on to the base system. Putting code into R packages is surprisingly straightforward, for a user who is careful to follow the rules.
Return to top of page

Report on AMSI/SSAI ASC2008 Satellite R Workshop, June 27-29 2008.

The idea for this workshop arose from discussions with Ross Darnell, Andrew Robinson and Neville Bartlett. The workshop was jointly organized by SSAI and AMSI (Australian Mathematical Sciences Institute), and held at the AMSI premises in Carlton.

It proved surprisingly easy to pull together a first-rate team of presenters. Four of the presenters - Graham Williams, Matt Wand, Adrian Baddeley and Rob Hyndman - have created their own very substantial R analysis packages. The workshop proved highly popular. There were 45 registrations for the Friday, 65 for the Saturday, and 71 for the Sunday. In all, there were 85 registrations, with 36 registered for all 3 days. In spite of the long days (8.30am to 5.15pm on the Saturday and Sunday), most participants attended all sessions on the days for which they had registered.

In order to meet the demand for places at Friday's tutorial introduction to R, there were parallel sessions. Andrew Robinson fronted one of these and John Maindonald the other. The audience divided itself nicely into two roughly equal groups -- one wanting a step by step introduction to R, and the other wanting to build on existing R skills. Members of this second group (some of them at least) seemed most unwilling to leave at the end of the day, staying well beyond the allotted 5pm closing time. We had excellent support from student tutors -- Frank Liu, Howard Chuang, and David Lazaridis. Additionally, Simon Blomberg turned up and was keen to help (he had somehow been enrolled as a participant!), which he did very expertly!

The Saturday sessions described particular R features. Graham Williams described difficulties, and strategy, in getting management to sanction use of open source software. He demonstrated his rattle GUI interface for "data mining" using R -- regression, classification, and multivariate data exploration. Malcolm Hudson commented on experiences in moving from S-plus to R, then discussing medical applications. Other talks were more tutorial in character -- lattice and other graphics in R (John Maindonald), generalized linear models (Peter Dunn), and multi-level models (Andrew Robinson).

The Sunday sessions kicked off with Peter Dunn's talk on the incorporation of R code and output into Open Office and LaTeX documents. Matt Wand used a tutorial on the use of the BRugs interface to Open Bugs to discuss Bayesian analysis using R. Adrian Baddeley surveyed spatial analysis in R, while Rob Hyndman gave a survey of the extensive range of time series abilities.

Neville Bartlett handled much of the local organization, with Andrew Robinson providing useful backup. Simi Henderson of AMSI gave sterling help with physical arrangements for the use of rooms, copying of notes, and so on. Thanks are due to all who helped make the workshop a success.

Financial support was provided to allow four students to attend: Frank Liu, Howard Chuang (both year 3, Finance and Applied Statistics, ANU), Andrea Walters (PhD, U of Tasmania) and Joanne Wang (PhD, U of Sydney).

A highly informal survey suggested that at most 20% or 25% of participants were willing to identify themselves as statisticians. There are a huge number of application area people out there who are using R and related tools for statistical analysis, for graphics, and for related purposes. Feedback was very favourable. with several comments along the lines of th e following:

"Overall I found the workshop very enjoyable and worthwhile. The venue and facilities were excellent, lunch and morning and afternoon teas good, and the presenters and tutors were extremely knowledgeable, enthusiastic and approachable. All in all I found this to be a very useful and well organised day and would recommend it to others."

To see Notes and overheads from the workshop, click here.

John Maindonald