Data Analysis and Graphics Using R – an Example-based Approach

The following is from the preface for a 4th edition draft, on which work started in late 2015. We are awaiting confirmation from CUP on how they wish to proceed. Provisionally, two texts are planned:

New text (“4th edition”) – changes and additions

With this new edition the focus has moved, from including R tutorial material in the text, to pointing users to the extensive R help resources now available on the web, and in books and other printed material. These include extensive supplements that are available from the web pages for this text. The third edition content of Chapters 1 – 5 has been amalgamated and condensed somewhat into Chapters 1 and 2 in this new edition.

Concerns about reproducibility, especially in wet laboratory biology and in psychology, have attracted extensive attention in the pages of Nature, Science, the Economist, psychology journals, and elsewhere. One consequence has been a renewed attention, both in the wider scientific community and in the statistical community, on the interplay between scientific methodology and statistical design and analysis. The uses and limitations of $p$ -values have been an important part of the discussion, in part because a small $p$ -value has commonly been seen (wrongly) as enough on its own, making replication unnecessary. Chapter 2 now has a much extended discussion of the use and role of $p$ -values, leading on to the wider discussion of reproducibility issues. These issues, as they affect data analysis, become more than ever important as the R system, and other such systems, increasingly make sophisticated statistical abilities widely available.

The treatment of $p$ -values extends to noting the new possibilities that arise when there are, potentially, hundreds, or thousands, or more, $p$ -values. The false discovery rate estimates that are then available are, we argue, more informative, and directly relevant than $p$ -values, to the questions that are commonly of experimental interest. A new Chapter 9 section takes up these ideas as they apply to the analysis of RNA-Seq gene expression data.

Other changes include increased attention to transformations that may yield linear relationships (Chapter 3), and to automated smoothing methods (Chapter 4). Chapter 4 has a new section on quantile regression. Chapters 5 has new content on models with a negative binomial error. Chapter 7 demonstrates the use of a model with beta binomial error. It has a new section that illustrates the power of simulation for handling problems that are analytically intractable (not yet incorporated). Changes and advances in the relevant R functions and packages have required a reworking of the R code in Chapter 7, with consequent changes in the text.

In preparing the drafts for this edition, code and text were first combined together in the one document, in a form that could then be processed to give the LaTeX file and files for figures. The code has, as a result, been extensively streamlined. For this work, Yihui Xie’s knitr package has been a huge boon, making the revising or updating of text and code easier and less error-prone. We are now able to offer R Markdown scripts, one for each chapter, that can be processed to reproduce all computer output, including tables and graphs.

John Maindonald.

Data Analysis and Graphics Using R – an Example-based Approach

Changes in 4th edition draft

John Maindonald and John Braun

22 August 2018

New text (“4th edition”) – changes and additions