\documentclass[a4paper,9pt]{book} % \SweaveOpts{engine=R,echo=FALSE} % \VignetteIndexEntry{chapter 10: Solutions to Exercises} \usepackage{a4wide} \usepackage{fancyvrb} \usepackage{float} \usepackage{exercises} \SweaveOpts{prefix.string=Art/ch10} \begin{document} \markright{\bf Chapter 10 Exercises} \centerline{Data Analysis \& Graphics Using R, 3$^{rd}$ edn -- Solutions to Exercises (\today)} <>= options(show.signif.stars=FALSE) options(SweaveHooks=list(fig=function(){par( mar=c(4.1,4.6,2.6,2.1), mgp=c(2.5,0.5,0), tck=-0.02)})) @ \begin{fmpage}{32.5pc} \noindent \textit{Preliminaries} <>= library(lme4) library(DAAG) @ \end{fmpage} \noindent The final two sentences of Exercise 1 are challenging! Exercises 1 \& 2 should be asterisked. \begin{fmpage}{32.5pc} \exhead{1} Repeat the calculations of Subsection 2.3.5, but omitting results from two vines at random. Here is code that will handle the calculation: \vspace*{-3pt} \fvset{xleftmargin=0pt} \begin{verbcode} n.omit <- 2 take <- rep(TRUE, 48) take[sample(1:48,2)] <- FALSE kiwishade.lmer <- lmer(yield ~ shade + (1|block) + (1|block:plot), data = kiwishade,subset=take) vcov <- show(VarCorr(kiwishade.lmer)) gps <- vcov[, "Groups"] print(vcov[gps=="block:plot", "Variance"]) print(vcov[gps=="Residual", "Variance"]) \end{verbcode} \vspace*{-3pt}% Repeat this calculation five times, for each of \texttt{n.omit} = 2, 4, 6, 8, 10, 12 and 14. Plot (i) the plot component of variance and (ii) the vine component of variance, against number of points omitted. Based on these results, for what value of \verb!n.omit! does the loss of vines begin to compromise results? Which of the two components of variance estimates is more damaged by the loss of observations? Comment on why this is to be expected. \end{fmpage} For convenience, we place the central part of the calculation in a function. On slow machines, the code may take a minute or two to run. <>= trashvine <- function(n.omit=2) { k <- k+1 n[k] <- n.omit take <- rep(T, 48) take[sample(1:48, n.omit)] <- F kiwishade$take <- take kiwishade.lmer <- lmer(yield ~ shade + (1 | block) + (1|block:plot), data = kiwishade, subset=take) varv <- as.numeric(attr(VarCorr(kiwishade.lmer), "sc")^2) varp <- as.numeric(VarCorr(kiwishade.lmer)$block:plot) c(varp, varv) } varp <- numeric(35) varv <- numeric(35) n <- numeric(35) k <- 0 for(n.omit in c( 2, 4, 6, 8, 10, 12, 14)) for(i in 1:5){ k <- k+1 vec2 <- trashvine(n.omit=n.omit) n[k] <- n.omit varp[k] <- vec2[1] varv[k] <- vec2[2] } @ We plot the results: \setkeys{Gin}{width=0.8\textwidth} \begin{figure}[H] \begin{center} <>= par(mfrow=c(1,2)) plot(n, varv, xlab="Number of vines omitted", ylab="Within plots variance estimate") plot(n, varp, xlab="Number of vines omitted", ylab="Between plots variance estimate") par(mfrow=c(1,1)) @ \caption{Within, and between plots variance estimates, as functions of the number of vines that were omitted at random} \end{center} \end{figure} As the number of vines that are omitted increases, the variance estimates can be expected to show greater variability. The effect should be most evident on the between plot variance. Inaccuracy in estimates of the between plot variance arise both from inaccuracy in the within plot sums of squares and from loss of information at the between plot level. At best it is possible only to give an approximate d.f.\ for the between plot estimate of variance (some plots lose more vines than others), which complicates any evaluation that relies on degree of freedom considerations. \begin{fmpage}{32.5pc} \exhead{2} Repeat the previous exercise, but now omitting 1, 2, 3, 4 complete plots at random. \end{fmpage} <>= trashplot <- function(n.omit=2) { k <- k+1 n[k] <- n.omit plotlev <- levels(kiwishade$plot) use.lev <- sample(plotlev, length(plotlev)-n.omit) kiwishade$take <- kiwishade$plot %in% use.lev kiwishade.lmer <- lmer(yield ~ shade + (1 | block) + (1|block:plot), data = kiwishade, subset=take) varv <- as.numeric(attr(VarCorr(kiwishade.lmer), "sc")^2) varp <- as.numeric(VarCorr(kiwishade.lmer)$block:plot) c(varp, varv) } varp <- numeric(20) varv <- numeric(20) n <- numeric(20) k <- 0 for(n.omit in 1:4) for(i in 1:5){ k <- k+1 vec2 <- trashplot(n.omit=n.omit) n[k] <- n.omit varp[k] <- vec2[1] varv[k] <- vec2[2] } @ Again, we plot the results: \setkeys{Gin}{width=0.8\textwidth} \begin{figure}[H] \begin{center} <>= par(mfrow=c(1,2)) plot(n, varv, xlab="Number of plots omitted", ylab="Within plots variance estimate") plot(n, varp, xlab="Number of plots omitted", ylab="Between plots variance estimate") par(mfrow=c(1,1)) @ \caption{Within, and between plots variance estimates, as functions of the number of whole plots (each consisting of four vines) that were omitted at random.} \end{center} \end{figure} Omission of a whole plot loses 3 d.f.\ out of 36 for estimation of within plot effects, and 1 degree of freedom out of 11 for the estimation of between plot effects, i.e., a slightly greater relative loss. The effect on precision will be most obvious where the d.f.\ are already smallest, i.e., for the between plot variance. The loss of information on complete plots is inherently for serious, for the estimation of the between plot variance, than the loss of partial information (albeit on a greater number of plots) as will often happen in Exercise 1. \begin{fmpage}{32.5pc} \exhead{3} The data set \verb!Gun! (\textit{MEMSS} package) reports on the numbers of rounds fired per minute, by each of nine teams of gunners, each tested twice using each of two methods. In the nine teams, three were made of men with slight build, three with average, and three with heavy build. Is there a detectable difference, in number of rounds fired, between build type or between firing methods? For improving the precision of results, which would be better -- to double the number of teams, or to double the number of occasions (from 2 to 4) on which each team tests each method? \end{fmpage} It probably does not make much sense to look for overall differences in \texttt{Method}; this depends on \texttt{Physique}. We therefore nest \texttt{Method} within \texttt{Physique}. <>= library(MEMSS) Gun.lmer <- lmer(rounds~Physique/Method +(1|Team), data=Gun) summary(Gun.lmer) @ A good way to proceed is to determine the fitted values, and present these in an interaction plot: <>= Gun.hat <- fitted(Gun.lmer) interaction.plot(Gun$Physique, Gun$Method, Gun.hat) @ Differences between methods, for each of the three physiques, are strongly attested. These can be estimated within teams, allowing 24 degrees of freedom for each of these comparisons. Clear patterns of change with \texttt{Physique} seem apparent in the plot. There are however too few degrees of freedom for this effect to appear statistically significant. Note however that the parameters that are given are for the lowest level of \texttt{Method}, i.e., for \texttt{M1}. Making \texttt{M2} the baseline shows the effect as closer to the conventional 5\% significance level. The component of variance at the between teams level is of the same order of magnitude as the within teams component. Its contribution to the variance of team means (1.044$^2$) is much greater than the contribution of the within team component (1.476$^2$/4; there are 4 results per team). If comparison between physiques is the concern; it will be much more effective to double the number of teams; compare (1.044$^2$+1.476$^2$/4)/2 (=0.82) with 1.044$^2$+1.476$^2$/8 (=1.36). \begin{fmpage}{32.5pc} \exhead{4} *The data set \verb!ergoStool! (\textit{MEMSS} package) has data on the amount of effort needed to get up from a stool, for each of nine individuals who each tried four different types of stool. Analyse the data both using \verb!aov()! and using \verb!lme()!, and reconcile the two sets of output. Was there any clear winner among the types of stool, if the aim is to keep effort to a minimum? \end{fmpage} For analysis of variance, specify <>= aov(effort~Type+Error(Subject), data=ergoStool) @ For testing the \texttt{Type} effect for statistical significance, refer (81.19/3)/(29.06/24) (=22.35) with the $F_{3,24}$ distribution. The effect is highly significant. This is about as far as it is possible to go with analysis of variance calculations. When \texttt{Error()} is specified in the \texttt{aov} model, R has no mechanism for extracting estimates. (There are mildly tortuous ways to extract the information, which will not be further discussed here.) For use of \texttt{lmer}, specify <>= summary(lmer(effort~Type + (1|Subject), data=ergoStool)) @ Observe that 1.100295$^2$ (Residual StdDev) is very nearly equal to 29.06/24 obtained from the analysis of variance calculation. Also the Stratum 1 mean square of 66.5/8 (=8.3125) from the analysis of variance output is very nearly equal to 1.3325$^2$ +1.100295$^2$/4 (= 2.078) from the \texttt{lme} output. \begin{fmpage}{32.5pc} \exhead{5*} In the data set \verb!MathAchieve! (\textit{MEMSS} package), the factors \verb!Minority! (levels \verb!yes! and \verb!no!) and \verb!sex!, and the variable \verb!SES! (socio-economic status) are clearly fixed effects. Discuss how the decision whether to treat \verb!School! as a fixed or as a random effect might depend on the purpose of the study? Carry out an analysis that treats \verb!School! as a random effect. Are differences between schools greater than can be explained by within school variation? \end{fmpage} School should be treated as a random effect if the intention is to generalize results to other comparable schools. If the intention is to apply them to other pupils or classess within those same schools, it should be taken as a fixed effect. For the analysis of these data, both SES and MEANSES should be included in the model. Then the coefficient of MEANSES will measure between school effects, while the coefficient of SES will measure within school effects. <>= library(MEMSS) MathAch.lmer <- lmer(MathAch ~ Minority*Sex*(MEANSES+SES) + (1|School), data=MathAchieve) @ % {\footnotesize <>= options(width=90) MathAch.lmer options(width=68) @ % } The between school component of variance (1.585$^2$) is 2.51, compared with a within school component that equals 35.79. To get confidence intervals (strictly Bayesian credible intervals) for these variance estimates, specify: <>= MathAch.mcmc <- mcmcsamp(MathAch.lmer, n=10000) HPDinterval(VarCorr(MathAch.mcmc, type="varcov")) @ The 95\% confidence interval for the between school component of variance extended, in my calculation, from 1.64 to 3.0. The confidence interval excludes 0. The number of results for school varies between 14 and 67. Thus, the relative contribution to class means is 5.51 and a number that is at most 5.982429$^2$/14 = 2.56. \end{document}