\documentclass[a4paper,9pt]{book} % \SweaveOpts{engine=R,echo=FALSE} % \VignetteIndexEntry{chapter 3: Solutions to Exercises} \usepackage{a4wide} \usepackage{fancyvrb} \usepackage{float} \DefineVerbatimEnvironment% {verbcode}{Verbatim}{xleftmargin=2.5mm} \DefineVerbatimEnvironment% {verbout}{Verbatim}{xleftmargin=2.5mm} \DefineVerbatimEnvironment% {verbrun}{Verbatim}{xleftmargin=2.5mm} \renewcommand\floatpagefraction{.95} \renewcommand\topfraction{.9} \renewcommand\bottomfraction{.95} \renewcommand\textfraction{.05} \setcounter{totalnumber}{50} \setcounter{topnumber}{50} \setcounter{bottomnumber}{50} % % DIMENSION OF TEXT: % \setlength\oddsidemargin{2.5pc} \setlength\evensidemargin{2pc} \setlength\marginparwidth{3pc} \setlength\marginparsep{0.5pc} \setlength\textheight{52\baselineskip} \addtolength\textheight{\topskip} \setlength\textwidth{32.1pc} \setlength\columnsep{1pc} \newcommand{\exhead}[1]{\noindent \textit{Exercise #1}\newline} \newsavebox{\fmbox} \newenvironment{fmpage}[1] {\begin{lrbox}{\fmbox}\begin{minipage}{#1}} {\end{minipage}\end{lrbox} \vspace{6pt}\begin{center}\fbox{\usebox{\fmbox}}\end{center}} \begin{document} \markright{\bf Chapter 3 Exercises} \centerline{Data Analysis \& Graphics Using R -- Solutions to Exercises (\today)} <>= options(show.signif.stars=FALSE) options(SweaveHooks=list(fig=function(){par( mar=c(5.1,4.1,2.6,2.1), mgp=c(2.25,0.5,0), tck=-0.02)})) @ \begin{fmpage}{32.5pc} \exhead{1} An experimenter intends to arrange experimental plots in four blocks. In each block there are seven plots, one for each of seven treatments. Use the function \verb!sample()! to find four random permutations of the numbers 1 to 7 that will be used, one set in each block, to make the assignments of treatments to plots. \end{fmpage} <>= for(i in 1:4)print(sample(1:7)) ## Store results in the columns of a matrix ## The following is mildly cryptic sapply(1:4, function(x)sample(1:7)) @ % \begin{fmpage}{32.5pc} \exhead{2} Use \texttt{y <- rnorm(100)} to generate a random sample of 100 numbers from a normal distribution. Calculate the mean and standard deviation of \verb!y!. Now put the calculation in a loop and repeat 25 times. Store the 25 means in a vector named \verb!av!. Calculate the standard deviation of the values in \verb!av!. \end{fmpage} <>= av <- numeric(25) sdev <- numeric(25) for(i in 1:25){ y <- rnorm(100) av[i] <- mean(y) sdev[i] <- sd(y) } sd(av) @ % \begin{fmpage}{32.5pc} \exhead{3} Create a function that does the calculations of exercise 2. \end{fmpage} <>= avfun <- function(m=50, n=25){ for(i in 1:25){ y <- rnorm(50) av[i] <- mean(y) } sd(av) } @ It is insightful to run the function several times, and see how the value that is returned varies. \begin{fmpage}{32.5pc} \exhead{7} The function \verb!pexp(x, rate=r)! can be used to compute the probability that an exponential variable is less than \verb!x!. Suppose the time between accidents at an intersection can be modeled by an exponential distribution with a rate of .05 per day. Find the probability that the next accident will occur during the next 3 weeks. \end{fmpage} We require the probability that the time to the next accident is less than or equal to 21 days. <>= pexp(21, .05) @ % Note that the rate is both the waiting time from an arbitrary time to the next accident, and the interarrival'' time between accidents. The expected time to the next accident is unaffected by whether or not an accident has just occurred. \begin{fmpage}{32.5pc} \exhead{8} Use the function \verb!rexp()! to simulate 100 exponential random numbers with rate .2. Obtain a density plot for the observations. Find the sample mean of the observations. Compare with the the population mean. (The mean for an exponential population is 1/rate.) \end{fmpage} <>= z <- rexp(100, .2) plot(density(z, from=0), main="") @ % \begin{figure}[H] \begin{center} \begin{minipage}[c]{0.425\linewidth} \setkeys{Gin}{width=\textwidth} <>= <> @ \end{minipage}\hspace{.05\linewidth} \begin{minipage}[c]{0.5\linewidth} \caption{Density plot, for 100 random values from an exponential distribution with \texttt{rate} = 0.2} <>= ## Code <> @ % Notice the use of the argument \texttt{from=0}, to prevent \texttt{density()} from giving a positive density estimate to negative values. \end{minipage} \end{center} \end{figure} Compare \texttt{mean(z)} = \Sexpr{round(mean(z),2)} with 1/0.2 = 5. \begin{fmpage}{32.5pc} \exhead{10} The following data represent the total number of aberrant crypt foci (abnormal growths in the colon) observed in 7 rats that had been administered a single dose of the carcinogen azoxymethane and sacrificed after six weeks: \begin{verbout} 87 53 72 90 78 85 83 \end{verbout} Enter these data and compute their sample mean and variance. Is the Poisson model appropriate for these data. To investigate how the sample variance and sample mean differ under the Poisson assumption, repeat the following simulation experiment several times: \begin{verbcode} x <- rpois(7, 78.3) mean(x); var(x) \end{verbcode} \end{fmpage} <>= y <- c(87, 53, 72, 90, 78, 85, 83) c(mean=mean(y), variance=var(y)) @ Then try <>= x <- rpois(7, 78.3) c(mean=mean(x), variance=var(x)) @ % It is unusual to get as big a difference between the mean and the variance as that observed for these data, making it doubtful that these data are from a Poisson distribution. \begin{fmpage}{32pc} \exhead{11*} A Markov chain is a data sequence which has a special kind of dependence. For example, a fair coin is tossed repetitively by a player who begins with \$2. If heads' appear, the player receives one dollar; otherwise, she pays one dollar. The game stops when the player has either \$0 or \$5. The amount of money that the player has before any coin flip can be recorded -- this is a Markov chain. A possible sequence of plays is as follows: \begin{center} \begin{tabular}{lrrrrrrrrrrrrr} Player's fortune: & 2 & 1 & 2 & 3 & 4 & 3 & 2 & 3 & 2 & 3 & 2 & 1 & 0 \\ Coin Toss result: & T & H & H & H & T & T & H & T & H & T & T & T & \\ \end{tabular} \end{center} Note that all we need to know in order to determine the player's fortune at any time is the fortune at the previous time as well as the coin flip result at the current time. The probability of an increase in the fortune is .5 and the probability of a decrease in the fortune is .5. Such transition probabilities are usually summarized in a transition matrix: $P = \left[ \begin{array}{c c c c c c} 1 & 0 & 0 & 0 & 0 & 0 \\ .5 & 0 & .5 & 0 & 0 & 0 \\ 0 & .5 & 0 & .5 & 0 & 0 \\ 0 & 0 & .5 & 0 & .5 & 0 \\ 0 & 0 & 0 & .5 & 0 & .5 \\ 0 & 0 & 0 & 0 & 0 & 1\\ \end{array} \right]$ The$(i,j)$entry of this matrix refers to the probability of making a change from the value$i$to the value$j$. Here, the possible values of$i$and$j$are$0, 1, 2, \ldots, 5$. According to the matrix, there is a probability of 0 of making a transition from \$2 to \$4 in one play, since the (2,4) element is 0; the probability of moving from \$2 to \$1 in one transition is 0.5, since the (2,1) element is 0.5. The following function can be used to simulate$N$values of a Markov chain sequence, with transition matrix$P$: \begin{verbcode} Markov <- function (N=100, initial.value=1, P) { X <- numeric(N) X[1] <- initial.value + 1 n <- nrow(P) for (i in 2:N){ X[i] <- sample(1:n, size=1, prob=P[X[i-1],])} X - 1 } \end{verbcode} \begin{itemize} \item[(a)] Simulate 15 values of the coin flip game, starting with an initial value of \$2. \item Simulate 100 values of the Markov chain which has the following transition matrix. Save the result to a vector and use \verb!ts.plot()! to plot the sequence. $P = \left[ \begin{array}{rrrrrr} 0.10 & 0.90 & 0.00 & 0.00 & 0.00 & 0.00 \\ 0.50 & 0.00 & 0.50 & 0.00 & 0.00 & 0.00 \\ 0.00 & 0.50 & 0.00 & 0.50 & 0.00 & 0.00 \\ 0.00 & 0.00 & 0.50 & 0.00 & 0.50 & 0.00 \\ 0.00 & 0.00 & 0.00 & 0.50 & 0.00 & 0.50 \\ 0.00 & 0.00 & 0.00 & 0.00 & 1.00 & 0.00 \\ \end{array} \right]$ \item[(b)] Now simulate 1000 values from the above Markov chain, and calculate the proportion of times the chain visits each of the states. It can be shown, using linear algebra, that in the long run, this Markov chain will visit the states according to the following \textit{stationary} distribution \begin{verbatim} 0 1 2 3 4 5 0.1098901 0.1978022 0.1978022 0.1978022 0.1978022 0.0989011 \end{verbatim} There is a result called the \textit{ergodic} theorem which allows us to estimate this distribution by simulating the Markov chain for a long enough time. Compare your calculated proportions with above theoretical proportions. Repeat the experiment using 10000 simulated values; the calculated proportions should be even closer to the theoretically predicted proportions in that case. \item[(c)] Simulate 100 values of the Markov chain which has the following transition matrix. Plot the sequence. Compare the results when the initial value is 1 with when the initial value is 3, 4, or 5. [When the initial value is 0 or 1, this Markov chain wanders a bit before settling down to its stationary distribution which is concentrated more on the values 4 and 5. This wandering period is sometimes called burn-in'.] $P = \left[ \begin{array}{llllll} 0.50 & 0.50 & 0 & 0 & 0 & 0 \\ 0.50 & 0.45 & 0.05 & 0 & 0 & 0 \\ 0 & 0.01 & 0 & 0.90 & 0.09 & 0 \\ 0 & 0 & 0.01 & 0.40 & 0.59 & 0 \\ 0 & 0 & 0 & 0.50 & 0 & 0.50 \\ 0 & 0 & 0 & 0 & 0.50 & 0.50 \\ \end{array} \right]$ \label{ex3:markov} \end{itemize} \end{fmpage} Here is code that may be used for these calculations. \begin{itemize} \item[(a)] <>= P <- matrix(c(1, rep(0,5), rep(c(.5,0,.5, rep(0,4)),4), 0,1), byrow=TRUE,nrow=6) Markov(15, 2, P) @ % \item[(b)] <>= Pb <- matrix(c(0.10,0.90,0.00,0.00,0.00,0.00, 0.50,0.00,0.50,0.00,0.00,0.00, 0.00,0.50,0.00,0.50,0.00,0.00, 0.00,0.00,0.50,0.00,0.50,0.00, 0.00,0.00,0.00,0.50,0.00,0.50, 0.00,0.00,0.00,0.00,1.00,0.00), byrow=TRUE, nrow=6) xb <- Markov(100, 1, Pb) xb ts.plot(xb) @% \item[(c)] <>= xc <- Markov(1000, 1, Pb) table(xb)/1000 # one of several ways to calculate the proportions xc xc2 <- Markov(10000, 1, Pb) table(xc2)/10000 @% \item[(d)] <>= Pd <- matrix(c(0.50, 0.50, 0, 0, 0, 0, 0.50, 0.45, 0.05, 0, 0, 0, 0, 0.01, 0, 0.90, 0.09, 0, 0, 0, 0.01, 0.40, 0.59, 0, 0, 0, 0, 0.50, 0, 0.50, 0, 0, 0, 0, 0.50, 0.50), nrow=6, byrow=TRUE) @% The following function may be helpful, in examining results. <>= plotmarkov <- function(n=10000, start=1, window=100, transition=Pd){ xc2 <- Markov(n, start, transition) z4 <- as.integer(xc2==4) z5 <- as.integer(xc2==5) mav4 <- rollmean(z4,window) mav5 <- rollmean(z5,window) df <- data.frame(av4=mav4, av5=mav5, x=rep(1:1000, length=length(mav4)), gp=(0:(length(mav4)-1))%/%1000) print(xyplot(av4+av5 ~ x | gp, data=df, layout=c(1,10), type="l", par.strip.text=list(cex=0.65))) } ## Use thus library(zoo) # Use rollmean() [moving average] from zoo library(lattice) plotmarkov(start=1) plotmarkov(start=4) @ % \end{itemize} \end{document}