<>=
options(SweaveHooks=list(fig=function(){par(cex.main=1.1,
mar=c(4.1,4.1,2.6,2.1),
mgp=c(2.25,0.5,0), tck=-0.02)}))
@%
\part{Data Summary -- Traps for the Unwary}
\section{Multi-way Tables}
% latex table generated in R 2.2.0 by xtable 1.2-5 package
% Wed Oct 12 09:45:16 2005
% latex table generated in R 2.2.0 by xtable 1.2-5 package
% Tue Nov 29 11:43:40 2005
\begin{table}[ht]
\begin{tabular}{rrrrrrrrrr}
& \multicolumn{2}{c}{Small ($<$2cm)} & & \multicolumn{2}{c}{Large ($>=$2cm)}
& & \multicolumn{2}{c}{Total}\\
&\multicolumn{2}{c}{\hrulefill} && \multicolumn{2}{c}{\hrulefill}
&& \multicolumn{2}{c}{\hrulefill} \\
& open & ultrasound & & open & ultrasound & & open & ultrasound\\
\hline
yes & 81 & 234 & yes &192 & 55 & yes & 273 & 289\\
no & 6 & 36 & no & 71 & 25 & no & 77 & 61 \\
\hline
\hline
Success rate & 93\% & 87\% & & 73\% & 69\% && 78\% & 83\% \\
\end{tabular}
\caption{Outcomes for two different types of surgery for kidney stones.
The overall success rates (78\% for open surgery as opposed to 83\%
for ultrasound) favor ultrasound. Comparison of the
success rates for each size of stone separately favors, in each case,
open surgery.\label{tab:stones}}
\end{table}
\begin{fmpage}{36pc}
\exhead{1}
Table \ref{tab:stones} illustrates the potential hazards of
adding a multiway table over one of its margins. Data are from a
study\footnote{Charig, C.~R., 1986. \newblock Comparison of treatment
of renal calculi by operative surgery,
percutaneous nephrolithotomy, and extracorporeal shock wave lithotripsy.
\newblock {\em British Medical Journal\/}, 292:879--882} that compared
outcomes for two different types of
surgery for kidney stones; A: \texttt{open}, which used open surgery,
and B: \texttt{ultrasound}, which used a small incision, with the
stone destroyed by ultrasound. The data can be entered into R, thus:
<>=
stones <- array(c(81,6,234,36,192,71,55,25), dim=c(2,2,2),
dimnames=list(Sucess=c("yes","no"),
Method=c("open","ultrasound"),
Size=c("<2cm", ">=2cm")))
@ %
\begin{enumerate}
\item Determine the success rate that is obtained from combining the
data for the two different sizes of stone. Also determine the
success rates for the two different stone sizes separately.
\item Use the following code to give a visual representation of
the information in the three-way table:
\begin{verbcode}
mosaicplot(stones, sort=3:1)
# Re-ordering the margins gives a more interpretable plot.
\end{verbcode}
Annotate the graph to show the success rates?
\item Observe that the overall rate is, for open surgery, biased
toward the open surgery outcome for large stones, while for
ultrasound it is biased toward the outcome for small stones.
What are the implications for the interpretation of these data?
\end{enumerate}
[Without additional information, the results are impossible to
interpret. Different surgeons will have preferred different surgery
types, and the prior condition of patients will have affected the
choice of surgery type. The consequences of unsuccessful surgery may
have been less serious than for ultrasound than for open surgery.]
\end{fmpage}
The relative success rates for the two different types of surgery,
for the two stone sizes separately, can be calculated thus:
<>=
stones[1,,]/(stones[1,,]+stones[2,,])
@ %
To perform the same calculation after adding over the two stone sizes
(the third dimension of the table), do
<>=
stones2 <- stones[, , 1] + stones[, , 2]
stones2[1, ]/(stones2[1, ]+stones2[2, ])
@ %
\subsection{Which multi-way table? It can be important!}
Each year the National Highway Traffic Safety Administration (NHTSA)
in the USA collects, using a random sampling method, data from all
police-reported crashes in which there is a harmful event (people or
property), and from which at least one vehicle is towed. The data
frame \texttt{nassCDS} (\textit{DAAGxtras}) is derived from NHTSA
data.\footnote{They hold a subset of the columns from a corrected
version of the data analyzed in the Meyer (2005) paper that is
referenced on the help page for \texttt{nassCDS}.
More complete data are available from one of the web pages\\
\url{http://www.stat.uga.edu/~mmeyer/airbags.htm} (SAS transport file)\\
or \url{http://www.maths.anu.edu.au/~johnm/datasets/airbags/} (R
image file).}
The data are a sample. The use of a complex sampling scheme has the
consequence that the sampling fraction differs between observations.
Each point has to be multiplied by the relevant sampling fraction, in
order to get a proper estimate of its contribution to the total number
of accidents. The column \texttt{weight} (\texttt{national} =
\textit{national inflation factor} in the SAS dataset) gives the
relevant multiplier.
Other variables than those included in \verb!nassCDS! might be
investigated -- those extracted into \verb!nassCDS! are enough for
present purposes.
The following uses \texttt{xtabs()} to estimate numbers of front seat
passengers alive and dead, classified by airbag use:
\begin{verbrun}
library(DAAGxtras)
> abtab <- xtabs(weight ~ dead + airbag, data=nassCDS)
> abtab
airbag
dead none airbag
alive 5445245.90 6622690.98
dead 39676.02 25919.11
\end{verbrun}
The function \texttt{prop.table()} can then be used to obtain the
proportions in margin 1, i.e., the proportions dead, according to
airbag use:
\begin{verbrun}
> round(prop.table(abtab, margin=2)["dead", ], 4)
none airbag
0.0072 0.0039
## Alternatively, the following gives proportions alive & dead
## round(prop.table(abtab, margin=2), 4)
\end{verbrun}
The above might suggest that the deployment of an airbag substantially
reduces the risk of mortality. Consider however:
\begin{verbrun}
> abSBtab <- xtabs(weight ~ dead + seatbelt + airbag, data=nassCDS)
> ## Take proportions, retain margins 2 & 3, i.e. airbag & seatbelt
> round(prop.table(abSBtab, margin=2:3)["dead", , ], 4)
seatbelt
airbag none belted
none 0.0176 0.0038
airbag 0.0155 0.0021
\end{verbrun}
The results are now much less favorable to airbags. The clue comes
from examination of:
\begin{verbrun}
> margin.table(AStab, margin=2:3) # Add over margin 1
airbag
seatbelt none airbag
none 1366088.6 885635.3
belted 4118833.4 5762974.8
\end{verbrun}
In the overall table, the results without airbags are mildly skewed
($~$4.12:1.37) to the results for \texttt{belted}, while with airbags
they are highly skewed ($~$57.6:8.86) to the results for \texttt{belted}.
\begin{fmpage}{36pc}
\exhead{2} Do an analysis that accounts, additionally, for estimated
force of impact (\texttt{dvcat}):
\begin{verbcode}
ASdvtab <- xtabs(weight ~ dead + seatbelt + airbag + dvcat,
data=nassCDS)
round(prop.table(ASdvtab, margin=2:4)["dead", , , ], 6)
## Alternative: compact, flattened version of the table
round(ftable(prop.table(ASdvtab, margin=2:4)["dead", , , ]), 6)
\end{verbcode}
It will be apparent that differences between \texttt{none}
and \texttt{airbag} are now below any reasonable threshold of
statistical detectability.
\end{fmpage}
\begin{fmpage}{36pc}
\exhead{3}
The package \textit{DAAGxtras} includes the function \texttt{excessRisk()}.
Run it with the default arguments, i.e.\ type
<>=
excessRisk()
@ %
Compare the output with that obtained in Exercise 2 when the classification
was a/c seatbelt (and airbag), and check that the output agrees.
Now do the following calculations, in turn:
\begin{enumerate}
\item Classify according to \texttt{dvcat} as well as \texttt{seatbelt}.
All you need do is add \texttt{dvcat} to the first argument to
\texttt{excessRisk()}. What is now the total number of excess deaths?\newline
[The categories are 0-9 kph, 10-24 kph, 25-39 kph, 40-54 kph, and
55+ kph]
\item Classify according to \texttt{dvcat}, \texttt{seatbelt}
and \texttt{frontal}, and repeat the calculations.
What is now the total number of excess deaths?
\end{enumerate}
Explain the dependence of the estimates of numbers of excess deaths
on the choice of factors for the classification.
\end{fmpage}
\textbf{Note:} \Citet{Farmer} argues that these data, tabulated as
above, have too many uncertainties and potential sources of bias to
give reliable results. He presents a different analysis, based on the
use of front seat passenger mortality as a standard against which to
compare driver mortality, and limited to cars without
passenger airbags. In the absence of any effect from airbags, the
ratio of driver mortality to passenger mortality should be the same,
irrespective of whether or not there was a driver airbag. In fact the
ratio of driver fatalities to passenger fatalities was 11\% lower in
the cars with driver airbags.
\section{Weighting Effects -- Example with a Continous Outcome}
% latex table generated in R 2.2.0 by xtable 1.2-5 package
% Tue Oct 4 11:58:57 2005
% latex table generated in R 2.2.1 by xtable 1.2-5 package
% Wed Feb 22 12:32:40 2006
\begin{table}[ht]
\begin{minipage}[c]{0.5\linewidth}
\begin{center}
\begin{tabular}{rrrrrr}
\hline
& min & mbac & mpl & fbac & fpl \\
\hline
2 & 10 & 1.76 & 1.76 & 2.18 & 2.55 \\
3 & 30 & 1.31 & 1.65 & 3.48 & 4.15 \\
4 & 50 & 0.05 & 0.67 & 3.13 & 3.66 \\
5 & 70 & $-$0.57 & $-$0.25 & 3.03 & 2.05 \\
6 & 90 & $-$1.26 & $-$0.50 & 2.08 & 0.61 \\
7 & 110 & $-$2.15 & $-$2.22 & 1.60 & 0.34 \\
8 & 130 & $-$1.65 & $-$2.18 & 1.38 & 0.67 \\
9 & 150 & $-$1.68 & $-$2.86 & 1.76 & 0.76 \\
10 & 170 & $-$1.68 & $-$3.23 & 1.06 & 0.39 \\
\hline
\end{tabular}
\end{center}
\end{minipage}
\hspace*{0.05\linewidth}
\begin{minipage}[c]{0.425\linewidth}
\caption{Data (average VAS pain scores) are from a trial that
investigated the effect of pentazocine on post-operative pain,
with (mbac and fbac) and without (mpl and fpl) preoperatively
administered baclofen. Data are in the data frame \texttt{gaba}
(\textit{DAAGxtras} package). Numbers of males and females on the
two treatments were:}\label{VAS}
\begin{center}
\begin{tabular}{rrr}
\hline
& baclofen & placebo \\
\hline
females & 15 & 7 \\
males & 3 & 16 \\
\hline
\end{tabular}
\end{center}
\end{minipage}
\end{table}
\begin{fmpage}{36pc}
\exhead{4} Table \ref{VAS}, shows data from the data frame
\texttt{gaba} (\textit{DAAGxtras}). For background, see the Gordon (1995)
paper that is referenced on the help page for \texttt{gaba}.
[Image files that hold the functions
\texttt{plotGaba()} and \texttt{compareGaba()} are in the
subdirectory \url{http://www.maths.anu.edu.au/~johnm/r/functions/}
]\\[3pt]
\end{fmpage}
\begin{fmpage}{36pc}
\exhead{4, continued}
\begin{enumerate}
\item What do you notice about the relative numbers on the two treatments?
\item For each treatment, obtain overall weighted averages at each time point,
using the numbers in Table \ref{VAS} as
weights. (These are the numbers you would get if you divided
the total over all patients on that treatment by the total number of
patients.) This will give columns \texttt{avbac} and \texttt{avplac}
that can be added the data frame.
\item Plot \texttt{avbac} and \texttt{avplac} against time, on the same
graph. On separate graphs, repeat the comparisons (a) for females alone
and (b) for males alone. Which of these graphs make a correct and
relevant comparison between baclofen and placebo (albeit both in the
presence of pentazocine)?
\end{enumerate}
\end{fmpage}
\vspace*{9pt}
\section{Extraction of \texttt{nassCDS}}
Here are details of the code used to extract these data.
\begin{verbatim}
nassCDS <- nass9702cor[,c("dvcat", "national", "dead", "airbag", "seatbelt",
"frontal", "male", "age.of.o", "yearacc")]
nassCDS$dead <- 2-nass_cds$dead # Ensures 0 = survive; 1 = dead
## Now make dead a factor
nassCDS$dead <- factor(c("alive", "dead")[nassCDS$dead+1])
names(nassCDS)[8] <- "ageOFocc"
names(nassCDS)[2] <- "weight"
table(nassCDS$seatbelt) # Check the values of seatbelt, & their order
## Now make seatbelt a factor. The first value, here 0, becomes "none"
## The second value, here 1, becomes "belted"
nassCDS$seatbelt <- factor(nassCDS$seatbelt, labels=c("none","belted"))
# NB labels (unlike levels) matches only the order, not the values
nassCDS$airbag <- factor(nassCDS$airbag, labels=c("none","airbag"))
\end{verbatim}