**Stephen Senn**

*Consultant Statistician*

*Edinburgh*

**Introduction**

In a **previous pos**t I considered Lord’s paradox from the perspective of the ‘Rothamsted School’ and its approach to the analysis of experiments. I now illustrate this in some detail giving an example.

**What I shall do**

I have simulated data from an experiment in which two diets have been compared in 20 student halls of residence, each diet having been applied to 10 halls. I shall assume that the halls have been randomly allocated the diet and that in each hall 10 students have been randomly chosen to have their weights recorded at the beginning of the academic year and again at the end.

I shall then compare two approaches to analysing these data and invite the reader to consider which is correct.

I shall then discuss (briefly) what happens to these approaches to analysis when the problem is changed so that we now have not 10 halls with 10 students each per diet but one hall per diet with 100 students each. This is the Lord’s paradox problem (Lord, F. M., 1967) in the form proposed by Wainer and Brown (Wainer, H. & Brown, L. M., 2004). We shall see that one of the philosophies of analysis will indicate that this more difficulty case cannot be analysed. The other will produce an analysis that has been previously proposed as the right analysis for Lord’s paradox. I shall then consider what changes are necessary (if any) if we have an observational rather than an experimental set-up.

The data are saved in an Excel workbook **here**. The first sheet (Experiment_1) gives weights at the beginning and the end of each academic year for each student as well as the hall they were in (numbered 1 to 20); the diet they were given (A or B) and a unique student identification number (1 to 200). The second sheet (Summary_1) consists of mean weights per hall averaged over the students enrolled in the that hall and included in the study.

**The approach to analysis**

I shall use Genstat’s approach to analysing designed experiments. This is based on John Nelder’s theory of 1965 (Nelder, J. A., 1965a, 1965b) and declares block structure and treatment structure separately. The analyses will only differ as regards the block structure declared, although in one case I can produce an identical analysis using the so-called summary measures approach.

**The first analysis**

The code looks like this

BLOCKSTRUCTURE Hall/Student

TREATMENTSTRUCTURE Diet

COVARIATE Base

ANOVA [PRINT=aovtable,effects,covariates; FACT=1; FPROB=yes] Weight

The first statement defines the block structure. Students are ‘nested’ within halls. This is written Hall/Student The second states that the (putative) causal factor, the treatment, is Diet. The third declares a covariate, Base, the baseline weight, to be taken account of and the fourth says that the outcome variable is Weight, that is to say the weight at the end of the experiment.

The analysis that is produced is now given in Figure 1.

*Figure 1 Analysis of the diet data respecting the block structure*

I have highlighted the hall stratum and the diet term. Note that there are two residual terms. The first appears in the hall stratum and the second in the students within-halls stratum. Since the diet given is varied between halls but not within, only the former is relevant for judging the effect of diet.

The term v.r. stands for variance ratio. It is the ratio of the mean square (m.s.) for diet (367.3) to the variation term that matters (29.9) the residual for the hall stratum. The ratio is 12.3 so that the analysis tells us that the variation between diets is about 12 times what you would expect given the variation between halls given the same diet.

Note also that there are two covariate terms: both between halls and between students-within-halls. The latter is also irrelevant to any analysis of the effect of diet.

Now consider a second equivalent analysis of this. This just uses the average at baseline and outcome per hall. In other words, it is based on 20 pairs of values (baseline and outcome) not 200. This analysis produces the table in Figure 2. Note that the result is exactly as before, showing the irrelevance of the variances and covariances within halls. That is to say, that although the mean squares change, because now based on averages of 10 students per hall, the ratio of the term for treatment to its residual is the same and so are all inferences. The equivalence of summary measure approaches to more complex models for certain balanced cases is well known (Senn, S. J. et al., 2000).

Note that for both of these equivalent analyses the residual degrees of freedom are 17: there are 20 halls and one degree of freedom has been used for each of grand mean, covariate and treatment, leaving 17. The variation between diets is judged by the variation between halls.

*Figure 2 Summary measures analysis respecting block structure*

**The second analysis**

This uses a different block structure. We now ignore the fact that the students are in different halls. The code becomes

BLOCKSTRUCTURE Student

TREATMENTSTRUCTURE Diet

COVARIATE Base

ANOVA [PRINT=aovtable,effects,covariates; FACT=1; FPROB=yes] Weight

and the output is as given in Figure 3.

*Figure 3 Analysis ignoring the block structure*

Note that the residual used to judge the effect of diet is now based on 197 degrees of freedom and it is less than a quarter of what it was before (6.3 as opposed to 29.9). The numerator of the variance ratio is somewhat similar to what it was before (a different covariate term has been used to adjust it so there is some difference) but the variance ratio is now five times what it was. The result is much more impressive.

**Which analysis is right?**

A long tradition says that the first analysis is right and the second is wrong. In a clinical context, the experiment has a cluster randomised form. The regulators, the EMA and the FDA, will not let drug sponsors analyse cluster randomised trials as if they were parallel group trials but this is what the second analysis will do.

“No causality in, no causality out” is a common slogan but the actual intervention here did not take place independently at the level of students but at the level of halls. It is this variation (between halls) that should be used to judge treatment. Speaking practically, the halls may be situated at different distances from lecture theatres on campus so that exercise effects may be different. Some may be closer to food shops and so forth. One can imagine many effects independent of the diet offered that would vary at the level of hall but not at the level of students within halls.

**But isn’t this a red-herring?**

I have considered a randomised experiment involving many halls. It differs from the situation of the paradox in two respects. First, there were only two halls and second the diet was not randomised. We can summarise these differences as ’two not many’ and ‘observational not experimental’. I consider these in turn.

**Two not many**

There are only two halls in the Lord’s paradox case. This means that analysis one is impossible and only two is possible, which is an analysis that has been previously proposed as being right for Lord’s paradox. You cannot estimate the relevant variances and covariances for approach one if you only have two halls. (See my original blog on the subject.) I have no objection to analysts defending approach two on the grounds that this is all that can be done if an analysis is to be done. In fact, I have even given this analysis some (lukewarm) support in the past (Senn, S. J., 2006). However, two points are important. First, it should be recognised that a third choice is being overlooked: that of saying that the data are simply too ambiguous to offer any analysis. Second, it should be made explicit that the analysis is valid on the assumption that there are no between hall variance and covariance elements above those seen within. It should be made clear that this is a strong but untestable assumption.

**Observational not experimental**

I can think of no valid reason why analysis two could become valid for an observational set-up if it was not valid for the experimental one. I can imagine the reverse being the case but to claim that an invalid analysis of an experiment would suddenly become valid if only it had not been randomised despite the fact that no different or further data of any kind were available, strikes me as being a very unpromising line of defence. Thus, I consider this the real red herring.

**Does it make a difference?**

In this case the between halls regression is very similar to the within halls regression: the slope in the first case is 0.57 and in the second is 0.55. Furthermore, the means at baseline for the two diets are also very similar; 75.1kg and 76.1kg. This means that the estimates of the diet effects (B compared to A) are nearly identical 2.7kg versus 2.8kg. The situation is illustrated in Figure 4, the estimated treatment effect being the difference between the corresponding pair of parallel lines.

*Figure 4 Two analyses of covariance. Red = Diet A, Black = Diet B. Open circles and dashed sloping lines, students. Closed circles and solid sloping lines, halls. Vertical dashed lines indicate mean weights per diet at baseline.*

It might be concluded that the distinction is irrelevant. Such a conclusion would be false. Even in this case, where estimates do not differ, the standard errors of the estimates are radically different. For the between-halls analysis the standard error is 0.78kg. For the within-halls analysis it is 0.36kg. The relative evidential weight of the two, say for updating a prior distribution to a posterior for any Bayesian, or for combining with other evidence in any meta-analysis, is the ratio of the square of the reciprocal of the standard errors, that is to say 4.7. The analysis based on students rather than halls overstates the evidence considerably.

**In summary**

I maintain that thinking carefully about block-structure and treatment structure as John Nelder taught us to do is the right way to think about experiments. I also think, *mutatis mutandi*, that it can help in thinking about some causal questions in an observational set-up. Variation can occur at many levels and this is as true of observational studies as it is of experimental ones. In making this claim it is not my intention to detract from other powerful approaches. It can be helpful to have many tools to attack such problems.

**References**

Lord, F. M. (1967). A paradox in the interpretation of group comparisons. *Psychological Bulletin, ***66**, 304-305

Nelder, J. A. (1965a). The analysis of randomised experiments with orthogonal block structure I. Block structure and the null analysis of variance. *Proceedings of the Royal Society of London. Series A, ***283**, 147-162

Nelder, J. A. (1965b). The analysis of randomised experiments with orthogonal block structure II. Treatment structure and the general analysis of variance. *Proceedings of the Royal Society of London. Series A, ***283**, 163-178

Senn, S. J. (2006). Change from baseline and analysis of covariance revisited. *Statistics in Medicine, ***25**(24), 4334–4344

Senn, S. J., Stevens, L., & Chaturvedi, N. (2000). Repeated measures in clinical trials: simple strategies for analysis using summary measures. *Statistics in Medicine, ***19**(6), 861-877

Wainer, H., & Brown, L. M. (2004). Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data. *American Statistician, ***58**(2), 117-123