“However noble the goal, research findings should be reported accurately. Distortion of results often occurs not in the data presented but . . . in the abstract, discussion, secondary literature and press releases. Such distortion can lead to unsupported beliefs about what works for obesity treatment and prevention. Such unsupported beliefs may in turn adversely affect future research efforts and the decisions of lawmakers, clinicians and public health leaders.”

January 7, 2018
By

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

David Allison points us to this article by Bryan McComb, Alexis Frazier-Wood, John Dawson, and himself, “Drawing conclusions from within-group comparisons and selected subsets of data leads to unsubstantiated conclusions.” It’s a letter to the editor for the Australian and New Zealand Journal of Public Health, and it begins:

[In the paper, “School-based systems change for obesity prevention in adolescents: Outcomes of the Australian Capital Territory ‘It’s Your Move!’”] Malakellis et al. conducted an ambitious quasi-experimental evaluation of “multiple initiatives at [the] individual, community, and school policy level to support healthier nutrition and physical activity” among children.1 In the Abstract they concluded, “There was some evidence of effectiveness of the systems approach to preventing obesity among adolescents” and cited implications for public health as follows: “These findings demonstrate that the use of systems methods can be effective on a small scale.” Given the importance of reducing childhood obesity, news of an effective program is welcome. Unfortunately, the data and analyses do not support the conclusions.

And it continues with the following sections:

Why within-group testing is misleading

Malakellis et al. reported a “significant decrease in the prevalence of overweight/obesity within the pooled intervention group (p<0.05) but not the pooled comparison group (NS) (Figure 2)”. This kind of analysis, known as differences in nominal significance (DINS) analysis, is “invalid, producing conclusions which are, potentially, highly misleading”. . . .

Why drawing conclusions from subsets of data selected on the basis of observed results is misleading

Ideally, all analyses would be clearly described as having been specified a priori or not, so that readers can best interpret the data. Despite reporting no significance for the overall association, Malakellis et al. highlighted the results of the subgroup analyses as a general effect overall. Further complicating matters, the total number of subgroup analyses were unclear. It is also uncertain whether the analyses were planned a priori or after the data were collected and viewed. . . . Other problems arise when subgroup analyses are unrestricted, which is a multiple comparisons issue. . . .

Spin can distort the scientific record and mislead the public

Although Malakellis et al. may have presented their data accurately, by including statements of effectiveness based on a within-group test instead of relying on the proper between-group test, the article did not represent the findings accurately. The goal of reducing childhood obesity is a noble one. . . . However noble the goal, research findings should be reported accurately. Distortion of results often occurs not in the data presented but, as in the current article, in the abstract, discussion, secondary literature and press releases. Such distortion can lead to unsupported beliefs about what works for obesity treatment and prevention. Such unsupported beliefs may in turn adversely affect future research efforts and the decisions of lawmakers, clinicians and public health leaders.

They conclude:

Considering the importance of providing both the scientific community and the public with accurate information to support policy decisions and future research, erroneous conclusions reported in the literature should be corrected. The stated conclusions of the article in question were not substantiated by the data and should be corrected.

Well put. The problems identified by McComb et al. should be familiar to regular readers of this blog, as they include the difference between significant and non-significant is not itself statistically significant, the garden of forking paths, and story time.

I particularly like this bit: “However noble the goal, research findings should be reported accurately.” That was one of the things that got tangled in discussions we’ve had of various low-quality psychology research. The research has noble goals. But I don’t think those goals are served by over-claiming and then minimizing the problems with those claims. You really have to go back to first principles. If the published research is wrong, it’s good to know that. And if the published research is weak, it’s good to know that too: it’s the nature of claims supported by weak evidence that they often don’t replicate.

Allison also pointed me to the authors’ response to their letter. The authors of the original paper are Mary Malakellis, Erin Hoare, Andrew Sanigorski, Nicholas Crooks, Steven Allender, Melanie Nichols, Boyd Swinburn, Cal Chikwendu, Paul Kelly, Solveig Petersen, and Lynne Millar, and they write:

The paper describes one of the first attempts to evaluate an obesity prevention intervention that was informed by systems thinking and deliberately addressed the complexity within each school setting. A quasi-experimental design was adopted, and the intervention design included the facility for each school to choose and adopt interventions that were specific to their school context and priorities. This, in turn, meant the expectation of differential behavioural effects was part of the initial design and therefore a comparison of outcomes by intervention school was warranted. . . . Because of the unique and adaptive nature of intervention within each school, and the different intervention priority in each school, there was an a priori expectation of differential results and we therefore investigated reports within schools’ changes.

This is fine. Interactions are important. You just have to recognize that estimates of interactions will be more variable than estimates of main effects, thus you can pretty much forget about establishing “statistical significance” or near-certainty about particular interactions.

Malakellis et al. continue:

Our conclusion used qualifying statements that there was “some evidence” of within-school changes but no interaction effect, and that the findings were “limited”.

Fair enough—if that’s what they really did.

Let’s check, going back to the original article. Here’s the abstract, in its entirety:

OBJECTIVE: The Australian Capital Territory ‘It’s Your Move!’ (ACT-IYM) was a three-year (2012-2014) systems intervention to prevent obesity among adolescents.

METHODS: The ACT-IYM project involved three intervention schools and three comparison schools and targeted secondary students aged 12-16 years. The intervention consisted of multiple initiatives at individual, community, and school policy level to support healthier nutrition and physical activity. Intervention school-specific objectives related to increasing active transport, increasing time spent physically active at school, and supporting mental wellbeing. Data were collected in 2012 and 2014 from 656 students. Anthropometric data were objectively measured and behavioural data self-reported.

RESULTS: Proportions of overweight or obesity were similar over time within the intervention (24.5% baseline and 22.8% follow-up) and comparison groups (31.8% baseline and 30.6% follow-up). Within schools, two of three the intervention schools showed a significant decrease in the prevalence of overweight and obesity (p<0.05).

CONCLUSIONS: There was some evidence of effectiveness of the systems approach to preventing obesity among adolescents. Implications for public health: The incorporation of systems thinking has been touted as the next stage in obesity prevention and public health more broadly. These findings demonstrate that the use of systems methods can be effective on a small scale.

After reading this, I’ll have to say, No, they did not sufficiently qualify their claims. Yes, their Results section clearly indicates that the treatment and comparison groups were not comparable and that there was no apparent main effects. But it’s inappropriate to pick out some subset of comparisons and label them as “p<0.05.” Multiple comparisons is real. My concern here is not “Type 1 errors” or “Type 2 errors” or “false rejections” or “retaining the null hypothesis.” My concern here is that from noisy data you’ll be able to see patterns, and there’s no reason to believe that these noisy patterns tell us anything beyond the people and measurements in this particular dataset.

And then in the conclusions, yes, they say “some evidence.” But then consider the final sentence of the abstract, which for convenience I’ll repeat here:

These findings demonstrate that the use of systems methods can be effective on a small scale.

No no NO NO NOOOOOOO!

I mean, sure, they got excited when they were writing their article and this sentence slipped in. Too bad, but such things happen. But then they were lucky enough to receive thoughtful comments from McComb et al., and this was their chance to re-evaluate, to take stock of the situation and correct their errors, if for no other reason than to help future researchers not be led astray. And did they do so? No, they didn’t. Instead they muddled the waters and concluded their response with, “While we grapple with intervention and evaluation of systems approaches to prevention, we are forced to use the methods available to us which are mainly based on very linear models.” Which completely misses the point that they overstated their results and made a claim not supported by their data. As McComb et al. put it, “The stated conclusions of the article in question were not substantiated by the data and should be corrected.” And the authors of the original paper, given the opportunity to make this correction, did not do so. This behavior does not surprise me, but it still makes me unhappy.

Who cares?

What’s the point here? A suboptimal statistical analysis and misleading summary appeared in an obscure journal published halfway around the world? (OK, not so obscure; I published there once.) That seems to fall into the “Someone is wrong on the internet” category.

No, my point is not to pick on some hapless authors of a paper in the Australian and New Zealand Journal of Public Health. I needed to check the original paper to make sure McComb et al. got it right, that’s all.

My point in sharing this story is to foreground this quote from McComb et al.:

However noble the goal, research findings should be reported accurately. Distortion of results often occurs not in the data presented but, as in the current article, in the abstract, discussion, secondary literature and press releases. Such distortion can lead to unsupported beliefs about what works for obesity treatment and prevention. Such unsupported beliefs may in turn adversely affect future research efforts and the decisions of lawmakers, clinicians and public health leaders.

This is a message of general importance. It seems to be pretty hopeless to get researcher to correct the errors they’ve made in published papers, but maybe this message will get out there to students and new researchers who can do better in the future.

???

Really, what’s up with people? Everyone was a student, once. And as a student you make mistakes: mistakes in class, mistakes on your homework, etc. What makes people think that, suddenly, once they graduate and have a job, that they can’t make serious mistakes in their work? What makes people think that, just because a paper has their name on it and happens to be published somewhere, that it can’t have a serious mistake? The whole thing frankly baffles me. I make mistakes, I put my work out there, and people point out errors that I’ve made. Why do so many researchers have problems doing the same? It’s baffling. I mean, sure, I guess I understand from a psychological perspective: people have their self-image, they can feel they have a lot to lose by admitting error, etc. But from a logical perspective, it makes no sense at all.

The post “However noble the goal, research findings should be reported accurately. Distortion of results often occurs not in the data presented but . . . in the abstract, discussion, secondary literature and press releases. Such distortion can lead to unsupported beliefs about what works for obesity treatment and prevention. Such unsupported beliefs may in turn adversely affect future research efforts and the decisions of lawmakers, clinicians and public health leaders.” appeared first on Statistical Modeling, Causal Inference, and Social Science.



Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: , ,


Subscribe

Email:

  Subscribe