Tips for reading Big Data results correctly

A prominent MIT healthcare economist explains how Big Data can be misleading, and discusses methods for properly studying such information.

By Bill Siwicki

September 27, 2016

01:32 PM

MIT healthcare economist Joseph Doyle spends his time measuring the returns on healthcare spending and outcomes with the goals of identifying value and waste in the $3 trillion U.S. healthcare system along with helping to create a higher-quality, more cost-effective approach to healthcare. And data drives Doyle’s empirical, evidence-based approach for answers.

One area Doyle has been studying is whether hospitals that spend more money on healthcare achieve greater outcomes. The answer is not a simple one to uncover, said Doyle, Erwin H. Schell professor of management and professor of applied economics at the MIT Sloan School of Management.

“There have been a lot of studies that look at the correlation between spending and outcomes, and the literature says hospitals that spend more do not have better outcomes – but I am discovering different things,” Doyle said. “This is because of a concern with data usage. There is a lot of interest in Big Data methods that get you robust correlations. But I argue in my work that these correlations can be misleading. In this case, where studies show that higher spending hospitals do not achieve greater outcomes, you cannot interpret these results as money wasted.”

In his research, which digs deeper into Big Data, Doyle has found that higher spending hospitals do indeed achieve greater health outcomes. Doyle cautions healthcare executives and caregivers using Big Data methods simply to find correlations not to “over-interpret” these results.

“One example is the relationship between spending on newborns and health outcomes,” he explained. “If you just look at the correlations, if you spend a lot on newborns, they have much higher mortality rates. And that is because this kind of spending is on sicker newborns. So it’s really difficult to look at correlations in spending as just good or bad, because we choose how much to spend based on how sick people are.”

Healthcare executives must use Big Data by applying strict research designs, Doyle added.

“Take newborns: If a newborn is just below or above a 1,500 gram weight threshold, that can determine how much treatment is delivered, more if the weight is below 1,500 grams,” he explained. “So if you zoom in on the data and compare newborns weighing 1,499 grams with newborns weighing 1,501 grams, you find a 10 percent treatment difference, and the newborn actually just below the 1,500 gram threshold has a better survival rate. And when you compare the change in treatment to the change in survival, you can attribute the change in survival to the change in treatment.”

Doyle said he always is concerned when healthcare executives compare treatment differences because the bottom line is that patients differ — and that can confound analysis.

“But in this instance, with the newborns so close in weight, the patients are similar,” he said. “With Big Data, it’s important to be able to zoom right into the data, and in this instance you can.”