Turning an economist’s eye to having a baby.
When it comes to having a baby, everyone has an opinion. Economists certainly love telling people what they think and pregnant economists might be the worst of all. American economist Emily Oster did her best to set the economic cat among the medical pigeons in 2013 with the publication of: Expecting better, why the conventional pregnancy wisdom is wrong – and what you really need to know.
Confused by the advice she received about healthy living during her pregnancy, Oster did what economists often do and set about summarising existing studies about a wide range of pregnancy issues to give the ‘bottom line’, as an economist might see it, from the available data. Oster covered issues of interest to many pregnant women such as intake of caffeine, alcohol, tobacco, the risk of infection with listeria and toxoplasmosis, as well as prenatal screening and testing, drug safety, bed rest, induction of labour and pain relief for birth. Her book also usefully raised issues central to economics around ‘trade-offs’ and the ‘continuum of risk’. Every pregnant woman thinks differently – each might make different decisions, and each will have different tolerances to caffeine, alcohol and so on.
Oster’s book attracted considerable criticism. An economist writing about pregnancy? People often presume economics deals exclusively with money and investments. In reality, other than having an understanding of the advantages of investment in indexed funds, most economists know very little about the share market. Economics is really about the choices people make, given that we cannot have everything we want. It is about trade-offs and knowing what you are giving up in order to get something else. Everything has a cost and that can also mean foregone opportunities. Economists usually believe ‘opportunity cost’ guides rational decision-making in every sphere of life and they are trained to get the data and then use those data in conjunction with risks, benefits and decision-theory to make choices. Economists are data driven. So, in examining how economists think about data, causation, correlation, what makes a good study, trade-offs and the risk continuum, Oster’s book made a useful contribution to both economics and medicine.
Not everyone will make the same choices faced with the same data. But, as Oster acknowledges, it can be difficult even to obtain data about issues in pregnancy. This is not only because it is difficult to tease out cause and effect, but also because there are almost insurmountable legal and ethical issues that make further research into some areas of pregnancy incredibly challenging. There are some areas where it is unlikely clinical trials will ever be run.
The economy, like the body, is a complex system in which it is very difficult to measure cause and effect. There are great amounts of data and many statistical techniques are used to try to isolate the impact of one variable on another. In economics, there are not normally real experimental data with a ‘control’ group. There are some experimental data in microeconomics (the study of individual units and business decisions), but there is no control group in macroeconomics (the ‘business cycle’). There are natural experiments – things that happen in the world are assumed to be exogenous – and we can compare before and after. There are also empirical studies that use statistical evidence to try to tease out the individual effects of different variables by holding other things constant. However, in the wake of the global financial crisis (GFC), there has been robust debate in economic circles around the statistical techniques employed.
Economists are examining these data issues and are making suggestions for the future. Perhaps there are empirical lessons different disciplines can learn from each other. When we are dealing with complex systems, such as the body and the economy, it may be time to reweight the hierarchy of evidence and consider other forms of evidence.
Data, causation and correlation
The hierarchy of medical evidence is the major construct of evidence-based medicine (EBM). Randomised controlled trials (RCTs) and meta-analyses (MAs) sit at the pinnacle and are often seen as trumping other forms of evidence. Observational studies, case reports, personal experience and physiological considerations usually sit well below. Higher levels of evidence trump lower levels.
With most pregnancy research there is a very clear problem with causality. Caffeine provides a good example of this and Oster includes an interesting discussion of caffeine consumption in pregnancy in her book. The perceived concern with having too much caffeine in pregnancy is it might cause miscarriage in early pregnancy. Oster found all of the studies suggest that up to 200mg of caffeine per day is safe and that there is not an increased risk of miscarriage up to that level. When she examined studies of women who reported much higher levels of caffeine consumption, she found the evidence suggested there was an increased risk of miscarriage. The evidence for the ‘middle range’ of caffeine consumption was mixed.
Oster points out in pregnancy it is difficult to establish causality and she suggests women who drink a lot of coffee are probably different from those who do not. The data suggest women who drink coffee tend to be older and that cannot be controlled for. She also points out there is also a problem with nausea. Nausea in early pregnancy tends to be a good sign: women who are nauseous are less likely to miscarry. However, women who are nauseous are also more likely to avoid coffee. So, women who drink a lot of coffee are also women who, on average, are less nauseous. When it is then considered that they miscarry at higher rates, it may well just be that not being nauseous is a sign of miscarriage, not that the coffee caused the miscarriage. Trying to understand the impact of caffeine consumption on pregnancy can lead to ‘spurious correlations’ being drawn. It might be that the thing you cannot measure or have not measured is actually the underlying causal variable.
In her book, Oster usefully considers the types of evidence that are used for pregnancy research. Observational studies could be used: a group of people could be asked retrospective questions about their behaviour during pregnancy. This, however, is not ideal. For example, if they have learned certain behaviours during pregnancy were not socially acceptable they might lie, or unintentionally ‘misremember’ their past behaviour. Oster is clearly of the EBM school and places randomised data on a pedestal in order to try to tease out causation. There might be a number of areas of pregnancy where there are randomised data. However, such studies are also not without their downsides. They are expensive. They are run typically in one population, not in all populations. Is a finding significant in terms of size? The studies may also be outdated and could usefully be revisited. For example, there might be different technology available now. However, it might not be possible to run another trial because it might be more difficult to be granted ethical approval now than when the study was initially undertaken. It may not be deemed acceptable practice to ask a group of pregnant women to drink a large amount of coffee and another group not to, to obtain new data.
Researchers are often quite passionate advocates for a certain pet hypothesis and not as interested in refuting it. A common assumption among researchers is that if you did not get the answer you expected you did the experiment incorrectly. It becomes very hard for new data to convince proponents otherwise. In economics, as in epidemiology, beliefs are often masked in the language of science, precisely what proponents of EBM are seeking to overcome. Sometimes that debate is not about data, but about dogma.
Let me provide an example from economics. If you are a Keynesian and believe government spending creates prosperity in the face of a recession, something many economists have been advocating post-GFC, what evidence would dissuade you from that belief? Certainly predicting eight per cent unemployment after the US stimulus package and getting ten per cent has not really dissuaded any Keynesian economists. The response has been broadly ‘the GFC was worse than we thought!’, or ‘we didn’t spend enough’. Often the more certain and confident the researcher, the more dismissive they are of their ideological opponents, and the bigger their platform. There is an incentive to reconfirm what the researcher already believes. As well, the more exotic and dramatic the result, the more likely the research will be featured in the media. There is a bias towards incredible claims and journals can be just as guilty of this bias and generally only publish positive results. Surely the evidence base should also include negative results?
In epidemiological terms, examining the relationship between coffee and miscarriage is very similar to economic analysis of the relationship between a stimulus package and employment. Papers will be accompanied by a table or a chart that supposedly shows that the relationship between the two variables is of a certain magnitude and not owing to chance. However, readers do not see all the different regressions that were done before the chart was finished. The chart is presented as objective science. If you have not been shown all the steps, it is not clear whether the findings are robust. There is a focus on finding something in the data set. However, there might not be anything in the data set, the set might be too small or the assumptions may not be credible. As one wit once quipped, ‘statistics are like swimsuits – what they reveal is interesting, but what they conceal is vital!’
There are also well-documented issues associated with MAs. MAs are essentially observational studies in which the studies themselves are the subjects. The major difference between conclusions drawn in different MAs is the choice made of which studies to include and which ones to exclude. Yet authors maintain their criteria for exclusion are valid even though they are sometimes quite different and result in quite different conclusions. The justifications for the inclusion or exclusion of studies from the evidence often rest on competing claims of methodological authority. In many ways, these are no different to the traditional claims of medical authority that proponents of EBM have criticised. Again, the various statistical techniques used to analyse the data also require consideration. Arguments between statisticians on statistical methods are often impenetrable to all but the ultra-specialist. Doctors reading and relying on MAs are usually not aware of the thick methodological-statistical layer that underpins them.
It may be that a revolution is occurring in the way economics is done. Arguments have been made for the use of more sophisticated statistical techniques to analyse complex systems, with the power of data and analysis ascendant over more traditional decision-making methods based on judgment and intuition. Yet there is no such thing as a completely exogenous variable in either epidemiology or economics, and drawing conclusions from such experiments takes the same kind of work it takes to draw conclusions from non-experimental or observational data. In macroeconomics that jump is taken all the time.
If you want to know whether government spending has a multiplier effect, then you have to have a ‘treated group’ and a ‘control group’ just as doctors are used to. Just because, in the past, a government spend of a billion dollars had a particular impact on the economy, the complex nature of economies means that the outcome might be very different in the next recession. In the case of macroeconomics, it is very difficult to conceive a study to make conclusions about the impact of stimulus programs. Defence spending comes to mind; comparing the end of war with the start of war. While it might seem logical to draw the conclusion that defence spending has historically stimulated the economy, the scientific nature of such conclusions are somewhat problematic. The same could be said for epidemiology.
Where to from here
Some economists have dismissed all empirical work as flawed. That is wrong – no single path will ever provide the complete solution. While randomised trials are appropriate for addressing simple questions, they are practical for only a small number of issues. Even when evidence is available from high-quality RCTs, evidence from other study types can often be very relevant.
Proponents of EBM make a conceptual error in relegating clinical experience to the lowest rung. It is judgment that determines what evidence is admissible, and how strongly to weigh different forms of evidence. Judgment is integral to, and cannot be excised from, the process of evidence synthesis. The EBM evidence hierarchy otherwise becomes a means to avoid judgment. Patients seek specialist advice because specialists have considerable experience. The hope is doctors put all the other evidence in context.
And what of Emily Oster herself? Emily’s daughter, Penelope, seems to be a healthy toddler. Whatever she did in her own pregnancy, it definitely seemed to work. Let’s leave her the last word. ‘Is it okay to use dish soap during pregnancy? Is it okay to eat a lot of potatoes? What about using those whiteboard markers that smell so bad? At the end of the day, we may have to just admit we are accepting some baseline level of risk by just living, and, well, live with it.’