Introduction
Childhood stunting is associated with myriad proximal and distal factors. Historically, investigations into the aetiology of stunting were focused primarily on dietary intake, with inadequate consumption of nutritious food and low dietary diversity identified as the primary risk factors.1–3 However, improving nutrition through intervention studies had inconsistent effects on child growth.4 5 Attention then shifted to the impact of infectious diseases, such as diarrhoea, on child nutritional status. Research studies into improving hygiene through provision of clean water, toilets and handwashing with soap also showed variable results, offering conflicting evidence of the role of water, sanitation and hygiene in improving childhood growth.6 7 It now appears that inadequate diets and infectious diseases can interact to contribute to stunting. Exposure to enteropathogens among people living in settings with poor sanitation and hygiene can modify gut structure and function and lead to reduced digestion and absorption of nutrients.8 These effects may be amplified in some individuals due to the expression of epigenetic traits in response to environmental triggers.9 In addition, underlying the aetiology of stunting are multitude of other factors, including socioeconomic status, childcare practices, parental education, malaria, air pollution, etc.10–15
Knowledge around the aetiology of child stunting is largely derived from observational studies through correlation of factors with measures of child growth (such as height-for-weight z-scores). Data alone cannot show causality. Drawing causal inferences requires both statistical inference and sound knowledge of the domain/subject area, otherwise the statistical outcomes can be biased16 or even paradoxical.17 The challenges of correlation studies are variables being included haphazardly in the model based on statistical associations without considering their causal influence on other factors. This is supported by an example from Hernán et al on the impact of folic acid supplementation on neural tube defects, where the authors suggest lowering the OR of the intervention by roughly 20%, from 0.80 to 0.65, taking into consideration a priori causal knowledge, as opposed to relying solely on statistical associations.18 Hernán et al show that statistical strategies for variable selection and confounding recommend adjusting for a third variable representing whether the pregnancy ends in stillbirth or therapeutic abortion. However, a priori causal knowledge indicates that adjusting for this variable is likely to bias the causal effect estimates between folic acid supplementation and neural tube effects.
In addition, estimating causal effects requires the inclusion of confounding variables, which may be limited by availability of data, especially when using secondary data. Thus, it is not uncommon to see disregard of potentially important confounding variables in correlation models where there are no data, despite knowledge of the impact of the variable on the study’s outcome. This procedure also leads to biased estimates due to unobserved confounders. Such data limitations are particularly challenging for research into child stunting, where very few or no datasets exist that include the multitude of determinants, and where some purported determinants are purely based on hypotheses and observation with only limited data.
An underlying causal model enables the determination of the causal relations that can be estimated and provides guidance in collecting additional information where data are lacking. Furthermore, recent studies of child stunting conducted in Bangladesh and India have adopted machine learning—random forests, gradient boosting and quantile regression—to learn from the data and discover further evidence of associations between stunting and various determinants.19–21 However, these approaches also risk spurious predictions, especially when making similar predictions outside the study, because they do not exploit the causal understanding of their domain. Thus, there is a need to apply modelling frameworks that integrate causal knowledge and multiple types of data.22
Bayesian networks (BNs) by comparison offer an opportunity to build holistic causal models, based on the current understanding of the complex factors determining child stunting and their inter-relationship. Encoding a holistic model of causal relations is critical in assessing diverse intervention opportunities, to inform programmes and policies, and to identify evidence gaps in domain knowledge. Causal structure of these models can provide a basis for causal assumptions in future studies.
The aims of this study are to: (1) create a directed acyclic graph (DAG) summarising probable causal factors of child stunting at the whole child level, (2) parameterise BNs based on the DAG by using secondary data, and (3) assess the drivers of stunting in Indonesia, India and Senegal, and examine evidence gaps using BNs.