Article Text

Download PDFPDF

Original research
Uncovering early predictors of cerebral palsy through the application of machine learning: a case–control study
  1. Sara Rapuc1,
  2. Blaž Stres2,3,
  3. Ivan Verdenik4,
  4. Miha Lučovnik4,5,
  5. Damjan Osredkar1,5
  1. 1 Department of Pediatric Neurology, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, Slovenia
  2. 2 Department of Catalysis and Chemical Reaction Engineering, National Institute of Chemistry, Ljubljana, Slovenia
  3. 3 Institute of Sanitary Engineering, Faculty of Civil and Geodetic Engineering, University of Ljubljana, Ljubljana, Slovenia
  4. 4 Department of Perinatology, Division of Obstetrics and Gynecology, University Medical Centre Ljubljana, Ljubljana, Slovenia
  5. 5 Medical Faculty, University of Ljubljana, Ljubljana, Slovenia
  1. Correspondence to Dr Damjan Osredkar; damjan.osredkar{at}kclj.si

Abstract

Objective Cerebral palsy (CP) is a group of neurological disorders with profound implications for children’s development. The identification of perinatal risk factors for CP may lead to improved preventive and therapeutic strategies. This study aimed to identify the early predictors of CP using machine learning (ML).

Design This is a retrospective case–control study, using data from the two population-based databases, the Slovenian National Perinatal Information System and the Slovenian Registry of Cerebral Palsy. Multiple ML algorithms were evaluated to identify the best model for predicting CP.

Setting This is a population-based study of CP and control subjects born into one of Slovenia’s 14 maternity wards.

Participants A total of 382 CP cases, born between 2002 and 2017, were identified. Controls were selected at a control-to-case ratio of 3:1, with matched gestational age and birth multiplicity. CP cases with congenital anomalies (n=44) were excluded from the analysis. A total of 338 CP cases and 1014 controls were included in the study.

Exposure 135 variables relating to perinatal and maternal factors.

Main outcome measures Receiver operating characteristic (ROC), sensitivity and specificity.

Results The stochastic gradient boosting ML model (271 cases and 812 controls) demonstrated the highest mean ROC value of 0.81 (mean sensitivity=0.46 and mean specificity=0.95). Using this model with the validation dataset (67 cases and 202 controls) resulted in an area under the ROC curve of 0.77 (mean sensitivity=0.27 and mean specificity=0.94).

Conclusions Our final ML model using early perinatal factors could not reliably predict CP in our cohort. Future studies should evaluate models with additional factors, such as genetic and neuroimaging data.

  • Neurology
  • Infant
  • Statistics
  • Data Collection
  • Cerebral Palsy

Data availability statement

Data are available on reasonable request. The authors will share the data generated by the study on a reasonable request to the corresponding authors.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Neonatal MRI and general movement assessment have the highest sensitivity in predicting cerebral palsy before the age of 5 months.

WHAT THIS STUDY ADDS

  • The variables commonly collected during prenatal and perinatal care cannot be used to reliably predict which child will develop cerebral palsy.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Additional factors extending beyond the variables collected in the process of perinatal care should be considered in future studies for accurate early prediction of cerebral palsy.

Introduction

Cerebral palsy (CP) is a heterogeneous group of neurological disorders characterised by their predominant impact on the development of movement, muscle tone and posture.1 Typically, CP is diagnosed at approximately 2 years of age.2 Early identification of infants who may develop CP is of paramount importance for the initiation of timely interventions and therapies aimed at mitigating the impact of the condition and improving long-term outcomes. Understanding perinatal risk factors that arise during the perinatal period (ie, between 22 completed weeks of gestation and 7 completed days after birth3) could lead to more effective perinatal preventive strategies.

In recent years, there has been a growing trend toward using multivariate approaches and machine learning (ML) to study CP.4 Studies using multivariate approaches found several risk factors to be associated with CP, including prematurity, periventricular leukomalacia, neonatal sepsis, birth asphyxia and structural MRI abnormalities.5 6 Notably, neonatal MRI and general movement assessment (GMA) have demonstrated the highest sensitivity in predicting CP before the age of 5 months, particularly in high-risk infants.7 Furthermore, ML was used to analyse DNA methylation patterns and predict the likelihood of spastic CP.8 Although the study yielded results indicating high sensitivity and specificity, the findings were derived from a rather limited sample size of 16 participants with CP.

Overall, there is a need for further research using ML approaches to predict CP, incorporating larger samples and simultaneously evaluating multiple ML algorithms. Therefore, the primary objective of this study was to assess the predictive value of perinatal and maternal factors for CP using various ML algorithms using population-based datasets. Emphasising the importance of cost-effectiveness and feasibility in clinical settings, our study focused on using existing Slovenian governmental nationwide clinical data, rather than collecting new data via neuroimaging techniques or GMA that require specialised equipment and expertise.

Methods

Patient and public involvement

We have collected data from two national registries, both prenatal data and CP-related data. By combining these databases, we created a comprehensive dataset to address the research questions posed by the study. Patients and the public were not directly involved in the study’s design or recruitment process. The results of the study have been and will continue to be, disseminated to the relevant audiences.

Data acquisition

The perinatal and CP data were acquired by merging two population-based databases: the Slovenian National Perinatal Information System (NPIS) and the Slovenian Registry of Cerebral Palsy (SRCP). The governmental databases were merged based on the date of birth, birth weight, sex of the child and birth multiplicity.

The NPIS registers all deliveries in Slovenia at ≥22 weeks of pregnancy or when the birth weight is ≥500 g.9 Registration is mandatory by law in all of Slovenia’s 14 maternity units. More than 140 variables were collected and included information about patient demographics, family, medical, gynaecological, and obstetric history, current pregnancy, labour and delivery, postpartum period, and neonatal data.9

The SRCP is a national registry in which all children with CP in Slovenia are enrolled. The SRCP includes all children who have been diagnosed with CP at a minimum age of 5 years by a developmental paediatrician trained in child neurology in 1 of the 23 developmental outpatient clinics in Slovenia or by a child neurologist at the Department of child, adolescent and developmental neurology, University Children’s Hospital, University Medical Center Ljubljana.

Participants

The resulting merged database contained 382 unique cases of CP born between 2002 and 2017. Patients with congenital anomalies were excluded from the study (n=44). Liveborn controls surviving the neonatal period until hospital discharge were selected based on matching gestational age and birth multiplicity, at a control-to-case ratio of 3:1. The final database comprised 338 CP cases and 1014 controls.

Data preparation

Data consolidation: At the initial stage, the merged database and its variables were visually screened for inclusion/exclusion in the final database by two experienced clinicians (DO and ML). Variables were excluded if they were repeated/redundant or contained personally identifying information. The missing values were calculated per each included variable. Variables with >10% of NA in each group (case and control) were excluded from the database (n=2). The numbers and percentages of missing values for the remaining variables are reported in online supplemental table S1. Additionally, variables with zero and non-zero variance were excluded from the database due to potential issues with cross-validation (n=79). The remaining missing data were imputed using median and k-nearest neighbour (k-NN) imputation. Subsequently, both imputed datasets were subjected to the described ML analyses, and the obtained results were compared.

Supplemental material

Statistical analyses

ML analyses were performed in the R statistical program10 and the platform JADBio V.1.4.117.11

ML approach

Test/train split: A random 80% of the data was used to train the model, whereas the remaining 20% was used for validation, based on one of the standard train/test split recommendations.12–14 A 10-fold cross-validation was used to estimate the test metrics (ie, receiver operating characteristic (ROC), sensitivity and specificity) within the training data in R, whereas repeated 10-fold cross-validation (max. repeats=20) was used in JADBio.

ML algorithms: Several ML algorithms were used to estimate the test metrics, including linear (regularised regression), non-linear (k-NN), boosting (stochastic gradient boosting machine (GBM)) and bagging (random forest) algorithms and their performances were compared. The ML algorithm-specific tuning parameters were used to tune the models.

Data transformations: The input variables for ML algorithms that require data standardisation were centred and scaled. Additionally, input variables were transformed using log(x+1) and Yeo-Johnson transformations. The results of the ML models with and without data transformations were evaluated.

Model evaluation: The final model was selected based on its highest ROC value. The estimated model was then evaluated using the test data. A confusion matrix was calculated to determine the accuracy of the predictions by evaluating the correctness and accuracy of the model predictions.15 Additionally, sensitivity and specificity were derived from it. The area under the ROC curve (AUC) was calculated using the trapezoidal rule.

Most important predictors: A variable importance plot was generated for the final model, where the predictors were sorted from most to least important and scaled to have a maximum value of 100. To evaluate the validity of the final model, three random variables were added to the dataset. If one of these random variables appeared among the most important predictors, all predictors below that variable were discarded.

Fully automated ML (autoML) analysis in JADBio: The same training and test datasets used in the main analysis were applied to fully automated ML by using JADBio. For simplicity, only median-imputed datasets are presented. Extensive tuning with six central processing unit cores was performed to identify the most interpretable classification model. The model was selected based on the best AUC value from all trained models. The data were preprocessed using constant removal and standardisation, whereas LASSO was used for feature selection (penalty=0.0) and ridge logistic regression with penalty hyperparameter lambda=0.1 was used as the predictive algorithm.

Results

Database description

The final database comprised 136 variables, with 135 used as predictors and 1 serving as the outcome variable (ie, CP/case group). See online supplemental table S1 for the complete list of predictors.

Baseline characteristics

The baseline characteristics of the CP and control groups are presented in table 1. CP cases were more often classified as small for their gestational age than controls. Additionally, CP cases were more frequently transferred to intensive care or therapy units and had a higher incidence of birth asphyxia as well as brain abnormalities, such as periventricular leukomalacia and severe intraventricular haemorrhage (IVH).

Table 1

Baseline characteristics and group comparison of CP and control cases

Among CP cases, the most common type of CP was spastic (87.9 %), followed by dyskinetic (9.2%), ataxic (1.5%) and other (1.5%). According to the Gross Motor Function Classification System, level I constituted the majority, representing 41.4% of the cases. Levels II, III, IV and V were observed in 16.6%, 6.2%, 18.0% and 15.4% of cases, respectively.

Model selection

The ML models were trained on 271 CP and 812 control cases, whereas the validation dataset contained 67 CP and 202 control cases. The performance of the various ML algorithms is presented in figure 1. The results were obtained using median imputed data without any applied transformations. Among these algorithms, the stochastic GBM ML model with the Bernoulli loss function demonstrated the highest ROC value and was chosen as the final model. The selected model achieved a mean ROC value of 0.81, with a mean sensitivity of 0.46 and a mean specificity of 0.95. The tuning parameters of the final model had the following values: interaction depth=4, number of trees=150, shrinkage=0.1 and minimum number of observations in trees=10.

Figure 1

Comparison of mean ROC, sensitivity and specificity values of four ML algorithms: stochastic gradient boosting, regularised logistic regression, random forest and k-nearest neighbour. Bars represent 95% CIs. ML, machine learning; ROC, receiver operating characteristic.

A detailed comparison of the performance of the ML algorithms using both median and k-NN imputed data, as well as untransformed and transformed data, is presented in online supplemental figure S1. Both imputation methods and data transformations yielded comparable results in terms of mean ROC, sensitivity and specificity values across ML algorithms.

Model evaluation

Confusion matrix

The confusion matrix for the final model is presented in table 2. Of the 67 CP cases, 18 were accurately classified as CP, while 49 cases with CP were incorrectly identified as controls. Conversely, the majority of control cases (n=190) were correctly labelled, except for 12 cases that were misclassified as CP cases despite not having CP. The model achieved an AUC of 0.77, a sensitivity of 0.27 and a specificity of 0.94.

Table 2

Confusion matrix of the final model

Variable importance

The most important variables in the final model are presented in figure 2. The results suggest that severe IVH (defined as grade 3/4 IVH)16 was the most important variable, followed by maternity hospital, postneonatal transport and gestational age at birth. The importance gradually decreased for the remaining variables, such as small for gestational age and the number of visits to the antenatal clinic. The other variables had even lower importance, indicating less significance in the overall dataset.

Figure 2

The most important variables of the final model after the incorporation of three additional random variables into the dataset. The variables are ordered from most to least important and scaled to have a maximum value of 100. BMI, body mass index; CTG, cardiotocography; GA, gestational age; IVH, intraventricular haemorrhage.

Additional analyses

To further evaluate the robustness of the results, the ML analyses were extended using JADBio by testing 2593 different ML pipelines. The final model used ridge logistic regression (penalty hyperparameter lambda=0.1) as the ML algorithm. The tested model achieved a mean AUC value of 0.75, with a mean sensitivity of 0.33 and a mean specificity of 0.89, which is comparable to the independent results of the main analysis conducted in R. Furthermore, a comparable group of variables was recognised as the most significant, as shown in online supplemental figure S2.

To mitigate potential bias due to the uneven distribution of cases among different maternity wards, the analyses were replicated exclusively for cases born in Ljubljana’s maternity hospital, which accounted for approximately half (46%) of all cases (online supplemental figure S3). The resulting training dataset included 116 CP and 381 control cases, whereas the validation dataset contained 28 CP and 95 control cases. No data transformations were employed because no difference between transformed and untransformed data was previously observed (online supplemental figure S1). The final model used GBM as the ML algorithm and achieved a mean ROC value of 0.78, a mean sensitivity of 0.38 and a mean specificity of 0.94. Factors such as severe IVH, gestational age at birth, birth weight and discharge weight continued to be prominent, and a specific birth-related intervention, namely, transfusion of packed red blood cells, emerged as a novel significant factor not previously detected in the model encompassing all maternity units.

Discussion

To the best of our knowledge, this is the first study to comprehensively examine a diverse set of perinatal risk factors associated with CP. The main finding of our study was that the variables commonly collected during prenatal and perinatal care cannot be used to reliably predict which child will develop CP. Although we evaluated 135 variables per case/control and extensively explored a number of ML algorithms to find the best predictive model, our data suggest that variables other than those used are necessary to increase the predictive value to a level that would be of practical use.

The poor performance of the ML models in differentiating individuals with CP from those without CP could be attributed to a range of factors. For instance, several factors that have been previously identified as significantly associated with CP (eg, necrotising enterocolitis and birth asphyxia) had to be excluded from the current merged database generated in this study due to their low representation. Additionally, the lack of other relevant features that are often not routinely collected in hospitals and the small sample size of the CP cohort might result in poor model performance. The overall prevalence of CP in Slovenia decreased between 1995 and 2014.17 18 In 2010–2011, there were 1.6 prenatal/perinatal CP cases per 1000 live births in Slovenia, similar to rates in other high-income European countries such as Denmark (1.6), Sweden (1.7) and Belgium (1.4).17 18 Therefore, collaborative research is crucial to increase the sample size needed for ML analysis.

The low prognostic value of perinatal factors for the development of CP is important from a medicolegal perspective. Medical malpractice litigation is increasing in most of the developed world.19 20 Pregnancy and birth events account for a large proportion of claims, generally alleging that acute or chronic peripartum asphyxia led to long-term neurological sequelae such as CP.19 20 The American College of Obstetricians and Gynecologists reported that over 80% of its members have been sued, with an average of three lawsuits per member.21 Yet, many supposed associations between perinatal events and fetal/neonatal neurological injury remain unproven and are mostly based on belief or historic allegations.22 Our data suggests that predicting CP from perinatal risk factors is at best unreliable, if not impossible. This emphasises the need for further research in the field on larger datasets in order to allow better CP prediction and evidence-based obstetric preventive measures.

Thus far, CP has primarily been associated with the following mechanisms: (1) intrauterine events like fetal growth restriction, (2) peripartum events, such as placental abruption and (3) neonatal events, exemplified by IVH and periventricular leukomalacia.23 CP can also develop postneonatally, with traumatic brain injury, near-drowning and meningitis as the most common causes.24 More recently, genetic aetiology has been found to be a significant contributor to the development of CP, potentially through both disrupted brain development and dysregulated responses to risk factors.25 For example, in a study involving 681 CP cases, approximately 35% of the total cases were due to genetic pathology.26 From a more practical point of view, it seems reasonable to assume that similar CP pathologies effectively emerge from either genetic and birth-related pathologies or their interplay, therefore, more unified and technology-oriented approaches should be adopted in the future to secure CP identification as quickly as possible.

According to recent guidelines, GMA is one of the most sensitive predictors of CP in infants before the age of 5 months.7 GMA is used to evaluate and detect abnormal infant movement, with absent fidgety movements at 3 months strongly associated with CP.27 However, GMA requires specialised training and is therefore not part of standard clinical care across the globe. There was also no GMA data available for our study cohort as it is not part of standard clinical practice. More recently, researchers have developed automated tools to detect abnormal infant movements at home.28 Combining GMA with clinical characteristic has been found to improve CP prediction compared with GMA alone.28 Therefore, future studies should focus on developing and evaluating multivariate automated models to use in clinical practice. A recent study showed that clinicians use automated tools correctly but tend to overestimate survival with major disabilities.29 Further development and testing of these tools could significantly enhance early CP prediction.

Despite the generally poor predictive performance of ML models in our study, it is important to acknowledge the various variables that were identified as the most important and are in line with previous findings. Severe IVH emerged as the most crucial factor, which is consistent with previous studies that have highlighted the link between IVH and CP risk.30 31 Gestational age at birth was identified as another important variable, consistent with research showing that premature birth is a significant risk factor for CP.30 32 Furthermore, two variables not usually identified in CP research, maternity unit and postneonatal transport, had emerged among the most important ones and warrant consideration in future studies to assess their importance. Other studies have shown that high-risk children (eg, very preterm born and very low birth weight), who are born outside a tertiary centre have a significantly higher risk of mortality33 and brain injury (periventricular leukomalacia, grade ≥3 IVH)34 35 than those born inside the tertiary centre.34 35 This might be due to maternity wards with fewer births having fewer resources and expertise available when needed. The need for transportation to a tertiary intensive care unit after a complicated birth is not only a sign that the infant’s condition is severe but might also lead to increased time to appropriate interventions, thus significantly contributing to the risk of developing CP.

Finally, the limitations of our study need to be addressed as the sample of patients with CP consisted of a low frequency of severe cases. This frequency has a highly variable regional distribution; therefore, validation in larger cohorts that reflect a more natural distribution of patients is needed. Furthermore, ML analysis often requires a large amount of data. The suboptimal performance of ML algorithms in our study might be due to the small sample size of the CP cohort.

In conclusion, our study found a limited prognostic value of standard perinatal risk factors for identifying CP. Understanding the complex interplay between potential causative pathways and postneonatal triggers is crucial for the collection of variables that may enhance the prognostic accuracy of ML models and guide effective interventions.

Data availability statement

Data are available on reasonable request. The authors will share the data generated by the study on a reasonable request to the corresponding authors.

Ethics statements

Patient consent for publication

Ethics approval

This retrospective case–control study was approved by the National Medical Ethics Committee of Slovenia (0120-317/2020/6).

References

Footnotes

  • ML and DO contributed equally.

  • Contributors SR: data acquisition; data analysis, interpretation; drafting of the manuscript. BS: design of the study; data analysis, interpretation; revising the manuscript. IV: data acquisition. ML: design of the study; data acquisition; interpretation; drafting and revising of the manuscript. DO: design of the study; data acquisition; interpretation; drafting and revising of the manuscript. David Neubauer and DO: establishing the database of cerebral palsy in Slovenia. Anja Troha Gergeli and DO: maintaining the database of cerebral palsy in Slovenia. The guarantor for the study is DO.

  • Funding The study was funded by a tertiary research project grant from the University Medical Centre Ljubljana (grant No 20210101).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.