Article Text

Original research
Risk factors for early language delay in children within a minority ethnic, bilingual, deprived environment (Born in Bradford’s Better Start): a UK community birth cohort study
  1. Rachael W Cheung1,2,
  2. Kathryn Willan2,
  3. Josie Dickerson2,
  4. Claudine Bowyer-Crane3,4
  1. 1 Department of Health Sciences, University of York, York, UK
  2. 2 Better Start Bradford Innovation Hub, Born in Bradford, Bradford Institute for Health Research, Bradford, UK
  3. 3 National Institute of Economic and Social Research, London, UK
  4. 4 Department of Education, University of Sheffield, Sheffield, UK
  1. Correspondence to Dr Rachael W Cheung; rachael.cheung{at}


Background Preschool language skills and language delay predict academic and socioemotional outcomes. Children from deprived environments are at a higher risk of language delay, and both minority ethnic and bilingual children can experience a gap in language skills at school entry. However, research that examines late talking (preschool language delay) in an ethnically diverse, bilingual, deprived environment at age 2 is scarce.

Methods Data from Born in Bradford’s Better Start birth cohort were used to identify rates of late talking (≤10th percentile on the Oxford-Communicative Development Inventory: Short) in 2-year-old children within an ethnically diverse, predominantly bilingual, deprived UK region (N=712). The relations between known demographic, maternal, distal and proximal child risk factors, and language skills and language delay were tested using hierarchical linear and logistic regression.

Results A total of 24.86% of children were classified as late talkers. Maternal demographic factors (ethnicity, born in UK, education, financial security, employment, household size, age) predicted 3.12% of the variance in children’s expressive vocabulary. Adding maternal language factors (maternal native language, home languages) and perinatal factors (birth weight, gestation) to the model predicted 3.76% of the variance. Adding distal child factors (child sex, child age) predicted 11.06%, and adding proximal child factors (receptive vocabulary, hearing concerns) predicted 49.51%. Significant risk factors for late talking were male sex (OR 2.07, 95% CI 1.38 to 3.09), receptive vocabulary delay (OR 8.40, 95% CI 4.99 to 14.11) and parent-reported hearing concerns (OR 7.85, 95% CI 1.90 to 32.47). Protective factors were increased household size (OR 0.85, 95% CI 0.77 to 0.95) and age (OR 0.82, 95% CI 0.70 to 0.96).

Conclusions Almost one in four children living in an ethnically diverse and deprived UK area have early language delay. Demographic factors explained little variance in early vocabulary, whereas proximal child factors held more predictive value. The results indicate further research on early language delay is warranted for vulnerable groups.

  • Psychology
  • Epidemiology
  • Health services research

Data availability statement

Data are available on reasonable request. Data for this study are available via request via the Born in Bradford website (

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Children from minority ethnic, bilingual, deprived populations are at risk of delayed school entry language skills and subsequent lower academic outcomes.

  • Existing research on early language delay (late talking) is limited to monolingual, white, mid-socioeconomic status populations.


  • 24.86% of 2 year olds in a minority ethnic, bilingual, deprived population were classified as having late talking status, approximately double that of other cohort studies.

  • Demographics explain little variance, whereas proximal child factors such as, for example, receptive vocabulary, may provide more predictive value.


  • Highlights the potential need for early language intervention in vulnerable groups in the UK.

  • Provides estimates to aid public health planning.


Early language delay is associated with poor academic attainment and socioemotional well-being,1 2 and can be an early manifestation of neurodevelopmental disorders.3 Vocabulary development is sensitive to negative socioeconomic effects,4 5 and there is a well-known achievement gap between children from mid-high socioeconomic status (SES) and those from deprived6 and minority ethnic7 communities. Bilingual children with language delay are also at risk of under detection.8 Therefore, minority ethnic children from deprived backgrounds with low English language exposure may be particularly vulnerable to early language delay and low school readiness.9

Cohort research on preschool language delay tests for ‘late talking’ status, defined as children at or below the 10th percentile for expressive vocabulary size expected for 2 years old as compared with population norms.10–12 Late talking prevalence rates in five English-speaking, predominantly white cohorts (Australia, Canada, UK, USA) were 9.6%–15%,10 12–15 although another study reported 19.1% (Australia11). Two cohorts with more ethnic and linguistic diversity reported similar rates (8.7%–15.4%) but only 20%–30% of these cohorts were from minority ethnic or bilingual communities.16 17 Research on late talking in deprived populations is also scarce. One UK study reported 23% of 2-year-old skewed towards socioeconomic disadvantage had broader language delay.18 A US study identified 35.9% of children from low-SES backgrounds were late talking, 28.4% of which were exposed to a second language, as compared with 11.7% of a mid-SES sample, where 14% had exposure to a second language.19 These limited results indicate that children from minority ethnic, bilingual, deprived homes may be at a higher risk of early language delay.

Of note is that population-level variables are of limited predictive value alone.20 More proximal child factors may also hold additional predictive value, but are not always measured at population level. Demographic, maternal and distal child factors (such as age and sex) account for ~4%–6.2% of the variance in children’s expressive vocabulary, whereas proximal child factors such as receptive vocabulary and concurrent developmental skills account for ~9%–30%.11 16 21 However, we do not know if these patterns hold also for bilingual, deprived communities in the UK—despite their potential vulnerability to early language delay. The recent Early Language Identification Measure (ELIM) can be administered by health visitors at the 2–2.5 years old universal screening visit in England if concerned about language skills and includes some proximal child factors.18 However, although the ELIM was tested on a diverse SES sample, ~70% participants were from monolingual English-speaking homes, requiring further research to gauge need in multilingual populations.

This study uses the Born in Bradford’s Better Start (BiBBS) data,22 a community birth cohort that uses routine data collection covering three regions of Bradford, UK. In BiBBS, 85% of participants are from minority ethnic backgrounds and live in deprived communities, with multiple languages spoken at home. Approximately, one-third of mothers report understanding and speaking some, a little or no English during pregnancy.23 The present study sought to identify predictors of children’s vocabulary, rates of late talking as compared with normative data from other cohorts, and potential risk factors (including proximal child language factors) for delay in the 2-year-old children in BiBBS.



The BiBBS experimental birth cohort was established to evaluate early-life interventions (birth to 4 years old) in three inner-city areas of Bradford, funded by the National Lottery Community Fund, via the Better Start Bradford programme.22 BiBBS recruits women during pregnancy and up to 2 weeks post partum, where they complete a baseline questionnaire and consent to linkage and research use of their and their child’s routinely collected health, education and Better Start Bradford intervention data. BiBBS includes a representative sample of families living in the Better Start Bradford areas.23

From an interim BiBBS pre-COVID-19 data freeze of 2626 pregnancies recruited between January 2016 and November 2019, 2485 children had linkable individual data to early-life interventions (detail on data linkage in online supplemental information).23 Of these, 1027 attended a language screening and intervention, Talking Together, designed by Bradford-based service BHT Early Education and Training with local speech and language therapists. BHT receive contact information about all 2 years old in the Better Start Bradford areas from the National Health Service (NHS) and invite them to a screening visit (Talking Together Screening). At this visit, practitioners undertake the Oxford-Communicative Development Inventory: Short (CDI-S)24 and a clinical assessment which includes questions about children’s everyday language use. Children who are identified as at risk of language delay on the clinical assessment are referred to a 6-week home-based intervention (Talking Together Intervention; for further information, see Bowyer-Crane et al). 25

Supplemental material

Inclusion and exclusion criteria

Only children who had a Talking Together Screening visit with the outcome measure (CDI-S) in English were included in analyses (n=712). Children were excluded from analyses if they did not have a Talking Together Screening visit, the outcome measure was missing, not in English or the screening language was not recorded (see Figure 1).

Figure 1

Participant flow diagram. Further details on data linkage and missing data available in online supplemental information.

Patient and public involvement

Members of the community are involved in the design and conduct of BiBBS cohort research through regular Community Research Advisory Group meetings, composed of Better Start Bradford residents, including recruitment, questionnaires, measures and interpretation and dissemination of findings.


Potential risk factors

Potential risk factors were identified from prior literature10 11 13 15 16 21 and are listed in Table 1 with data sources.

Table 1

A priori groups of variables used in analyses

Bilingual classification

Children were classified as being from bilingual-English homes based on BiBBS baseline data on maternal native language and the languages spoken in the home. Where native language was monolingual-English and languages spoken at home were English only, children were classified as monolingual-English, otherwise children were classified as bilingual-English (see online supplemental information).

Outcome measures

Raw expressive vocabulary scores on the CDI-S24 collected at Talking Together Screening were used as the outcome measure (please see Supporting Information for receptive vocabulary as outcome measure). The CDI-S is a 100-item parent-reported checklist of words that children understand (receptive) and say (expressive vocabulary). Receptive vocabulary was used as a predictor of expressive vocabulary.26 Data were collected in the child’s home by trained staff from BHT Early Education and Training who supported parents in completing the measure. Parents were asked which language they were most comfortable using for the screening visit; only data from those who completed the visit and CDI-S in English were included. Maternal self-reported confidence in oral English language ability from the BiBBS baseline data in can be viewed in Table 2.

Table 2

Characteristics of mothers who had Talking Together Screening

Table 3

Characteristics of children who had Talking Together Screening

Late talking was defined as ≤10th percentile for expressive raw scores using the CDI-S24 and the UK Bilingual Assessment Tool (UKBTAT) normative data,27 consistent with the literature.26 The CDI-S published data were used to calculate norms for monolingual English children aged 2 years old. The UKBTAT published data (normed on 2 years old using the CDI-S word list) were used to estimate norms for bilingual-English children with exposure to additional languages.

Statistical analysis

The analyses were preregistered ( and conducted in R. Analysis 1 used hierarchical linear regression analyses containing a priori sets of variables28 to identify predictors of children’s raw expressive vocabulary scores on the CDI-S. Models were built up consecutively with each variable set using forced entry, and the percentage of unique variance was identified at each step. Results are reported as unstandardised estimates and those with p<0.05 reported as significant risk factors. Analysis 2 used a general linear model to predict late talking (delay=1, no delay=0) using all predictors in Analysis 1. Results are reported with ORs and confidence intervals, with p values reported for information.29

Missing data

A total of 82.58% of cases were complete for analysis (17.40% missing; Figure 1; Supporting Information, online supplemental table S1). The data were not MCAR (Little’s test p<0.001). Although there is no statistical test for MNAR versus MAR data, multiple imputation is considered less biased than complete cases.30 We therefore used multiple imputation with 18 datasets, similar to the percentage of incomplete cases,31 using the mice package in R. All results report the pooled estimates of the imputed datasets.



Characteristics are available in Tables 2 and 3. Families were predominantly in the first (83.29% of sample) or second (14.33%) most deprived decile using the Indices of Multiple Deprivation 2019. A total of 18% of children were classified as monolingual-English and 82% as bilingual-English. The mean age of children at screening was 25.21 months (SD=1.30). Children had a mean raw score of 69.06 (SD=22.01) for receptive vocabulary and a mean raw score of 49.93 (SD=28.82) on the CDI-S.

Language delay at 2 years old

A total of 177 children (24.86%) were of late talking status at the Talking Together screening visit, and 221 (31.04%) of the sample were referred to Talking Together Intervention, based on the clinical assessment.

Analysis 1: predictors of expressive vocabulary using known variables

Full results can be viewed in Table 4. Demographics explained only 3.12% of the variance. Pakistani heritage significantly predicted lower expressive vocabulary across all models; Model 1–Model 4 indicated where children were of Pakistani heritage as compared with white British, they had an average difference of B=−10.56 to −12.33 in expressive vocabulary, although model 5 indicated with the addition of proximal child factors, this effect was less (B=−3.05). This pattern was also present for those of Central/Eastern European and ‘Other’ relative to white British ethnicities, although these did not reach the significance threshold (model 1–4: Bs=−6.98 to −12.24, model 5: B=−1.86 to −2.84). Factors that predicted a decrease in average expressive vocabulary, but also did not reach the threshold of significance, were having an education level below that of degree, having a financial security level less than ‘comfortable’ (with the except of ‘just getting by’, which was associated with a small increase), maternal age at screening and neither parent being employed (see Table 4). Factors that predicted an increase in average expressive vocabulary were not being born in the UK, and an increased size of household—however, all estimates were relatively small, and only size of household was significant (B=0.94–1.36).

Table 4

Predictors of expressive vocabulary as a continuous variable (N=712)

Adding maternal factors did not significantly improve the model, contributing 0.32% variance, with model 2 explaining 3.44% overall. Having a mother who was bilingual was associated with a small detriment to expressive vocabulary, as was having bilingual or no English languages at home at baseline, although these estimates were not significant.

Perinatal factors also did not significantly add to the model, contributing an additional 0.32%, with model 3 explaining 3.76% overall. Increased birth weight and gestation estimates were largely associated with a small increase in average vocabulary, although these increases were not significant.

Distal child factors provided a significant increase to the variance of the model, contributing an additional 7.30%, with model 4 explaining 11.06% of the variance overall. Male sex indicated a significant decrease in average expressive vocabulary similar to being of Pakistani heritage (Bs=−7.26 to −12.33), and increased child age indicated a significant increase (Bs=2.96–4.32), although this was less than that of male sex and Pakistani heritage.

Proximal child factors were also a significant addition, contributing an additional 38.45% of the variance. The final model (model 5) explained 49.51% of the overall variance in expressive vocabulary. Receptive vocabulary was associated with a small but significant increase in expressive vocabulary (B=0.82),1 and parent-reported hearing concerns was associated with a significant decrease in expressive vocabulary exceeding that of all other predictors (B=−19.45). The model was examined for multicollinearity: variance inflation factors were <332 and condition indices <30.33

Analysis 2: risk factors for late talking

We tested all variables as risk factors for late talking status using a general linear model with odds ratios (Table 5). These were generally consistent with analysis 1. Children from a Pakistani heritage family were over two times more likely to be delayed (OR 2.43, 95% CI 0.99 to 5.97). An increased number of people in the household was a protective factor (OR 0.85, 95% CI 0.77 to 0.95). Of distal child factors, children who were male at birth were two times more likely to be delayed (OR 2.07, 95% CI 1.38 to 3.09), whereas higher age in months at assessment was a protective factor (OR 0.82, 95% CI 0.70 to 0.96). Of proximal child factors, being ≤10th percentile for receptive vocabulary was associated with an eight-fold increase in late talking status (OR 8.40, 95% CI 4.99 to 14.11), and having a parent-reported hearing concern was associated with an eight-fold increase (OR:7.85, 95% CI:1.90 to 32.47).

Table 5

Risk factors for early expressive vocabulary delay at age 2 (≤10th percentile) (N=712)


We identified that rates of late talking were higher in a minority ethnic, bilingual, deprived environment and that more proximal child factors might be useful when identifying early language delay in the community. Late talking rates were approximately double that found in a monolingual, mid-high SES UK sample using similar criteria,12 higher than other monolingual English-speaking populations globally,10 11 and higher than those that have more ethnic or linguistic diversity16 17 even when accounting for bilingualism. Our results are similar to the ELIM validation sample, which included some low SES children (using a different language assessment.)18 Head Start US data suggest it is low SES, rather than bilingualism, that predicts lower language scores when assessing children in both community and heritage languages,34 consistent with our results. Mechanisms behind low SES and low language skills are not fully understood, but low SES early caregiving environments appear to have less rich language input than in mid-high SES families,4 5 which correlates with child brain structure and neurophysiological function.35

This study provides evidence for potential risk factors and rates of early language delay in ethnically diverse, deprived communities, making use of bilingual normative data where possible. Although we did not have the in-depth information that quantifies exposure to each language in bilingual children, and levels of exposure to English affects core language skill,27 the study provides a practical estimate for minority ethnic, deprived, bilingual community population-level data, as required in front-line health service settings. Additionally, they provide a much-needed baseline for future research, as school-entry language skills have suffered following the pandemic.36

Consistent with the existing literature,11 demographic factors explained little variance in expressive vocabulary. Some risk factors for late talking were consistent with existing literature, such as male sex, hearing concerns and receptive vocabulary, although the CI for hearing concerns was quite wide, suggesting instability. However, unlike prior studies, maternal language and languages spoken at home did not predict higher rates of language delay, potentially because we used bilingual norms. Pakistani heritage as a risk factor for lower expressive vocabulary was consistent with the UK-Millennium Cohort Study.7 This may in part be explained by larger health inequalities that also disadvantage minority ethnic families,37 as well as child factors that develop alongside language acquisition, such as motor skills, compounded by socioeconomic factors.38

Interestingly, having a larger household was protective, contrary to a Canadian cohort,10 although the beneficial effect was small. Many Pakistani heritage families have an intergenerational integrative structure where multiple generations live together as part of a cultural norm;39 a possible benefit may be a greater variety of speakers, which has previously corresponded with success in word learning.40 However, these possibilities require further research using in-depth access to the immediate learning environment.41

Our results also suggest that assessing individual factors such as receptive vocabulary and hearing concerns may be more useful when detecting early language delay in the community than demographic factors alone. This is consistent with the limited research which suggests that more proximal child factors hold predictive weight10 21—further research in this area would likely be fruitful. Screening programmes like ELIM have the advantage of testing some of these and of using existing home-visiting; however, they also place demand on increasingly stretched services—particularly following COVID-19.42 Any national screening programmes for language delay, therefore, must account for differential local need while planning capacity.

There are several limitations to this study. First, as the sample consisted of only participants who were sufficiently confident in their English ability to complete the Talking Together screening visit in English, families who did not speak sufficient English for the CDI-S are not represented, and may be at even greater risk of under detection of early language delay. Second, the nature of the sample is also a potential threat to validity. The sample only contained those who agreed to the universal screening visit and were from an interventional birth cohort that utilises routinely collected data from the NHS and from commissioned services, and we did not have outcome data for those who did not take part in the screening. This means the prevalence rates of late talking and predictors may not generalise to other populations, and may mean children whose parents are concerned about language skills are over-represented. However, as other community data on preschool language ability for minority ethnic, multilingual, deprived UK populations is scarce, routine data collection can provide some insight into early language delay within a population that is seldom-heard.

In addition, data regarding the home learning environment,7 21 family history of language delay10–12 15 or in-depth skills such as phonological development and grammar43 were not available, all of which are targets for future research. We also do not yet have data on later language skills, although we plan to use national school data to track longer-term outcomes via the BiBBS cohort. Despite some research previously suggesting that late talking children ‘catch-up’, others find they have persistent difficulties throughout school.44 Failing to intervene early may thus risk missing a key period for intervention.45


The achievement gap between low and mid-high SES children starts early in life,46 47 and the negative effect of SES on child neurocognitive outcomes appears strongest for language when compared with other cognitive domains such as working memory or executive function.48 Early language intervention may be particularly important in deprived environments;49 however, to effectively plan services, research in minority ethnic, deprived, bilingual environments is necessary. This study indicates these communities are at risk and further research on early language delay is likely warranted for vulnerable groups in the UK.

Data availability statement

Data are available on reasonable request. Data for this study are available via request via the Born in Bradford website (

Ethics statements

Patient consent for publication

Ethics approval

The recruitment, collection and research use of BiBBS cohort data was approved by Bradford Leeds National Health Service (NHS) Research Ethics Committee (15/YH/0455). Research governance approval was provided by Bradford Teaching Hospitals NHS Foundation Trust.


We thank Gillian Santorelli for providing statistical advice and Amy Hough and Dan Mason for assisting with additional data following review of the manuscript.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors RWC: conception, design, statistical analysis, interpretation of results, first draft, revisions, guarantor. KW: data acquisition and data curation, interpretation of results, revisions. JD: design, interpretation of results, revisions. CB-C: design, interpretation of results, revisions. All authors approved the final version of the manuscript.

  • Funding This study is supported by the National Lottery Community Fund (previously the Big Lottery Fund) as part of the A Better Start programme (Ref 10094849). JD is also supported by the NIHR Yorkshire and Humber Applied Research Collaboration (ARC-YH; Ref: NIHR200166).

  • Disclaimer The funder had no role in the study design, data collection, analysis, interpretation of data, decision to publish, or in writing the manuscript.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.