Article Text

Original research
Universal language development screening: comparative performance of two questionnaires
  1. Philip Wilson1,
  2. Robert Rush2,
  3. Jenna Charlton3,
  4. Vicky Gilroy4,
  5. Cristina McKean3,
  6. James Law3
  1. 1 Centre for Rural Health, Institute of Applied Health Sciences, University of Aberdeen, Inverness, UK
  2. 2 Independent statistical consultant, Edinburgh, UK
  3. 3 School of Education, Communication and Language Sciences, Newcastle University, Newcastle upon Tyne, UK
  4. 4 Institute of Health Visiting, London, UK
  1. Correspondence to Professor Philip Wilson; p.wilson{at}


Background and objective Low language ability in early childhood is a strong predictor of later psychopathology as well as reduced school readiness, lower educational attainment, employment problems and involvement with the criminal justice system. Assessment of early language development is universally offered in many countries, but there has been little evaluation of assessment tools. We planned to compare the screening performance of two commonly used language assessment instruments.

Methods A pragmatic diagnostic accuracy study was carried out in five areas of England comparing the performance of two screening tools (Ages and Stages Questionnaire (ASQ) and Sure Start Language Measure (SSLM)) against a reference test (Preschool Language Scale, 5th edition).

Results Results were available for 357 children aged 23–30 months. The ASQ Communication Scale using optimal cut-off values had a sensitivity of 0.55, a specificity of 0.95 and positive and negative predictive values of 0.53 and 0.95, respectively. The SSLM had corresponding values of 0.83, 0.81, 0.33 and 0.98, respectively. Both screening tools performed relatively poorly in families not using English exclusively in the home.

Conclusion The very widely used ASQ Communication Scale performs poorly as a language screening tool, missing over one-third of cases of low language ability. The SSLM performed better as a screening tool.

  • health services research

Data availability statement

Data are available in a public, open access repository. The data from this project, together with other data from the study, will be available in the Newcastle University data repository. The protocol (initial accepted response to tender), the final report, the deidentified participant data and a data dictionary will be made available with publication from Newcastle University - My data ( The reports are available currently. The data will be available by March 2022. Data access will be subject to a data access agreement but beyond that will be fully accessible. Those wishing to access the data will be given support from the investigator team if required.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

What is known about the subject?

  • There is very little published research reporting the screening performance of the Ages and Stages Questionnaire (ASQ) Communication Scale compared with a reference test.

  • No studies have reported on the screening performance of the Sure Start Language Measure (SSLM).

  • No published research has reported good screening performance of questionnaires for speech and language development in primary care or in substantial population samples.

What this study adds?

  • The screening performance of the SSLM was superior to that of the ASQ Communication Scale.

  • Neither instrument performed well among families in which English was not the only language spoken at home.

  • Further work to refine the performance of language screening tools is required, particularly among families where English is not used exclusively at home.


Early language development is a good indicator of cognitive capacity.1 Satisfactory acquisition of language also requires an adequate social environment2 and an ability to give attention to the communication of others. Delayed language acquisition in the third year of life is a strong predictor of later psychopathology3–5 as well as reduced school readiness,6 lower educational attainment,7 8 employment problems3 and involvement with the criminal justice system.9

Given our knowledge of the natural history of language disorders10–12 and their potential mutability by social factors, it may seem surprising that language screening does not yet meet internationally accepted screening criteria.13 14 There is a lack of evidence on the benefit of early treatment15 and on cost-effectiveness,16 17 although there remains a widely held view that surveillance of language development is desirable and it has been adopted internationally.18 19

Parents are often best placed to report on their child’s skills in the very early years and commonly want to discuss any concerns they may have about their child’s development and language development, particularly around the age of 24 months, by which time parents compare their own child’s development with that of others. Where national policies on developmental assessments exist, they generally include the use of structured instruments to assess language acquisition in the third year of life.18 20 Many language screening measures exist, but few have been adequately evaluated or compared with others; these are essential first steps in building the case for or against universal screening.14 21 22

Here we compare the screening performance of two commonly-used language screening tools: the Ages and Stages Questionnaire (ASQ)23–25 and the Sure Start Language Measure26 (SSLM). The ASQ is translated into many languages and is probably the most widely used developmental screening tool internationally, although there is only one published study of the concurrent validity of its communication domain against a reference language assessment in a low-risk population-based sample,27 and the sample of children in the relevant age range is small. The SSLM has been developed for the English Sure Start programme26 and has been used in its large-scale evaluation. It has nevertheless only undergone one small-scale assessment against the Reynell Developmental Language Scales and has shown good predictive validity.5

In the present paper, we examine the concurrent validity of the ASQ and SSLM used with children aged between 23 and 30 months against a reference test, the Preschool Language Scale V.5 (PLS-5),28 in a substantial socially diverse sample in five areas of England.


The study was conducted as a planned component of a larger programme of work funded by Public Health England, which was designed to optimise a universal programme for language assessment and early intervention in England—‘Identifying and Supporting Children’s Early Language Needs’.29 The measure, the SSLM, which is used in the present paper, is an integral part of the Early Language Identification Measure (ELIM), created for the study in question. The screening data were collected during a scheduled review, offered by community child nursing (health visitor) teams as a mandated part of England’s Healthy Child Programme to all children at 2.0–2.5 years of age.23


Participants were children aged 23–30 months and their parents/carers living in five areas identified by Public Health England: Derbyshire, Middlesbrough, the London Borough of Newham, Wakefield, and Wiltshire. The five sites were selected as they were considered to have good information technology systems, and they reflected a broad spread of socioeconomic status with a good representation of more disadvantaged areas.


Parents in the five sites who were due to attend for their child’s health visitor review of 2.0–2.5 years were sent, via post, a study information leaflet and consent form alongside the Ages and Stages Questionnaire, Third Edition (ASQ-3), which forms part of the universal developmental assessment offer to parents of all children aged 2.0–2.5 years in England.23 Parents were asked to return their signed consent form to the health visitor during their review.

Across sites, the review took place in either the home or a clinical setting. The health visitor recorded the ASQ-3 Communication Scale score, administered the SSLM, and collected demographic data and information about language use in the home as part of the ELIM. All parents who had the ELIM completed were then provided with a date to attend a second assessment with a speech and language therapist (SLT). Within this appointment, the SLT completed the PLS-5 with the child. The SLT assessment was carried out within 2–4 weeks of the initial ELIM assessment. The SLTs were blind to the results of the ELIM. Not all parents attended for the SLT assessment, and as the study period spanned the early phase of the UK national COVID-19 lockdown, a small proportion of participants were unable to attend for this reason.


The ASQ-3 is a standardised parent-completed questionnaire used to screen for developmental delays in children25: it can be completed by parents in 12–18 min. Different versions are available for various ages: in the current study, these were the versions of 24, 27 and 30 months. There are five domains: fine motor, gross motor, communication, problem-solving and personal–social. Each domain contains six questions that can be answered with a yes (10 points), sometimes (5 points) or not yet (0 points), as well as nine open-ended questions. Scores obtained from each domain are compared with established cut-off points at 1 and 2 SDs that are used to identify children at risk of developmental problems. If the score on any domain falls below the 2-SD cut-off (‘refer’), referral for further assessment is advised. If the score on any domain is within the 1-SD and 2-SD cut-off point (‘monitor’), learning activities and monitoring of the child’s development are advised. We used data from the six communication domain questions for the present study. The ASQ-3 is a standard element in the Healthy Child Programme and the results are routinely reported in the UK.

The SSLM26 is a 50-item word list originally adapted from the MacArthur Bates Communicative Development Inventory for use in Sure Start programmes in England in the early 2000s that takes approximately 5 min to complete. The clinician administering the measure ticks a box next to each word if the parent reports that their child is able to say that word. A total word count out of 50 is then recorded. For children living in families where English is not used exclusively, equivalent words in the other languages are acceptable. There is no recognised screening threshold for the SSLM.

The PLS-528 is a structured assessment of receptive and expressive language with items that range from preverbal, interaction-based skills to emerging language. The assessment is detailed and requires practitioners to observe the child; assessment can take half an hour or more to complete. As part of the application of this measure, the person carrying it out has to have the requisite professional accreditation. In most cases, this measure would be carried out by an SLT who is who fully trained to administer and interpret standardised language assessments. The cut-off for low language ability was set at below or equal to the 10th centile following recent literature reporting a population prevalence of language disorder of 9.92%.30


Characteristics of children with full data and those without the Preschool Language Scale (PLS) assessment were compared using independent t-tests, analysis of variance, Mann-Whitney or χ2 tests.

The optimal cut-off scores for the SSLM and ASQ-3 Communication Scale questions were determined from a receiver operating characteristic (ROC) curve in relation to the pre-selected PLS-5 threshold. The ROC curve illustrates the trade-off between sensitivity and specificity for each screening score. Areas under the ROC curve were used as an assessment of screening performance. Optimal cut-off scores from the ROC curves (best trade-off between sensitivity and specificity, derived using Youden’s Index31) and the published above-threshold/monitor/refer classifications for the ASQ-3 Communication Scale score were used to obtain screening performance in terms of sensitivity, specificity, and positive and negative predictive values.

Data analysis was conducted using SPSS V.24.

Role of the funding source

The project was funded as ‘PHE - Corporate - Assessment Tool and Resources to Support Action by Health Visitors and Early Years Practitioners to Identify and Support Children with Early Speech, Language and Communication Needs ECM_6378’. The study sponsor (Public Health England) approved the study design but played no part in the collection, analysis and interpretation of data, in the writing of this article or in the decision to submit the paper for publication.

Patient and public involvement

Although there was substantial public engagement with the overall ELIM programme, members of the public were not involved in the design, analysis or interpretation of the research presented here.


Participant flow

Across the five sites, parents/carers of 894 children consented to participate in the study. Figure 1 describes the flow of participants and data. Paired ASQ-3 and SSLM data were available for 811 (91%) of the children for whom consent was obtained. A full matched dataset was obtained for 357 of these children who attended for the PLS-5 assessment.

Figure 1

Participant flow through the study. Figure created by coauthor JC. *ASQ communication subscale, SSLM 50-word list (Q9 of ELIM) and PLS V.5 total Language standard score. ASQ, Ages and Stages Questionnaire; ELIM, Early Language Identification Measure; PLS, Preschool Language Scale; SSLM, Sure Start Language Measure.

Characteristics of the children taking part in the initial assessment, the 454 with ASQ-3 and SSLM screening data but with no PLS data and of the 357 with a full dataset including PLS-5 Language classification, SSLM score and ASQ-3 Communication Scale category are presented in table 1. Children attending for the PLS assessment were broadly similar to those who did not attend, with almost identical mean SSLM and ASQ language scale scores, but those who had the PLS were slightly older and were more likely to be from exclusively English-speaking families, and their mean area deprivation scores were lower.

Table 1

Characteristics of children attending for initial health visiting team assessment for whom age-appropriate ASQ-3 and SSLM data are available, and for those who had subsequent assessment by a speech and language therapist

In most cases, the reasons for non-attendance for the SLT assessment are not known, although 39 participants were unable to attend in March 2020 due to COVID-19.

Seventy-seven per cent (290/376) of the children in this sample had PLS scores above the 10th centile threshold while 23% (86) had scores below the threshold.

The age-appropriate versions of the ASQ-3 allow automatic correction for age within the sample, but the SSLM score is not age corrected. Online supplemental file 1 presents the scores for the whole sample of 865 children with SSLM data; children gained on average four new words per month within the age range 23–30 months, but correction for age does not significantly change the screening performance of the instrument.

Supplemental material

The screening performance of the SSLM and ASQ-3 communication scales in relation to the PLS-5 cut-off is illustrated in figure 2.

Figure 2

ROC curves illustrating screening performance of the ASQ-3 communication score and the age-adjusted SSLM word list against the Preschool Language Scale V.5 reference test. Figure created by coauthor RR. ASQ, Ages and Stages Questionnaire; ASQ-3, Ages and Stages Questionnaire, Third Edition; ROC, receiver operating characteristic; SSLM, Sure Start Language Measure.

Table 2 gives the performance of the ASQ-3 and SSLM language screening tools using both the optimal cut-off points derived from the ROC curves and, for the ASQ-3, the predefined cut-off points between the ‘above-threshold’ (normal), monitor and refer categories.

Table 2

Screening performance of the SSLM score and the ASQ-3 communication score against the 10th population centile Preschool Language Scale V.5 cut-off

Online supplemental file 2 presents ROC curves for children of families speaking only English at home and for children of families speaking languages other than English (including bilingual families). Screening performance for both ASQ-3 and SSLM is poorer among families not using English exclusively at home (area under the curve (AUC) 0.740 and 0.764, respectively) than among families speaking English exclusively (AUC 0.895 and 0.912, respectively).

Supplemental material


Main findings

Using the lower predefined threshold for the ASQ communication (ASQ-3) domain, encompassing both the monitor and refer classifications, we missed the 35% of children with low language ability. The higher ASQ-3 refer threshold misses over half of such children. There appears to be little scope for improving the ASQ-3 screening performance through changing the threshold score category. The SSLM performs substantially better as a screening instrument, missing 17% of children with significant language pathology but with slightly poorer specificity than the ASQ-3 at this level of sensitivity. The SSLM threshold of 17.5/50 words appears to operate effectively across the age range of 23–30 months. The performance of both the ASQ-3 and SSLM is poorer among children living in families in which English is not the only language spoken at home than it is among children living in families in which English is spoken exclusively.


The findings are based in a real-world evaluation of an enhanced approach to a universal developmental review focused on language. The proportion (23%) of children attending for the PLS language assessment whose language attainment fell below the published population tenth centile is higher than anticipated but may reflect the characteristics of the five sites which were relatively socially disadvantaged.2 This proportion is nevertheless very similar to other studies of 2-year-old children’s language in community-ascertained samples. For example, the Early Language in Victoria Study32 found 19.7% of children at 2 years of age fell below the 10th centile cut-point using the Communication Development Inventories,33 suggesting samples in these studies may be more representative of the population than those used to norm the tests. The sample size was sufficient to draw sound conclusions, although only around half the children undertook the optional gold-standard language assessment. Overall, the differences between those children attending the initial assessment only and those attending for both assessments were relatively minor: both the original sample and the sample with full data available have socioeconomic profiles very similar to the English population. Our dataset incorporating both assessments is therefore likely to be broadly generalisable to the population of children attending the universal review of 2.0–2.5 years in England.


Although uptake of the developmental review of 24–30 months in England is around 78%, no data were made available to the research team about the families failing to attend this review. It is likely that families with significant psychosocial difficulties would be over-represented among these non-attenders,34 and this might have introduced some bias. Furthermore, as outlined previously, there was substantial attrition between the initial health visitor review and the research assessment carried out by the SLT. This level of attrition was probably unavoidable, given that the second research assessment was not part of routine care and involved an extra attendance by families. Finally, figure 1 illustrates that around 10% of the initial health visitor team reviews were associated with incomplete or inaccurate data recording—this is likely to reflect the pragmatic difficulties of evaluating development of a universal child health service innovation across five geographical areas with differing organisational configurations.

Relatively few children in the study (18%) were living in households where English was not spoken exclusively. The limited evidence we were able to gather suggests that both the ASQ-3 and SSLM performed more poorly in these groups. Further work is required with bilingual and non-English-speaking families, and in the meantime, we would recommend extreme caution in the interpretation of language screening results from these children.

Findings in context of the literature

Comparisons of developmental screening measures against a consistent reference test is rarely performed in low-risk populations20 23 27 but is useful in selection of effective instruments. Screening for poor language ability, although potentially valuable and very widely practised internationally,18 does not yet meet accepted criteria for screening programmes,13 and further work is required to demonstrate the effectiveness and cost-effectiveness of early interventions for screen-positive children. Our data suggest that the SSLM yields relatively robust results, and this could potentially be used for selecting participants for intervention in a trial of a comprehensive screening programme in the future.


The SSLM appears to perform well as a language screening instrument for children aged 23–30 months, but further developmental work may be required to optimise its performance, particularly among families in which English is not used exclusively in the home.

Data availability statement

Data are available in a public, open access repository. The data from this project, together with other data from the study, will be available in the Newcastle University data repository. The protocol (initial accepted response to tender), the final report, the deidentified participant data and a data dictionary will be made available with publication from Newcastle University - My data ( The reports are available currently. The data will be available by March 2022. Data access will be subject to a data access agreement but beyond that will be fully accessible. Those wishing to access the data will be given support from the investigator team if required.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and obtained a favourable ethical opinion and approval from West Midlands/Black Country Research Ethics Committee (reference 19/WM/0114) in May 2019. Participants gave informed consent to participate in the study before taking part.


The authors would like to acknowledge other members of the “Early Identification” team: Professor Sue Roulstone and Caitlin Holme from the Bristol Speech and Language Therapy Research Unit, Rose Watson from Newcastle University, and Sheena Carr and Renvia Mason from Public Health England for their support throughout the project. Finally, we would like to thank the children and their families who attended the clinics and the health visitors and speech and language therapists who were involved in collecting data for the project.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors PW drafted the manuscript. RR, JC and PW analysed the data. All authors contributed to the design of the study, edited the manuscript and approved the final submission. PW acts as guarantor for the paper.

  • Funding This work was funded by the Department of Education and managed by Public Health England and entitled Assessment Tool and Resources to Support Action by Health Visitors and Early Years Practitioners to Identify and Support Children with Early Speech, Language and Communication Needs ECM_6378.

  • Competing interests The Sure Start Language Measure was originally developed for the Sure Start Programme in England by the late project chief investigator Professor James Law.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.