Article Text

Original research
Development of a set of quality indicators in paediatric and perinatal care in Japan with a modified Delphi method
  1. Daisuke Shinjo1,2,
  2. Nobuaki Ozawa3,
  3. Naoya Nakadate4,
  4. Yutaka Kanamori5,
  5. Kimikazu Matsumoto6,
  6. Takashi Noguchi2,
  7. Shosuke Ohtera7,
  8. Hitoshi Kato8
  1. 1Department of Health Policy and Informatics, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
  2. 2Department of Information Technology and Management, National Center for Child Health and Development, Setagaya-ku, Tokyo, Japan
  3. 3Center for Maternal-Fetal, Neonatal and Reproductive Medicine, National Center for Child Health and Development, Setagaya-ku, Tokyo, Japan
  4. 4Division of Medical Security and Patient Safety, National Center for Child Health and Development, Setagaya-ku, Tokyo, Japan
  5. 5Division of Surgery, National Center for Child Health and Development, Setagaya-ku, Tokyo, Japan
  6. 6Children’s Cancer Center, National Center for Child Health and Development, Setagaya-ku, Tokyo, Japan
  7. 7Department of Health Economics, National Center for Geriatrics and Gerontology, Obu, Aichi, Japan
  8. 8San-ikukai Hospital, Sumida-ku, Tokyo, Japan
  1. Correspondence to Daisuke Shinjo; dshinjo.hci{at}


Backgrounds Few paediatric and perinatal quality indicators (QIs) have been developed in the Japanese setting, and the quality of care is not assured or validated. The aim of this study was to develop QIs in paediatric and perinatal care in Japan using an administrative database and confirm the feasibility and applicability of the indicators using a single-site practice test.

Methods We used a RAND-modified Delphi method that integrates evidence review with expert consensus development. QI candidates were generated from clinical practice guidelines (CPGs) available in English or Japanese and existing QIs in nine selected paediatric or perinatal conditions. Consensus building was based on independent panel ratings. The performance of QIs was retrospectively assessed using data from an administrative database at the National Children’s Hospital. Data between April 2018 and March 2019 were used, while data between April 2019 and March 2021 were also used for selected condition, considering the small number of patients. Each QI was calculated as follows: number of times the indicator was met/number of participants×100.

Results From the literature review conducted between 2010 and 2020, 124 CPGs and 193 existing indicators were identified to generate QI candidates. Through the consensus-building process, 133 QI candidates were assessed and 79 QIs were accepted. The practice test revealed wide variations in the process-level performance of QIs in four categories: patient safety: median 43.9% (IQR 16.7%–85.6%), general paediatrics: median 98.8% (IQR 84.2%–100%), advanced paediatrics: median 94.4% (IQR 46.0%–100%) and advanced obstetrics: median 80.3% (IQR 59.6%–100%).

Conclusions We established 79 QIs for paediatric and perinatal care in Japan using an administrative database that can be applied to hospitals nationwide. The practice test confirmed the measurability of the developed QIs. Benchmarking these QIs will be an attractive approach to improving the quality of care.

  • Health services research
  • Epidemiology
  • Information Technology

Data availability statement

No data are available.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Quality indicators (QIs) and their benchmarks between hospitals are widely used in many healthcare areas to improve the quality of care.

  • Few paediatric and perinatal QIs have been developed in Japanese settings to date; the quality of care in paediatrics and perinatal care in Japan is not validated.


  • A total of 79 QIs for paediatric and perinatal care in Japan were established based on an administrative database using the RAND/UCLA appropriateness method.

  • The practice test based on one institution confirmed the measurability of the developed QIs.


  • Developed QIs can be applied to nationwide hospitals, providing useful information that could contribute to improving quality of care and updating the health provision system and health policy.


Many efforts have been made to improve the quality of care; these efforts include the identification of specific, measurable indicators of quality. Quality indicators (QIs) are measurable elements of practice performance for which there is evidence or consensus and which are systematically developed with guidelines that can be used to assess the appropriateness of specific healthcare services and outcomes. They can highlight and reveal quality issues as a benchmark, which in turn helps provide solutions.1–3

These QIs and their benchmarks between hospitals are widely used worldwide, including in paediatric and perinatal care.4–8 For example, Braithwaite et al developed paediatric QIs in Australia, focusing on 17 clinical conditions, and revealed that the overall prevalence of adherence to quality-of-care indicators was not high.4 However, due to the variations in healthcare systems and culture, direct application of foreign-developed QIs may not be suitable for different countries like Japan, where limited paediatric and perinatal QIs exist and care quality remains unvalidated.9–12

The hospital information system is a rich data source for QI calculation, but differing vendor data structures pose challenges requiring extensive preparation efforts.13 Alternatively, administrative databases with standardised data structures offer an efficient, population-based approach to monitoring care quality and have been used to compare QIs in prior studies.6 14–17 Prior to publication and benchmarking, practice tests are essential for evaluating feasibility.18 Accepting QIs must undergo review, as some may not be measurable or applicable in certain healthcare systems.19

This study’s primary aim was to develop paediatric and perinatal QIs in Japan using an administrative database, facilitating nationwide benchmarking to improve care quality as part of a quality improvement framework. We selected perinatal care instead of obstetrics care, considering its association with children. A secondary objective was to conduct a pilot practice test at one hospital to confirm QI feasibility and applicability.

Materials and methods

Overview of the development process

The project followed a process based on a modified Delphi technique (the Rand Corporation (RAND)/University of California, Los Angeles appropriateness method),20 which has been widely used to develop QIs.21 We applied and modified the development process based on previous literature regarding developing QIs.4 22 23 The method integrates an evidence review, a face-to-face multidisciplinary panel meeting and repeated anonymous ratings to build consensus.24 The development and ratification of QIs are depicted in figure 1.

Figure 1

Flow diagram of QI development and ratification. QIs, quality indicators.

Multidisciplinary panel

Panel members, comprising seven healthcare professionals—five paediatricians, one obstetrician and one public health specialist—selected the conditions for QI and consensus development. The sampling strategy was a non-random selection aimed at seeking participants who would be informative, with recommendations by DS and approval by TN and HK. We selected members who had held any one of the following responsibilities: being a board or committee member of the medical academic society, engaged in guideline development in selected conditions and having outstanding research achievements in related research fields.

Selection of conditions for QI development

Four categories (patient safety, general paediatrics, advanced paediatrics and advanced obstetrics), followed by nine conditions regarding paediatric and perinatal care, were identified based on published research, the burden of disease, frequency of presentation and national priority areas. The term ‘advanced’ was used for conditions that are usually cared for specialised centres such as tertiary hospitals. These included conditions with high prevalence, such as paediatric bronchial asthma, neonatal respiratory care, caesarean sections and Kawasaki diseases. Kawasaki disease is one of the major diseases in Japan, with more than 28 500 patients identified between January 2019 and December 2020.25 These categories and conditions were identified through discussions at the panel meeting. Two conditions among these retain two subconditions: (1) ‘rare diseases’, consisting of acute lymphoblastic leucaemia and congenital diaphragmatic hernia, and (2) ‘acute abdomen’, consisting of intussusception and appendicitis.

Systematic search for evidence

Because de novo development of evidence-based QIs is costly and time-consuming, methods using existing clinical practice guidelines (CPGs) have gained interest as viable alternatives.26 Thus, for selected paediatric and perinatal conditions, we retrieved existing CPGs and QIs available in English or Japanese from QI/guideline databases and one medical literature database (PUBMED) in 2020, which was published between April 2010 and March 2020 (details of the search formula are shown in online supplemental etable 1). The following QI/guideline databases were used: Agency for Healthcare Research and Quality (USA), National Quality Forum (USA), National Institute for Health and Care Excellence (NICE, UK), Australian National Health and Medical Research Council, Minds with the Japan Council for Quality Healthcare and the National Hospital Organisation (Japan). Websites regarding these selected conditions, including those of related paediatric associations, were also reviewed. Furthermore, we manually searched to identify literature that might be relevant to this study.

Supplemental material

Indicator development

Recommendations in the selected guidelines were extracted from the CPGs. Each recommendation was screened for eligibility from the viewpoints of (1) strength of recommendation (relatively strong in each condition); (2) validity and adequacy in actual clinical practice in Japanese settings; and (3) feasibility of defining indicators using administrative databases. Recommendations that did not match the above-mentioned three criteria were excluded. The remaining recommendations were then converted into a standardised indicator format using the modified American College of Cardiology/American Heart Association methodology.22 Existing QIs were also converted into a standardised indicator format. We designed QIs based on the format of the Japanese administrative database.

Subcommittees for each condition

An expert coordinator was appointed to review the proposed indicators under each condition. The subcommittee consisted of five experts recruited to undertake a review of the proposed indicators. Three experts participated in the first-step ratings, and two experts participated in the second-step ratings. They assessed the proposed indicators based on their selected subcommittees’ conditions, while members of the multidisciplinary panel were responsible for assessing all proposed indicators.

Expert consensus process

The proposed indicators were reviewed and ratified by experts from subcommittees in each condition and a multidisciplinary panel with two-step, two-round independent ratings. Three experts from subcommittees reviewed the proposed indicators in the first-step rating, while nine members (two experts from subcommittees (in each condition) and seven panel members) reviewed the indicators in the second-step rating. During each step/round, members rated the appropriateness of each QI candidate on a 9-point scale, where 1 and 9 represented ‘least suitable’ and ‘most suitable’, respectively. QIs were adopted according to the following criteria: the median individual rating in each round/step was >7; the number of members who scored <3 was one or fewer in the first-step rating and <3 was two or fewer in the second-step rating. In addition, members were given the opportunity to provide comments or suggest additional candidates, especially with regard to supporting panel members for ratings. In round 1, members individually evaluated indicators with a set of documents that described the QIs adapted from nine domains: evidence-based, interpretable, actionable, denominator, numerator, validity, reliability, feasibility and overall assessment.22 In round 2, they convened for a web-based or face-to-face meeting to discuss, revise and individually evaluate the proposed indicators, anonymously sharing their results from the first round. If additional candidates were presented after the meeting, they discussed them via email using the same postal questionnaire.

We have explained the development process to panel and subcommittee members in detail, and all members agreed to the process. The features of the study are as follows: (1) QIs would be developed in a wide range of paediatric and perinatal care; (2) QI candidates were developed by guidelines and existing QIs so that each condition may have different quality standards and/or levels of evidence and (3) assessment of QI candidates was two-round-two-step ratings by subcommittees’ specialists (first step and second step) and panel members (second step).

Pilot practice test for feasibility

We used data from the Japanese Administrative Database, the Diagnosis Procedure Combination (DPC) per-diem payment system (details of the DPC have been described elsewhere).27 In brief, the DPC is a case-mix patient classification system linked to payments at acute-care and mixed-care hospitals in Japan. Anonymous clinical and administrative claims data were included in the database. Clinical data comprised baseline patient information, diagnosis (based on ICD-10) and detailed medical information, including all major or minor procedures, medication and device use. The DPC database was used for many epidemiological research including QIs.9–12 27 28 A programme for data checking was provided by the government to support the quality of the data, and the data were uniformly collected.

We collected DPC data from the National Center for Child Health and Development, which is the only national children’s hospital, between April 2018 and March 2019. We further collected data on QIs for rare diseases between April 2019 and March 2021, considering the small number of patients. For each indicator, percentage scores (QIs) were calculated as follows: number of times the indicator was met/number of participants (excluding those who had obvious reasons for not implementing the process as defined by the indicator) ×100. The medians of the indicator scores were also computed as the overall quality score of the programme. To ensure feasibility, we further checked the data in cases where the percentage scores were lower than 5%. Data processing was performed using the Microsoft SQL server (Microsoft Corporation, Seattle, WA, USA).


Development of quality indicators

From the literature review, we extracted 124 CPGs and 193 indicators. A total of 132 QI candidates pertaining to the Japanese healthcare system were aggregated from the CPGs and existing QIs after the exclusion of those screened for eligibility. All panel members and subcommittee experts from 21 institutions agreed to participate in this study. All responded to the email-based surveys, attended meetings and owned their responsibilities. The consensus development process was completed in October 2021.

Figure 1 illustrates the QI development process. The first-step ratings resulted in 98 QIs being selected from among the 132 QI candidates; 33 indicator candidates were not adopted. The second-step ratings selected 78 QI candidates and added a new indicator. Some QI candidates were modified and included as additional candidates, following the suggestions of experts, and reassessed. Consequently, 79 QIs were established (figure 2). These 79 QIs comprised 19 QIs for patient safety, 23 for general paediatrics, 23 for advanced paediatrics and 14 for advanced obstetrics (figure 3). Among the 79 QIs, 76 are process measures for expressing the proportion of patients who received appropriate care; the remaining 3 QIs are outcome measures. Examples of the indicators are shown in table 1 (a full list is in online supplemental etable 2).

Table 1

Examples and characteristics of the proposed QIs for paediatric and perinatal care in Japan

Figure 2

QI candidates and adopted QIs. QIs, quality indicators.

Figure 3

Number of QI candidates and adopted QIs by condition. *Including newly added QI candidates. 1Acute abdomen consist of intussusception and appendicitis. 2Rare diseases consist of acute lymphoblastic leukaemia and congenital, diaphragmatic hernia. QIs, quality indicators. QIs, quality indicators

Performance in the pilot practice test

The results of the pilot practice tests are presented in table 2. Seven QIs (rare diseases) used data for fiscal years between 2018 and 2020, and the remaining 72 QIs used data for fiscal year 2018. Unfortunately, three QIs among the 79 QIs were not calculated (PS12, pulse oximeter monitoring in paediatric sedation during MRI examination; NRC02, budesonide inhalation for neonates at very high risk of bronchopulmonary dysplasia and RDA03, morphological bone marrow examination and diagnosis by a haematologist (for acute lymphoblastic leukaemia)). Most QIs were process-level due to the characteristics of the database and its feasibility. There were wide variations in the process-level performance of QIs in four categories for process measures (patient safety: median 43.9%, IQR 16.7%–85.6%, general paediatrics: median 98.8%, IQR 84.2%–100%, advanced paediatrics: median 94.4%, IQR 46.0%–100% and advanced obstetrics: median 80.3%, IQR 59.6%–100%). The details of each QI and the results of the pilot test are presented in online supplemental e table 2.

Table 2

Performance of proposed QIs for paediatric and perinatal care in Japan


We developed 79 QIs for paediatric and perinatal care using a RAND-modified Delphi method based on CPGs and existing QIs. These QIs are designed to be defined based on the Japanese administrative database. They have the advantages of being a cost-effective source of information, reducing the risk of selection bias and being easily adopted by participating hospitals. The use of QIs has the potential to raise the standards of quality in paediatric and perinatal care in Japan. Our single-site practice test showed that most of the proposed QIs were measurable in real-world clinical practice using administrative databases and that there was a wide variation in their performance.

QIs and their analyses are beneficial for identifying gaps and areas for improvement, informing the creation of future best practices, tracking progress in quality improvement and providing insights into the management of each condition. Poor adherence may affect patient outcomes or lead to a substantial worsening of disease, death and increased healthcare costs. Adherence gaps and practice variation persist despite decades of development and endorsement of CPGs, and efforts have been made to close this gap.29 Efforts to enhance the quality of guidelines, measure (monitor) and analyse QIs and improve adherence to guidelines are essential for achieving better health service provision. QIs could provide useful information regarding institutional-level and regional-level disparities in quality of care, which could contribute to updating health policy and improving the health provision system. While most QIs are localised considering the difference in healthcare systems and cultures, it would be helpful to consolidate QIs and related information worldwide considering generalisability; comparison of measurable QIs between countries/regions would also help quality improvement.

Our practice test was based on one institution but included all indicators developed in this study. Three proposed QIs were not calculated, partly because the data were not recorded in healthcare insurance regulations. Some QIs had a smaller number of patients than expected, which is partly because of coding practices (eg, ‘Acute focal bacterial nephritis’ tended to be coded as ‘Acute pyelonephritis’). Results also showed that the Japanese Administrative Database is one of the appropriate sources of information for QIs. Further benchmarking of QIs would be attractive in the Japanese setting, while it seemed hard to compare QIs across countries using administrative databases without a common data structure.30

This study has several implications for future research. First, the feasibility of using these QIs in other hospitals needs to be evaluated. Second, the use of the proposed QIs needs to be evaluated; future studies should assess if and how these QIs contribute to quality improvement, including changes in the behaviour of physicians and the frequency of unwarranted events. Third, implementing these QIs requires continuous updates, evaluations and adaptation.21

Strength and limitations

One of the main strengths of this project is the use of routinely collected administrative data, which enables the monitoring of quality issues throughout the healthcare system. Second, administrative data are relatively inexpensive to collect compared with primary data collection.31 Although QIs may be biased by intrahospital heterogeneity,32 we do not suffer from sampling bias, which avoids non-response and recall bias.33 The final strength of the project is the inclusion of a data collection and calculation phase to assess the feasibility of indicator measurement.

Despite the advantages, the study acknowledges certain limitations. The composition of panels and subcommittees may influence consensus development, potentially leading to a biased selection of QIs. Although a two-step, two-round assessment was implemented to reduce bias, the representativeness of panel members could still impact the validity of the consensus method. Notably, patient perspectives were not adequately included due to the absence of patient representation in the panels. Indeed, patient participation in QI development has been limited.34 35 This may be a general limitation of QI development based on guidelines; however, it needs to be resolved for further studies, such as updating these QIs.

Furthermore, the study focused on QIs applicable to the administrative database, excluding aspects of quality not relevant to this specific data source. The accuracy of the proposed QIs may have some limitations due to the coding process, even though efforts were made to ensure measurability by medical experts familiar with the database. Although the DPC (administrative) database was used for epidemiological studies including QIs9–12 27 28 with efforts on its validation studies,36 differences in coding practices between hospitals could affect QI results, and the limited generalisation of indicator performance due to a single-site practice test also needs to be considered. Further practice tests with more hospitals were also required to strengthen its feasibility and assess its reliability and adaptability. In addition, the proposed QIs did not include timely administration of medications/procedures due to the unavailability of hours-level data.

It would be preferable to a set of QIs where the link between process and outcome is established. However, we have developed QIs without consideration of the linkage between process measures and outcome measures. This is partly because it is not easy to define appropriate outcome measures related to process measures using the administrative database. For example, the validation regarding adverse-related ICD-10 codes (T79-T88) is not established, and the timing of diagnosis is lacking in a Japanese setting. Further efforts focusing on the linkage would be an attractive approach to facilitating quality improvement.

Moreover, evidence-based QIs are only as robust as the underlying evidence they are based on. Existing problems with CPGs, such as redundancy, lack of currency and concerns about the quality of evidence, may systematically under-represent or overrepresent certain aspects in specific clinical areas.

Finally, it is essential to recognise that evidence-based QIs can only be as reliable as the underlying evidence they are based on. CPGs have been extensively studied, revealing issues such as redundancy, outdated information, inconsistent structure and content and overly lengthy documents. Moreover, concerns persist regarding the quality of evidence supporting CPGs, leading to potential under-representation or over-representation of certain aspects in specific clinical areas.

In conclusion, this study offers valuable insights into the development and application of QIs for paediatric and perinatal care in Japan. While the use of administrative data is advantageous, there are notable limitations related to consensus development, patient perspectives, database relevance, coding accuracy, coding practices and evidence-based QIs. Awareness of these limitations helps to ensure the appropriate interpretation and utilisation of the proposed QIs in enhancing healthcare quality.


We have established 79 QIs for paediatric and perinatal care in Japan, drawing from pre-existing international CPGs and QIs and through an informal expert consensus process. These QIs can be applied to nationwide hospitals using an administrative database. While the practice test was conducted at a single site, it proved valuable in confirming the measurability of QIs and demonstrated how incentives, such as insurance coverage, can enhance performance in clinical practice and documentation processes. In the pursuit of improving the quality of paediatric and perinatal care, benchmarking these QIs emerges as an attractive approach.

Data availability statement

No data are available.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants. This study was approved by the Institutional Review Board at the National Center for Child Health and Development (2021-146).The board waived the requirement for the acquisition of informed consent from the patients owing to the anonymisation of patient data.


The authors wish to extend their gratitude to everyone involved in the study at the National Center for Child Health and Development and Tokyo Medical and Dental University for their cooperation. Special thanks also go to Professor Susumu Yokoya of the Fukushima Medical University for his inestimable comments and suggestions and to Wataru Mimura of the National Center for Global Health and Medicine for his valuable insights and continuous support.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors All the authors contributed to the conception and design of the study. Material preparation, data collection and analysis were performed by DS, NO, NN, YK and KM. The first draft of the manuscript was written by DS; all authors commented on the previous versions of the manuscript and participated in its revision for intellectual content. All authors have read and approved the final manuscript and agreed to be accountable for all aspects of the work. Guarantor: DS.

  • Funding This study was supported by a Grant-in-Aid for Scientific Research (B) from the Japan Society for the Promotion of Science (JSPS KAKENHI, 20H03921 (to DS)), the Council for Science, Technology and Innovation, Cross-ministerial Strategic Innovation Promotion Program, 'Innovation AI Hospital System' (Funding Agency: National Institute of Biomedical Innovation, Health and Nutrition (NIBIOHN)) (SIP 18087257 (to DS)) and a grant from the National Center for Child Health and Development (2019b-7 (to TN)).

  • Competing interests None declared

  • Patient and public involvement The board waived the requirement for the acquisition of informed consent from the patients owing to the anonymisation of patient data.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.