Analysis and validation of clinical subgroups of Kawasaki disease in children in China: a retrospective study
•,,,,,,,,,,,,.
...
Abstract
Objective Although Kawasaki disease (KD) is commonly regarded as a single disease entity, clinical subgroups have recently been described. We aimed to validate previous research on clinical subgroups and establish a KD subgroup differentiation model specific to China.
Methods We analysed clinical data of 1682 patients diagnosed with KD at the Kunming Children’s Hospital from December 2014 to December 2022. We performed principal component analysis and hierarchical clustering on 13 continuous variables. Then, we grouped the patients based on the optimal number of clusters and analysed the clinical characteristics of each subgroup.
Results We ultimately identified three subgroups. In cluster 1, younger patients predominantly exhibited the highest risk of coronary artery aneurysm and the lowest rate of intravenous immunoglobulin resistance. Cluster 2 was characterised by high inflammatory markers and a lowered risk of coronary artery aneurysm. Cluster 3 was characterised by liver involvement, with significant elevations in liver enzymes, gamma-glutamyl transferase and total bilirubin. We found a positive correlation between the ratio of the rising trend and intravenous immunoglobulin resistance. Cluster 1 and cluster 3 shared similarities with the previously identified younger age subgroup and liver subgroup, respectively, whereas cluster 2 was unique to our study.
Conclusions Our study preliminarily validated a previous KD subgroup study and established a KD subgroup model in China.
What is already known on this topic
Previous study has laid the foundation for the subgroups of Kawasaki disease, but further validation of the reproducibility and universality of these subgroups is still needed.
What this study adds
Our study identified three subgroups of Kawasaki disease in China and provided a certain level of reproducibility for previous subgroup studies.
How this study might affect research, practice or policy
Our study established the necessity of subgroup analysis for Kawasaki disease, contributing to the development of precision medicine in the clinical subgroups of Kawasaki disease.
Introduction
Kawasaki disease (KD), also known as mucocutaneous lymph node syndrome, is an acute immune-mediated vasculitis. It predominantly affects children under the age of 5 and can occur throughout the year.1 2 KD can lead to coronary artery lesions (CAL), which have become a common acquired heart disease in certain countries and regions. Intravenous immunoglobulin (IVIG) has been proven to be effective in the treatment of KD and prevention of CAL, but some children may develop resistance to IVIG.2 Additionally, adverse outcomes such as liver injury, anaemia and jaundice have also been continuously reported.3–6 Because KD exhibits diverse clinical presentations and outcomes, it is necessary to conduct further subgroup studies to enhance our understanding of KD. These studies can contribute to the improvement of KD diagnosis and treatment.
Recently, Wang et al conducted a study in which they performed a cluster analysis on 1016 patients with KD from the REDCap database at the KD Research Center, University of California San Diego.7 They classified the patients into four subgroups and then summarised the clinical characteristics of each subgroup. The conclusions drawn from this study were consistent with clinical experience. Their study pioneered the discussion on the subdivision of patients with KD into subgroups. However, relying solely on cohort studies conducted in a single country or specific population may not provide sufficient reproducibility. This is particularly important considering that KD is known to have a higher prevalence in non-East Asian regions. Therefore, it is necessary to validate the results of this study through research conducted in multiple centres and populations in order to ensure its reproducibility.
Therefore, we aim to validate the reliability and generalisability of their research and establish a subgroup analysis model for KD through our study. This study will provide a theoretical foundation for the clinical diagnosis and treatment of KD.
Materials and methods
Participants
Our study retrospectively collected KD cases at Kunming Children’s Hospital from December 2014 to December 2022, along with their relevant clinical data, through our hospital’s medical record system. All cases were strictly diagnosed as complete KD (CKD) and incomplete KD (IKD) according to American Heart Association (AHA) criteria. Only patients with KD who received their first visit and had complete required data were included in this study.
Comparison of clinical features
Using the lead provided by the Wang et al study, we selected 13 variables.7 Therefore, we included demographic characteristics (age at onset, gender and race), physical examination findings related to KD, relevant laboratory test results (such as complete blood cell count and manual differential count at diagnosis, biochemistry, erythrocyte sedimentation rate (ESR) and C reactive protein (CRP) levels), time of diagnosis, number of days of illness at the time of the first IVIG infusion and cardiac echocardiography results (including measurements of coronary artery diameter by echocardiography at diagnosis and within 2 months of diagnosis) in the retrieved data. We standardised the haemoglobin concentration based on age and expressed the cardiac echocardiography results as Z-scores (coronary artery diameter adjusted for body surface area).8 The complete clinical information of these cases can include in the construction of the model.
To maintain consistency with our study, we also adopted the AHA definition: we classified KD cases based on the maximum Z-score of the left anterior descending coronary artery and the right coronary artery within 60 days after fever onset. We defined normal as Z<2, dilatation as Z≥2 to <2.5 and aneurysm as Z≥2.5. In addition, we defined resistance to IVIG as having continuous or recurrent fever for at least 36 hours after the first IVIG injection.1
This study used R software (V.4.2.3) for all analyses. Using mean±SD, we described normally distributed continuous variables and conducted a comparison between multiple groups using one-way analysis of variance (function ‘oneway.test’). Using median with IQR (P25, P75), we compared groups for non-normally distributed variables using the non-parametric Kruskal-Wallis test. We presented categorical variables as proportions and determined differences between categorical variables using the χ2 test. We considered a p value <0.05 as statistically significant. We used the ggplot2 package to perform data visualisation.
Cluster analysis of clinical phenotypes
Based on their study,7 we selected 13 variables for subsequent clustering analysis, including one demographic characteristic (age at onset), nine laboratory results (age-corrected haemoglobin Z-score before treatment, white cell count, platelet count, neutrophil percentage, lymphocyte percentage, ESR, CRP, alanine transaminase (ALT) and gamma-glutamyl transferase (GGT)) and the Z-score of the coronary artery segment involved before treatment. In order to test our hypothesis regarding the clustering of clinical features of patients with KD, we used the factoextra package to calculate the Hopkins statistic. The Hopkins statistic quantified the clustering tendency of the phenotype data, where a value closer to 1 indicated a higher data clustering tendency, a value closer to 0.5 suggested that the data set was less suitable for dimensional reduction using principal component analysis (PCA) and a value closer to 0 indicated even distribution of the data, suggesting it was not suitable for PCA dimensional reduction. We used the FactoMineR package to perform PCA analysis, which involved normalising the data, conducting PCA, hierarchical clustering, grouping the hierarchical tree according to the optimal number of clusters and merging the results. As in their study, we determined the optimal number of clusters by comparing inertia.
Results
We recruited a total of 1682 patients with KD, including 1230 (73.13%) patients with CKD and 452 (26.87%) patients with IKD for our study. Among them, 1064 (63.26%) patients were male, and 618 (36.74%) patients were female. The ethnic background of the patients included 1431 (85.08%) patients of Han ethnicity and 251 (14.92%) patients from ethnic minorities. The average age at onset of the patients was 2.47±1.89 years.
Then, we conducted a cluster analysis. First, we calculated the Hopkins statistic, which yielded a value of 0.907, indicating a significant clustering tendency and distinct subgroups among the 13 phenotypic data (figure 1A). Then, we determined the optimal number of clusters to be 3 (k=3) based on the inertia calculation (figure 1B,C). Moreover, we achieved dimensional reduction through PCA. Principal component 1 explained 21.2% of the variance and captured the variance caused by inflammation-related markers such as neutrophil percentage and lymphocyte percentage, while principal component 2 explained 13.1% of the variance and largely explained the baseline coronary artery measurements, ESR and platelet count (figure 1A). We displayed the correlation between the three subgroups and various clinical features in a heat map (figure 1D).
(A) Three-dimensional map of hierarchical clustering and loading plot of variables on principal component analysis (PCA). (B) The optimal number of clusters determined by the calculated inertia. (C) Dendrogram of hierarchical map and identified clusters. (D) The correlation heat map of variables with clusters. %Lymph, percentage of lymphocytes; %PMN, percentage of polymorphonuclear leucocytes (neutrophils); ALT, alanine transaminase; CRP, C reactive protein; ESR, erythrocyte sedimentation rate; GGT, gamma-glutamyl transferase; IVIG, intravenous immunoglobulin; Plts, platelets; WBC, white blood cell; Z-Hgb, age-adjusted Z-score of haemoglobin; Z-LAD, Z-score of left anterior descending artery; Z-RAD, Z-score of right anterior descending artery.
Based on the principal component hierarchical clustering algorithm, we identified three patient subgroups with different clinical characteristics. These subgroups were as follows (table 1): Cluster 1: This subgroup was characterised by a younger age at the onset of KD. They also had elevated levels of inflammatory markers such as ESR, CRP and white cell count. Additionally, they exhibited the lowest Z-score for left coronary artery dimension. Cluster 2: This subgroup was characterised by higher ESR and CRP. Cluster 3: Patients in this subgroup showed liver involvement along with elevated levels of alanine aminotransferase, GGT and total bilirubin (TBIL).
Table 1
|
Summary of cluster-specific clinical characteristics
In terms of clinical manifestations, there were significant differences among the three subgroups (table 2). Specifically, there were differences in the occurrence of rash (p<0.001), conjunctival congestion (p<0.001), mucosal involvement (p<0.027) and cervical lymphadenopathy (p<0.001) (table 2). Cluster 1 had the lowest proportions in all four clinical manifestations, while cluster 2 was more likely to present with conjunctival congestion and mucosal involvement. Cluster 3, on the other hand, had a higher likelihood of having rash, cervical lymphadenopathy and conjunctival congestion. These findings were summarised in table 2.
Table 2
|
Demographic features, clinical presentation and clinical laboratory test results of the total cohort and each subgroup
The subgroups showed significant differences in terms of the outcomes of coronary events and resistance to IVIG. In cluster 1, patients had the highest risk of coronary artery involvement, including the highest left anterior descending artery Z-score at diagnosis, right coronary artery Z-score, Z-score within 2 months of diagnosis and incidence of coronary artery aneurysm formation. On the other hand, cluster 2 had the lowest risk of coronary artery involvement (p<0.001) (figure 2A). Cluster 3 had the highest occurrence of resistance to IVIG (p<0.001), while cluster 1 had the lowest occurrence (p<0.001) (figure 2B). Moreover, patients in cluster 1 were most likely to be diagnosed with IKD (post hoc χ2 test, p<0.001), whether based on echocardiography or laboratory findings (figure 2C). Patients in cluster 1 exhibited a unique laboratory test profile at diagnosis, with lower white cell count, lower neutrophil count, high lymphocyte percentage, high platelet count and lower CRP and ESR levels (table 2). These profiles may be related to the unique blood cell composition in young children. Conversely, cluster 2 demonstrated significant increases in inflammatory markers, including white cell count, ESR and CRP levels (table 2). In contrast, patients in cluster 3 were mainly distinguished by elevated ALT, GGT and TBIL levels, which were the key features of this subgroup (table 2). Interestingly, a positive correlation was observed between the ratio of the fifth quantile level of these three liver and biliary biomarkers and the occurrence of resistance to IVIG in all patients in the cohort (figure 2D). In summary, when compared with the results of Wang et al’s study, we found that cluster 1 and cluster 3 were in line with their findings regarding age subgroups and liver subgroups. However, cluster 2 exhibited differences, showing a higher proportion of neck lymph node swelling and an increased risk of coronary artery abnormalities.
(A-C) Coronary artery outcome, intravenous immunoglobulin (IVIG) resistance and Kawasaki disease (KD) including CKD (complete Kawasaki disease) and IKD (incomplete Kawasaki disease)diagnostic criteria based on American Heart Association (AHA) guidelines in different clinical subgroups among the patients. (D) OR for IVIG resistance per quintile of alanine transaminase (ALT), Total Bilirubin (TBIL) and gamma-glutamyl transferase (GGT).
Discussion
Our study revealed clinical subgroups of patients with KD in China through unsupervised clustering analysis, validating the heterogeneity of KD and providing an important foundation for future precision medicine and personalised treatment. We also demonstrated that the subgroup study of KD by Wang et al has a certain level of reproducibility. Although subgroup analysis based on clinical data has uncovered the complexity of KD, future research needs to integrate genetic and molecular biology tools to identify and understand these subgroups more accurately.
The importance of unsupervised clustering analysis
Unsupervised clustering analysis in our study revealed three subgroups of patients with KD. These subgroups exhibited significant differences in clinical characteristics, indicating that KD is not a singular disease. Compared with Wang et al’s research, subgroups 1 and 3 were identified to resemble their young age subgroup and hepatic subgroup, respectively, while subgroup 2 shows unique features, suggesting that the heterogeneity of patients with KD may vary due to regional and racial differences.9 The value of the current study lies in using clinically available data for foundational evidence, but it should be understood as a preliminary approximation rather than a final conclusion. Future research should reliably identify these subgroups through genetic or molecular features to further elucidate the underlying substructure of KD.
Cluster 1: young subgroup
Cluster 1 patients are younger with higher percentages of lymphocytes and platelet counts, but lower levels of CRP and ESR. This may be attributed to unique haematological characteristics and immature immune systems in young children. Due to difficulties in symptom communication, these patients are prone to delayed diagnosis, increasing the risk of CALs. Our study findings indicate that cluster 1 patients have the highest risk of coronary artery aneurysms (21.77%) and the lowest rate of IVIG resistance (5.41%). This aligns with previous research, further validating the clinical characteristics of this subgroup.10 Moreover, young patients, due to unclear symptom expression, often experience delays in diagnosis and treatment, increasing the risk of disease progression.9 Whether this age group has a higher incidence of incomplete or atypical presentations leading to misdiagnosis warrants further investigation.
Cluster 2: inflammatory subgroup
Cluster 2 is characterised by elevated levels of inflammatory markers such as white cell count, CRP and ESR. This subgroup shares similarities with Wang et al’s node subgroup but exhibits higher rates of cervical lymphadenopathy and lower risk of CALs in our study. Despite typically being associated with more severe vascular inflammation,11 12 the lower incidence of CALs in this subgroup may be due to early effective treatment control. Our research indicates an IVIG resistance rate of 7.12% in this subgroup, which is moderate.13 14 Furthermore, patients in cluster 2 show distinct clinical manifestations from other subgroups, suggesting this subgroup may represent a unique pathophysiological process of KD, necessitating further research to explore its mechanisms and clinical management strategies.
Cluster 3: hepatic subgroup
Patients in cluster 3 exhibit significant liver involvement characterised by elevated levels of ALT, GGT and TBIL. This aligns with Wang et al’s hepatic subgroup. Our research shows that patients in this subgroup have the highest IVIG resistance rate (10.4%), and it is more common in older children. Liver involvement in this subgroup may be due to the extension of vasculitis caused by KD to the hepatic vasculature. Patients in cluster 3 are more prone to liver dysfunction and IVIG resistance compared with other subgroups. These findings further support the importance of liver involvement in KD and suggest that future research should focus more on the relationship between these clinical markers and IVIG resistance.15 16 Additionally, we found that liver involvement may be associated with the severity and treatment response of patients with KD, indicating the importance of liver function assessment in KD management.17
Advantages and limitations of the study
Our study used retrospective data analysis, providing crucial preliminary evidence on clinical subgroups of patients with KD in China. Additionally, the study boasted a large sample size (1682 patients), showcasing significant differences among subgroups in terms of age, inflammatory markers, liver enzyme levels and coronary artery outcomes, which offers valuable insights for future research.18 However, the study also has several limitations. First, the retrospective nature of the data analysis may introduce issues related to data completeness and accuracy, particularly requiring a more detailed discussion on methods of data collection and the implications of incomplete data exclusion. Second, the study is based solely on data from a single region, potentially limiting the generalisability of the findings. Furthermore, there is a need for a deeper exploration of the characteristics and number of excluded patients due to incomplete data, to assess their potential impact on the study outcomes. Future research could enhance the reliability and applicability of these findings by increasing sample sizes and conducting prospective studies across multiple centres and ethnic groups.
Conclusion
Our study revealed three clinical subgroups of patients with KD in China through unsupervised clustering analysis, providing initial validation of KD’s heterogeneity. These findings lay an important foundation for future research combining genetic and molecular biology tools. They contribute to improving the diagnosis and treatment of KD and advancing precision medicine.