Article Text
Abstract
Background Autism spectrum disorder (ASD) is a diverse neurodevelopmental disease primarily distinguished by limited and stereotyped activities as well as impaired social interaction. Due to the high heritability of ASD, research on the disorder has emphasised on identifying the underlying genetic and epigenetic aetiology. Many ASD loci have been identified by genome-wide association studies (GWASs). However, GWASs are more susceptible to bias due to population stratification. Moreover, GWASs barely reflect the genetic aetiology of subtypes of behavioural deficits.
Methods We applied whole-genome transmission disequilibrium test (TDT) to reveal the gene sets that are significantly associated with the four behavioural subtypes of restricted repetitive behaviours in 334 ASD trios. We further mapped the clustered genes to pathways and enriched the SFARI genes in these pathways.
Results Four unique gene clusters (181 genes in total) that are related to four different behavioural subtypes in ASD were identified. 23 SFARI genes were enriched in these four clusters. Through pathway analysis, nine non-SFARI genes (CNDP1, ETNK1, ITPKB, KCNQ5, PDE4D, PDGFRA, PPARGC1A, ULK2, SYNJ2) were found to be linked to the SFARI genes, which may contribute to the development of ASD. Furthermore, we found that the mTOR pathway enriched with the CNDP1, PDE4D, ULK2 genes is associated with neurodevelopment.
Conclusions Whole-genome TDT test is a unique tool in clustering genes related to ASD subtypes of behavioural deficits. Several new candidate genes for ASD are revealed by pathway analysis of the clustered genes. These findings are useful for understanding the underlying mechanism of ASD.
- child psychiatry
Data availability statement
Data are available on reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
Autism spectrum disorder (ASD) is a neurodevelopmental disease mainly manifested by limited and stereotyped activities and impaired social interaction. ASD can be divided into subgroups by behavioural deficits.
WHAT THIS STUDY ADDS
It is unknown whether specific gene clusters are linked to the subgroups of ASD behaviours.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
We adopted whole-genome transmission disequilibrium test analysis and found unique gene clusters that are associated with four subtypes of restricted repetitive behaviours in ASD in Chinese children. Our findings are important for uncovering new candidate genes of ASD and for understanding the underlying mechanism of ASD.
Introduction
Autism spectrum disorder (ASD) is a diverse neurodevelopmental disease primarily distinguished by limited and stereotyped activities as well as impaired social interaction.1 ASD affects roughly 1 in 68 people, and males are affected more often than females in about 4:1.2 According to genetic and epidemiological studies, ASD is a complex condition caused by the interaction of genes and environment. Due to the high heritability of ASD, research on the disorder has emphasised on identifying the underlying genetic and epigenetic aetiology.3 The probability of recurrence in siblings of autistic children ranges from 2% to 8%,4 and it increases to 12% to 20% for the siblings who have impairment in one or more of the three domains (communication, social interaction, stereotyped behaviour) affected by ASD.5 Furthermore, a number of twin studies have revealed that shared genes, rather than a shared environment, are the best explanation for this aggregation within families.6 7 Although the heritability varies from 40% to 80%, it is interesting that the variance of autistic features in the general population has been shown at a similar degree of genetic impact as ASD itself.8 These findings have initiated a significant amount of investigation into the genetic factors of ASD. Our understanding of autism has greatly advanced over the past 40 years,9 however, the pathogenic mechanism of ASD has not been fully understood.
ASD-related genes have been identified with great success over the past ten years due to the improvements in DNA sequencing methods, large patient cohorts and data sharing.10 The great majority of population-based studies of qualitative or quantitative features are based on next-generation sequencing association studies of complex disorders. Genome-wide association studies (GWASs) have revealed many loci that are associated with ASD.11 However, these loci mostly only show the relationship between genotype and ASD condition by case–control studies, barely reflecting the genetic aetiology of subtypes of behavioural deficits.
GWASs are vulnerable to confounding variables such as population stratification. Transmission disequilibrium test (TDT) for linkage and association is an alternative method to reduce the false results caused by population stratification to ascertain the ASD aetiology. Several GWASs have been performed in Chinese ASD cohorts.12–14 Nevertheless, these studies only explain a fraction of genetic loci. More uncharacterised loci are to be defined. Family-based design is frequently used to control confounding factors.15–17 One of the accepted techniques for testing the relationship between disease and genetic markers using parent–offspring trios is the TDT, which was first reported by Spielman et al.18 Using a conditional likelihood technique, Schaid and Sommer19 were able to calculate the ideal TDT-type statistics for recessive, additive and dominant models. The TDT-type test for additive models is equivalent to the TDT algorithm of Spielman et al.18 When the association is revealed by population stratification, TDT is a reliable method for testing linkage and association.
In this study, we collected the whole genome sequencing (WGS) data from blood samples of 334 trios in which the Chinese children were diagnosed as ASD. We applied TDT to test the significance of four behavioural subtypes of restricted repetitive behaviours in autism, to reveal the gene sets that are significantly associated with these four behavioural types, respectively. We further mapped the genes to pathways and enriched the SFARI genes (https://www.sfari.org/resources/) in these pathways.20 This study attempts to explore the gene sets that closely related to individual subtypes of behavioural deficits in ASD, and to reveal new candidate genes of ASD.
Materials and methods
Subjects used in this study
The Changsha cohort is the collection of trios of children diagnosed with ASD in the Second Xiangya Hospital of Central South University. Peripheral venous blood samples were collected from the probands and their parents. Autism probands were diagnosed by experienced psychiatrists using the the Diagnostic and Statistical Manual of Mental Disorders (DSM)-V (American Psychiatric Association, 2013) criteria for autism.21 The trios in this study met the requirements that the child had been diagnosed as ASD and that neither parent had been diagnosed as ASD. Children with organic brain lesions found by MRI or with clear aetiology were excluded from this study.
Scale information
DSM-V information of the patient was collected by including two domains of impairment: (1) social communication and interaction and (2) restricted repetitive behaviour. Restricted repetitive behaviours were grouped into four behavioural types: (1) rigid or repetitive behaviours in movement, use of objects and speech, (2) adherence to the same patterns, rigid adherence to the same order of doing things or ritualised patterns in verbal or nonverbal behaviour, (3) very limited and persistent interests, and its intensity or focus is unusual and (4) over-reacting or under-reacting to sensory stimuli, or having an unusual interest in certain sensory stimuli in the environment.
Whole genome sequencing
Whole genomic DNA was extracted from the whole blood of all autism patients and their parents in the cohorts using standard proteinase K digestion and the phenol-chloroform method. Genomic DNA was quantified by using a fluorometric method such as Qubit or PicoGreen. Fragmented DNA was mixed with End Repair Mix and A-Tailing Mix thoroughly to repair ends and adenylate the 3' end of the DNA fragments. Then the Adapter Mix was added to the DNA fragments. Finally, a PCR Mix and DNA fragment were mixed. The PCR reaction was performed under the following conditions: 98°C for 30 s, 10 cycles of 98°C for 10 s, 55°C for 30 s and 72°C for 30 s, with a final extension of 72°C for 5 min. The sequencing libraries were generated using the Hieff NGS OnePot Pro DNA Library Prep Kit (Illumina, San Diego, California, USA) in accordance with the manufacturer’s instructions, and the index codes were inserted. The DNA library was sequenced on an Illumina Nova 6000 platform, and 300 bp paired-end reads were produced in accordance with accepted techniques.
Data analysis
Paired-end reads were quality controlled by FASTqc (V.0.11.3, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). After removing adaptor sequences and low-quality bases, raw reads were mapped to hg19 reference by BWA-MEME.22 Then raw mapped reads were sorted as bams and duplicates were marked by GATK (V.4.3, https://github.com/broadinstitute/gatk). Processing reads were corrected after base quality check and variants were called with GATK to get genotype variant calling format (GVCF). Finally, GVCF was combined and converted to VCF by PLINK (V.1.9).23
Statistical analysis
In order to reduce false-positive and false-negative results, we used PLINK to clean data. We performed the TDT analysis of the trios by PLINK.23 χ2 test was conducted for the single-nucleotide polymorphisms (SNPs) of the behavioural subtypes to determine which genes were significantly associated. Pathway analysis was performed by Web-based GEne SeT AnaLysis Toolkit (http://www.webgestalt.org/).
Patient and public involvement
Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.
Results
Discovery of ASD loci based on the whole-genome TDT
Patient demographics and clinical characteristics are shown in table 1. We performed WGS for 334 trios. After quality control, 32 594 727 SNPs were used for stepwise analysis (flow chart in figure 1). The SNPs with missing genotype data higher than 10% were removed and 32 315 230 SNPs passed filtering. Next, there was one proband with more than 10% of missing genotypes and this trio was removed from the cohort. Since males have only one X-chromosome, the expected homozygosity rate is >0.8 for males and <0.2 for females. We identified three subjects whose homozygosity rate deviated from the expected homozygosity rate according to the ascertained sex information. These three trios with discordant sex information were removed and 330 trios remained. We then assigned the locations on chromosomes 1–22, and excluded chromosomes X and Y. SNPs with very low minor allele frequencies (MAF) are more easily affected by genotyping error as the rarer allele only occurs in a few individuals. Therefore, SNPs which MAF<0.05 were filtered and 25 555 399 SNPs were kept. Finally, in accordance with the Hardy-Weinberg equilibrium law, 5 495 040 SNPs were kept for further analysis (figure 1).
SNPs data-cleaning further analysis. (A) The flow chart shows the filtering of SNPs. (B) The histogram shows the SNPs with high proportion of missing genotypes. (C) The histogram shows individuals with discordant sex information. (D) The histogram shows the SNPs with low MAF. (E) The histogram shows the SNPs that depart from HWE. HWE, Hardy-Weinberg equilibrium; MAF, minor allele frequencies; SNPs, single nucleotide polymorphisms.
Patient demographics and clinical features of the ASD children in trios
WGS usually contains 0.5–1.0 million SNPs and a Bonferroni correction rate is family-wise error rate, 5%. A single test’s significance level was set to 5×10−8, which is a commonly accepted genome-wide significance threshold.24 Through performing TDT, there were 39 983 SNPs passed the filtering threshold (figure 2A). We speculate that the effect of SNPs located in the intergenic region of genes are either small or uncertain, therefore, we excluded these SNPs in this study. There were 19 647 SNPs located in 5’-UTR or 3’-UTR (untranslated region, UTR5 and UTR3), non-coding DNA (ncDNA), intronic and exonic regions of genes (figure 2B). We found that 16 513 (84.05%) SNPs were located in intronic region, 2702 (13.75%) SNPs in ncDNA, 390 (1.99%) SNPs in UTR and only 42 (0.21%) SNPs in exonic region (figure 2B). These SNPs that may impact gene expression were used for following studies.
Distribution of ASD loci from the whole-genome TDT. (A) Vertical axis: -log10(p) for each SNP. Horizontal axis: chromosome where the SNP is located on 1–22. Horizontal line: p value threshold line, indicates the significance level threshold, which is convenient to visualise the significance level of each point. (B) The pie chart shows the location of ASD loci in the gene-related and intergenic regions. (C) The pie chart shows the location of ASD loci in the UTR, ncRNA, exonic and intronic. ASD, autism spectrum disorder; ncRNA, non-coding RNA; SNP, single-nucleotide polymorphism; TDT, transmission disequilibrium test; UTR, untranslated region.
Fine-mapping and functional annotations of the gene-related SNPs
We mapped the gene-related 19 647 SNPs to 5386 coding and non-coding genes to investigate the association between these genes and the four behavioural subtypes of ASD in our cohort. We found that for the DSM-V scale information we collected, all the ASD samples showed deficits in social communication and social interaction. Other behaviours such as limited and repetitive behaviours, interests or activities can be grouped into four behavioural subtypes (types I–IV) as described in the Methods section. We performed a χ2 test for the relationships between genes and four behavioural subtypes of the probands, respectively. We found that 41 genes were significantly correlated with behavioural type I, 40 genes were significantly correlated with behavioural type II, 39 genes were significantly correlated with behavioural type III and 63 genes were significantly correlated with behavioural type IV (table 2, figure 3). Interestingly, UPK3B significantly correlated with both behavioural type I and behavioural type IV and TMEM39A was significantly correlated with both behavioural type III and behavioural type IV. These results suggest that different gene sets or clusters contribute more significantly to different behavioural deficits in ASD.
Association between genes and the four behavioural subtypes of ASD. The heatmap shows the genes that are significantly related to the four behavioural subtypes of ASD. Type I: rigid or repetitive behaviours in movement, use of objects and speech, Type II: adherence to the same patterns, rigid adherence to the same order of doing things or ritualised patterns in verbal or nonverbal behaviour, Type III: very limited and persistent interests, and its intensity or focus is unusual, Type IV: overreacting or underreacting to sensory stimuli, or having an unusual interest in certain sensory stimuli in the environment. The colour bar in the right shows the -log10(p). The detailed gene list is shown in table 2. ASD, autism spectrum disorder.
The genes were significantly correlated with behavioural subtypes I–IV
Pathway analysis implicates new ASD genes for behavioural deficits
To further characterise the 181 genes that are associated with four subtypes of ASD behaviours, the scenario is described here. First, we search the SFARI database to see how many of these genes are listed in the database. Second, we explore whether the SFARI genes are enriched in the KEGG pathways. Third, we link the non-SFARI genes enriched in these four subtypes to the pathways that contain SFARI genes. We found 9, 6, 4, 4 SFARI genes in types I–4, respectively (figure 4A). This suggests that our enriched gene sets are likely related to ASD. These SFARI genes were mapped to different pathways (figure 4B). Through pathway analysis, the non-SFARI genes in each type that were linked to the SFARI genes are CNDP1, ETNK1, ITPKB, KCNQ5, PDE4D, PDGFRA, PPARGC1A, ULK2, SYNJ2. These genes are distributed in six pathways of type II and one pathway of type IV, which were searched from literature to reveal the relationship between pathway and neurodevelopment. We found that the mammalian-target of rapamycin (mTOR) signalling pathway (hsa04150) has been reported being related to neurodevelopment. This suggests that the non-SFARI genes, CNDP1, PDE4D, ULK2, that are associated with this mTOR pathway, are likely involved in ASD development.
The non-SFARI genes in each type that were linked to the SFARI genes. (A) The histogram showing SFARI genes in types I–IV. (B) The histogram showing the genes which are significantly correlated with behavioural type are enriched in the KEGG pathways. Red colour represents the SFARI genes enriched in the KEGG pathways. VEGF, vascular endothelial growth factor; mTOR, mammalian-target of rapamycin; cAMP, cyclic adenosine monophosphate.
Discussion
ASD has a spectrum of behavioural deficits. Current studies on ASD genetic aetiology are focused on the relationship between genotype and ASD. In this study, we recruited 334 family samples for WGS to investigate whether the subtypes of behavioural deficits are correlated with some gene clusters. Through whole-genome TDT analysis, we uncovered four unique gene clusters that are related to four different behavioural subtypes in ASD. We have enriched several SFARI genes in each clusters, and proposed that several non-SFARI genes may contribute to the development of ASD through pathway analysis by mostly focusing on the neurodevelopmental pathways. These findings are useful for understanding of the underlying mechanism of ASD.
The vast majority of GWASs of complex diseases are population-based studies of qualitative or quantitative traits. However, population stratification may lead to false positive or negative outputs. TDT for linkage and association is an alternative method to reduce the false results caused by population stratification. Our whole-genome TDT analysis revealed 181 genes are associated with ASD in the Chinese population. Among these 181 genes, we found 23 SFARI genes, indicating our TDT analysis is efficient in uncovering the ASD genes.
There are some potential limitations of our study. First, our study focused on a specific population (Chinese children with ASD) and only examined the genetic factors related to four subtypes of restricted repetitive behaviours. This limited scope may not fully capture the complexity and heterogeneity of ASD, which may limit the generalisability of the results to other populations. A meta-analysis of different populations and studies on broad phenotypes may better reveal its genetic aetiology. Second, in the design of TDT, a control group of children without ASD was not included, which limits the ability to compare genetic differences between those with and without the disorder. A comparison of the results between GWAS and TDT may be more insightful. Third, our study did not replicate the results in a larger and more diverse cohort to establish the reliability and generalisability of the findings. Finally, although several SFARI genes were uncovered in our list of risk genes, further functional validation is required for those new candidate risk genes.
Through pathway analysis, we found that the mTOR pathway that is associated with ASD is enriched in our subtypes of ASD. mTOR is a highly conserved Ser/Thr protein kinase that forms a complex signalling network that integrates extracellular and intracellular signals, influences mRNA translation, and controls protein synthesis mechanisms.25 mTOR forms two different functional protein complexes, mTORC (mTOR complex) 1 and mTORC2. Recent studies have shown that it regulates various neuronal processes, including the growth and differentiation of neural precursor cells, synaptic plasticity, learning and memory, hormone secretion, food intake, and sleep.25 Pathway analysis was conducted to further prove that some autism genes may be related to the neurodevelopmental pathways.25 We uncovered three risk genes (CNDP1, PDE4D, ULK2) that are linked to the mTOR pathway. Further molecular experiments and animal models are needed to verify whether these risk genes are actually involved in ASD pathogenesis.
Data availability statement
Data are available on reasonable request.
Ethics statements
Patient consent for publication
Ethics approval
This study involves human participants and this study has been approved by the Ethics Committee of Xiangya Medical School of Central South University (No. 2014031113).
References
Footnotes
KX and WL are joint senior authors.
Contributors WL, KX, CH, XN and PZ designed and supervised the study; QG and WL wrote the manuscript; QG, RG, WX, YZ and CZ analysed the data. LX and TB collected the samples and patient information. RG prepared samples for sequencing. WL is the guarantor of this manuscript.
Funding This work was partially supported by the Ministry of Science and Technology of China (2019YFA0802104; 2016YFC1000306); the National Natural Science Foundation of China (31830054, 82130043) and the Beijing Municipal Health Commission (JingYiYan 2018-5).
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.