Genetic Heterogeneity Analysis Using Genetic Algorithm and Network Science

We generate a feature co-selection network from multiple evolved feature subsets. A new synthetic feature, namely community risk score (CRS), is created for each network community. CRS quantifies the risk of a community of variables and allows for more effective heterogeneity analysis. [Link]

Authors

Zhendong Sha
Dr. Yuanzhu Chen
Dr. Ting Hu

ABSTRACT

Genome-wide association studies (GWAS) have linked thousands of genetic variants to the susceptibility of many common human diseases. However, the genetic explanations of diseases are often heterogeneous, imposing a substantial challenge for GWAS. We propose a feature construction method using genetic algorithm (GA) to recognize the heterogeneous risk effects of different genetic variable groups. Multiple GA-based feature selection runs are used to collect an ensemble of the high-performing feature subsets. We generate a feature co-selection network from the ensemble, where nodes represent genetic variables and edges represent their co-selection frequencies. A new synthetic feature, namely community risk score (CRS), is created for each network community. CRS quantifies the risk of a community of variables and allows for more effective heterogeneity analysis. We applied our method to two colorectal cancer GWAS datasets, one for training and the other for validation. We ran the GA-based feature selection on the training dataset and constructed the co-selection network. CRS was then created for each community in the network. We identified three colorectal cancer subtypes using the CRSs and clustering algorithms on the validation dataset. The function enrichment analysis in our results further highlighted gastric cancer related genes, tumor suppressors and DNA methylation genes.

KEYWORDS

genetic algorithm, feature selection, genome-wide association study (GWAS), genetic heterogeneity

How it works

This work was accepted by GECCO ’22 Companion, July 9–13, 2022, Boston, MA, USA

Zhēndòng Shā 沙桢栋