Genetic Heterogeneity Analysis Using Genetic Algorithm and Network Science
We generate a feature co-selection network from multiple evolved feature subsets. A new synthetic feature, namely community risk score (CRS), is created for each network community. CRS quantifies the risk of a community of variables and allows for more effective heterogeneity analysis. [Link]
Authors
- Zhendong Sha
- Dr. Yuanzhu Chen
- Dr. Ting Hu
ABSTRACT
Genome-wide association studies (GWAS) have linked thousands of genetic variants to the susceptibility of many common human diseases. However, the genetic explanations of diseases are often heterogeneous, imposing a substantial challenge for GWAS. We propose a feature construction method using genetic algorithm (GA) to recognize the heterogeneous risk effects of different genetic variable groups. Multiple GA-based feature selection runs are used to collect an ensemble of the high-performing feature subsets. We generate a feature co-selection network from the ensemble, where nodes represent genetic variables and edges represent their co-selection frequencies. A new synthetic feature, namely community risk score (CRS), is created for each network community. CRS quantifies the risk of a community of variables and allows for more effective heterogeneity analysis. We applied our method to two colorectal cancer GWAS datasets, one for training and the other for validation. We ran the GA-based feature selection on the training dataset and constructed the co-selection network. CRS was then created for each community in the network. We identified three colorectal cancer subtypes using the CRSs and clustering algorithms on the validation dataset. The function enrichment analysis in our results further highlighted gastric cancer related genes, tumor suppressors and DNA methylation genes.
KEYWORDS
genetic algorithm, feature selection, genome-wide association study (GWAS), genetic heterogeneity
How it works

This work was accepted by GECCO ’22 Companion, July 9–13, 2022, Boston, MA, USA