Identifying differentially expressed K-mers associated with cytomegalovirus phenotype in humans using quantitative K-mer analysis and Deep Learning algorithm DeepTCR
Summary
The T-cell receptor (TCR) repertoire forms an individual’s immunological memory and thereby states the immune exposure history. In recent years and via the use of Next Generation Sequencing, large human cohorts have been created containing thousands of individuals and millions of complementarity-determining region 3 (CDR3) TCRβ chains. Although research has been conducted towards CDR3 TCRβ chains, its components, K-mers, have not been considered thoroughly. In this study, K-mer analysis based on prevalence together with machine learning algorithm DeepTCR are used to predict disease associated K-mers. A database containing 666 patients with known CMV phenotype was researched. The K-mer analysis is employed to filter non-descriptive chains and DeepTCR, a Deep Learning algorithm designed for classification tasks, establishes K-mers associated with disease serotype. A total of 449 K-mers been identified in this study that are linked to CMV serotype. Moreover, after noise deletion, TCRBV10-03 and TCRBJ02-05 have been identified in CMV+ patients, portraying a crucial role in CMV+ chains. Furthermore, DeepTCR analysis resulted in the discovery of CMV+ associated K-mers. Following this study, K-mer analysis in a whole body of diseases databases should be performed and consequently the creation of a diagnostic tool for disease phenotype. The Deep Learning TCR field should shift its attention towards K-mers.