Using 3D protein models to uncover genetic risk in ALS
Summary
Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disease that is characterized by the involvement of rare genetic variants. Disease association of single rare variants can often not be tested. Therefore, variants are usually grouped together to perform burden tests. In numerous proteins, disease associated variants are located at specific regions (hotspots). Hotspots can be identified by clustering variants together to define groups for burden testing. We extend previous ALS hotspot identification study designs, which clustered variants based on distances within linear sequences, to include 3D spatial clustering methods. We aim to determine if these 3D methods can be used to identify known ALS hotspots or whether specific limitations prevent the application of these methods in ALS hotspot detection. In order to examine this, three well known ALS proteins (SOD1, FUS and TARDBP) that each represent a different use case (no hotspot, localised hotspot and elongated hotspot respectively) will be studied with spherical clustering and protein-structure based scan (PSCAN) methods. Our PSCAN results resemble previous findings for the three use cases, while spherical clustering methods are not able to replicate expectations for the elongated hotspot use case at all. PSCAN is thus an improvement over spherical clustering, for our intended application, as no predefined window sizes or shapes are used. Still, PSCAN has notable limitations. The most important limitation, especially for neurodegenerative disease, lies in the use of AlphaFold2 models which do not adequately represent intrinsically disordered regions. Hotspots which occur in disordered regions can not be identified with 3D methods until this limitation is resolved. Therefore, current 3D spatial clustering methods should only be used for ALS hotspot detection in ordered regions of proteins.