Off-Target Effects integration in Machine Learning Models for siRNAs 1 Prediction: Current Practices and Limitations
Summary
Non-coding RNAs (ncRNAs) have emerged as key regulators of gene expression, with diverse roles in 66 disease pathogenesis and therapeutic potential. Among ncRNA-based technologies, RNA interference 67 (RNAi) stands out for its ability to selectively silence genes via small interfering RNAs (siRNAs). 68 While siRNAs offer high efficacy through targeted mRNA degradation, accumulating evidence has 69 revealed significant off-target effects, including unintended transcript regulation, immune activation, 70 and interference with endogenous RNAi pathways. Experimental strategies—such as chemical 71 modifications and conjugate delivery systems—have been developed to mitigate these effects. 72 However, computational tools used to design siRNAs, particularly those based on machine learning 73 (ML), often prioritize gene silencing efficacy over specificity. Early ML models such as support vector 74 machines and decision trees demonstrated initial promise, while recent advances in deep learning have 75 enabled the use of large-scale datasets to predict siRNA efficacy. Nonetheless, the integration of off-76 target considerations into these models remains limited. This review critically examines the current 77 landscape of ML-based siRNA prediction, with a focus on how off-target effects are addressed. We 78 identify key limitations in existing approaches, including a lack of comprehensive off-target feature 79 incorporation and insufficient validation using unbiased transcriptomic profiling. Finally, we highlight 80 emerging opportunities to enhance siRNA design pipelines by integrating off-target data, proposing a 81 path forward for the development of more accurate and therapeutically relevant ML models. By 82 advancing integrative frameworks that optimize both efficacy and specificity, we aim to support the 83 safer and more effective clinical translation of siRNA-based therapeutics.