A computational framework to understand translational control

Predictive models of gene expression require reliable and reusable data. We developed a software framework around a compact binary hierarchical data format to efficiently store, process (Ozadam, Geng, and Cenik. Bioinformatics), and visualize ribosome profiling data (Chacko, Ozadam, and Cenik. Bioinformatics).

Using this infrastructure, we manually curated and uniformly reprocessed over 3,500 ribosome profiling experiments. This effort yielded a high-quality compendium of translation efficiency measurements across diverse cell types. Inspired by how co-expression of RNA reveals gene function and regulatory programs, we introduced the concept of translation efficiency covariation (TEC), a conserved property of mammalian transcriptomes, reflecting coordinated translational control. TEC is predictive of protein–protein interactions and gene functions. For instance, we discovered a regulator of glycolysis that was missed by RNA expression and protein abundance analyses (Liu et al. Nature Biotechnology).

We have also developed RiboNN, a deep neural network that predicts cell-type-specific translation efficiency from full-length mRNA sequences (Zheng et al. Nature Biotechnology). Trained on our large compendium, RiboNN is the most accurate model of translation to date. Beyond prediction, the model reveals sequence features linked to translation, mRNA stability, and localization. RiboNN can help interpret the effects of genetic variants on translation and guide the design of optimized mRNA-based therapies, with implications for both diagnostics and treatment of genetic diseases.

We emphasize the development of reusable, portable, and open-source software.