mRNA translation can be measured transcriptome-wide by sequencing of mRNA fragments protected by ribosomes from RNase digestion. This approach, called ribosome profiling, poses unique computational challenges. Unlike RNA sequencing measurements, ribosome profiling data typically needs to be analyzed as a function of the read/footprint size. This results in significant bottlenecks in storage and processing, as many values need to be stored for each gene and experiment. While there are specialized solutions for other data modalities such as
BAM for sequence alignment, or
Cooler /
hic for chromosome conformation capture data, ribosome profiling experiments lacked a comparable dedicated and standardized format. We have recently designed a dedicated binary hierarchical data format to efficiently store, organize and process ribosome profiling data. We are building a computational ecosystem around this file format (
.ribo). We currently have a workflow that can generate these files starting from raw sequencing reads. The resulting file can be used seamlessly to analyze and visualize in our downstream analysis software. We continue to improve this ecosystem by adding ribosome profiling specific analyses such as improved algorithms for pause site detection. We emphasize development of reusable, portable and open source software that will be widely distributed.