Predicting the 3D structure of RNA molecules is a challenging task due to limited training data and RNA specificities, but could lead to a better understanding of its functions and to the creation of new therapeutics. This repository aims to maintain a curated list of awesome deep learning methods to predict RNA 3D structure.
Note that we do not include scoring functions or inverse folding methods, which are seen as a different topics.
β Don't hesitate to open an issue or submit a pull request if you would like to add a paper or resource!
3D structures are generally optained from the Protein Data Bank (PDB). Challenging structures, generally used to evaluate models' performances can be found during the RNA-Puzzles and CASP15 challenges.
Some models use distillation to augment the data, since available RNA structures are limited. They often use the bpRNA-1m database that contains annotated secondary structures.
Researchers often use their own methods to build datasets, separating between training and test sets based on sequence identity and thus not accounting for structural similarities.
-
RNA3DB : a dataset created for machine learning training with train-test separation accounting for sequence and structural similarities.
-
DeepFoldRNA : this model predicts simulation constraints with transformer-based blocks like AlphaFold 2, to guide energy minimization simulations that produce the structures.
-
RhoFold : the first structure prediction method using deep learning only. Its architecture is close to AlphaFold 2 with transformers blocks and structure module. It uses of a foundation model for sequence embedding.
-
epRNA : this model predicts distances between pairs of residues from the sequence only using convolutionnal neurol networks. Refinement with force fields is used to obtain full-atom structures.
-
DRfold : this method uses the sequence and predicted secondary structure to predict rotation/translation matrices and geometric constraints. Both are then used to guide an optimization process that leads to the structure.
-
NuFold : an adaptation of AlphaFold 2's architecture to RNAs: frame definition and vocabulary changed, auxiliary networks added and templates replaced by secondary structure (predicted).
-
trRosettaRNA : a transformer-based network similar to AlphaFold 2 which predicts 1D and 2D geometries used as constraints to guide folding based on energy minimization.
-
RoseTTAFoldNA : an adaptation of the RoseTTAFold2 model to predict protein-nucleic acid complexes. It updates 1D, 2D and 3D molecule representations that are fed into a SE(3) transformer and a model that predicts frame and rotation angles.
-
Deep-RNAfold : this models predicts distance classes using an autoregressive generative model with a VAE, Monte Carlo tree search (MCTS) and a scoring model.
-
AlphaFold 3 : the third iteration of Google's famous models, which includes multiple types of molecules and complexes (including RNA). It predicts atomic coordinates with a diffusion module.
-
AutoRNA : this method uses a VAE to predict distance matrices.
-
Rhofold+ : an improvement over Rhofold which keeps the same global architecture and features but uses a larger training dataset and a larger model with deeper attention modules.
-
Chai-1 : this model's architecture is really close to AlphaFold 3's, it predicts the structure of multiple molecules and complexes. RNA structures are predicted without MSA and one can add constraint features to guide the prediction.