Table of Contents
Overview and Objectives
Most RNA-seq is probably done to study gene expression. This objective implies a set of experimental methods that specifically isolate mRNA molecules within a particular expected size range. The relevant steps can include processes like rRNA depletion, size selection, and/or fragmentation. Most of this course, accordingly, focuses on pipelines and tools that assume as input mRNA-targeted data (with all that implies), and will output lists of differentially expressed genes mapped to various conditions. However, RNA has numerous biological functions beyond acting as mRNA, and many of these functions can be studied using many of the same principles. Such studies simply require taking into account the expected differences between the data of interest and gene expression-oriented data. In this section, we will discuss two major categories of 'non-mRNA' RNA-seq data: (1) sequencing to study small RNAs such as microRNAs and (2) sequencing to study RNA-protein interactions.
Studying Small RNAs
Many types of small RNA have been characterized, and their biological functions are extremely wide-ranging. The table below describes the different forms and biological functions of small RNAs.
Clearly, there are many biologically important functions executed by small RNA, and they can be studied by sequencing by simply cutting (for example) the 25-50bp range out of a size selection gel followed by otherwise normal library preparation. Otherwise, all these species share certain qualities that allow sequencing data derived from each to be analyzed in a similar fashion. These qualities can include (but are not limited to):
- Single-end sequencing: if an RNA-species is (for example) 16-25 bp, than paired-end sequencing of any kind provides little (though some) additional data compared to a single end run provided the reads are long enough
- Increased adapter contamination: as implied above, adapter sequence is almost always included in your reads requiring either pre-processing to remove such sequence or alignment adjustment to account for it
- Low-complexity libraries: there are often far fewer members of a category of small RNA in a genome than there are reads in the data, meaning the exact same sequence will occur in multiple reads
- Extensive genomic duplication: there are often many copies of the same sequence of a given small RNA in a genome, meaning most genomic alignments will contain numerous "multi-mappers"
All of these issues can be taken into account effectively, and in some regards can produce results simpler to understand and evaluate relative to standard gene expression data. Our first exercise will focus on one class of small RNAs, microRNAs, and will use principles that generalize to other interesting small RNAs.
Studying RNA-Protein Interactions
Similarly, RNA-protein interactions are required for an equally diverse set of biological functions, and hundreds of RNA-binding proteins have been identified. It is frequently interesting to isolate protein-RNA complexes, remove the protein, and sequence the resulting RNA. The methods involve combine components of RNA-seq, because the underlying molecule is RNA, and chromatin immunoprecipitation (ChIP), because the most common mechanism to isolate a protein-RNA complex is with an antibody raised against a fragment of the protein of interest. Below is a sample protocol flow for a RIP-seq experiment.
For "normal" RIP-seq, one usually expects to recover full RNA molecules regardless of where on an RNA molecule the protein was bound, since all of it is 'pulled down' together. However, such protocols generally do not use any chemical or physical means to covalently attach the RNA to the protein, which allows for the possibility that the RNA and protein complexes disassociate and re-associate from each other during sample preparation (there have been published papers that claim this - see here). Moreover, proteins will often bind to specific RNA sequence motifs or positions, and retrieval of the full RNA molecule provides no information about the specific binding site. To accommodate these concerns, methods have been developed to cross-link protein to RNA in a way that leaves a signature of interaction where the protein and RNA actually come into contact. Below is a table of the three methods that modify the RNA in various ways to enable binding site detection by sequencing.
Important Software
For standard RIP-seq, many of the methods already covered in this class are useful since one can expect to recover a full RNA molecule, and the IP and Input samples can be thought of as "conditions" to be compared by differential expression analysis. However, more specific tools do exist, particularly for CLIP-seq and its variants.
Exercise #1: miRNA/small RNA Sequencing and Profiling (miRNA-seq)
Exercise #2: Ribonucleoprotein Immunoprecipitation and Sequencing (RIP-seq)
Exercise #3: Beyond RIP-seq: (PAR)CLIP-seq