Part A Take Away Points
When designing your RNA seq experiment
When creating RNA-Seq libraries, you have lots of options: strand specific vs non strand specific, ribosomal depletion etc.
Decisions depend on your questions.
For differential expression studies, number of replicates is most important. For assembly and annotation of transcriptomes or for identifying rare/novel transcripts, depth of coverage becomes important.
Once you get your RNA-Seq data
Check for low quality bases, low quality reads, overrepresented sequences, and sequence duplication using fastqc.
If needed, trim low quality bases, filter low quality reads, trim adaptors. We covered fastx_toolkit for doing these operations.
Now it's ready for mapping.
Unix reminders
When you are in a particular directory in a linux environment, always use ls or ls -l to list all the files in that directory. Most errors happen due to trying to access a file that is not in your current directory.
You can access files within the current directory. If you want to access directories not in current directory, you need to provide a direct path (ex: cat /stor/home/daras/my_rnaseq_course/partA/fastqc_exercise/data/Sample1_R1.fastq) or relative path (ex: cat ../results/Sample1_R1_fastqc.html). Use tab to autocomplete paths/filenames.
You can access executables even if they are in your current directory as long as wherever they are located is in your PATH variable. echo $PATH to see what locations are in your path.
BACK TO COURSE OUTLINE