Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You can't run a web browser directly from your "dumb terminal" command line environment. The FastQC results have to be placed where a web browser can access them. One way to do this is to copy the results back to your laptop, for example by using scp from your computer (read more at Copying files from TACC to your laptop).

For convenience, we put an example FastQC report at this URL:
https://web.corral.tacc.utexas.edu/BioinformaticsResource/CoreNGS/yeast_stuff/Sample_Yeast_L005_R1.cat_fastqc/fastqc_report.html 

...

Expand
titleAnswer

The Per base sequence quality report does not look good. The data should probably be trimmed (to 40 or 50 bp) before alignment.

Newer versions of FastQC have slightly different report formats. See this example:
https://web.corral.tacc.utexas.edu/BioinformaticsResource/CoreNGS/reports/wcaar_mqc_report.html

Using MultiQC to consolidate multiple QC reports

...

Expand
titleMake sure you're in a idev session

Make sure you're in an idev session. If you're in an idev session, the hostname command will display a name like c455-020.ls6.tacc.utexas.edu. But if you're on a login node the hostname will be something like login1.ls6.tacc.utexas.edu.

If you're on a login node, start an idev session like this:

Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r CoreNGSday3  # or -A TRA23004
# or
idev -m 120 -N 1 -A OTH21164 -p development  # or -A TRA23004



Code Block
languagebash
# Load the main BioContainers module if you have not already
module load biocontainers  # may take a while

# Load the multiqc module and ask for its usage information
module load multiqc
multiqc --help | more

...

  • The -l 50 option says that base 50 should be the last base (i.e., trim down to 50 bases)
  • The -Q 33 option specifies how base Qualities on the 4th line of each FASTQ entry are encoded.
    • The FASTX Toolkit is an older program written in the time when Illumina base qualities were encoded differently, so its default does not work for modern FASTQ files.
      • These days Illumina base qualities follow the Sanger FASTQ standard (Phred score + 33 to make an ASCII character).
    • This option is not really required here because we're just hard trimming, so the program doesn't have to interpret the quality scores. But the -Q 33 option would be required if you were trimming according to base qualities.
    • Note that the fastq_trimmer help does not document this -Q option! But they do talk about it on their website.

Exercise: compressing fastx_trimmer output

...

Expand
titleHint


Code Block
languagebash
echo $(( `zcat Sample_H54_miRNA_L004_R1.cat.fastq.gz | wc -l` / 4 ))
# or
zcat Sample_H54_miRNA_L004_R1.cat.fastq.gz | wc -l | awk '{print $1 / 4}'

Read more about Arithemetic in bash and more about awk in Some Linux commands: awk


Expand
titleAnswer

Looking at the FASTQ file names, we see this is two lanes of single-end reads (L004 and L005).

The data from lane 4 has 2,001,337 reads, the data from lane 5 has 2,022,237 reads.

...