Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tip
titleReservations

Use our summer school reservation (CoreNGS-Wed) when submitting batch jobs to get higher priority on the ls6 normal queue today.

Code Block
languagebash
titleRequest an interactive (idev) node
# Request a 180 minute interactive node on the normal queue using today'sour reservation
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Wed
idev -m 120 -N 1 -A TRA23004 -r CoreNGS

# Request a 120 minute idev node on the development queue 
idev -m 120 -N 1 -A OTH21164 -p development
idev -m 120 -N 1 -A TRA23004 -p development


Code Block
languagebash
titleSubmit a batch job
# Using today'sour reservation
sbatch --reseservation=CoreNGS-Wed <batch_file>.slurm

Note that the reservation name (CoreNGS-Wed) is different from the TACC allocation/project for this class, which is OTH21164.

...

Code Block
languagebash
# Load the main BioContainers module then load the fastqc module
module load biocontainers  # make take a while
module load fastqc

It has a number of options (see fastqc --help | more) but can be run very simply with just a FASTQ file as its argument.

...

Expand
titleMake sure you're in a idev session

Make sure you're in an idev session. If you're in an idev session, the hostname command will display a name like c455-021.ls6.tacc.utexas.edu. But if you're on a login node the hostname will be something like login3.ls6.tacc.utexas.edu.

If you're on a login node, start an idev session like this:

Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Wed      # or -A TRA23004 
idev -m 120 -N 1 -A OTH21164 -p development  # or -A TRA23004


FASTX Toolkit is available as a BioContainers module.

...

Because the [-i INFILE] [-o OUTFILE] options are shown in brackets [ ], reading from a file and writing to a file are optional. That means that by default the program reads its input data from standard input and writes trimmed sequences to standard output:

...

  • The -l 50 option says that base 50 should be the last base (i.e., trim down to 50 bases)
  • The -Q 33 option specifies how base Qualities on the 4th line of each FASTQ entry are encoded.
    • TheĀ FASTX Toolkit is an older program written in the time when Illumina base qualities were encoded differently, so its default does not work for modern FASTQ files.
    • These days Illumina base qualities follow the Sanger FASTQ standard (Phred score + 33 to make an ASCII character).

...

Expand
titleMake sure you're in a idev session

Make sure you're in an idev session. If you're in an idev session, the hostname command will display a name like c455-021.ls6.tacc.utexas.edu. But if you're on a login node the hostname will be something like login3.ls6.tacc.utexas.edu.

If you're on a login node, start an idev session like this:

Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Wed      # or -A TRA23004 
idev -m 120 -N 1 -A OTH21164 -p development  # or -A TRA23004



Code Block
languagebash
module load biocontainers
module spider cutadapt 

module load cutadapt 
cutadapt --help | more
cutadapt --help | less

A common application of cutadapt is to remove adapter contamination from RNA library sequence data. Here we'll show that for someĀ small RNA libraries sequenced by GSAF, using their documented small RNA library adapters.

...

Expand
titleHint


Code Block
languagebash
echo $(( `zcat Sample_H54_miRNA_L004_R1.cat.fastq.gz | wc -l` / 4 ))
# or
zcat Sample_H54_miRNA_L004_R1.cat.fastq.gz | wc -l | awk '{print $1 / 4}'

Read more about Arithemetic in bash and more about awk in Some Linux commands: awk


Expand
titleAnswer

Looking at the FASTQ file names, we see this is two lanes of single-end reads (L004 and L005).

The data from lane 4 has 2,001,337 reads, the data from lane 5 has 2,022,237 reads.

...