Content Comparison

...

Tip

title	Reservations

Use our summer school reservation (CoreNGS-Thu) when submitting batch jobs to get higher priority on the ls6 normal queue.

Code Block

language	bash
title	Request an interactive (idev) node

# Request a 180 minute interactive node on the normal queue using our reservation
idev -m 120 -N 1 -A OTH21164 -r CoreNGS
idev -m 120 -N 1 -A TRA23004 -r CoreNGS-Thu

# Request a 120 minute idev node on the development queue 
idev -m 120 -N 1 -A OTH21164 -p development
idev -m 120 -N 1 -A TRA23004 -p
development

Code Block

language	bash
title	Submit a batch job

# Using our reservation
sbatch --reseservation=CoreNGS-Thu <batch_file>.slurm

Note that the reservation name (CoreNGS) is different from the TACC allocation/project for this class, which is OTH21164.

...

After raw sequence files are generated (in FASTQ format), quality-checked, and pre-processed preprocessed in some way, the next step in many NGS pipelines is mapping to a reference genome.

...

Code Block

language	bash
title	Start an idev session

idev -m 180 -N 1 -A OTH21164 -r CoreNGS-Thu

Then stage the sample datasets and references we will use.

...

Here are the four reference genomes we will be using today, with some information about them. These are not necessarily the most recent versions of these references (e.g. the newest human reference genome is hg38 and a the most recent miRBase version is v21. (See here for information about many more genomes.)

...

Regular expressions are so powerful that nearly every modern computer language includes a "regex" module of some sort. There are many online tutorials for regular expressions, and several slightly different "flavors" of them. But the most common is the Perl style (http://perldoc.perl.org/perlretut.html), which was one of the fist and still the most powerful (there's a reason Perl was used extensively when assembling the human genome). We're only going to use simple regular expressions here, but learning more about them will pay handsome dividends for you in the future.Here's how to execute grep to list contig names in a FASTA file.

First stage the FASTA files we'll need:

Code Block

language	bash
title	grep to match contig names in a FASTA file

# Stage the FASTA files
cds
mkdir -p core_ngs/references/fasta
cd core_ngs/references/fasta
cp $CORENGS/references/fasta/*.fa .

Here's how to execute grep to list contig names in a FASTA file.

Code Block

language	bash
title	grep to match contig names in a FASTA file

cd $SCRATCH/core_ngs/references/fasta
grep -P '^>' sacCer3.fa | more

...

The -P option tells grep to Perl-style regular expression patterns.
- This makes including special characters like Tab ( \t ), carriage return ( \r ) or linefeed ( \n ) much easier that the default POSIX paterns.
- While it is not required here, it generally doesn't hurt to include this option.
'^>' is the regular expression describing the pattern we're looking for (described below)
sacCer3.fa is is the FASTA file to search.
- lines with text that match our pattern will be written to standard output
- non matching lines will be omitted
We pipe to more just in case there are a lot of contig names.

...

Expand

title	Setup (if needed)

Code Block

language	bash

# Copy the FASTA files for building references
mkdir -p $SCRATCH/core_ngs/references/fasta
cp $CORENGS/references/fasta/*.fa $SCRATCH/core_ngs/references/fasta/

Exercise: How many lines does the sacCer3 reference have, and how many contigs are there in the sacCer3 reference?

Expand

title	Hint

Code Block

language	bash

cd $SCRATCH/core_ngs/references/fasta
grep -P '^>' sacCer3.fa | wc -l

# and to count the lines:
wc -l sacCer3.fa

Or use grep's -c option that says "just count the line matches"

Code Block

language	bash

grep -P -c '^>' sacCer3.fa

...

Expand

title	Answer

There are 17 contigs, out of 243,167 total lines.

Aligner overview

There are many aligners available, but we will concentrate on two of the most popular general-purpose ones: bwa and bowtie2. The table below outlines the available protocols for them.

...

Version	Old Version 205	New Version 206
Changes made by	Anna Battenhouse	Anna Battenhouse
Saved on	Jun 07, 2024	May 29, 2025

Versions Compared

Key

Aligner overview