Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Tip
titleReservations

Use our summer school reservation (core-ngs-class-0605) for today when submitting batch jobs to get higher priority on the ls6 normal queue.

Code Block
languagebash
titleRequest an interactive (idev) node
# Request a 180 minute interactive node on the normal queue using our reservation
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0605  # Thursday
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606  # Friday

# Request a 120 minute idev node on the development queue 
idev -m 120 -N 1 -A OTH21164 -p development


# Using our reservation
Code Block
languagebash
titleSubmit a batch job
using our reservation
sbatch --reseservation=core-ngs-class-0605 <batch_file>.slurm  # Thursday
sbatch --reseservation=core-ngs-class-06050606 <batch_file>.slurm  # Friday

Note that todaythe day's reservation name (core-ngs-class-0605) is different from the TACC allocation/project for this class, which is OTH21164.

...

Code Block
languagebash
titleStart an idev session
idev -m 180120 -N 1 -A OTH21164 -r core-ngs-class-0605

Then stage the sample datasets and references we will use.

Code Block
languagebash
titleGet the alignment exercises files
# Copy the   # Thursday
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606  # Friday

# or, without the reservation:
idev -m 120 -N 1 -A OTH21164 -p development

Then stage the sample datasets and references we will use.

Code Block
languagebash
titleGet the alignment exercises files
# Copy the FASTA files for building references
mkdir -p $SCRATCH/core_ngs/references/fasta
cp $CORENGS/references/fasta/*.fa   $SCRATCH/core_ngs/references/fasta/

# Copy the FASTQ files that will be used for alignment
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/
cd $SCRATCH/core_ngs/alignment/fastq

...

Code Block
languagebash
titleStart an idev session
idev -m 180120 -N 1 -A OTH21164 -r core-ngs-class-0605  # orThursday
idev -m 120 -N 1 -A OTH21164 -p development  
Code Block
languagebash
module load biocontainers  r core-ngs-class-0606  # Friday

# or, without the reservation
idev -m 120 -N 1 -A OTH21164 -p development  


Code Block
languagebash
module load biocontainers  # takes a while
module load bwa
bwa

...

You should now have a SAM file (yeast_pe.sam) that contains the alignments.

ExerciseExercises:

  • How many lines does the SAM file have?
  • How does this compare to the number of input sequences (R1+R2)?
Expand
titleAnswer

wc -l yeast_pe.sam  reports 1,184,378 lines

The alignment SAM file will contain records for both R1 and R2 reads, so we need to count sequences in both files.

zcat ./fastq/Sample_Yeast_L005_R[12]*gz | wc -l | awk '{print $1/4}' reports 1,184,360 read sequences.

So the SAM file has 18 more lines than the R1+R2 total. These are the header records that appear before any alignment records.

...

Expand
titleAnswers

The SAM file must contain both mapped and un-mapped reads, because there were 1,184,360 R1+R2 reads and the same number of alignment records.

Alignment records occur in the same read-name order as they did in the FASTQ, except that they come in pairs. The R1 read comes 1st, then the corresponding R2. This is called read name ordering.

The command below uses cut to extract the 1st Tabtab-delimited field (-f 1) from the first few alignment records:

Code Block
languagebash
grep -Pv '^@' yeast_pe.sam | head | cut -f 1

Next we'll see how to tell which read is the R1 and which is the R2.

...

Expand
titleMake sure you're in a idev session


Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0605  # Thursday
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606  # Friday

# or, without the reservation:
idev -m 120 -N 1 -A OTH21164 -p development  


...

Code Block
titleSAMtools suite usage
Program: samtools (Tools for alignments in the SAM format)
Version: 1.20 (using htslib 1.20)

Usage: samtools <command> [options]

Commands:
-- Indexing
   dict           create a sequence dictionary file
   faidx index/extract FASTA        fqidx index/extract FASTQFASTA
   fqidx index index alignment       index/extract FASTQ
   index          index alignment

-- Editing
   calmd          recalculate MD/NM tags and '=' bases
   fixmate        fix mate information
   reheader       replace BAM header
   targetcut      cut fosmid regions (for fosmid pool only)
   addreplacerg   adds or replaces RG tags
   markdup        mark duplicates
   ampliconclip   clip oligos from the end of reads

-- File operations
   collate        shuffle and group alignments by name
   cat            concatenate BAMs
   consensus      produce a consensus Pileup/FASTA/FASTQ
   merge          merge sorted alignments
   mpileup        multi-way pileup
   sort           sort alignment file
   split          splits a file by read group
   quickcheck     quickly check if SAM/BAM/CRAM file appears intact
   fastq          converts a BAM to a FASTQ
   fasta          converts a BAM to a FASTA
   import         Converts FASTA or FASTQ files to SAM/BAM/CRAM
   reference      Generates a reference from aligned data
   reset          Reverts aligner changes in reads

-- Statistics
   bedcov         read depth per BED region
   coverage       alignment depth and percent coverage
   depth         depth compute the depth
   flagstat       simple stats
   idxstats       BAM index stats
   cram-size      list CRAM Content-ID and Data-Series sizes
   phase          phase heterozygotes
   stats          generate stats (former bamcheck)
   ampliconstats  generate amplicon specific stats

-- Viewing
   flags explain  BAM flags
   head           header viewer
   tview text     alignment viewer
   view           SAM<->BAM<->CRAM conversion
   depad          convert padded BAM to unpadded BAM
   samples        list the samples in a set of SAM/BAM/CRAM files

-- Misc
   help [cmd]     display this help message or help for [cmd]
   version        detailed version information

...

The utility samtools index creates an index that has the same name as the input BAM file, with suffix .bai appended. Here's the samtools index usage:

Usage
Code Block
titlesamtools index usage
samtools index usage
Usage: samtools index -M [-bc] [-m INT] <in1.bam> <in2.bam>...
   or: samtools index [-bc] [-m INT] <in.bam> [out.index]
Options:
  -b, --bai            Generate BAI-format index for BAM files [default]
  -c, --csi            Generate CSI-format index for BAM files
  -m, --min-shift INT   Set minimum interval size for CSI indices to 2^INT [14]
  -M                   Interpret all filename arguments as files to be indexed
  -o, --output FILE    Write index to FILE [alternative to <out.index> in args]
  -@, --threads INT    Sets the number of threads [none]

...

More information about the alignment is provided by the samtools idxstats report, which shows how many reads aligned to each contig in your reference.

Note that samtools idxstats must be run on a sorted, indexed BAM file.

...