| Tip |
|---|
|
Use our summer school reservation (core-ngs-class-0605) for today when submitting batch jobs to get higher priority on the ls6 normal queue. | Code Block |
|---|
| language | bash |
|---|
| title | Request an interactive (idev) node |
|---|
| # Request a 180 minute interactive node on the normal queue using our reservation
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0605 # Thursday
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606 # Friday
# Request a 120 minute idev node on the development queue
idev -m 120 -N 1 -A OTH21164 -p development
|
| Code Block |
|---|
| language | bash |
|---|
| title | Submit a batch job |
|---|
| # Using our reservation | sbatch --reseservation=core-ngs-class-0605 <batch_file>.slurm # Thursday
sbatch --reseservation=core-ngs-class-06050606 <batch_file>.slurm # Friday |
Note that todaythe day's reservation name (core-ngs-class-0605) is different from the TACC allocation/project for this class, which is OTH21164.
|
...
| Code Block |
|---|
| language | bash |
|---|
| title | Start an idev session |
|---|
|
idev -m 180120 -N 1 -A OTH21164 -r core-ngs-class-0605 |
Then stage the sample datasets and references we will use.
| Code Block |
|---|
| language | bash |
|---|
| title | Get the alignment exercises files |
|---|
|
# Copy the # Thursday
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606 # Friday
# or, without the reservation:
idev -m 120 -N 1 -A OTH21164 -p development |
Then stage the sample datasets and references we will use.
| Code Block |
|---|
| language | bash |
|---|
| title | Get the alignment exercises files |
|---|
|
# Copy the FASTA files for building references
mkdir -p $SCRATCH/core_ngs/references/fasta
cp $CORENGS/references/fasta/*.fa $SCRATCH/core_ngs/references/fasta/
# Copy the FASTQ files that will be used for alignment
mkdir -p $SCRATCH/core_ngs/alignment/fastq
cp $CORENGS/alignment/*fastq.gz $SCRATCH/core_ngs/alignment/fastq/
cd $SCRATCH/core_ngs/alignment/fastq |
...
| Code Block |
|---|
| language | bash |
|---|
| title | Start an idev session |
|---|
|
idev -m 180120 -N 1 -A OTH21164 -r core-ngs-class-0605 # orThursday
idev -m 120 -N 1 -A OTH21164 -p development |
| Code Block |
|---|
|
module load biocontainers r core-ngs-class-0606 # Friday
# or, without the reservation
idev -m 120 -N 1 -A OTH21164 -p development |
| Code Block |
|---|
|
module load biocontainers # takes a while
module load bwa
bwa |
...
You should now have a SAM file (yeast_pe.sam) that contains the alignments.
ExerciseExercises:
- How many lines does the SAM file have?
- How does this compare to the number of input sequences (R1+R2)?
| Expand |
|---|
|
wc -l yeast_pe.sam reports 1,184,378 lines
The alignment SAM file will contain records for both R1 and R2 reads, so we need to count sequences in both files. zcat ./fastq/Sample_Yeast_L005_R[12]*gz | wc -l | awk '{print $1/4}' reports 1,184,360 read sequences.
So the SAM file has 18 more lines than the R1+R2 total. These are the header records that appear before any alignment records. |
...
| Expand |
|---|
|
The SAM file must contain both mapped and un-mapped reads, because there were 1,184,360 R1+R2 reads and the same number of alignment records. Alignment records occur in the same read-name order as they did in the FASTQ, except that they come in pairs. The R1 read comes 1st, then the corresponding R2. This is called read name ordering. The command below uses cut to extract the 1st Tabtab-delimited field (-f 1) from the first few alignment records: | Code Block |
|---|
| grep -Pv '^@' yeast_pe.sam | head | cut -f 1 |
Next we'll see how to tell which read is the R1 and which is the R2. |
...
| Expand |
|---|
| title | Make sure you're in a idev session |
|---|
|
| Code Block |
|---|
| language | bash |
|---|
| title | Start an idev session |
|---|
| idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0605 # Thursday
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606 # Friday
# or, without the reservation:
idev -m 120 -N 1 -A OTH21164 -p development |
|
...
| Code Block |
|---|
| title | SAMtools suite usage |
|---|
|
Program: samtools (Tools for alignments in the SAM format)
Version: 1.20 (using htslib 1.20)
Usage: samtools <command> [options]
Commands:
-- Indexing
dict create a sequence dictionary file
faidx index/extract FASTA fqidx index/extract FASTQFASTA
fqidx index index alignment index/extract FASTQ
index index alignment
-- Editing
calmd recalculate MD/NM tags and '=' bases
fixmate fix mate information
reheader replace BAM header
targetcut cut fosmid regions (for fosmid pool only)
addreplacerg adds or replaces RG tags
markdup mark duplicates
ampliconclip clip oligos from the end of reads
-- File operations
collate shuffle and group alignments by name
cat concatenate BAMs
consensus produce a consensus Pileup/FASTA/FASTQ
merge merge sorted alignments
mpileup multi-way pileup
sort sort alignment file
split splits a file by read group
quickcheck quickly check if SAM/BAM/CRAM file appears intact
fastq converts a BAM to a FASTQ
fasta converts a BAM to a FASTA
import Converts FASTA or FASTQ files to SAM/BAM/CRAM
reference Generates a reference from aligned data
reset Reverts aligner changes in reads
-- Statistics
bedcov read depth per BED region
coverage alignment depth and percent coverage
depth depth compute the depth
flagstat simple stats
idxstats BAM index stats
cram-size list CRAM Content-ID and Data-Series sizes
phase phase heterozygotes
stats generate stats (former bamcheck)
ampliconstats generate amplicon specific stats
-- Viewing
flags explain BAM flags
head header viewer
tview text alignment viewer
view SAM<->BAM<->CRAM conversion
depad convert padded BAM to unpadded BAM
samples list the samples in a set of SAM/BAM/CRAM files
-- Misc
help [cmd] display this help message or help for [cmd]
version detailed version information |
...
The utility samtools index creates an index that has the same name as the input BAM file, with suffix .bai appended. Here's the samtools index usage:
| Code Block |
|---|
| title | samtools index usage |
|---|
| Usage |
Usage: samtools index -M [-bc] [-m INT] <in1.bam> <in2.bam>...
or: samtools index [-bc] [-m INT] <in.bam> [out.index]
Options:
-b, --bai Generate BAI-format index for BAM files [default]
-c, --csi Generate CSI-format index for BAM files
-m, --min-shift INT Set minimum interval size for CSI indices to 2^INT [14]
-M Interpret all filename arguments as files to be indexed
-o, --output FILE Write index to FILE [alternative to <out.index> in args]
-@, --threads INT Sets the number of threads [none] |
...
More information about the alignment is provided by the samtools idxstats report, which shows how many reads aligned to each contig in your reference.
Note that samtools idxstats must be run on a sorted, indexed BAM file.
...