| Tip | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
Use our summer school reservation (core-ngs-class-0605) for today when submitting batch jobs to get higher priority on the ls6 normal queue.
Note that today's reservation name (core-ngs-class-0605) is different from the TACC allocation/project for this class, which is OTH21164. |
...
You should now have a SAM file (yeast_pe.sam) that contains the alignments.
ExerciseExercises:
- How many lines does the SAM file have?
- How does this compare to the number of input sequences (R1+R2)?
| Expand | ||
|---|---|---|
| ||
The alignment SAM file will contain records for both R1 and R2 reads, so we need to count sequences in both files.
So the SAM file has 18 more lines than the R1+R2 total. These are the header records that appear before any alignment records. |
...
| Expand | |||||
|---|---|---|---|---|---|
| |||||
The SAM file must contain both mapped and un-mapped reads, because there were 1,184,360 R1+R2 reads and the same number of alignment records. Alignment records occur in the same read-name order as they did in the FASTQ, except that they come in pairs. The R1 read comes 1st, then the corresponding R2. This is called read name ordering. The command below uses cut to extract the 1st Tabtab-delimited field (-f 1) from the first few alignment records:
Next we'll see how to tell which read is the R1 and which is the R2. |
...
| Code Block | ||
|---|---|---|
| ||
Program: samtools (Tools for alignments in the SAM format) Version: 1.20 (using htslib 1.20) Usage: samtools <command> [options] Commands: -- Indexing dict create a sequence dictionary file faidx index/extract FASTA fqidx index/extract FASTQ index index alignment -- Editing calmd recalculate MD/NM tags and '=' bases fixmate fix mate information reheader replace BAM header targetcut cut fosmid regions (for fosmid pool only) addreplacerg adds or replaces RG tags markdup mark duplicates ampliconclip clip oligos from the end of reads -- File operations collate shuffle and group alignments by name cat concatenate BAMs consensus produce a consensus Pileup/FASTA/FASTQ merge merge sorted alignments mpileup merge sorted alignments mpileup multi-way pileup sort sort alignment file split splits a file by read group quickcheck quickly check if SAM/BAM/CRAM file appears intact fastq converts a BAM to a FASTQ fasta converts a BAM to a FASTA import Converts FASTA or FASTQ files to SAM/BAM/CRAM reference Generates a reference from aligned data reset Reverts aligner changes in reads -- Statistics bedcov read depth per BED region coverage alignment depth and percent coverage depth compute the depth flagstat simple stats idxstats BAM index stats cram-size list CRAM Content-ID and Data-Series sizes phase phase heterozygotes stats generate stats (former bamcheck) ampliconstats generate amplicon specific stats -- Viewing flags explain BAM flags head header viewer tview text alignment viewer view SAM<->BAM<->CRAM conversion depad convert padded BAM to unpadded BAM samples list the samples in a set of SAM/BAM/CRAM files -- Misc help [cmd] display this help message or help for [cmd] version detailed version information |
...
The utility samtools index creates an index that has the same name as the input BAM file, with suffix .bai appended. Here's the samtools index usage:
| Code Block | |||||
|---|---|---|---|---|---|
| Usage
| ||||
Usage: samtools index -M [-bc] [-m INT] <in1.bam> <in2.bam>... or: samtools index [-bc] [-m INT] <in.bam> [out.index] Options: -b, --bai Generate BAI-format index for BAM files [default] -c, --csi Generate CSI-format index for BAM files -m, --min-shift INT Set minimum interval size for CSI indices to 2^INT [14] -M Interpret all filename arguments as files to be indexed -o, --output FILE Write index to FILE [alternative to <out.index> in args] -@, --threads INT Sets the number of threads [none] |
...
More information about the alignment is provided by the samtools idxstats report, which shows how many reads aligned to each contig in your reference.
Note that samtools idxstats must be run on a sorted, indexed BAM file.
...