...
Make sure you're in an idev session, since we will be doing some significant computation, and make bedtools and samtools available.
| Expand |
|---|
| title | Make sure you're in an idev session |
|---|
|
| Code Block |
|---|
| language | bash |
|---|
| title | Start an idev session |
|---|
| idev -m 120 -N 1 -A OTH21164 -r | CoreNGS # or -A TRA23004
core-ngs-class-0606
# or
idev -m | 90120 -N 1 -A OTH21164 -p development
module load biocontainers
module load bedtools
bedtools --version # | or -A TRA23004should be bedtools v2.27.1 |
|
Copy over the yeast RNA-seq files we'll need (also copy the GFF gene annotation file if you didn't make one).
| Code Block |
|---|
| language | bash |
|---|
| title | Setup for BEDTools multicov |
|---|
|
# Get the merged yeast genes bed file if you didn't create one
mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
# Copy the BAM file
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
Exercises:
- How many reads are represented in the yeast_mrna.sort.filt.bam file?
- How many mapped? How many proper pairs? How many duplicates?
- What is the distribution of mapping qualities? What is the average mapping quality?
| Expand |
|---|
|
samtools flagstat for the different read counts. samtools view + cut + sort + uniq -c for mapping quality distribution samtools view + awk for average mapping quality |
...
| Expand |
|---|
|
| Code Block |
|---|
| module load samtools
cd $SCRATCH/core_ngs/bedtools_multicov
samtools flagstat yeast_mrna.sort.filt.bam | tee yeast_mrna.flagstat.txt |
| Code Block |
|---|
| title | samtools flagstat output |
|---|
| 3347559 + 0 in total (QC-passed reads + QC-failed reads)
24317 + 0 secondary
0 + 0 supplementary
922114 + 0 duplicates
3347559 + 0 mapped (100.00% : N/A)
3323242 + 0 paired in sequencing
1661699 + 0 read1
1661543 + 0 read2
3323242 + 0 properly paired (100.00% : N/A)
3323242 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5) |
There are 3323242 total reads, all mapped and all properly paired. So this must be a quality-filtered BAM. There are 922114 duplicates, or about 28%. To get the distribution of mapping qualities :(BAM field 5) | Code Block |
|---|
| samtools view yeast_mrna.sort.filt.bam | cut -f 5 | sort | uniq -c |
| Code Block |
|---|
| title | distribution of mapping qualities |
|---|
| 498 20
6504 21
1012 22
355 23
1054 24
2800 25
495 26
14133 27
282 28
358 29
954 30
1244 31
358 32
6143 33
256 34
265 35
1112 36
905 37
309 38
4845 39
5706 40
427 41
1946 42
1552 43
1771 44
6140 45
1771 46
3049 47
3881 48
3264 49
4475 50
15692 51
25378 52
16659 53
18305 54
7108 55
2705 56
59867 57
2884 58
2392 59
3118705 60 |
To compute average mapping quality: | Code Block |
|---|
| samtools view yeast_mrna.sort.filt.bam | awk '
BEGIN{FS="\t"; sum=0; tot=0}
{sum = sum + $5; tot = tot + 1}
END{printf("mapping quality average: %.1f for %d reads\n", sum/tot,tot) }' |
Mapping qualities range from 20 to 60 – excellent quality! Because the majority reads have mapping quality 60, the average is 59.2. So again, there must have been quality filtering performed on upstream alignment records. |
Here's how to run bedtools multicov in stranded mode, directing the standard output to a file:
| Expand |
|---|
|
| Code Block |
|---|
| idev -m 120 -N 1 -A OTH21164 -r CoreNGS core-ngs-class-0606
# or
idev -m 120 -N 1 -A TRA23004OTH21164 -p development
module load biocontainers
module load samtools
module load bedtools
mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
|
...
| Code Block |
|---|
| language | bash |
|---|
| title | Run bedtools multicov to count BAM alignments overlapping a set of genes |
|---|
|
cd $SCRATCH/core_ngs/bedtools_multicov
bedtools multicov -s -bams yeast_mrna.sort.filt.bam \
-bed merged.good.sc_genes.bed > yeast_mrna_gene_counts.bed |
ExerciseExercises:
- How may records of output were written?
- Where is the count of overlaps per output record?
| Expand |
|---|
|
| Code Block |
|---|
| wc -l yeast_mrna_gene_counts.bed |
6485 records were written, one for each feature in the merged.sc_genes.bed file. The overlap count was added as the last field in each output record. So here it is field 7 since the input annotation file had 6 columns. |
...
| Expand |
|---|
|
| Code Block |
|---|
| cut -f 7 yeast_mrna_gene_counts.bed | grep -v '^0' | wc -l
# or
cut -f 7 yeast_mrna_gene_counts.bed | grep -v -c '^0'
# or
cat yeast_mrna_gene_counts.bed | \
awk '{if ($7 > 0) print $7}' | wc -l |
Most of the genes (6141/6485) have non-zero read overlap counts. |
...
Make sure you're in an idev session, then prepare a directory for this exercise.
| Expand |
|---|
| title | Make sure you're in an idev session |
|---|
|
| Prepare for bedtools coverage | idev -m 120 -N 1 -A OTH21164 -r | CoreNGS # or -A TRA23004core-ngs-class-0606
# or
idev -m | 90120 -N 1 -A OTH21164 -p development | #or-ATRA23004
module load biocontainers
module load bedtools
bedtools --version # should be bedtools v2.27.1 |
|
| Code Block |
|---|
| language | bash |
|---|
| title | Prepare for bedtools coverage |
|---|
|
mkdir -p $SCRATCH/core_ngs/bedtools_genomecov
cd $SCRATCH/core_ngs/bedtools_genomecov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
...