...
Code Block |
---|
language | bash |
---|
title | Start an idev session |
---|
|
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Fri # or -A TRA23004
# or
idev -m 90 -N 1 -A OTH21164 -p development # or -A TRA23004 |
Copy over the yeast RNA-seq files we'll need (also copy the GFF gene annotation file if you didn't make one).
...
Expand |
---|
|
Code Block |
---|
| module load samtools
cd $SCRATCH/core_ngs/bedtools_multicov
samtools flagstat yeast_mrna.sort.filt.bam | tee yeast_mrna.flagstat.txt |
Code Block |
---|
title | samtools flagstat output |
---|
| 3347559 + 0 in total (QC-passed reads + QC-failed reads)
24317 + 0 secondary
0 + 0 supplementary
922114 + 0 duplicates
3347559 + 0 mapped (100.00% : N/A)
3323242 + 0 paired in sequencing
1661699 + 0 read1
1661543 + 0 read2
3323242 + 0 properly paired (100.00% : N/A)
3323242 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5) |
There are 3323242 total reads, all mapped and all properly paired. So this must be a quality-filtered BAM. There are 922114 duplicates, or about 28%. To get the distribution of mapping qualities: Code Block |
---|
| samtools view yeast_mrna.sort.filt.bam | cut -f 5 | sort | uniq -c |
Code Block |
---|
title | distribution of mapping qualities |
---|
| 498 20
6504 21
1012 22
355 23
1054 24
2800 25
495 26
14133 27
282 28
358 29
954 30
1244 31
358 32
6143 33
256 34
265 35
1112 36
905 37
309 38
4845 39
5706 40
427 41
1946 42
1552 43
1771 44
6140 45
1771 46
3049 47
3881 48
3264 49
4475 50
15692 51
25378 52
16659 53
18305 54
7108 55
2705 56
59867 57
2884 58
2392 59
3118705 60 |
To compute average mapping quality: Code Block |
---|
| samtools view yeast_mrna.sort.filt.bam | awk '
BEGIN{FS="\t"; sum=0; tot=0}
{sum = sum + $5; tot = tot + 1}
END{printf("mapping quality average: %.1f for %d reads\n", sum/tot,tot) }' |
Mapping qualities range from 20 to 60 – excellent quality! Because the majority reads have mapping quality 60, the average is 59. So again, there must have been quality filtering performed on upstream alignment records. |
...
Expand |
---|
|
Code Block |
---|
| idev -m 120 -N 1 -A OTH21164 -r CoreNGSday5 CoreNGS # or -A TRA23004
module load biocontainers
module load samtools
module load bedtools
mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
|
...
Expand |
---|
|
Code Block |
---|
| wc -l yeast_mrna_gene_counts.bed |
6485 records were written, one for each feature in the merged.sc_genes.bed file. The overlap count was added as the last field in each output record (. So here it is field 7 , since the input annotation file had 6 columns). |
Exercise: How many features have non-zero overlap counts?
...
Code Block |
---|
language | bash |
---|
title | Prepare for bedtools coverage |
---|
|
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-day5 # or -A TRA23004
# or
idev -m 90 -N 1 -A OTH21164 -p development # or -A TRA23004
module load biocontainers
module load bedtools
mkdir -p $SCRATCH/core_ngs/bedtools_genomecov
cd $SCRATCH/core_ngs/bedtools_genomecov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
Then calling bedtools genomecov is easy. The -bg option says to report the depth in bedGraph format.
Code Block |
---|
|
cd $SCRATCH/core_ngs/bedtools_genomecov
bedtools genomecov -bg -ibam yeast_mrna.sort.filt.bam > yeast_mrna.genomecov.bedGraph
wc -l yeast_mrna.genomecov.bedGraph # 1519274 lines |
...