Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Make sure you're in an idev session, since we will be doing some significant computation, and make bedtools and samtools available.

Expand
titleMake sure you're in an idev session


Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r 
CoreNGS # or -A TRA23004
core-ngs-class-0606
# or
idev -m 
90
120 -N 1 -A OTH21164 -p development 

module load biocontainers
module load bedtools
bedtools --version   # 
or -A TRA23004
should be bedtools v2.27.1


Copy over the yeast RNA-seq files we'll need (also copy the GFF gene annotation file if you didn't make one).

Code Block
languagebash
titleSetup for BEDTools multicov
# Get the merged yeast genes bed file if you didn't create one
mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .

# Copy the BAM file
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* .

Exercises:

  • How many reads are represented in the yeast_mrna.sort.filt.bam file?
  • How many mapped? How many proper pairs? How many duplicates?
  • What is the distribution of mapping qualities? What is the average mapping quality?
Expand
titleHints

samtools flagstat for the different read counts.

samtools view + cut + sort + uniq -c for mapping quality distribution

samtools view + awk for average mapping quality

...

Expand
titleAnswer


Code Block
languagebash
module load samtools

cd $SCRATCH/core_ngs/bedtools_multicov
samtools flagstat yeast_mrna.sort.filt.bam | tee yeast_mrna.flagstat.txt


Code Block
titlesamtools flagstat output
3347559 + 0 in total (QC-passed reads + QC-failed reads)
24317 + 0 secondary
0 + 0 supplementary
922114 + 0 duplicates
3347559 + 0 mapped (100.00% : N/A)
3323242 + 0 paired in sequencing
1661699 + 0 read1
1661543 + 0 read2
3323242 + 0 properly paired (100.00% : N/A)
3323242 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

There are 3323242 total reads, all mapped and all properly paired. So this must be a quality-filtered BAM.

There are 922114 duplicates, or about 28%.

To get the distribution of mapping qualities :(BAM field 5)

Code Block
languagebash
samtools view yeast_mrna.sort.filt.bam | cut -f 5 | sort | uniq -c 


Code Block
titledistribution of mapping qualities
    498 20
   6504 21
   1012 22
    355 23
   1054 24
   2800 25
    495 26
  14133 27
    282 28
    358 29
    954 30
   1244 31
    358 32
   6143 33
    256 34
    265 35
   1112 36
    905 37
    309 38
   4845 39
   5706 40
    427 41
   1946 42
   1552 43
   1771 44
   6140 45
   1771 46
   3049 47
   3881 48
   3264 49
   4475 50
  15692 51
  25378 52
  16659 53
  18305 54
   7108 55
   2705 56
  59867 57
   2884 58
   2392 59
3118705 60

To compute average mapping quality:

Code Block
languagebash
samtools view yeast_mrna.sort.filt.bam | awk '
  BEGIN{FS="\t"; sum=0; tot=0}
  {sum = sum + $5; tot = tot + 1}
  END{printf("mapping quality average: %.1f for %d reads\n", sum/tot,tot) }'

Mapping qualities range from 20 to 60 – excellent quality! Because the majority reads have mapping quality 60, the average is 59.2. So again, there must have been quality filtering performed on upstream alignment records.

Here's how to run bedtools multicov in stranded mode, directing the standard output to a file:

Expand
titleSetup (if needed)


Code Block
languagebash
idev -m 120 -N 1 -A OTH21164 -r CoreNGS  core-ngs-class-0606
# or
idev -m 120 -N 1 -A TRA23004OTH21164 -p development
module load biocontainers
module load samtools
module load bedtools

mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* .


...

Code Block
languagebash
titleRun bedtools multicov to count BAM alignments overlapping a set of genes
cd $SCRATCH/core_ngs/bedtools_multicov
bedtools multicov -s -bams yeast_mrna.sort.filt.bam \
  -bed merged.good.sc_genes.bed > yeast_mrna_gene_counts.bed

ExerciseExercises:

  • How may records of output were written?
  • Where is the count of overlaps per output record?
Expand
titleAnswers


Code Block
languagebash
wc -l yeast_mrna_gene_counts.bed

6485 records were written, one for each feature in the merged.sc_genes.bed file.

The overlap count was added as the last field in each output record. So here it is field 7 since the input annotation file had 6 columns.

...

Expand
titleAnswer


Code Block
languagebash
cut -f 7 yeast_mrna_gene_counts.bed | grep -v '^0' | wc -l
# or
cut -f 7 yeast_mrna_gene_counts.bed | grep -v -c '^0'
# or
cat yeast_mrna_gene_counts.bed | \
  awk '{if ($7 > 0) print $7}' | wc -l

Most of the genes (6141/6485) have non-zero read overlap counts.

...

Make sure you're in an idev session, then prepare a directory for this exercise.

Prepare for bedtools coverage
Expand
titleMake sure you're in an idev session


Code Block
languagebash
title
Start an idev session
idev -m 120 -N 1 -A OTH21164 -r 
CoreNGS # or -A TRA23004
core-ngs-class-0606
# or
idev -m 
90
120 -N 1 -A OTH21164 -p development
#
 
or

-A

TRA23004
module load biocontainers
module load bedtools
bedtools --version   # should be bedtools v2.27.1



Code Block
languagebash
titlePrepare for bedtools coverage
mkdir -p $SCRATCH/core_ngs/bedtools_genomecov
cd $SCRATCH/core_ngs/bedtools_genomecov 
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* .

...