Content Comparison

...

Code Block

language	bash
title	Start an idev session

idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606

# or, without a reservation
idev -m 120 -N 1 -A OTH21164 -p development 

module load biocontainers
module load bedtools
bedtools --version   # should be bedtools v2.27.1

...

Expand

title	Make sure you're in an idev session

Code Block

language	bash
title	Start an idev session

idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606

# or, without a reservation:
idev -m 120 -N 1 -A OTH21164 -p development 

module load biocontainers
module load bedtools
bedtools --version   # should be bedtools v2.27.1

...

Make sure you're in an idev session, since we will be doing some significant computation, and make bedtools and samtools available.

Expand

title	Make sure you're in an idev session

Code Block

language	bash
title	Start an idev session

idev -m 120 -N 1 -A OTH21164 -r

CoreNGS

core-ngs-class-0606

# or,

-A

without

TRA23004

# or

reservation
idev -m

90

120 -N 1 -A OTH21164 -p development 

module load biocontainers
module load bedtools
bedtools --version   # should

or

be

-A TRA23004

bedtools v2.27.1

Copy over the yeast RNA-seq files we'll need (also copy the GFF gene annotation file if you didn't make one).

Code Block

language	bash
title	Setup for BEDTools multicov

# Get the merged yeast genes bed file if you didn't create one
mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .

# Copy the BAM file
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* .

Exercises:

How many reads are represented in the yeast_mrna.sort.filt.bam file?
How many mapped? How many proper pairs? How many duplicates?
What is the distribution of mapping qualities? What is the average mapping quality?

Expand

title	Hints

samtools flagstat for the different read counts.

samtools view + cut + sort + uniq -c for mapping quality distribution

samtools view + awk for average mapping quality

...

Expand

title	Answer

Code Block

language	bash

module load samtools

cd $SCRATCH/core_ngs/bedtools_multicov
samtools flagstat yeast_mrna.sort.filt.bam | tee yeast_mrna.flagstat.txt

Code Block

title	samtools flagstat output

3347559 + 0 in total (QC-passed reads + QC-failed reads)
24317 + 0 secondary
0 + 0 supplementary
922114 + 0 duplicates
3347559 + 0 mapped (100.00% : N/A)
3323242 + 0 paired in sequencing
1661699 + 0 read1
1661543 + 0 read2
3323242 + 0 properly paired (100.00% : N/A)
3323242 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

There are 3323242 total reads, all mapped and all properly paired. So this must be a quality-filtered BAM.

There are 922114 duplicates, or about 28%.

To get the distribution of mapping qualities :(BAM field 5)

Code Block

language	bash

samtools view yeast_mrna.sort.filt.bam | cut -f 5 | sort | uniq -c

Code Block

title	distribution of mapping qualities

To compute average mapping quality:

Code Block

language	bash

samtools view yeast_mrna.sort.filt.bam | awk '
  BEGIN{FS="\t"; sum=0; tot=0}
  {sum = sum + $5; tot = tot + 1}
  END{printf("mapping quality average: %.1f for %d reads\n", sum/tot,tot) }'

Mapping qualities range from 20 to 60 – excellent quality! Because the majority reads have mapping quality 60, the average is 59.2. So again, there must have been quality filtering performed on upstream alignment records.

Here's how to run bedtools multicov in stranded mode, directing the standard output to a file:

Expand

title	Setup (if needed)

Code Block

language	bash

idev -m 120 -N 1 -A OTH21164 -r CoreNGScore-ngs-class-0606

# or -A TRA23004, without a reservation:
idev -m 120 -N 1 -A OTH21164 -p development

module load biocontainers
module load samtools
module load bedtools

mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* .

...

Code Block

language	bash
title	Run bedtools multicov to count BAM alignments overlapping a set of genes

cd $SCRATCH/core_ngs/bedtools_multicov
bedtools multicov -s -bams yeast_mrna.sort.filt.bam \
  -bed merged.good.sc_genes.bed > yeast_mrna_gene_counts.bed

ExerciseExercises:

How may records of output were written?
Where is the count of overlaps per output record?

Expand

title	Answers

Code Block

language	bash

wc -l yeast_mrna_gene_counts.bed

6485 records were written, one for each feature in the merged.sc_genes.bed file.

The overlap count was added as the last field in each output record. So here it is field 7 since the input annotation file had 6 columns.

...

Expand

title	Answer

Code Block

language	bash

cut -f 7 yeast_mrna_gene_counts.bed | grep -v '^0' | wc -l
# or
cut -f 7 yeast_mrna_gene_counts.bed | grep -v -c '^0'
# or
cat yeast_mrna_gene_counts.bed | \
  awk '{if ($7 > 0) print $7}' | wc -l

Most of the genes (6141/6485) have non-zero read overlap counts.

...

Go to the UCSC Genome Browser: https://genome.ucsc.edu/
Select Genomes from the top menu bar
Select Human from POPULAR SPECIES
- under Human Assembly select Feb 2009 (GrCh37/hg19)
- select GO
In the hg19 browser page, the Layered H3K27Ac
- the 100 Vert. Cons track is a signal track
  - the x-axis is the genome position
  - the y-axis represents the
  count of ChIP-seq reads that overlap each position
  - where the ChIP'base-wise conservation among vertebrates
- customize the 100 Vert. Cons track
  - right-click on "100 Vert. Cons" text in the left margin,
    - select "Configure 100 Vert. Cons" from the menu
  - in the 100 Vert. Cons Track Settings dialog:
    - change "Track height" to 100
    - change "Data view scaling" to "auto-scale to data view"
    - click "OK"
- the Layered H3K27Ac track is a signal track
  - the x-axis is the genome position
  - the y-axis represents the count of ChIP-seq reads that overlap each position
    - where the ChIP'd protein is H3K27AC (histone H3, acetylated on the Lysine at amino acid position 27)

The bedtools genomecov function (https://bedtools.readthedocs.io/en/latest/content/tools/coverage.html), with the -bg (bedgraph) option produces output in bedGraph format. Here we'll analyze the per-base coverage of yeast RNAseq reads in our merged yeast gene regions.

Make sure you're in an idev session, then prepare a directory for this exercise.

Prepare for bedtools coverage

Expand

title	Make sure you're in an idev session

Code Block

title
language	bash

Start an idev session

idev -m 120 -N 1 -A OTH21164 -r

CoreNGS

core-ngs-class-0606

# or,

-A

without

TRA23004

#

reservation:

or

idev -m

90

120 -N 1 -A OTH21164 -p development

# or -A

TRA23004



module load biocontainers
module load bedtools
bedtools --version   # should be bedtools v2.27.1

Code Block

language	bash
title	Prepare for bedtools coverage

mkdir -p $SCRATCH/core_ngs/bedtools_genomecov
cd $SCRATCH/core_ngs/bedtools_genomecov 
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* .

...

Version	Old Version 91	New Version Current
Changes made by	Anna Battenhouse	Anna Battenhouse
Saved on	Jun 02, 2025	Jun 04, 2025

Versions Compared

Key