...
| Code Block |
|---|
| language | bash |
|---|
| title | Start an idev session |
|---|
|
idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606
# or, without a reservation
idev -m 120 -N 1 -A OTH21164 -p development
module load biocontainers
module load bedtools
bedtools --version # should be bedtools v2.27.1 |
...
| Expand |
|---|
| title | Make sure you're in an idev session |
|---|
|
| Code Block |
|---|
| language | bash |
|---|
| title | Start an idev session |
|---|
| idev -m 120 -N 1 -A OTH21164 -r core-ngs-class-0606
# or, without a reservation:
idev -m 120 -N 1 -A OTH21164 -p development
module load biocontainers
module load bedtools
bedtools --version # should be bedtools v2.27.1 |
|
...
Make sure you're in an idev session, since we will be doing some significant computation, and make bedtools and samtools available.
| Expand |
|---|
| title | Make sure you're in an idev session |
|---|
|
| Code Block |
|---|
| language | bash |
|---|
| title | Start an idev session |
|---|
| idev -m 120 -N 1 -A OTH21164 -r | CoreNGS core-ngs-class-0606
# or, | -ATRA23004# or90120 -N 1 -A OTH21164 -p development
module load biocontainers
module load bedtools
bedtools --version # should | or-A TRA23004
|
Copy over the yeast RNA-seq files we'll need (also copy the GFF gene annotation file if you didn't make one).
| Code Block |
|---|
| language | bash |
|---|
| title | Setup for BEDTools multicov |
|---|
|
# Get the merged yeast genes bed file if you didn't create one
mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
# Copy the BAM file
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
Exercises:
- How many reads are represented in the yeast_mrna.sort.filt.bam file?
- How many mapped? How many proper pairs? How many duplicates?
- What is the distribution of mapping qualities? What is the average mapping quality?
| Expand |
|---|
|
samtools flagstat for the different read counts. samtools view + cut + sort + uniq -c for mapping quality distribution samtools view + awk for average mapping quality |
...
| Expand |
|---|
|
| Code Block |
|---|
| module load samtools
cd $SCRATCH/core_ngs/bedtools_multicov
samtools flagstat yeast_mrna.sort.filt.bam | tee yeast_mrna.flagstat.txt |
| Code Block |
|---|
| title | samtools flagstat output |
|---|
| 3347559 + 0 in total (QC-passed reads + QC-failed reads)
24317 + 0 secondary
0 + 0 supplementary
922114 + 0 duplicates
3347559 + 0 mapped (100.00% : N/A)
3323242 + 0 paired in sequencing
1661699 + 0 read1
1661543 + 0 read2
3323242 + 0 properly paired (100.00% : N/A)
3323242 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5) |
There are 3323242 total reads, all mapped and all properly paired. So this must be a quality-filtered BAM. There are 922114 duplicates, or about 28%. To get the distribution of mapping qualities :(BAM field 5) | Code Block |
|---|
| samtools view yeast_mrna.sort.filt.bam | cut -f 5 | sort | uniq -c |
| Code Block |
|---|
| title | distribution of mapping qualities |
|---|
| 498 20
6504 21
1012 22
355 23
1054 24
2800 25
495 26
14133 27
282 28
358 29
954 30
1244 31
358 32
6143 33
256 34
265 35
1112 36
905 37
309 38
4845 39
5706 40
427 41
1946 42
1552 43
1771 44
6140 45
1771 46
3049 47
3881 48
3264 49
4475 50
15692 51
25378 52
16659 53
18305 54
7108 55
2705 56
59867 57
2884 58
2392 59
3118705 60 |
To compute average mapping quality: | Code Block |
|---|
| samtools view yeast_mrna.sort.filt.bam | awk '
BEGIN{FS="\t"; sum=0; tot=0}
{sum = sum + $5; tot = tot + 1}
END{printf("mapping quality average: %.1f for %d reads\n", sum/tot,tot) }' |
Mapping qualities range from 20 to 60 – excellent quality! Because the majority reads have mapping quality 60, the average is 59.2. So again, there must have been quality filtering performed on upstream alignment records. |
Here's how to run bedtools multicov in stranded mode, directing the standard output to a file:
| Expand |
|---|
|
| Code Block |
|---|
| idev -m 120 -N 1 -A OTH21164 -r CoreNGScore-ngs-class-0606
# or -A TRA23004, without a reservation:
idev -m 120 -N 1 -A OTH21164 -p development
module load biocontainers
module load samtools
module load bedtools
mkdir -p $SCRATCH/core_ngs/bedtools_multicov
cd $SCRATCH/core_ngs/bedtools_multicov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
|
...
| Code Block |
|---|
| language | bash |
|---|
| title | Run bedtools multicov to count BAM alignments overlapping a set of genes |
|---|
|
cd $SCRATCH/core_ngs/bedtools_multicov
bedtools multicov -s -bams yeast_mrna.sort.filt.bam \
-bed merged.good.sc_genes.bed > yeast_mrna_gene_counts.bed |
ExerciseExercises:
- How may records of output were written?
- Where is the count of overlaps per output record?
| Expand |
|---|
|
| Code Block |
|---|
| wc -l yeast_mrna_gene_counts.bed |
6485 records were written, one for each feature in the merged.sc_genes.bed file. The overlap count was added as the last field in each output record. So here it is field 7 since the input annotation file had 6 columns. |
...
| Expand |
|---|
|
| Code Block |
|---|
| cut -f 7 yeast_mrna_gene_counts.bed | grep -v '^0' | wc -l
# or
cut -f 7 yeast_mrna_gene_counts.bed | grep -v -c '^0'
# or
cat yeast_mrna_gene_counts.bed | \
awk '{if ($7 > 0) print $7}' | wc -l |
Most of the genes (6141/6485) have non-zero read overlap counts. |
...
- Go to the UCSC Genome Browser: https://genome.ucsc.edu/
- Select Genomes from the top menu bar
- Select Human from POPULAR SPECIES
- under Human Assembly select Feb 2009 (GrCh37/hg19)
- select GO
- In the hg19 browser page, the Layered H3K27Ac
- the 100 Vert. Cons track is a signal track
- the x-axis is the genome position
- the y-axis represents the
count of ChIP-seq reads that overlap each position- where the ChIP'base-wise conservation among vertebrates
- customize the 100 Vert. Cons track
- right-click on "100 Vert. Cons" text in the left margin,
- select "Configure 100 Vert. Cons" from the menu
- in the 100 Vert. Cons Track Settings dialog:
- change "Track height" to 100
- change "Data view scaling" to "auto-scale to data view"
- click "OK"
- the Layered H3K27Ac track is a signal track
- the x-axis is the genome position
- the y-axis represents the count of ChIP-seq reads that overlap each position
- where the ChIP'd protein is H3K27AC (histone H3, acetylated on the Lysine at amino acid position 27)
The bedtools genomecov function (https://bedtools.readthedocs.io/en/latest/content/tools/coverage.html), with the -bg (bedgraph) option produces output in bedGraph format. Here we'll analyze the per-base coverage of yeast RNAseq reads in our merged yeast gene regions.
Make sure you're in an idev session, then prepare a directory for this exercise.
| Expand |
|---|
| title | Make sure you're in an idev session |
|---|
|
| Prepare for bedtools coverage | idev -m 120 -N 1 -A OTH21164 -r | CoreNGS -ATRA23004#or
90120 -N 1 -A OTH21164 -p development | # or -ATRA23004
module load biocontainers
module load bedtools
bedtools --version # should be bedtools v2.27.1 |
|
| Code Block |
|---|
| language | bash |
|---|
| title | Prepare for bedtools coverage |
|---|
|
mkdir -p $SCRATCH/core_ngs/bedtools_genomecov
cd $SCRATCH/core_ngs/bedtools_genomecov
cp $CORENGS/catchup/bedtools_merge/merged*bed .
cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* . |
...