Evaluating capture metrics
...
To run the program on Lonestar, there are three prerequisites: 1) A bam file and 2) a list of the genomic intervals that were to be captured and 3) the reference (.fa). As you would guess, the BAM and interval list both have to be based on exactly the same genomic reference file.
For our tutorial, the bam files are one of these:
Code Block | ||
---|---|---|
| ||
/corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam /corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12892.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam /corral-repl/utexas/BioITeam/ngs_course/human_variation/NA12891.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam |
I've started with one of Illumina's target capture definitions (the vendor of your capture kit will provide this) but since the bam files only represent chr21 data I've created a target definitions file from chr21 only as well. Here they are:
Code Block | ||
---|---|---|
| ||
/corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.chr20.reduced.withhead.intervallist /corral-repl/utexas/BioITeam/ngs_course/human_variation/target_intervals.reduced.withhead.intervallist |
And the relevant reference is:
Code Block | ||
---|---|---|
| ||
/corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa
/corral-repl/utexas/BioITeam/ngs_course/human_variation/ref/hs37d5.fa.fai |
We've already copied over all of these files to scratch yesterday at $SCRATCH/BDIB_Human_tutorial
/raw_files
let's copy these to a new folder in our
If you'd like to try this, copy the intervals, bam files, and reference (.fa and .fai) to a temporary directory on your $SCRATCH, and don't forget to "module load picard" first!
...
title | If you're in a hurry... |
---|
...
our scratch directory to begin our analysis.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
cds
mkdir tmpEBDIB_Exome_Capture cdtmpE BDIB_Exome_Capture cp/corral-repl/utexas/BioITeam/ngs_course/human_variation $SCRATCH/BDIB_Human_tutorial/raw_files/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam . cp/corral-repl/utexas/BioITeam/ngs_course/human_variation $SCRATCH/BDIB_Human_tutorial/raw_files/target_intervals.chr20.reduced.withhead.intervallist . cp/corral-repl/utexas/BioITeam/ngs_course/human_variation $SCRATCH/BDIB_Human_tutorial/raw_files/ref/hs37d5.fa . cp/corral-repl/utexas/BioITeam/ngs_course/human_variation $SCRATCH/BDIB_Human_tutorial/raw_files/ref/hs37d5.fa.fai . |
The run command looks long but isn't that complicated (like most java programs):
Code Block | ||
---|---|---|
| ||
module load picard java -Xmx4g -Djava.io.tmpdir=/tmp -jar $TACC_PICARD_DIR/CalculateHsMetrics.jar BI=target_intervals.chr20.reduced.withhead.intervallist TI=target_intervals.chr20.reduced.withhead.intervallist I=NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20111114.bam R=ref/hs37d5.fa O=exome.picard.stats PER_TARGET_COVERAGE=exome.pertarget.stats |
...
Code Block |
---|
grep -A 1 '^BAIT' exome.picard.stats | awk 'BEGIN {FS="\t"} {for (i=1;i<=NF;i++) {a[NR"_"i]=$i}} END {for (i=1;i<=NF;i++) {print a[1"_"i]"\t"a[2"_"i]}}'
## below here is the output of the above command, do not paste this into the command line
BAIT_SET target_intervals
GENOME_SIZE 3137454505
BAIT_TERRITORY 1843371
TARGET_TERRITORY 1843371
BAIT_DESIGN_EFFICIENCY 1
TOTAL_READS 4579959
PF_READS 4579959
PF_UNIQUE_READS 4208881
PCT_PF_READS 1
PCT_PF_UQ_READS 0.918978
PF_UQ_READS_ALIGNED 4114249
PCT_PF_UQ_READS_ALIGNED 0.977516
PF_UQ_BASES_ALIGNED 283708397
ON_BAIT_BASES 85464280
NEAR_BAIT_BASES 49788346
OFF_BAIT_BASES 148455771
ON_TARGET_BASES 85464280
PCT_SELECTED_BASES 0.476731
PCT_OFF_BAIT 0.523269
ON_BAIT_VS_SELECTED 0.631886
MEAN_BAIT_COVERAGE 46.363038
MEAN_TARGET_COVERAGE 46.76568
PCT_USABLE_BASES_ON_BAIT 0.245533
PCT_USABLE_BASES_ON_TARGET 0.245533
FOLD_ENRICHMENT 512.716312
ZERO_CVG_TARGETS_PCT 0.009438
FOLD_80_BASE_PENALTY 23.38284
PCT_TARGET_BASES_2X 0.849372
PCT_TARGET_BASES_10X 0.484824
PCT_TARGET_BASES_20X 0.435911
PCT_TARGET_BASES_30X 0.401622
PCT_TARGET_BASES_40X 0.36876
PCT_TARGET_BASES_50X 0.335459
PCT_TARGET_BASES_100X 0.173683
HS_LIBRARY_SIZE 5325189
HS_PENALTY_10X 232.05224
HS_PENALTY_20X -1
HS_PENALTY_30X -1
HS_PENALTY_40X -1
HS_PENALTY_50X -1
HS_PENALTY_100X -1
AT_DROPOUT 2.143632
GC_DROPOUT 10.000011
SAMPLE
LIBRARY
READ_GROUP |
...