Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titleMapping
collapsetrue
#check out our reference
less stickleback_chrom3.fasta 


#how many sequences are there?
grep "^>" stickleback_chrom3.fasta | wc -l


#index the reference
module load bwa
bwa index stickleback_chrom3.fasta


#run mapping
bwa mem stickleback_chrom3.fasta sample_fastqs/sample_AACCA-AGCGAC.1.fq sample_fastqs/sample_AACCA-AGCGAC.2.fq > sampleA.sam
bwa mem stickleback_chrom3.fasta sample_fastqs/sample_CGATC-AGCGAC.1.fq sample_fastqs/sample_CGATC-AGCGAC.2.fq > sampleB.sam
bwa mem stickleback_chrom3.fasta sample_fastqs/sample_GCATG-AGCGAC.1.fq sample_fastqs/sample_GCATG-AGCGAC.2.fq > sampleC.sam
bwa mem stickleback_chrom3.fasta sample_fastqs/sample_TCGAT-AGCGAC.1.fq sample_fastqs/sample_TCGAT-AGCGAC.2.fq > sampleD.sam


#assess mapping efficiency
module load samtools
samtools flagstat sampleA.sam > flagStatRes_sampleA.txt
samtools flagstat sampleB.sam > flagStatRes_sampleB.txt
samtools flagstat sampleC.sam > flagStatRes_sampleC.txt
samtools flagstat sampleD.sam > flagStatRes_sampleD.txt


#these results are not spectacular.
#80% mapping efficiency is OK, but the low rate of properly paired reads is disapointingdissapointing.


#convert the sam files to bam files
samtools sort sampleA.sam -O SAMBAM -o sampleA.bam
samtools sort sampleB.sam -O SAMBAM -o sampleB.bam
samtools sort sampleC.sam -O SAMBAM -o sampleC.bam
samtools sort sampleD.sam -O SAMBAM -o sampleD.bam

Genotyping

Now genotype the samples with mpileup.

Code Block
titlempileup
collapsetrue
#index the reference for samtools
module load samtools
samtools faidx stickleback_chrom3.fasta
 
#make a list of the bam files
ls *.bam > my_bamfiles.txt
 
#run mpileup
samtools mpileup -f stickleback_chrom3.fasta -t DP,AD,ADF,ADR,SP -u -b my_bamfiles.txt > mpileup_results.bcf


#now call genotypes from the mpileup results
bcftools call -vmO v -o raw_calls.vcf mpileup_results.bcf

...

Code Block
titlequality filtering
collapsetrue
#how many raw genotyped sites to be have?
vcftools --vcf raw_calls.vcf


      #After filtering, kept 4 out of 4 Individuals
      #After filtering, kept 14559 out of a possible 14559 Sites


#now perform basic quality filtering on raw calls
bcftools filter --exclude 'QUAL < 30' raw_calls.vcf | bcftools view > filt0.vcf
 
#check the new quality checked vcf
vcftools --vcf filt0.vcf


#Additionally, you can filter on representation among samples, allele frequency, and other options with VCFtools


#Filter for biallelic sites variants 
#genotyped in 75% of samples,
#with at least two reads covering the site


vcftools --vcf filt0.vcf --max-alleles 2 --max-missing 0.75 --minDP 2 --recode --out filt2