Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titlesolution
collapsetrue
#change to directory with species-separated vcfs
cd species_separated


#select window size and step (100Kb)
WINDOW_SIZE=100000
WINDOW_STEP=100000


#loop through vcfs and build commands for running --window-pi
>piCommands
for vcfFile in *.recode.vcf
do echo "vcftools --vcf $vcfFile --window-pi $WINDOW_SIZE --window-pi-step $WINDOW_STEP --out ${vcfFile/.recode.vcf} &" >> piCommands
done


#execute them


#look at results files
ls -l *windowed.pi


#send to pungitius_exercise/ on personal computer to plot with 

Plot the resulting *weir.fst and *.pi results with plot_window_stats.R

A pseudoautosomal region (PAR) is a region of a sex chromosome that still undergoes recombination.

...

admixture result

Phylogenetics

Code Block
(on TACC)
cd phylogenetics

Build gene trees from the PAR and from the divergent regions of chr12

...

Code Block
titlesolution
collapsetrue
#subset the vcf for the two regions
vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 0 --to-bp 2000000 --recode --out PAR
vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp 4000000 --to-bp 17000000 --recode --out SDR


#use vcf-kit to build trees
vk phylo tree upgma PAR.recode.vcf > PAR.newick
vk phylo tree upgma SDR.recode.vcf > SDR.newick


#look at the trees on TACC with view_newick.py
view_newick.py PAR.newick
view_newick.py SDR.newick

For a nicer tree viewer, Dendroscope works well.

Build gene trees in windows along chromosome

Code Block
titlesolution
collapsetrue
#set up boundaries
seq 0 250000 19750000 > lefts
seq 250000 250000 20000000 > rights
paste lefts rights > bounds.txt


#now build commands to subset the vcf with VCFtools, then pipe the subset into VCF-kit for tree building
awk '{print "vcftools --vcf chr12_for_phylo.vcf --chr chrXII --from-bp "$1 " --to-bp "$2" --recode -c | vk phylo tree upgma - > window_"$1"-"$2".newick &"}' bounds.txt > treeBuildCommands


#check the commands (should have 80 lines)
head treeBuildCommands
wc -l treeBuildCommands


#Double-check you are in an idev session
#If you are in an idev session, and had 80 lines, print to screen and execute


sh treeBuildCommands




#check that you have all trees
ls window*.newick | wc -l




#compile the resulting trees into a single file
#they need to be in order, so we can just simply cat them
#read bounds.txt again to cat them in order
echo ">allTrees.newick" >catTrees
while read bound
 do left=$(echo $bound | cut -d " " -f 1)
 right=$(echo $bound | cut -d " " -f 2)
 echo "cat window_${left}-${right}.newick | tr -d '\n' >> allTrees.newick && echo '' >>allTrees.newick" >> catTrees
done<bounds.txt


#execute the catting commands
sh catTrees


#check that the input is correct size (should be 80)
wc -l allTrees.newick


#now run Twisst on them
~/twisst/twisst.py -t allTrees.newick\
 -w topology_weights.tsvtxt\
 -g Pun_X -g Pun_Y -g Sin -g Tym\
 --groupsFile groups_file.txt\
 --method threshold\
 --thresholdTable ~/twisst/threshold_tables/binom_threshold_wilson_CI95_0.1.tsv



#This takes a bit.
#Once it's done, format the output
grep "^#" topology_weights.txt > topos.txt
grep -v "^#" topology_weights.txt > weights.txt


#send the results (all end in *.txt) to personal computer to plot with plot_twisst_results.R

...