SAMtools is a suite of commands for dealing with databases of mapped reads. You'll be using it quite a bit throughout the course. It includes programs for performing variant calling (mpileup-bcftools).
Table of Contents |
---|
Calling variants in reads mapped by
...
bowtie2
Right now, we'll be using it to call variants (find mutations) in the re-sequenced E. coli genome from the Mapping tutorial. You will need the output SAM files from that tutorial to continue here.
Warning | ||
---|---|---|
If you do not have the output from the Mapping tutorial, run these commands to copy over the output that would have been produced. Then, you can immediately start this tutorial!
|
We assume that you are still working in the main directory called intro_to_mapping
data that you created on $SCRATCH
.
...
Expand | ||||
---|---|---|---|---|
| ||||
This should work:
|
Prepare your directories
Create From inside your main mapping
directory, create a new output directory called samtools_bowtiebowtie2
or whatever makes sense to you.
...
SAM is a text file, so it is slow to access information about how any given read was mapped. SAMtools and many of the commands that we will run later work on BAM files (essentially GZIP compressed binary forms of the text SAM files). These can be loaded much more quickly. Typically, they also need to be sorted, so that when the program wants to look at all reads overlapping position 4,129,888, it can easily find them all at once without having to search through the entire BAM file.
Warning | |||||
---|---|---|---|---|---|
| |||||
Many commands past this point are computationally intensive. You should run them through an
|
Convert from SAM to BAM format.
...
What are all the options doing?
The samtools mpileup
command will take a few minutes to run. You might consider putting it in the background by pressing control-z
and then typing the command bg
so that you can do some other things in this terminal window at the same time – like starting to run variant calling on the BWA or bowtie mapping results. Remember, there are still many other processors available on this node!
Convert BCF to human-readable VCF:
...
Comparing the results of different mappers using bedtools
You will need the output from #Calling variants in reads mapped by BWA or Bowtie2 to complete this exercise.
Often you want to compare the results of variant calling on different samples or using different pipelines. Bedtools is a suite of utility programs that work on a variety of file formats, one of which is conveniently VCF format. It provides many ways of slicing, dicing, and comparing the information in VCF files. For example, we can use it to find out what predictions are the same and which are different from the variant calling on reads mapped with different programs if you generated VCF files for each one.
Set up a new output directory and copy the respective VCF files to it, renaming them so that we know where they came from:
...