...
| Code Block |
|---|
| language | bash |
|---|
| title | Alternative to head/tail/cat for examining a file without causing programs to crash |
|---|
| collapse | true |
|---|
|
less NC_012967.1.fasta.fai # can exit with "q"
|
As you can see, the less command also works perfectly well with files that are not in danger of crashing anything without cluttering your terminal with lines of a file.
Convert mapped reads from SAM to BAM, sort, and index
...
| Warning |
|---|
| title | Do not run on head node |
|---|
|
Many commands past this point are computationally intensive. You should run them through an idev shell or by qsub. We recommend idev for the tutorial. | Code Block |
|---|
| title | Example command to start an idev shell |
|---|
| idev -m 60 -q development -A UT-2015-05-18 |
| Code Block |
|---|
| language | bash |
|---|
| title | Commands to be executed in order... |
|---|
| samtools view -b -S -o SRR030257.bam SRR030257.sam
samtools sort SRR030257.bam SRR030257.sorted
samtools index SRR030257.sorted.bam |
| Tip |
|---|
| This is a really common sequence of commands, so you might want to add it to your personal cheat sheet. |
|
...
| Expand |
|---|
| title | What new files were created by these commands? |
|---|
|
| Code Block |
|---|
| language | bash |
|---|
| title | List the contents of the output directory |
|---|
| ls
samtools_bowtie2
|
| Code Block |
|---|
| NC_012967.1.fasta SRR030257.sorted.bam.bai
NC_012967.1.fasta.fai SRR030257.sam
SRR030257.bam SRR030257.sorted.bam
|
|
| Expand |
|---|
| title | Why didn't we name the output SRR030257.sorted.bam in the samtools sort command? |
|---|
|
Samtools appends an extra .bam to whatever we put here, so it would have created SRR030257.sorted.bam.bam, and then we would have had to make a joke about the Flintstones. |
| Expand |
|---|
| title | Can you guess what a *.bai file is? |
|---|
|
Sure enough, it's the index file for the BAM file. |
...
| Code Block |
|---|
| title | This is *one* command. Put it all on one line. |
|---|
|
samtools mpileup -u -f NC_012967.1.fasta.fai SRR030257.sorted.bam > SRR030257.bcf
|
...
VCF format has alternative Allele Frequency tags denoted by AF= Try the following command to see what values we have in our files.
| Code Block |
|---|
grep AF=AF1 SRR030257.vcf
|
| Expand |
|---|
| title | Optional: For the data we are dealing with, predictions with an allele frequency not equal to 1 are not really applicable. (The reference genome is haploid. There aren't any heterozygotes.) How can we remove these lines from the file? |
|---|
|
Try looking at grep --help to see what you can come up with. | Code Block |
|---|
| language | bash |
|---|
| title | Here for answer |
|---|
| collapse | true |
|---|
| grep -v *something* # The -v flag inverts the match effecitvely showing you everything that does not match your input
|
| Expand |
|---|
| | Code Block |
|---|
cat input.vcf | grep AFAF1=1 > output.vcf
|
Is not practical, since we will lose vital VCF formatting and may not be able to use this file in the future. | Code Block |
|---|
cat input.vcf | grep -v AFAF1=0 > output.vcf
|
Will preserve all lines that don't have a AF=0 value and is one way of doing this. | Code Block |
|---|
sed -i '/AF1=0/ d' input.vcf
|
Is a way of doing it in-line and not requiring you to make another file. (But it writes over your existing file!) |
|
...
| Code Block |
|---|
| language | bash |
|---|
| title | If you have done any of the optional other mapping tutorials, consider the following comparisons. Remember the use of cp -i (or cp -n on some newer linux versions) is useful to make sure you don't overwrite any existing files. |
|---|
|
mkdir comparison
cp -ni samtools_bowtie2/SRR030257.vcf comparison/bowtie2.vcf
cp -ni samtools_bwa/SRR030257.vcf comparison/bwa.vcf
cp -ni samtools_bowtie/SRR030257.vcf comparison/bowtie.vcf
cd comparison
|
...