...
Code Block | ||||
---|---|---|---|---|
| ||||
echo -e "File\tReference\tReads"; for sam in *.sam;do name=$(echo $sam | sed 's/.sam//');grep -v "^@" $sam | awk -F '\t' '{print $3}' | sort | uniq -c | awk -v sam="$name" '{sum+=$1}{print sam,$2,$1}END{print sam,"total",sum}' OFS='\t';done |
...
Again the format is not great, but can be copy pasted to Excel or similar for column alignment. and in either case this will produce a single row for each sample and single column for each reference, while at the same time, changing the * category (which from reading info about samtools you would see is a read that does not map to any reference) to a more human understood 'unmapped' classification.
You were promised a description of the complicated for loop above.
No Format |
---|
example bash command to generate input file
for sam in *.sam;do name=$(echo $sam | sed 's/.sam//');grep -v "^@" $sam | awk -F '\t' '{print $3}' | sort | uniq -c | awk -v sam="$name" '{sum+=$1}{print sam,$2,$1}END{print sam,"total",sum}' OFS='\t';done > ref_counts.tsv
for sam in *.sam; # for loop looking at sam files
do # standard bash for loop formating
name=$(echo $sam | sed 's/.sam//'); # strip .sam ending from files
grep -v "^@" $sam | # remove header lines from consideration
awk -F '\t' '{print $3}' | # Pull the 3rd column from the sam file. 3rd column is name of chromosome read mapped to
sort | # Sort chromosome names (required for uniq to work correctly)
uniq -c | # take unique chromosome names and return the count of each
awk # Start of long awk command
-v sam="$name" # define awk variable "sam" using shell variable "$name" (see above) for later printing
'{sum+=$1} # define variable 'sum' as running count of how many total reads were evaluated
{print sam,$2,$1} # for each line (chromosome) print the sam file name, chromosome name, and number of reads mapped to that chromosome
END{print sam,"total",sum}' # after all lines (chromosomes) dealt with, print sam file name, the word "total", and the value of variable 'sum' (total number of reads)
OFS='\t'; # OutputFieldSeparator set to tab so output will be tsv
done # standard bash for loop end
> ref_counts.tsv # redirect output to ref_counts.tsv file |
The above box is a copy paste from the header information found in the bt2-mapping-ref-count-table.py file.
Code Block | ||||
---|---|---|---|---|
| ||||
less `which bt2-mapping-ref-count-table.py` |
Tip |
---|
Note 1 last trick here where the text between the ` marks (backtick NOT single quote) is evaluated fist allowing you to open the script for viewing in the less viewer. Frustratingly, once you hit the first ` mark, you lose the ability to tab complete often making it easier to type the which command with tab completion, then go back to add the ` marks. |
Next steps
As noted above, the human trios tutorial deals with calling SNVs on multiple samples at once. Consider how you could combine the two steps
...