Table of Contents |
---|
...
Often, the first thing you (or your boss) want to know about your sequencing run is simply, "how many reads are there?". For the $BI/gva_course/mapping/data/SRR030257_1.fastq file, the answer is 3,800,180. How can we figure that out?
The grep (or Global Regular Expression Print) command can be used to determine the number of lines which match some criteria as shown above. Above we used it to search for:
- anything from the group of ACTGN with the [] marking them as a group
- matching any number of times *
- from the beginning of the line ^
- to the end of the line $
Here, since we are only interested in the number of reads that we have, we can make use of knowing the 3rd line in the fastq file is a + and a + only, and grep's -c option to simply report the number of reads in a file.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
grep -c "^+$" $BI/gva_course/mapping/data/SRR030257_1.fastq |
...
Info | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
In our first tutorial we mentioned how knowing what version of a program you are using can be. When we installed the the cutadapt package we didn't specify what version to install. Can you figure out what version you used, and what the most recent version of the program there is? .
Figuring out the most recent version is a little more complicated. Unlike programs on your computer like Microsoft Office or your internet browser, there is nothing in an installed program that tells you if you have the newest version or even what the newest version is. If you go to the programs website (easily found with google or this link), the changes section lists all the versions that have been list with v3.4 being released on March 30th of this year.
This won't be the last time we mention different program versions. |
Optional Exercise: Improve the quality of R2 the same way you did for R1.
Unfortunately we don't have time during the class to do this, but as a potential exercise in your free time, you could improve R2 the same way you did R1 and use the improved fastq files in the subsequent read mapping and variant calling tutorials to see the difference it can make in overall results.
Return to GVA2020 GVA2021 course page