Working with 3rd party program I/O
Recall the three standard Unix streams: they each have a number, a name and redirection syntax:

Third party bioinformatics tools are often written to perform sub-command processing; that is, they have a top-level program that handles multiple sub-commands. Examples include the bwa NGS aligner and the samtools and bedtools tool suites.
To see their menu of sub-commands, you usually just need to enter the top-level command, or <command> --help. Similarly, sub-command usage is usually available as <command> <sub-command> or <command> <sub-command> --help.
Now let's see how these concepts fit together when running 3rd party tools.
Exercise 2-3 bwa mem
Display the bwa mem sub-command usage using the more pager
Answer...
Just typing bwa mem | more doesn't use the more pager!
That's because bwa writes its usage information to standard error, not to standard output. So you have to use the funky 2>&1 syntax before piping to more:
bwa mem 2>&1 | more
Where does the bwa mem sub-command write its output?
Answer...
The bwa mem usage says:
Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]
This does not specify an output file, so it must write its alignment information to standard output.
How can this be changed?
Answer...
The bwa mem options usage says:
-o FILE sam file to output results to [stdout]
bwa mem also writes diagnostic progress as it runs, to standard error. This is typical for tools that may run for an extended period of time.
Real example...
cd ~/gzips
bwa mem /mnt/bioi/ref_genome/bwa/bwtsw/sacCer3/sacCer3.fa sm2.fq.gz > small.sam
Show how you would invoke bwa mem to capture both its alignment output and its progress diagnostics. Use input from a my_fastq.fq file and ./refs/hg38 as the <idxbase>.
Answers...
Redirecting the output to a file:
bwa mem ./refs/hg38 my_fastq.fq 1> my_fastq.sam 2>my_fastq.aln.log
Using the -o option:
bwa mem -o my_fastq.sam ./refs/hg38 2>my_fastq.aln.log
Exercise 2-4 cutadapt
The cutadapt adapter trimming command reads NGS sequences from a FASTQ file, and writes adapter-trimmed reads to a FASTQ file. Find its usage.
Answer...
cutadapt # overview; tells you to run cutadapt --help for details
cutadapt --help | less
cutadapt --help | more
Note that it also points you to https://cutadapt.readthedocs.io/ for full documentation.
Where does cutadapt write its output to from by default? How can that be changed?
Answer...
The cutadapt usage says that output can be written to a file using the -o option
Usage:
cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq
The brackets around [-o output.fastq] suggest this is optional. Reading a bit further we see:
... Without the -o option, output is sent to standard output.
This suggests output can be specified in 2 ways:
- to a file, using the -o option
- cutadapt -a CGTAATTCGCG -o trimmed.fastq small.fq
- to standard output without the -o option
- cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq
Where does cutadapt read its input from by default? How can that be changed? Can the input FASTQ be in compressed format?
Answer...
The cutadapt usage suggests that an input.fastq file is a required argument:
cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq
But again, reading a bit further we see:
... Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for
standard input/output. ...
This says that the input.fastq file can be provided in one of three compression formats.
And the usage also suggests input can be specified in 2 ways:
- from a file, using the -o option
- cutadapt -a CGTAATTCGCG -o trimmed.fastq small.fq
- from standard input if the input.fastq argument is replaced with a dash ( - )
- cat small.fq | cutadapt -a CGTAATTCGCG -o trimmed.fastq -
Where does cutadapt write its diagnostic output by default? How can that be changed?
Answer...
The cutadapt usage doesn't say anything directly about diagnostics:
cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq
But again, reading in the Output: options section:
-o FILE, --output=FILE
Write trimmed reads to FILE. FASTQ or FASTA format is
chosen depending on input. The summary report is sent
to standard output. Use '{name}' in FILE to
demultiplex reads into multiple files. Default: write
to standard output
Careful reading of this suggests that:
- When the -o option is omitted output goes to standard output,
- and diagnostics (the "summary report") is written to standard error
- so can be redirected to a log file with 2> trim.log
- cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq 2> trim.log
- But when the trimmed output is sent to a file with the -o output.fastq option,
- diagnostics are written to standard output
- so can be redirected to a log file with 1> trim.log
- cutadapt -a CGTAATTCGCG -o trimmed.fastq small.fq 1> trim.log
Real example...
cd ~/gzips
cutadapt -a AGATCGGAAGAGCACACGTCTGA small.fq > trimmed.fq