Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Working with 3rd party program I/O

Recall the three standard Unix streams: they each have a number, a name and redirection syntax:

Image RemovedImage Added

3rd party tool

...

file and

...

stream handling

Third party bioinformatics tools are often written to perform sub-command processing; that is, they have a top-level program that handles multiple sub-commands. Examples include the bwa NGS aligner and the samtools and bedtools tool suites.

...

Tip
title3rd party tools and standard streams

Many tools write their main output to standard output by default, but have options to write it to a file instead.

Similarly, tools often write processing status and diagnostics to standard error, and it is usually your responsibility to redirect this elsewhere (e.g. to a log file).

Finally, tools may support taking their main input from standard input, but need a "placeholder" argument where you 'd would usually specify a file. That standard input placeholder is usually a single dash ( - ) but can also be a reserved word such as stdin.

Now let's see how these concepts fit together when running 3rd party tools.

Exercise 2-3 bwa mem

Display the bwa mem sub-command usage using the more pager.

Expand
titleAnswer...

Just typing bwa mem | more doesn't use the more pager!

That's because bwa writes its usage information to standard error, not to standard output. So you have to use the funky 2>&1 syntax before piping to more:

bwa mem 2>&1 | more

Where does the bwa mem sub-command write its output?

...

bwa mem also writes diagnostic progress as it runs, to standard error. This is typical for tools that may run for an extended period of time.

...

titleReal example...

...

languagebash

...

.

...

Show how you would invoke bwa mem to capture both its alignment output and its progress diagnostics. Use input from a my_fastq.fq file and ./refs/hg38 as the <idxbase>. (The resulting expression isn't expected to work!)

Expand
titleAnswers...

Redirecting the output to a file:
bwa mem ./refs/hg38 my_fastq.fq 1> my_fastq.sam  2>my_fastq.aln.log

Using the -o option:
bwa mem -o my_fastq.sam  ./refs/hg38  2>my_fastq.aln.log

A real example:

Code Block
languagebash
cd ~/gzips

# Diagnostic progress is written to standard error, which is 
# mapped to the Terminal
bwa mem /mnt/bioi/ref_genome/bwa/bwtsw/sacCer3/sacCer3.fa \
  sm2.fq.gz > small.sam

# Diagnostic progress on standard error is redirected to a log file
bwa mem /mnt/bioi/ref_genome/bwa/bwtsw/sacCer3/sacCer3.fa \
  sm2.fq.gz > small.sam 2>small.log
cat small.log

Exercise 2-4 cutadapt

The cutadapt adapter trimming command reads NGS sequences from a FASTQ file, and writes adapter-trimmed reads to a FASTQ file. Find its usage.

Expand
titleAnswer...

cutadapt    # overview; tells you to run cutadapt --help for detailsdoesn't display help
cutadapt --help | less
cutadapt --help | more

Note that it also points you to https://cutadapt.readthedocs.io/ for full documentation.

Where does cutadapt write its output to from by default? How can that be changed?

Expand
titleHint...

Pipe the usage to less -I. The -I options does a Case Insensitive search.
Then use the / operator search for a pattern like output, and use n to look for the next occurrence if needed.


Expand
titleAnswer...

The cutadapt usage says that output can be written to a file using the -o option

Usage:
    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

The brackets around [-o output.fastq] suggest this is optional. Reading a bit further we see:

... Without the -o option, output is sent to standard output.

This suggests output can be specified in 2 ways:

  • to a file, using the -o option
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq
  • to standard output without the -o option
    • cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq

...

Expand
titleAnswer...

The cutadapt usage suggests says that an input.fastq file is a required argument:

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But again, Again reading a bit further we see:

...                           Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for
standard input/output. ...

This says that the input.fastq file can be provided in one of three compression formats.

And the usage also suggests input can be specified in 2 ways:

  • from a file, using the -o optiona filename argument
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq
  • from standard input if the input.fastq argument is replaced with a dash ( - )
    • cat small.fq | cutadapt -a CGTAATTCGCG -o trimmed.fastq  -

...

expandReal example...
Expand
titleAnswer...

The cutadapt usage doesn't say anything directly about diagnostics:

    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

But again, reading in the Output: options section:

   -o FILE, --output=FILE
        Write trimmed reads to FILE. FASTQ or FASTA format is
        chosen depending on input. The summary report is sent
        to standard output. Use '{name}'
in FILE to
        demultiplex reads into multiple filesfor demultiplexing (see
        docs).
Default: write
        to standard output

Careful reading of this suggests that:

  • When the -o option is omitted output goes to standard output,
    • and diagnostics (the "summary report") is written to standard error
      • so can be redirected to a log file with 2> trim.log
    • cutadapt -a CGTAATTCGCG small.fq 1> trimmed.fastq 2> trim.log
  • But when the trimmed output is sent to a file with the -o output.fastq option,
    • diagnostics are written to standard output
      • so can be redirected to a log file with 1> trim.log
    • cutadapt -a CGTAATTCGCG -o trimmed.fastq  small.fq 1> trim.log
title

Real examples:

Code Block
languagebash
cd ~/gzips 

# No -o option, so output is written to standard output, redirected here
# Summary report is written to standard error, which goes to the Terminal
cutadapt -a AGATCGGAAGAGCACACGTCTGA small.fq  > trimmed.fq

# Same as above, but summary report is redirected to a log file
cutadapt -a AGATCGGAAGAGCACACGTCTGA small.fq > trimmed.fq 2>trim.log
cat trim.log

# Use -o to specify the output file
# Pipe the fastq data in, specifying "-" as the placeholder argument
# Summary report will go to standard output; redirect to a log file
cat small.fq | \
  cutadapt -a AGATCGGAAGAGCACACGTCTGA -o trimmed.fq - 1>tr.log
cat tr.log