Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page should serve as a reference for the many "things Linux" we use in this course. It is by no means complete – Linux is **huge** – but offers introductions to many important topics.

...

  • Macs and Linux have a Terminal program built-in
  • Windows options:

Use ssh (secure shell) to login to a remote computers.

Code Block
languagebash
titleSSH to a remote computer
# General form:
ssh <user_name>@<full_host_name>

# For example
ssh abattenh@ls6.tacc.utexas.edu

...

  • ls *.bam – lists all files in the current directory that end in .bam
  • ls [A-Z]*.bam – does the same, but only if the first character of the file is a capital letter
  • ls [ABcd]*.bam – lists all .bam files whose 1st letter is A, B, c or d.
  • ls *.{fastq,fq}.gz – lists all .fastq.gz and .fq.gz files.

...

Streams and Piping

Standard streams and redirection

...

  • samtools view converts the binary small.bam file to text and writes alignment record lines one at a time to standard output.
    • -F 0x4 option says to filter out any records where the 0x4 flag bit is 0 (not set)
    • since the 0x4 flag bit is set (1) for unmapped records, this says to only report records where the query sequence did map to the reference
  • | head -1000
    • the pipe connects the standard output of samtools view to the standard input of head
    • the -1000 option says to only write the first 1000 lines of input to standard output
  • | cut -f 5
    • the pipe connects the standard output of head to the standard input of cut
    • the -f 5 option says to only write the 5th field of each input line to standard output (input fields are tab-delimited by default)
      • the 5th field of an alignment record is an integer representing the alignment mapping quality
      •  the resulting output will have one integer per line (and 1000 lines)
  • | sort -n
    • the pipe connects the standard output of cut to the standard input of sort
    • the -n option says to sort input lines according to numeric sort order
    • the resulting output will be 1000 numeric values, one per line, sorted from lowest to highest
  • | uniq -c
    • the pipe connects the standard output of sort to the standard input of uniq
    • the -c option option says to just count groups of lines with the same value (that's why they must be sorted) and report the total for each group
    • the resulting output will be one line for each group that uniq sees
    • each line will have the text for the group (here the unique mapping quality values) and a count of lines in each group

More Linux concepts

Environment variables

Environment variables are just like variables in a programming language (in fact bash is a complete programming language), they are "pointers" that reference data assigned to them. In bash, you assign an environment variable as shown below:

...