  • cat outputs all the contents of its input (one or more files and/or standard input) or the specified file
    • CAUTION – only use on small files!
  • zcat <file.gz> like cat, but understands the gzip (.gz) format, and decompresses the data before writing it to standard output
    • CAUTION – only use on small files!
    • Another CAUTION – does not understand .zip or .bz2 compression formats
  • more and less "pagers"
    • both display their (possibly very long) input one Terminal "page" at a time
    • in more:
      • use spacebar to advance a page
      • use q or Ctrl-c to exit more
    • in less:
      • q – quit
      • Ctrl-f or space – page forward
      • Ctrl-b – page backward
      • /<pattern> – search for <pattern> in forward direction
        • n – next match
        • N – previous match
      • ?<pattern> – search for <pattern> in backward direction
        • n – previous match going back
        • N – next match going forward
    • use less -N to display line numbers
    • can be used directly on .gz files
  • head and tail
    • show you the top or bottom 10 lines (by default) of their input
    • head -20 show the top 20 lines
    • tail -2 shows the last 2 lines
    • tail -n +100 shows lines starting at line 100
    • tail -n +100 | head -20 shows 20 lines starting at line 100
    • tail -f shows the last lines of a file, then follows the output as more lines are written (Ctrl-c to quit)
  • gunzip -c <file.gz> | more (or less) – like zcat, un-compresses lines of <file.gz> and outputs them to standard output
    • <file.gz> is not altered on disk
    • always pipe the output to a pager!

Copying files and directories

  • cp <source> <destination><destination> copies the file <source> to the location and/or file name <destination>.
    • using . (period) as the destination means "here, with the same name"
    • -p option says to preserve file times
    • cp -r <dirname> <destination> will recursively copy the directory dirname <dirname> and all its contents to the directory destination <destination>.
  • scp <user>@<host> <user>@<host>:<source_path> <destination_path>
    • Works just like cp but copies source from the a directory on the remote host machine to the local file destination
    • -p (preserve file times) and -r (recursive) options work the same as cp
    • A nice scp syntax resource is located here.
  • wget <url> fetches a file from a valid URL (e.g. http, https, ftp).
    • -O <file> specifies the name for the local file (defaults to the last component of the URL)
  • rsync -arvW <source_directory>/ <target_directory>/
    rsync -ptlrvP <source_directory>/ <target_directory>/
    • Recursively copies <source_directory> contents to <target_directory>, but only if <source_directory> files are newer or don't yet exist in <target_directory>
    • Remote path syntax (<user>@<host><user>@<host>:<absolute_or_home-relative_path>) can be used for either source or target (but not both).

    • Always include a trailing slash ( / ) after the source and target directory names!
    • -a means "archive" mode (equivalent to -ptl and some other options)
    • -r means recursively copy sub-directories
    • -v means verbose
    • -W means Whole file only
      • Normally the rsync algorithm compares the contents of files that need to be copied and only transfers the different portions. This option disables file content comparisons, which are not appropriate for large and/or binary files.
    • -p means preserve file permissions
    • -t means preserve file times
    • -l means copy symbolic links as links (vs -L which means dereference the link and copy the file it refers to)
    • -P means show transfer Progress (useful when large files are being transferred)


  • wc -l  reports the number of lines (-l) in its input
  • history lists your command history to the terminal
    • redirect to a file to save a history of the commands executed in a shell session
    • pipe to grep to search for a particular command
  • which <pgm> searches all $PATH directories to find <pgm> and reports its full pathname


cut, sort, uniq, grep, awk

  • cut -f <field_number(s)> extracts one or more fields (-f) from each line of its input
    • -d <delim> to change the field delimiter (Tab by default)
  • sort sorts its input using an efficient algorithm
    • by default sorts each line lexically, but
      • one or more fields to sort can be specified with one or more -k
      • <start_field_number>,<end_field_number> options
    • options to sort numerically (-n), or numbers-inside-text (version sort -V)
    • -t <delim> to change the field delimiter (whitespace -- one or more spaces or Tabs – by default)
  • uniq -c counts groupings of its input (which must be sorted) and reports the text and count for each group
  • Anchor
    grep -P '<pattern>'
    searches for <pattern> in its input and outputs only lines containing it
    • always enclose <pattern> in single quotes to inhibit shell evaluation!
    • -P says use Perl patterns, which are much more powerful than standard grep patterns
    • -c says just return a count of line matches
    • -n says include the line number of the matching line
    • -v (inverse match) says return only lines not matching the pattern
    • -l says return only the names of files that do contain the mattern match
    • -L says return only the names of files containing no pattern matches
    • <pattern> can contain special match meta-characters and modifiers such as:
      • ^ – matches beginning of line
      • $ – matches end of line
      • .  – (period) matches any single character
      • * – modifier; place after an expression to match 0 or more occurrences
      • + – modifier, place after an expression to match 1 or more occurrences
      • \s – matches any whitespace (\S any non-whitespace)
      • \d – matches digits 0-9
      • \w – matches any word character: A-Z, a-z, 0-9 and _ (underscore)
      • \t matches Tab; \r matches Carriage return; \n matches Linefeed
      • [xyz123] – matches any single character (including special characters) among those listed between the brackets [ ]
        • this is called a character class.
        • use [^xyz123^xyz123] to match any single character not listed in the class
      • (Xyz|Abc) – matches either Xyz or Abc or any text or expressions inside parentheses separated by | characters
        • note that parentheses ( ) may also be used to capture matched sub-expressions for later use
    • Regular expression modules are available in nearly every programming language (Perl, Python, Java, PHP, awk, even R)
      • each "flavor" is slightly different
      • even bash has multiple regex commands: grep, egrep, fgrep.
    • There are many good online regular expression tutorials, but be sure to pick one tailored to the language you will use.
  • Anchor
    '<script>' a powerful scripting language that is easily invoked from the command line
    • <script> is applied to each line of input (generally piped in)
      • always enclose <script> in single quotes to inhibit shell evaluation
    • General structure of an awk script:
      • BEGIN {<expressions>}  –  use to initialize variables before any script body lines are executed
        • e.g. BEGIN {FS=":"; OFS="\t"; sum=0} says
          • use colon (:) as the input field separator (FS), and tab (\t) as the output field separator (OFS)
            • the default input field separator (FS) is whitespace
              • one or more spaces or tabs
            • the default output field separator (OFS) is a single space
          • initialize the variable sum to 0
      • {<body expressions>}  – expressions to apply to each line of input
        • use $1, $2, etc. to pick out specific input fields
        • e.g. {print $3,$4} outputs fields 3 and 4 of the input, separated by the output field separator.
      • END {<expressions>} – executed after all input is complete (e.g. print a sum)
    • Here is an excellent awk tutorial, very detailed and in-depth
