Introduction

What is Bash?

There are many Linux/Unix shells (bash, tcsh/csh, ksh, zsh...), but bash has become the most popular, probably because it is the default shell in open-source Linux.

It is a multi-faceted execution and programming environment, intimately tied to the Linux/Unix operating system, featuring:

  • a command-line interpreter (a.k.a. Read-Eval-Print loop, or REPL)
  • a rich set of built-in commands for file system navigation & data manipulation
  • advanced utility programs (e.g. cut, join, paste, sort, grep, sed, awk, perl)
    • some of which are full-featured programming languages of their own (awk, perl)
  • many programming language features
    • variables, variable types, control structures, functions
  • a lot of weird but powerful syntax (piping, redirection)
  • a highly extensible execution environment
    • enables calling of both built-in and custom scripts & programs

It is: large, complex, cryptic, intimidating ... but incredibly powerful!

"It's supposed to be hard. If it wasn't hard, everyone would do it.
 The hard ... is what makes it great."

-- Jimmy Dugan, A League of Their Own

What Bash is not

Then again, bash has some significant drawbacks:

  • a lot of weird (but powerful) syntax ( (smile) )
  • a meager set of built-in data types
  • functions and scripts can only return one small integer value
    • the proxy for this is capturing output, but this can be tricky
  • no support for object oriented programming

Why learn Bash?

Two main reasons:

  1. To improve command line productivity
  2. To write scripts

For command line productivity

The combination of piping, a large set of built-in utilities, and the ease of creating and troubleshooting long "command line one-liners" provides tremendous productivity potential over, for example, having to write a Python program to achieve equivalent results.

This Piping a Histogram discussion provides a good example.

For scripting

Because bash is an execution environment, it is uniquely well suited for executing a series of processing steps, often calling other programs or utilities, and integrating the results. Such scripts are sometimes termed pipeline scripts and can automate processes that consist of many sub tasks – for example, next-gen sequencing alignment pipeline scripts that go from raw reads (FASTQ files) to alignment reports (sorted, index BAM files), gathering statistics along the way.

Another possible programming language for writing pipeline scripts is Python, since it has a rich set of built-in features and can be easily extended. The downside of this choice is that it takes discipline to separate the scripting environment from Python program environments, which can lead to the creation of large, complex but fragile systems with many hidden dependencies.

These days, complex pipelines may be difficult to write in any single programming language; thus workflow managers are becoming increasingly popular. These tools allow the integrated orchestration of many different workflow components, potentially written in many different languages, managing their dependencies via rules (think make, on steroids), and can also be effectively deployed in cloud environments such as AWS and Google Cloud. Both Nextflow and Snakemake, two of the most popular workflow managers, support rule bodies written in bash as well as other languages.

Bash in the world

bash scripting is much in demand. And while it may not be loved by programmers, developers who know it are well compensated.

From a the 2020 Stack Overflow programming languages poll (https://insights.stackoverflow.com/survey/2020)

Most "loved"
programming languages
Most popular (and in demand)
programming languages

Developer pay by programming language