Error handling

Overview

One of the most common problems with writing any kind of program is lack of proper error handling.

If a program does not sanity check its operations and data, it can sometimes proceed for many subsequent steps until finally either a bad (or empty) result is generated, or some called program that does sanity check its data notices and terminates execution. It is then challenging to backtrack to where the original, causative error occurred – assuming there are even log files available to examine!

Some languages (Python, Java, R) automatically detect certain types of errors (e.g. file not found) and, by default, stop program execution and report an execution stack trace that can be displayed to the user. While these stack traces are not usually meaningful on their own, they are better than nothing, and certainly better than allowing the program to blithely continue.

But built-in error runtime error checks are no help in sanity checking program data, because only the programmer knows what the data should look like! Enter user-defined error checking. Again, some languages make this less cumbersome:

Python – assert statement
Java – assert keyword
R – stopifnot statement

Bash does not have built-in error checking, or the equivalent of throw/catch exception handling, which lets a program deal with errors – if possible.

So we're going to explore how to add our own error handling mechanisms. This will also force us to better understand shells & sub-shells, exit and result codes, and other communication between execution contexts.

Shells and sub-shells

Every bash program has its own execution environment (sub-shell), which is a child process of its calling parent shell. A new sub-shell is created, runs, and returns when:

a built-in bash utility (e.g. ls) is run from the command line (or from within script)
a custom script is run from the command line (or from within a script)
backtick evaluation is used to execute commands (e.g. echo `date`)
any set of commands enclosed in parentheses is run, e.g.
- ( date )

Parentheses evaluation is similar to backtick evaluation except that the standard output of backtick evaluation is automatically connected to the standard input of the caller. To connect the standard output of parentheses evaluation to the standard input of the caller, the the parentheses expression must be "evaluated" with a dollar sign. Consider:

today=`date`
today=$(date)
- the date command is run in a sub-shell, writing its data to its standard output
- date's standard output stream is connected to the calling shell's standard input
- the caller's standard input text is stored in the today variable

Here are the main communication methods between shell environments:

Input to sub-shells
- program arguments
- environment variables
- file or stream data
Output from sub-shells
- exit code
- standard output
- file data

Environment variables

In addition to passing arguments to a program, a caller may set environment variables (normal bash shell variables) that can be read in the called environment. However by default, variables in a parent shell are not copied into sub-shells unless they are exported using the export keyword.

# a normal bash variable is not visible to sub-shells
foo=abc
( echo $foo )

# exported bash variables are visible to sub-shells
export foo
( echo $foo )

export bar="def"
( echo $bar )

Script exit codes and function return values

Unlike most other programming languages, bash functions and scripts can only return a single integer between 0 and 255. By convention a return value of 0 means success, and any other return value is an error code.

A function can return this value using the return keyword (e.g. return 0); the return value is then stored in the special $? variable, which can be checked by the caller. Function callers are always in the same execution environment (sub-shell) as any bash functions they call. Since this not very much information, function return values are not often used or checked. Instead, as we've seen, functions are often called for their standard output, which serves as a return value proxy.

A script can also return an integer value, called the exit code, using the exit keyword (e.g. exit 255). The exit code is returned to the script caller (in the parent shell) in the $? variable. Note that no further code in the current sub-shell is executed after exit is called.

The main use of exit codes is to check that a called program completed successfully.

# A successful exit code is 0
ls
echo $?

# Any non-0 exit code is an error (here the code is 2)
ls not_a_file
echo $?

Note that in the non-0 exit code case, the program may also report error information on standard error (e.g. ls: cannot access not_a_file: No such file or directory above).

Tip

The $? return code variable must be checked immediately after the called program or sub-shell completes, because any further actions in the caller will change $?. One way to do this is to save off the value $? of in another variable (e.g. res=$?).

exercise 1

On the command line, call exit with various codes in a parentheses sub-shell and check the result in the caller.

Tip

You may want to do this in a new tmux or screen session, since accidentally calling exit at top-level (instead of in a sub-shell) will log you off the server!

Solution

tmux new

( exit 0 )
res=$?
echo "exit code: $res" 

( exit 255 )
res=$?
echo "exit code: $res"

x