Shell Programming and bash

This chapter contains advice about shell programming, specifically in bash. Most of the advice will apply to scripts written for other shells because extensions such as integer or array variables have been implemented there as well, with comparable syntax.

Consider Alternatives

Once a shell script is so complex that advice in this chapter applies, it is time to step back and consider the question: Is there a more suitable implementation language available?

For example, Python with its subprocess module can be used to write scripts which are almost as concise as shell scripts when it comes to invoking external programs, and Python offers richer data structures, with less arcane syntax and more consistent behavior.

Shell Language Features

The following sections cover subtleties concerning the shell programming languages. They have been written with the bash shell in mind, but some of these features apply to other shells as well.

Some of the features described may seem like implementation defects, but these features have been replicated across multiple independent implementations, so they now have to be considered part of the shell programming language.

Parameter Expansion

The mechanism by which named shell variables and parameters are expanded is called parameter expansion. The most basic syntax is “$variable” or “${variable}”.

In almost all cases, a parameter expansion should be enclosed in double quotation marks “…”.

external-program "$arg1" "$arg2"

If the double quotation marks are omitted, the value of the variable will be split according to the current value of the IFS variable. This may allow the injection of additional options which are then processed by external-program.

Parameter expansion can use special syntax for specific features, such as substituting defaults or performing string or array operations. These constructs should not be used because they can trigger arithmetic evaluation, which can result in code execution. See Arithmetic Evaluation.

Double Expansion

Double expansion occurs when, during the expansion of a shell variable, not just the variable is expanded, replacing it by its value, but the value of the variable is itself is expanded as well. This can trigger arbitrary code execution, unless the value of the variable is verified against a restrictive pattern.

The evaluation process is in fact recursive, so a self-referential expression can cause an out-of-memory condition and a shell crash.

Double expansion may seem like as a defect, but it is implemented by many shells, and has to be considered an integral part of the shell programming language. However, it does make writing robust shell scripts difficult.

Double expansion can be requested explicitly with the eval built-in command, or by invoking a subshell with “bash -c”. These constructs should not be used.

The following sections give examples of places where implicit double expansion occurs.

Arithmetic Evaluation

Arithmetic evaluation is a process by which the shell computes the integer value of an expression specified as a string. It is highly problematic for two reasons: It triggers double expansion (see Double Expansion), and the language of arithmetic expressions is not self-contained. Some constructs in arithmetic expressions (notably array subscripts) provide a trapdoor from the restricted language of arithmetic expressions to the full shell language, thus paving the way towards arbitrary code execution. Due to double expansion, input which is (indirectly) referenced from an arithmetic expression can trigger execution of arbitrary code, which is potentially harmful.

Arithmetic evaluation is triggered by the follow constructs:

  • The expression in “$expression” is evaluated. This construct is called arithmetic expansion.

  • $[expression]” is a deprecated syntax with the same effect.

  • The arguments to the let shell built-in are evaluated.

  • expression” is an alternative syntax for “let expression”.

  • Conditional expressions surrounded by “[[]]” can trigger arithmetic evaluation if certain operators such as -eq are used. (The test built-in does not perform arithmetic evaluation, even with integer operators such as -eq.)

    The conditional expression “[[ $variable =~ regexp ]]” can be used for input validation, assuming that regexp is a constant regular expression. See Performing Input Validation.

  • Certain parameter expansions, for example “${variable[expression]}” (array indexing) or “${variable:expression}” (string slicing), trigger arithmetic evaluation of expression.

  • Assignment to array elements using “array_variable[subscript]=expression” triggers evaluation of subscript, but not expression.

  • The expressions in the arithmetic for command, “for expression1; expression2; expression3; do commands; done” are evaluated. This does not apply to the regular for command, “for variable in list; do commands; done”.

Depending on the bash version, the above list may be incomplete.

If faced with a situation where using such shell features appears necessary, see Consider Alternatives.

If it is impossible to avoid shell arithmetic on untrusted inputs, refer to Performing Input Validation.

Type declarations

bash supports explicit type declarations for shell variables:

	declare -i integer_variable
	declare -a array_variable
	declare -A assoc_array_variable

	typeset -i integer_variable
	typeset -a array_variable
	typeset -A assoc_array_variable

	local -i integer_variable
	local -a array_variable
	local -A assoc_array_variable

	readonly -i integer_variable
	readonly -a array_variable
	readonly -A assoc_array_variable

Variables can also be declared as arrays by assigning them an array expression, as in:

array_variable=(1 2 3 4)

Some built-ins (such as mapfile) can implicitly create array variables.

Such type declarations should not be used because assignment to such variables (independent of the concrete syntax used for the assignment) triggers arithmetic expansion (and thus double expansion) of the right-hand side of the assignment operation. See Arithmetic Evaluation.

Shell scripts which use integer or array variables should be rewritten in another, more suitable language. See Consider Alternatives.

Other Obscurities

Obscure shell language features should not be used. Examples are:

  • Exported functions (export -f or declare -f).

  • Function names which are not valid variable names, such as “module::function”.

  • The possibility to override built-ins or external commands with shell functions.

  • Changing the value of the IFS variable to tokenize strings.

Invoking External Commands

When passing shell variables as single command line arguments, they should always be surrounded by double quotes. See Parameter Expansion.

Care is required when passing untrusted values as positional parameters to external commands. If the value starts with a hyphen “-”, it may be interpreted by the external command as an option. Depending on the external program, a “--” argument stops option processing and treats all following arguments as positional parameters. (Double quotes are completely invisible to the command being invoked, so they do not prevent variable values from being interpreted as options.)

Cleaning the environment before invoking child processes is difficult to implement in script. bash keeps a hidden list of environment variables which do not correspond to shell variables, and unsetting them from within a bash script is not possible. To reset the environment, a script can re-run itself under the “env -i” command with an additional parameter which indicates the environment has been cleared and suppresses a further self-execution. Alternatively, individual commands can be executed with “env -i”.

Complete isolation from its original execution environment (which is required when the script is executed after a trust transition, e.g., triggered by the SUID mechanism) is impossible to achieve from within the shell script itself. Instead, the invoking process has to clear the process environment (except for few trusted variables) before running the shell script.

Checking for failures in executed external commands is recommended. If no elaborate error recovery is needed, invoking “set -e” may be sufficient. This causes the script to stop on the first failed command. However, failures in pipes (“command1 | command2”) are only detected for the last command in the pipe, errors in previous commands are ignored. This can be changed by invoking “set -o pipefail”. Alternatively, return codes for previous commands in pipes can be accessed in the (“${PIPESTATUS[X]}”) array. Due to architectural limitations, only the process that spawned the entire pipe can check for failures in individual commands; it is not possible for a process to tell if the process feeding data (or the process consuming data) exited normally or with an error.

See Creating Safe Processes for additional details on creating child processes.

Temporary Files

Temporary files should be created with the mktemp command, and temporary directories with “mktemp -d”.

To clean up temporary files and directories, write a clean-up shell function and register it as a trap handler, as shown in Creating and Cleaning up Temporary Files. Using a separate function avoids issues with proper quoting of variables.

Exemple 1. Creating and Cleaning up Temporary Files
tmpfile="$(mktemp)"

cleanup () {
  rm -f -- "$tmpfile"
}

trap cleanup 0

Performing Input Validation

In some cases, input validation cannot be avoided. For example, if arithmetic evaluation is absolutely required, it is imperative to check that input values are, in fact, integers. See Arithmetic Evaluation.

Input validation in bash shows a construct which can be used to check if a string “$value” is an integer. This construct is specific to bash and not portable to POSIX shells.

Exemple 2. Input validation in bash
if [[ $value =~ ^-?[0-9]+$ ]] ; then
	echo value is an integer
else
	echo "value is not an integer" 1>&2
	exit 1
fi

Using case statements for input validation is also possible and supported by other (POSIX) shells, but the pattern language is more restrictive, and it can be difficult to write suitable patterns.

The expr external command can give misleading results (e.g., if the value being checked contains operators itself) and should not be used.

Guarding Shell Scripts Against Changes

bash only reads a shell script up to the point it is needed for executed the next command. This means that if script is overwritten while it is running, execution can jump to a random part of the script, depending on what is modified in the script and how the file offsets change as a result. (This behavior is needed to support self-extracting shell archives whose script part is followed by a stream of bytes which does not follow the shell language syntax.)

Therefore, long-running scripts should be guarded against concurrent modification by putting as much of the program logic into a main function, and invoking the main function at the end of the script, using this syntax:

main "$@" ; exit $?

This construct ensures that bash will stop execution after the main function, instead of opening the script file and trying to read more commands.