Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What is text?

So what exactly is text? Inside of files, text isn't characters at all – it is all numbers (0's and 1's), because that's all computers know.

On standard Unix systems, each text character is stored as one byteeight binary bits – in a format called ASCII (American Standard Code for Information Interchange). Eight bits can store 2^8 = 256 values, numbered 0 - 255. In its original form values 0 - 127 were used for standard ASCII characters. Now values 128 - 255 comprise an Extended set. See https://www.asciitable.com/

The non-printable ASCII characters we care most about are:

...

  • backslash escape: \t

...

  • backslash escape: \n

...

Code Block
languagebash
# display 2 lines of text using \n for newline and \t for Tab
echo -e "aa z\nbb\tcc"                  

# use the hexdump alias to view the hex values for the alphabetic
# and special characters
echo -e "aa z\nbb\tcc" | hexdump

More at:

Other shell concepts

Environment variables

Environment variables are just like variables in a programming language (in fact bash is a complete programming language): they are names that hold a value assigned to them. As with all programming language variables, they have two operations:

  1. variable definition - assign a value to a variable name
  2. variable reference - use the variable name to represent the value it holds

...

basic grep

The word grep stands for general regular expression parser.

In Unix, the grep program performs regular-expression text searching, and displays lines where the pattern text is found.

Basic usage: grep <pattern> [file] where <pattern> describes what to search for. grep can also take its input on standard input

There are many grep regular expression metacharacters that control how the search is performed. We'll see more in Part 4: Advanced text manipulation, and at the grep command.

  • C-i will perform a case-insensitive search
  • grep -n will display line numbers where the pattern was matched

Tip

Because grep's metacharacters are different from metacharacters in bash, it is always a good idea to enclose the <pattern> in single quotes so that the shell treats it as literal text and passes it through as-is to grep.

More at Intro Unix: Introducing grep

What is text?

So what exactly is text? Inside of files, text isn't characters at all – it is all numbers (0's and 1's), because that's all computers know.

On standard Unix systems, each text character is stored as one byteeight binary bits – in a format called ASCII (American Standard Code for Information Interchange). Eight bits can store 2^8 = 256 values, numbered 0 - 255. In its original form values 0 - 127 were used for standard ASCII characters. Now values 128 - 255 comprise an Extended set. See https://www.asciitable.com/

The non-printable ASCII characters we care most about are:

  • Tab (decimal 9, hexadecimal 0x9, octal 0o011)
    • backslash escape: \t
  • Linefeed/Newline (decimal 10, hexadecimal 0xA, octal 0o012)
    • backslash escape: \n
  • Carriage Return (decimal 13, hexadecimal 0xD, octal 0o015)
    • backslash escape: \r

Code Block
languagebash
# Assigndisplay the environment variable named "varname" the value "hello world!"
varname='hello world!' 
Tip
  • Do not put spaces around the equals sign when assigning environment variable values! The shell is very picky about this.
  • Always enclose environment variable values that contain spaces in single or double quotes (see below)
  • Variable names can only contain alphnumeric (A-Z, a-z, 0-9) and underscore ( _ ) characters, and must begin with a letter.

An environment variable can be referenced by putting the dollar sign ( $ ) metacharacter in front of the variable name (e.g $varname) or the slightly longer syntax: ${varname}.

Examples:

2 lines of text using \n for newline and \t for Tab
echo -e "aa z\nbb\tcc"                  

# use the hexdump alias to view the hex values for the alphabetic
# and special characters
echo -e "aa z\nbb\tcc" | hexdump

More at:

Other shell concepts

Environment variables

Environment variables are just like variables in a programming language (in fact bash is a complete programming language): they are names that hold a value assigned to them. As with all programming language variables, they have two operations:

  1. variable definition - assign a value to a variable name
  2. variable reference - use the variable name to represent the value it holds

In bash, you define (set/assign)an environment variable like this:

Code Block
languagebash
# Assign Thethe environment variable named "$USERvarname" isthe evaluatedvalue and its value substituted
foo="My USER name is $USER";  echo $foo   
# Same as above using longer evaluation syntax
foo="My USER name is ${USER}; echo $foo  

# Undefined environment variables just appear as empty text
bar='chess'; echo "Today's game is: $bar"
unset bar;   echo "Today's game is: $bar"

# Evaluating an environment variable that contains an underscore 
# may need to use the longer evaluation syntax, if the literal text
# before or after it is an underscore.
my_var="middle"
echo "File name is: foo_${my_var}_bar.txt"

Your built-in environment variables (e.g. $USER, $MY_GROUP, $PATH) and their values can be viewed with the env command.

More at: Intro Unix: Writing text: Environment variables

Quoting in the shell

When the shell processes a command line, it first parses the text into tokens ("words"), which are groups of characters separated by whitespace (one or more space characters). Quoting affects how this parsing happens, including how metacharacters are treated and how text is grouped.

There are three types of quoting in the shell:

...

  • it groups together all text inside the quotes into a single token
  • it tells the shell not to "look inside" the quotes to perform any evaluation
    • all metacharacters inside the single quotes are ignored
    • in particular, any environment variables in single-quoted text are not evaluated

...

"hello world!"
varname='hello world!' 


Tip
  • Do not put spaces around the equals sign when assigning environment variable values! The shell is very picky about this.
  • Always enclose environment variable values that contain spaces in single or double quotes (see below)
  • Variable names can only contain alphnumeric (A-Z, a-z, 0-9) and underscore ( _ ) characters, and must begin with a letter.

An environment variable can be referenced by putting the dollar sign ( $ ) metacharacter in front of the variable name (e.g $varname) or the slightly longer syntax: ${varname}.

Examples:

Code Block
languagebash
# The variable "$USER" is evaluated and its value substituted
foo="My USER name is $USER";  echo $foo   
# Same as above using longer evaluation syntax
foo="My USER name is ${USER}; echo $foo  

# Undefined environment variables just appear as empty text
bar='chess'; echo "Today's game is: $bar"
unset bar;   echo "Today's game is: $bar"

# Evaluating an environment variable that contains an underscore 
# may need to use the longer evaluation syntax, if the literal text
# before or after it is an underscore.
my_var="middle"
echo "File name is: foo_${my_var}_bar.txt"

Your built-in environment variables (e.g. $USER, $MY_GROUP, $PATH) and their values can be viewed with the env command.

More at: Intro Unix: Writing text: Environment variables

Quoting in the shell

When the shell processes a command line, it first parses the text into tokens ("words"), which are groups of characters separated by whitespace (one or more space characters). Quoting affects how this parsing happens, including how metacharacters are treated and how text is grouped.

There are three types of quoting in the shell:

  1. single quoting (e.g. 'some text') – serves two purposes
    • it groups together all text inside the quotes into a single token
    • it allows environment variable evaluation, but inhibits some metacharcters
      • e.g. asterisk ( * ) pathname globbing and some other metacharacters
    • double quoting also preserves any special characters present in the text
      • e.g. newlines (\n) or Tabs (\t)
    backtick
    • tells the shell not to "look inside" the quotes to perform any evaluation
      • all metacharacters inside the single quotes are ignored
      • in particular, any environment variables in single-quoted text are not evaluated
    • single quoting preserves any whitespace present in the text (spaces or newlines)
  2. double quoting (e.g. `date`)
    evaluates the "some text") – also serves two purposes
    • it groups together all text inside the quotes into a single token
    • it allows environment variable evaluation, but inhibits some metacharcters
      • e.g. asterisk ( * ) pathname globbing and some other metacharacters
    • double quoting also preserves whitespace in the text
      • e.g. spaces, newlines (\n) , and Tabs (\t)
  3. backtick quoting (e.g. `date`)
    • evaluates the expression inside the backtick marks ( ` )
    • the standard output of the expression replaces the text inside the backtick marks ( ` )
    • note that the syntax $(<expression>)is equivalent to `<expression>`

Note that the quote characters themselves ( '  "  ` ) are metacharacters that tell the shell to "start a quoting process" then "end a quoting process" when the matching quote is found. Since they are part of the processing, the enclosing quotes are not included in the output.

Tip

Always use single ( ' ) or double ( " ) quotes when you define an environment variable whose value contains spaces so that the shell sees the quoted text as one item.

Single vs double quotes examples:

...

languagebash

...

    • is equivalent to `<expression>`

Note that the quote characters themselves ( '  "  ` ) are metacharacters that tell the shell to "start a quoting process" then "end a quoting process" when the matching quote is found. Since they are part of the processing, the enclosing quotes are not included in the output.

Tip

Always use single ( ' ) or double ( " ) quotes when you define an environment variable whose value contains spaces so that the shell sees the quoted text as one item.

Examples:

Code Block
languagebash
echo "My Unix group is $MY_GROUP"   # The text "$MY_GROUP" is evaluated 
                                    #   and its value substituted  
echo 'My Unix group is $MY_GROUP'   # The text "$MY_GROUP" is left as-is

foo="My USER name is $USER"; echo $foo  # The text "$USER" is evaluated 
                                        #   and its value substituted
foo='My USER name is $USER'; echo $foo  # The text "$USER" is left as-is

FOO="Hello world!"
echo "The value of variable 'FOO' is \"$FOO\""  # Escape the double quotes
                                                # The text "$USER"inside isdouble evaluatedquotes
and
its# valueSame substituted
foo='My USER name is $USER';output - the shell removes whitespace
echo $fooHello world!; echo Hello   world!

# Echo #a Themulti-line text "$USER"variable iswithout left as-is

FOO="Hello world!"
echo "The value of variable 'FOO' is \"$FOO\""  # Escape the double quotes inside double quotes
quotes, and the
# shell removes whitespace (here the newline)
foo='aa
bb'; echo $foo

# But enclose the multi-line variable in double quotes and whitespace
# is preserved
echo "$foo"


Tip

If you see the greater than ( > ) character after pressing Enter, it can mean that your quotes are not paired, and the shell is waiting for more input to contain the missing quote of the pair (either single or double). Just use Ctrl-c to get back to the command prompt.

...

Code Block
languagebash
date          # Calling the date command just displays just displays 
              #   date/time information
echo date     # Here "date" is treated as a literal word, and 
              #   written to output
echo `date`   # The date command is evaluated and its output replaces
   the command  today=$( date );      #   replaces echothe $todaycommand

# environmentAssign variablea "today"string isincluding assigned today's date to variable "today"
today="Today is: `date`"; echo $today  # "today" is assigned a string including today's date

More at: Intro Unix: Writing text: Quoting in the shell

...

Code Block
languagebash
ls haiku.txt xxx.txt                # displays both output and error text 
                                   #   on the Terminal
ls haiku.txt xxx.txt 2>/dev/null    # displays only output text on the the 
                                   #   Terminal
ls haiku.txt xxx.txt 1>/dev/null    # displays only error text on the Terminal 
                    # And this syntax (2>&1) sends standard output to outerr.log and standard error to the #   sameTerminal
place
as# standardAnd out.this So data fromsyntax (2>&1) sends both standard output and standard error 
# to willthe same beplace writtenas tostandard outerrout.log 
ls haiku.txt xxx.txt 1>outerr.log 2>&1 

...