...
- Intro Unix: Viewing text in files: head and tail
- Intro Unix: Viewing text in files: Text lines and the terminal
- Intro Unix: Writing text: echo - the bash print function
What is text?
So what exactly is text? Inside of files, text isn't characters at all – it is all numbers (0's and 1's), because that's all computers know.
On standard Unix systems, each text character is stored as one byte – eight binary bits – in a format called ASCII (American Standard Code for Information Interchange). Eight bits can store 2^8 = 256 values, numbered 0 - 255. In its original form values 0 - 127 were used for standard ASCII characters. Now values 128 - 255 comprise an Extended set. See https://www.asciitable.com/
The non-printable ASCII characters we care most about are:
...
- backslash escape: \t
...
- backslash escape: \n
...
| Code Block | ||
|---|---|---|
| ||
# display 2 lines of text using \n for newline and \t for Tab
echo -e "aa z\nbb\tcc"
# use the hexdump alias to view the hex values for the alphabetic
# and special characters
echo -e "aa z\nbb\tcc" | hexdump |
More at:
Other shell concepts
Environment variables
Environment variables are just like variables in a programming language (in fact bash is a complete programming language): they are names that hold a value assigned to them. As with all programming language variables, they have two operations:
- variable definition - assign a value to a variable name
- variable reference - use the variable name to represent the value it holds
...
basic grep
The word grep stands for general regular expression parser.
In Unix, the grep program performs regular-expression text searching, and displays lines where the pattern text is found.
Basic usage: grep <pattern> [file] where <pattern> describes what to search for. grep can also take its input on standard input.
There are many grep regular expression metacharacters that control how the search is performed. We'll see more in Part 4: Advanced text manipulation, and at the grep command.
- C-i will perform a case-insensitive search
- grep -n will display line numbers where the pattern was matched
| Tip |
|---|
Because grep's metacharacters are different from metacharacters in bash, it is always a good idea to enclose the <pattern> in single quotes so that the shell treats it as literal text and passes it through as-is to grep. |
More at Intro Unix: Introducing grep
What is text?
So what exactly is text? Inside of files, text isn't characters at all – it is all numbers (0's and 1's), because that's all computers know.
On standard Unix systems, each text character is stored as one byte – eight binary bits – in a format called ASCII (American Standard Code for Information Interchange). Eight bits can store 2^8 = 256 values, numbered 0 - 255. In its original form values 0 - 127 were used for standard ASCII characters. Now values 128 - 255 comprise an Extended set. See https://www.asciitable.com/
The non-printable ASCII characters we care most about are:
- Tab (decimal 9, hexadecimal 0x9, octal 0o011)
- backslash escape: \t
- Linefeed/Newline (decimal 10, hexadecimal 0xA, octal 0o012)
- backslash escape: \n
- Carriage Return (decimal 13, hexadecimal 0xD, octal 0o015)
- backslash escape: \r
- backslash escape: \r
| Code Block | ||
|---|---|---|
| ||
# Assigndisplay the environment variable named "varname" the value "hello world!" varname='hello world!' |
| Tip |
|---|
|
An environment variable can be referenced by putting the dollar sign ( $ ) metacharacter in front of the variable name (e.g $varname) or the slightly longer syntax: ${varname}.
Examples:
2 lines of text using \n for newline and \t for Tab
echo -e "aa z\nbb\tcc"
# use the hexdump alias to view the hex values for the alphabetic
# and special characters
echo -e "aa z\nbb\tcc" | hexdump |
More at:
Other shell concepts
Environment variables
Environment variables are just like variables in a programming language (in fact bash is a complete programming language): they are names that hold a value assigned to them. As with all programming language variables, they have two operations:
- variable definition - assign a value to a variable name
- variable reference - use the variable name to represent the value it holds
In bash, you define (set/assign)an environment variable like this:
| Code Block | ||
|---|---|---|
| ||
# Assign Thethe environment variable named "$USERvarname" isthe evaluatedvalue and its value substituted foo="My USER name is $USER"; echo $foo # Same as above using longer evaluation syntax foo="My USER name is ${USER}; echo $foo # Undefined environment variables just appear as empty text bar='chess'; echo "Today's game is: $bar" unset bar; echo "Today's game is: $bar" # Evaluating an environment variable that contains an underscore # may need to use the longer evaluation syntax, if the literal text # before or after it is an underscore. my_var="middle" echo "File name is: foo_${my_var}_bar.txt" |
Your built-in environment variables (e.g. $USER, $MY_GROUP, $PATH) and their values can be viewed with the env command.
More at: Intro Unix: Writing text: Environment variables
Quoting in the shell
When the shell processes a command line, it first parses the text into tokens ("words"), which are groups of characters separated by whitespace (one or more space characters). Quoting affects how this parsing happens, including how metacharacters are treated and how text is grouped.
There are three types of quoting in the shell:
...
- it groups together all text inside the quotes into a single token
- it tells the shell not to "look inside" the quotes to perform any evaluation
- all metacharacters inside the single quotes are ignored
- in particular, any environment variables in single-quoted text are not evaluated
...
"hello world!"
varname='hello world!' |
| Tip |
|---|
|
An environment variable can be referenced by putting the dollar sign ( $ ) metacharacter in front of the variable name (e.g $varname) or the slightly longer syntax: ${varname}.
Examples:
| Code Block | ||
|---|---|---|
| ||
# The variable "$USER" is evaluated and its value substituted
foo="My USER name is $USER"; echo $foo
# Same as above using longer evaluation syntax
foo="My USER name is ${USER}; echo $foo
# Undefined environment variables just appear as empty text
bar='chess'; echo "Today's game is: $bar"
unset bar; echo "Today's game is: $bar"
# Evaluating an environment variable that contains an underscore
# may need to use the longer evaluation syntax, if the literal text
# before or after it is an underscore.
my_var="middle"
echo "File name is: foo_${my_var}_bar.txt"
|
Your built-in environment variables (e.g. $USER, $MY_GROUP, $PATH) and their values can be viewed with the env command.
More at: Intro Unix: Writing text: Environment variables
Quoting in the shell
When the shell processes a command line, it first parses the text into tokens ("words"), which are groups of characters separated by whitespace (one or more space characters). Quoting affects how this parsing happens, including how metacharacters are treated and how text is grouped.
There are three types of quoting in the shell:
- single quoting (e.g. 'some text') – serves two purposes
- it groups together all text inside the quotes into a single token
- it allows environment variable evaluation, but inhibits some metacharcters
- e.g. asterisk ( * ) pathname globbing and some other metacharacters
- e.g. asterisk ( * ) pathname globbing and some other metacharacters
- double quoting also preserves any special characters present in the text
- e.g. newlines (\n) or Tabs (\t)
- tells the shell not to "look inside" the quotes to perform any evaluation
- all metacharacters inside the single quotes are ignored
- in particular, any environment variables in single-quoted text are not evaluated
- single quoting preserves any whitespace present in the text (spaces or newlines)
- double quoting (e.g. `date`)
evaluates the "some text") – also serves two purposes- it groups together all text inside the quotes into a single token
- it allows environment variable evaluation, but inhibits some metacharcters
- e.g. asterisk ( * ) pathname globbing and some other metacharacters
- e.g. asterisk ( * ) pathname globbing and some other metacharacters
- double quoting also preserves whitespace in the text
- e.g. spaces, newlines (\n) , and Tabs (\t)
- backtick quoting (e.g. `date`)
- evaluates the expression inside the backtick marks ( ` )
- the standard output of the expression replaces the text inside the backtick marks ( ` )
- note that the syntax $(<expression>)is equivalent to `<expression>`
Note that the quote characters themselves ( ' " ` ) are metacharacters that tell the shell to "start a quoting process" then "end a quoting process" when the matching quote is found. Since they are part of the processing, the enclosing quotes are not included in the output.
| Tip |
|---|
Always use single ( ' ) or double ( " ) quotes when you define an environment variable whose value contains spaces so that the shell sees the quoted text as one item. |
Single vs double quotes examples:
...
| language | bash |
|---|
...
- is equivalent to `<expression>`
Note that the quote characters themselves ( ' " ` ) are metacharacters that tell the shell to "start a quoting process" then "end a quoting process" when the matching quote is found. Since they are part of the processing, the enclosing quotes are not included in the output.
| Tip |
|---|
Always use single ( ' ) or double ( " ) quotes when you define an environment variable whose value contains spaces so that the shell sees the quoted text as one item. |
Examples:
| Code Block | ||
|---|---|---|
| ||
echo "My Unix group is $MY_GROUP" # The text "$MY_GROUP" is evaluated # and its value substituted echo 'My Unix group is $MY_GROUP' # The text "$MY_GROUP" is left as-is foo="My USER name is $USER"; echo $foo # The text "$USER" is evaluated # and its value substituted foo='My USER name is $USER'; echo $foo # The text "$USER" is left as-is FOO="Hello world!" echo "The value of variable 'FOO' is \"$FOO\"" # Escape the double quotes # The text "$USER"inside isdouble evaluatedquotes and its# valueSame substituted foo='My USER name is $USER';output - the shell removes whitespace echo $fooHello world!; echo Hello world! # Echo #a Themulti-line text "$USER"variable iswithout left as-is FOO="Hello world!" echo "The value of variable 'FOO' is \"$FOO\"" # Escape the double quotes inside double quotes quotes, and the # shell removes whitespace (here the newline) foo='aa bb'; echo $foo # But enclose the multi-line variable in double quotes and whitespace # is preserved echo "$foo" |
| Tip |
|---|
If you see the greater than ( > ) character after pressing Enter, it can mean that your quotes are not paired, and the shell is waiting for more input to contain the missing quote of the pair (either single or double). Just use Ctrl-c to get back to the command prompt. |
...
| Code Block | ||
|---|---|---|
| ||
date # Calling the date command just displays just displays # date/time information echo date # Here "date" is treated as a literal word, and # written to output echo `date` # The date command is evaluated and its output replaces the command today=$( date ); # replaces echothe $todaycommand # environmentAssign variablea "today"string isincluding assigned today's date to variable "today" today="Today is: `date`"; echo $today # "today" is assigned a string including today's date |
More at: Intro Unix: Writing text: Quoting in the shell
...
| Code Block | ||
|---|---|---|
| ||
ls haiku.txt xxx.txt # displays both output and error text # on the Terminal ls haiku.txt xxx.txt 2>/dev/null # displays only output text on the the # Terminal ls haiku.txt xxx.txt 1>/dev/null # displays only error text on the Terminal # And this syntax (2>&1) sends standard output to outerr.log and standard error to the # sameTerminal place as# standardAnd out.this So data fromsyntax (2>&1) sends both standard output and standard error # to willthe same beplace writtenas tostandard outerrout.log ls haiku.txt xxx.txt 1>outerr.log 2>&1 |
...
- Understanding the tree-like structure of directories and files in the file system hierarchy
- Absolute paths start with a slash ( / ), the root of the file system hierarchy
- More at: Intro Unix: Files and File Systems: The file system hierarchy
- Absolute paths start with a slash ( / ), the root of the file system hierarchy
- Knowing how to navigate the file system using the cd (change directory) command, Tab key completion, and relative path syntax:
- use the dot ( . ) metacharacter for the current directory
- use the dot-dot ( .. ) metacharacters for the parent directory
- More at:
- Selecting multiple files using pathname wildcards (a.k.a. "globbing")
- asterisk ( * ) to match any length of characters
- brackets ( [ ] ) match any character between the brackets, including hyphen ( - ) delimited character ranges such as [A-G]
- braces ( { } ) enclose a list of comma-separated strings to match (e.g. {dog,pony})
- More at: Intro Unix: Files and File Systems: Pathname wildcards
- A basic understanding of file attributes such as
- file type (file, directory)
- owner and group
- permissions (read, write, execute) for the owner, group and everyone
- More at: Intro Unix: Files and File Systems: File attributes
- Familiarly with basic file manipulation commands (mkdir, cp, mv, rm)