Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the course.
<cmd\> to erase all formatting of line.
We will meet at FNT 1.104 from Tuesday May 27 to Friday May 30 at 1:30-4:30pm for our class.
You are strongly encouraged to attend in person, but if you would like to attend virtually, use this zoom link:
https://utexas.zoom.us/j/96012546979?pwd=Xx4Je9KHZVKSFJFhAQwuwWGihL1kyS.1
Zoom Instructions:
Please make sure your zoom version is updated ( and that you have a zoom account )-- this is required to join a UT-sponsored Zoom session. See this link for more details about zoom requirements: https://zoom.its.utexas.edu/home
Other setup:
Please make sure you have an ssh client installed on your computer. All macs come with terminal, so no installation is required. For windows laptops, install putty and winSCP.
Course Overview
This is a course designed to give you an introduction to data manipulation and analysis in a hands-on manner. It will comprise of lectures and guided tutorials.
This course has the following objectives:
To teach you the Unix (Bash) environment and how to work effectively at the command line.
To teach you about various Unix tools that are available for manipulating files at the command line.
To familiarize you with R and RStudio and how to perform basic data analysis tasks..
To use the above skills in an integrated manner to take data from raw files and produce clean visualizations in R.
Your Instructor
Name | Affiliation | Expertise | How to contact? |
|---|---|---|---|
Matt Bramble | CBRS, bioinformatics group | Unix, R, Python, epigenetics. | matthew.bramble@austin.utexas.edu or come to FNT1.207C???????? |
Day 1, 2: Introduction to Unix
Login to the POD
Login to the POD
If you are using a terminal, open the terminal, and type:
ssh username@ls6.tacc.utexas.edu
(replace username with your tacc username)
If you are using putty, open it and enter the following:
hostname: ls6.tacc.utexas.edu
Upon prompting, enter your username and the password
NEXT:
Open the following link in a new window or tab. Be sure to bookmark this introduction page. We will be returning.
If your Terminal has a dark background, the default shell colors can be hard to read. Execute this line to display directory names in yellow.
export LS_COLORS=$LS_COLORS:'di=01;33:' |
We'll see later how to set this environment variable in your login script (~/.profile) so that it gets executed every time you login to this server.
For now, just copy the appropriate line above, paste it into your Terminal window (after logging on), then press Enter.
Logging in via RStudio web
If you're attending remotely or do not have access to the UT VPN, you can use the Terminal functionality in the RStudio web application.
To access the Terminal built into RStudio Server.
Click on this link:
Enter your studentNN account name and our password, then click the Sign In button
In the RStudio web application, select the Terminal tab above the type-in area
You should now see a command line in the RStudio type-in area.
In the RStudio Terminal if the default color for directories is difficult to see against its white background, execute this line to display directory names in blue.
export LS_COLORS=$LS_COLORS:'di=1;34:fi=01:ln=01;36:' |
We'll see later how to set this environment variable in your login script (~/.profile) so that it gets executed every time you login to this server.
For now, just copy the appropriate line above, paste it into your Terminal window (after logging on), then press Enter.
Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the course </layout> </toc>
Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the
<header designatedusing Tt dropdown heading 1
heading 2
this is an info panel
what? </divider>
me
you
him</table>
me | |
|---|---|
you | |
him | |
</code>
list.files() code snippet for R
element 1
element2
s
this is text
this ie more text </link>
Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the </quote>
From this page, you can link to all of the content needed to work through the From this page, you can link to all of the content needed to work through the From this page, you can link to all of the content needed to work through the From this page, you can link to all of the content needed to work through the
# UNIX Workshop
Learn the basics of using UNIX from the command line. Introductory topics include the filesystem, the shell, permissions, and text files. The course will touch on manipulating text files using standard UNIX utilities, how to string utilities together, and how to output the results to files. The goal of the course is to develop some basic comfort at the command line, get a sense of what's possible, and learn how to find help.
insert the following as text, NOT code: <cmmd shift v>
## Experience
### Who's Used A Command Line Before?
DOS?
Linux?
### Who has any programming experience?
## What I Hope This class Accomplishes
## What's The Command Line?
### You Literally Enter Commands Sequentially At A Prompt
### What's Wrong With Using GUIs? Why Don't People Write Their Code So It Uses A Nice Interface?
Programming involves writing text files anyway
Command line lets you be very precise and flexible at the same time
We'll see a little later how the Unix philosophy of lots of small tools gives you flexibility
After the initial learning curve (and it's nontrivial), the command line can actually be easier/faster
You can work on remote machines from your laptop easily. We'll do that in this class.
## Get Example Files
The pieces will be explained later, and this as a reference
command
options
combined options
*Should look like weird incantations at first, will be explained later.*
### Get/Untar Files
`cp ~benni/Classes/IntroUnix/IntroUnixFiles.tar.gz .`
`tar -zxvf IntroUnixFiles.tar.gz &`
`ls`
`rm IntroUnixFiles.tar.gz`
*WTF just happened? Hopefully it will be clear later.*
(Maybe Explain Tar For Grouping Files, gz As Compression?)
## Why Unix? What's Linux?
History of Unix slideshow
## Getting Around
What is the command line?
Using Terminal in Mac (and whatever Windows equivalent is)
Show how to turn on Meta/option key in Preferences -> Settings -> Keyboard
Explain meta key, and Ctrl-x, C-x, ^x
## Basics Of Files And Directories
### Filesystem
#### Files and directories
`ls`
`mkdir short_reads`
`cd`
`pwd` where are we?
Go up a level using `..`. Explain `.`.
`pwd` again
`rm` can't remember how to remove directory, use `-r` (Google "unix remove directory")
Switch into a populated directory (show off **tab completion**)
Compare `ls`, `ls -l`, `ls -lh`
Show that the Mac is a Unix machine, run commands in terminal, show on Desktop
Copy and mov
`cp` file
`mv` Explain how it is both move and rename.
Globbing
+ `*` for arbitrary matches
+ Example of copying a number of files using a glob, or remove a lot of files using glob
+ `mv *.fastq` files to `short_reads` directory
+ `tree`
*Probably don't bother with this* `scp` to copy between servers, stands for "secure `cp`" ("secure copy")
+ Back to `gsafcbig01`
+ `~/Classes/IntroUnix`
+ `N_vespilloides_trans.fa`
+ `scp` to `IntroUnix` directory on Stampede
Useful shortcuts
Tab autocompletion
`history`
Up/down arrows
`cd -`
Basic EMACS commands for getting around lines (don't mention EMACS)
+ `Ctrl-a`, `Ctrl-e`, `Meta-f`, `Meta-b`, `Ctrl-k`
`Ctrl-C`
Mention the shell
What's a shell?
Bash is actually a useful programming language
Bash has a print statement, it's called `echo`. Example: `echo walrus`
### Concept of a server, SSH ("secure shell")
SSH into Stampede
I logged into Stampede without a password because I have "ssh keys" set up.
ssh into, say, `gsafcbig01` from Stampede, where I need a password
`exit`, back to Stampede.
Switch into home directory
Explain `~`
Show `ls -a`, explain dot files. Point out `.profile_user` or `.bashrc`
`pwd`
## How To Get Help
I, personally, don't have commands memorized
`man` pages, `man ls`
`-h`, `--help` (`python --help` as an example)
How to Google for answers, example: remove directory
## Manipulating Text Files
Viewing a text file
`cat` - concatenate files, but it's also good for printing contents of file.
`cat jabberwocky.txt`
`cat N_vespilloides_trans.fa`
`head`, `tail`
`cat` on that FASTA file is too long! And it's only a small part of a transcriptome!
`head N_vespilloides_trans.fa`
Line wrapping. To make what's going on a little clearer: `head jabberwocky.txt` Note some lines are empty.
`less` (show on `mobydick.txt`)
to quit out of less, use `q`
`<space>` to scroll
`b` to scroll backwards
`shift-G` to jump to end
`g` to go to beginning
Google `linux less` for options
Show off `man`?
Google `linux less show line number`
`nano`
Go into Help
explain ^ for control, M for "meta"
mention Emacs and vi
What's a text file?
Unix is based around newline-delimited text files
But how are characters specified in binary?
ASCII--it's a table translating 7 bits (or 8 bits) of binary into letters; [www.asciitable.com](www.asciitable.com)
`head -3 jabberwocky.txt`
`od -t u1 jabberwocky.txt`
What's a "string" vs an "integer" vs a "float"
Tabs, CRs, invisible characters
`\t`, `\n`, `\r`? Unix (computers, actually) treats tabs and newlines as characters, how they're rendered on-screen is special.
`od -c jabberwocky.txt`
## Streams And Coreutils
### Unix Pipes And Coreutils
Coming from other OSes
+ large applications that try give options to do anything you want for your work, inside the application.
Zen of Unix is the reverse
+ Not few large, general programs
+ The Unix way is to make many tiny, specialized programs that you can combine in creative ways.
Coming from a GUI, this way of doing things may seem alien at first. But once it becomes natural, it's a very nice and powerful way of dealing with large data quickly and reproducibly.
### ET Vocalizing?
### Walrus Sounds Example
Find out the distinct walruses
`cut -f 1 walrus_sounds.tsv`
Let's save that long list to a file: `cut -f 1 walrus_sounds.tsv > walrus_sounds_cutf1.tsv`
The same walruses recur, but the names don't occur twice in a row. Explain how `uniq` works, why things need to be sorted.
`sort walrus_sounds_cutf1.tsv > walrus_sounds_cutf1_sort.tsv`
`uniq walrus_sounds_cutf1_sort.tsv`
Woohoo. We got the distinct walruses, but this is a little clumsy/tedious, and programmers are lazy.
### Standard Streams & Pipes
Everything in UNIX is a file
`STDIN`, `STDOUT`, `STDERR`
Redirection: `>`, `>>` and maybe `<`
Stderr `2>`, `2&>1`
Pipes
### Walrus Sounds, Revisited
Find the distinct walruses in a single line:
`cut -f 1 walrus_sounds.tsv | sort | uniq`
Distinct vocalizations
`cut -f 2 walrus_sounds.tsv | sort | uniq`
We can use an option in `uniq` to count how many times each vocalization occurs: `cut -f 2 walrus_sounds.tsv | sort | uniq -c`
### grep
`grep` probably stands for "get regular expression".
`grep not haiku.txt`
`grep 'is' haiku.txt` I prefer to almost always use quotes
`grep -w 'is' haiku.txt` For matching whole words
Where did that option come from? `grep --help` (or googling)
`grep -i 'is not' haiku.txt` For case insensitivity
`grep -c 'is' haiku.txt` to count lines
`grep -v 'is' haiku.txt` outputs all lines that don't match, surprisingly useful
`grep '^The' haiku.txt` outputs lines that start with "The"
### FASTA file (skip in favor of grep, if time is short)
Maybe do this after grep, since I use grep.
What's a FASTA file?
How big is the file? `ls -lh` gives us filesize. `wc` gives us something. `wc -l` gives us number of lines. But that isn't the number of sequences.
`grep -c '^>'` gives us number of headers
brief description of regular expressions (the concept)
Count different HGNC orthologs: `grep '^>' emergency_files/N_vespilloides_trans.fa | cut -f 2 | sort | uniq -c | sort -nr`
Mention tools like `seqtk`?
History
See old commands with `history`
Execute previous command with `!<number>`
`history | tail`
`history | tail -n 50 > recent_commands.txt`
`history | grep something`
Random topics
**Compression** tar, gz, less can read into a gz file
`tar -zxvf`, `gunzip`, `unzip`
PATH, environment variables
`echo $PATH | tr : '\n'`
Package managers (`apt-get` in Ubuntu, etc)
Run command in the background.
`top`, PID, `kill -9`
`nohup`, `screen`