Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the course.

<cmd\> to erase all formatting of line.

...

Info

Logistics:

We will meet  at FNT 1.104 from Tuesday May 27 to Friday May 30 at 1:30-4:30pm for our class.

...

In-person attendance is recommended, because the course is designed to be interactive, and following code remotely is sometimes difficult. However, if you would like to attend virtually, use this zoom link: 

https://utexas.zoom.us/j/96012546979?pwd=Xx4Je9KHZVKSFJFhAQwuwWGihL1kyS.1

Zoom Instructions:

Please make sure your zoom version is updated ( and that you have a zoom account )

-- this is  required to join a UT-sponsored Zoom session. See this link for more details about zoom requirements:  https://zoom.its.utexas.edu/home

Other setup:

Please make sure you have an ssh client installed on your computer. All macs come with terminal, so no installation is required. For windows laptops, install putty and winSCP.

Course Overview

This course is

...

designed to give you an introduction to methods for data manipulation and analysis in a hands-on manner. It will

...

consist of lectures and guided tutorials.

...

The course has the following objectives: 

  1. To teach you about the Unix (Bash) environment and

...

  1. its tools in order to allow you to work effectively at the command line. 

  2. To

...

  1. familiarize you with R and RStudio and how to perform basic data analysis tasks

...

  1. in R.

  2. To use the above skills in an integrated manner to take data from raw files and produce clean visualizations in R.

Your Instructor

Name

Affiliation

Expertise

How to contact?

Matt Bramble

CBRS, bioinformatics group

...

R, Python

...

...

Day 1, 2: Introduction to Unix

Login to the POD

...

206D

Setup

Info

Logging in to the POD

If you are using a terminal (Mac/Linux), open the terminal, and type:

ssh

...

username@gsafcbig01.

...

ccbb.utexas.edu
(replace username with your

...

student username – studentNN)

If you are using windows Powershell, use the same command as above (for Mac/Linux)

To set up Powershell: powershell

If you are using putty (Windows PC), open it and enter the following:

...

host name

...

gsafcbig01.

...

ccbb.utexas.edu

Upon prompting, enter your username and the password

NEXT:

Open the following link in a new window or tab. Be sure to bookmark this introduction page. We will be returning.

Go to Unix/Linux wiki

Putty Setup

Panel
panelIconIdatlassian-info
panelIcon:info:
bgColor#FFEBE6

If your Windows version does not support ssh, you can download PuTTY from:

https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

 Logging in with PuTTY

If you're using PuTTYas your Terminal from Windows:

  • Double-click the Putty icon

  • In the PuTTY Configuration window

    • make sure the Connection type is SSH

    • enter gsafcbig01.ccbb.utexas.edu for Host Name

      • Optional: to save this configuration for further use:

        • Enter gsafcomp into the Saved Sessions text box, then click Save

        • Next time, select gsafcomp from the Saved Sessions list, and click Load.

    • click the Open button

    • answer Yes to the SSH security question

  • In the PuTTYTerminal

    • enter your student account name after the "login as:" prompt, then Enter

    • enter the password associated with our student accounts

      • for security reasons, the text that you enter will not be displayed

If your Terminal has a dark background, the default shell colors can be hard to read. Execute this line to display directory names in yellow.

Code Block
export LS_COLORS=$LS_COLORS:'di=01;33:'

We'll see later how to set this environment variable in your login script (~/.profile) so that it gets executed every time you login to this server.

For now, just copy the appropriate line above, paste it into your Terminal window (after logging on), then press Enter.

...

Info

Terminal via RStudio web

If you're attending remotely or do not have access to the UT VPN, you can use the Terminal functionality in the RStudio web application.

To access the Terminal built into RStudio Server.

  • Click on this link: 

  • Enter your studentNN account name and our password, then click the Sign In button

  • In the RStudio web application, select the Terminal tab above

...

  • the text entry window on the left.

You should now see a command line in the RStudio

...

window.

In the RStudio Terminal, if the default color for directories is difficult to see against its white background, execute this line to display directory names in blue.

Code Block
export LS_COLORS=$LS_COLORS:'di=1;34:fi=01:ln=01;36:'

We'll see later how to set this environment variable in your login script (~/.profile) so that it gets executed every time you login to this server.

For now, just copy the appropriate line above, paste it into your Terminal window (after logging on), then press Enter.

Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the course </layout> </toc>

Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the

Table of Contents
minLevel1
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

<header designatedusing Tt dropdown heading 1

heading 2

Info

this is an info panel

what? </divider>

me

you

him</table>

...

me

...

you

...

him

</code>

Code Block
languager
list.files() code snippet for R
  • element 1

  • element2

  • s

this is text

this ie more text </link>

google link

Welcome to the class! This is the home page for the Introduction to Biocomputing class. From this page, you can link to all of the content needed to work through the </quote>

From this page, you can link to all of the content needed to work through the From this page, you can link to all of the content needed to work through the From this page, you can link to all of the content needed to work through the From this page, you can link to all of the content needed to work through the

# UNIX Workshop

Learn the basics of using UNIX from the command line. Introductory topics include the filesystem, the shell, permissions, and text files. The course will touch on manipulating text files using standard UNIX utilities, how to string utilities together, and how to output the results to files. The goal of the course is to develop some basic comfort at the command line, get a sense of what's possible, and learn how to find help.

insert the following as text, NOT code: <cmmd shift v>

## Experience

### Who's Used A Command Line Before?

  • DOS?

  • Linux?

### Who has any programming experience?

## What I Hope This class Accomplishes

## What's The Command Line?

### You Literally Enter Commands Sequentially At A Prompt

### What's Wrong With Using GUIs? Why Don't People Write Their Code So It Uses A Nice Interface?

  • Programming involves writing text files anyway

  • Command line lets you be very precise and flexible at the same time

  • We'll see a little later how the Unix philosophy of lots of small tools gives you flexibility

  • After the initial learning curve (and it's nontrivial), the command line can actually be easier/faster

  • You can work on remote machines from your laptop easily. We'll do that in this class.

## Get Example Files

The pieces will be explained later, and this as a reference

  • command

  • options

  • combined options

*Should look like weird incantations at first, will be explained later.*

### Get/Untar Files

  • `cp ~benni/Classes/IntroUnix/IntroUnixFiles.tar.gz .`

  • `tar -zxvf IntroUnixFiles.tar.gz &`

  • `ls`

  • `rm IntroUnixFiles.tar.gz`

*WTF just happened? Hopefully it will be clear later.*

(Maybe Explain Tar For Grouping Files, gz As Compression?)

## Why Unix? What's Linux?

  • History of Unix slideshow

## Getting Around

  • What is the command line?

  • Using Terminal in Mac (and whatever Windows equivalent is)

  • Show how to turn on Meta/option key in Preferences -> Settings -> Keyboard

  • Explain meta key, and Ctrl-x, C-x, ^x

## Basics Of Files And Directories

### Filesystem

#### Files and directories

  • `ls`

  • `mkdir short_reads`

  • `cd`

  • `pwd` where are we?

  • Go up a level using `..`. Explain `.`.

  • `pwd` again

  • `rm` can't remember how to remove directory, use `-r` (Google "unix remove directory")

  • Switch into a populated directory (show off **tab completion**)

  • Compare `ls`, `ls -l`, `ls -lh`

  • Show that the Mac is a Unix machine, run commands in terminal, show on Desktop

Copy and mov

  • `cp` file

  • `mv` Explain how it is both move and rename.

  • Globbing

+ `*` for arbitrary matches

+ Example of copying a number of files using a glob, or remove a lot of files using glob

+ `mv *.fastq` files to `short_reads` directory

+ `tree`

  • *Probably don't bother with this* `scp` to copy between servers, stands for "secure `cp`" ("secure copy")

+ Back to `gsafcbig01`

+ `~/Classes/IntroUnix`

+ `N_vespilloides_trans.fa`

+ `scp` to `IntroUnix` directory on Stampede

Useful shortcuts

  • Tab autocompletion

  • `history`

  • Up/down arrows

  • `cd -`

  • Basic EMACS commands for getting around lines (don't mention EMACS)

+ `Ctrl-a`, `Ctrl-e`, `Meta-f`, `Meta-b`, `Ctrl-k`

  • `Ctrl-C`

Mention the shell

  • What's a shell?

  • Bash is actually a useful programming language

  • Bash has a print statement, it's called `echo`. Example: `echo walrus`

### Concept of a server, SSH ("secure shell")

  • SSH into Stampede

  • I logged into Stampede without a password because I have "ssh keys" set up.

  • ssh into, say, `gsafcbig01` from Stampede, where I need a password

  • `exit`, back to Stampede.

  • Switch into home directory

  • Explain `~`

  • Show `ls -a`, explain dot files. Point out `.profile_user` or `.bashrc`

  • `pwd`

## How To Get Help

  • I, personally, don't have commands memorized

  • `man` pages, `man ls`

  • `-h`, `--help` (`python --help` as an example)

  • How to Google for answers, example: remove directory

## Manipulating Text Files

  • Viewing a text file

  • `cat` - concatenate files, but it's also good for printing contents of file.

  • `cat jabberwocky.txt`

  • `cat N_vespilloides_trans.fa`

  • `head`, `tail`

  • `cat` on that FASTA file is too long! And it's only a small part of a transcriptome!

  • `head N_vespilloides_trans.fa`

  • Line wrapping. To make what's going on a little clearer: `head jabberwocky.txt` Note some lines are empty.

  • `less` (show on `mobydick.txt`)

  • to quit out of less, use `q`

  • `<space>` to scroll

  • `b` to scroll backwards

  • `shift-G` to jump to end

  • `g` to go to beginning

  • Google `linux less` for options

  • Show off `man`?

  • Google `linux less show line number`

  • `nano`

  • Go into Help

  • explain ^ for control, M for "meta"

  • mention Emacs and vi

  • What's a text file?

  • Unix is based around newline-delimited text files

  • But how are characters specified in binary?

  • `head -3 jabberwocky.txt`

  • `od -t u1 jabberwocky.txt`

  • What's a "string" vs an "integer" vs a "float"

  • Tabs, CRs, invisible characters

  • `\t`, `\n`, `\r`? Unix (computers, actually) treats tabs and newlines as characters, how they're rendered on-screen is special.

  • `od -c jabberwocky.txt`

## Streams And Coreutils

### Unix Pipes And Coreutils

  • Coming from other OSes

+ large applications that try give options to do anything you want for your work, inside the application.

  • Zen of Unix is the reverse

+ Not few large, general programs

+ The Unix way is to make many tiny, specialized programs that you can combine in creative ways.

Coming from a GUI, this way of doing things may seem alien at first. But once it becomes natural, it's a very nice and powerful way of dealing with large data quickly and reproducibly.

### ET Vocalizing?

### Walrus Sounds Example

Find out the distinct walruses

  • `cut -f 1 walrus_sounds.tsv`

  • Let's save that long list to a file: `cut -f 1 walrus_sounds.tsv > walrus_sounds_cutf1.tsv`

  • The same walruses recur, but the names don't occur twice in a row. Explain how `uniq` works, why things need to be sorted.

  • `sort walrus_sounds_cutf1.tsv > walrus_sounds_cutf1_sort.tsv`

  • `uniq walrus_sounds_cutf1_sort.tsv`

Woohoo. We got the distinct walruses, but this is a little clumsy/tedious, and programmers are lazy.

### Standard Streams & Pipes

  • Everything in UNIX is a file

  • `STDIN`, `STDOUT`, `STDERR`

  • Redirection: `>`, `>>` and maybe `<`

  • Stderr `2>`, `2&>1`

  • Pipes

### Walrus Sounds, Revisited

Find the distinct walruses in a single line:

`cut -f 1 walrus_sounds.tsv | sort | uniq`

Distinct vocalizations

  • `cut -f 2 walrus_sounds.tsv | sort | uniq`

  • We can use an option in `uniq` to count how many times each vocalization occurs: `cut -f 2 walrus_sounds.tsv | sort | uniq -c`

### grep

`grep` probably stands for "get regular expression".

  • `grep not haiku.txt`

  • `grep 'is' haiku.txt` I prefer to almost always use quotes

  • `grep -w 'is' haiku.txt` For matching whole words

  • Where did that option come from? `grep --help` (or googling)

  • `grep -i 'is not' haiku.txt` For case insensitivity

  • `grep -c 'is' haiku.txt` to count lines

  • `grep -v 'is' haiku.txt` outputs all lines that don't match, surprisingly useful

  • `grep '^The' haiku.txt` outputs lines that start with "The"

### FASTA file (skip in favor of grep, if time is short)

Maybe do this after grep, since I use grep.

  • What's a FASTA file?

  • How big is the file? `ls -lh` gives us filesize. `wc` gives us something. `wc -l` gives us number of lines. But that isn't the number of sequences.

  • `grep -c '^>'` gives us number of headers

  • brief description of regular expressions (the concept)

  • Count different HGNC orthologs: `grep '^>' emergency_files/N_vespilloides_trans.fa | cut -f 2 | sort | uniq -c | sort -nr`

  • Mention tools like `seqtk`?

History

  • See old commands with `history`

  • Execute previous command with `!<number>`

  • `history | tail`

  • `history | tail -n 50 > recent_commands.txt`

  • `history | grep something`

Random topics

  • **Compression** tar, gz, less can read into a gz file

  • `tar -zxvf`, `gunzip`, `unzip`

  • PATH, environment variables

  • `echo $PATH | tr : '\n'`

  • Package managers (`apt-get` in Ubuntu, etc)

  • Run command in the background.

  • `top`, PID, `kill -9`

...

Introduction to Unix

Open the following link in a new window or tab. Be sure to bookmark this introduction page. We will be returning here.

Go to Unix/Linux wiki

R and RStudio Part 1

Go to R/Rstudio Part I