Getting Started

Overview

About Anna

Anna is a member of the Center for Biomedical Research Support (CBRS), which is home to a number of core facilities: the Bioinformatics Consulting Group (BCG), Genome Sequencing and Analysis Facility (GSAF), Proteomics/Mass Spec, Microscopy, Mouse Genetic Engineering, and others.

  • Anna Battenhouse, Associate Research Scientist, abattenhouse@utexas.edu
    • BA English literature, 1978, Carleton College
    • Commercial software development 1982 – 2007
      • Texas Instruments, Motorola ...
      • lots of software development experience but limited Unix/Linux
    • Joined Vishwanath Iyer Lab 2007 (functional genomics)
      • “retirement career”
      • began to appreciate Linux & bash (slowly)
    • BS Biochemistry, 2013, UT
    • Current affiliations:
      • Manager, Biomedical Research Computing Facility (BRCF)
      • member, Bioinformatics Consulting Group (BCG)
      • member, Marcotte lab (systems biology/proteomics)

The Biomedical Research Computing Facility (BRCF) and the Bioinformatics Consulting Group (BCG) are CBRS core facilities that support local research computing.

Note that Anna is not a Unix guru – there's a world of things she doesn't know! But she knows enough to be considered expert-ish (smile)

About you

Who has had command-line experience before? (E.g. Linux, Unix/Mac Unix, DOS)

What programming language experience do you have? (e.g. Python, R, Java)

About Unix/Linux

Here are two (similar) documents describing the history of Unix/Linux.

Unix has been around a long time (~1969), before computers had screens or hard drives – magnetic tape and paper tape were used instead. It is written in the C programming language, which was considered a high-level language at the time (compared to assembly language); now C is considered a low-level language (smile).

The original, fundamental "Zen" of Unix is that everything is a file – devices, printers, terminals, and of course actual files.

The GNU project ("Gnu's Not Unix") started by Richard Stallman in 1983, provided a lot of open-source tools and utilities to Unix. Popularity of Unix increased dramatically with this development. At this point in time, some Unix flavors were proprietary; others were open or partially open.

Linux, a "Unix-like" system, was developed in 1991 by Linus Torvalds using GNU utilities. Linux is entirely open-source so has become the "Unix of choice" for scientific computing. There are, however, commercial distributions of Linux (e.g. RedHat), but they have restrictions on what open-source tools can be included/used.

There are many flavors/distributions of Unix, and many flavors/distros of Linux. TACC (the Texas Advanced Computing Center) uses CentOS, an open-source version of RedHat. Our servers run Ubuntu, from the Debian family of distributions. Distributions may differ in a number of system management processes, but generally offer a similar set of utilities.

Why use command-line Linux

The other fundamental philosophy of Unix/Linux is to provide many small tools and utilities that can easily interact with each other using their built-in Input and Output streams, which we'll learn more about soon.

  • The upside of this is that it affords tremendous flexibility, letting you perform complex data manipulations on the command line without writing a formal program or script.
  • The downside is that there are a lot of tools to learn, each with many options/switches.

Why not use a program with a GUI (Graphical User Interface) instead?

  • Using the command line lets you be very precise and flexible at the same time
    • After the initial learning curve (which is non-trivial!!) the command line can be easier – and faster!
    • And you can easily test your command line "mini-scripts" as you develop them
  • You can easily work on remote computers from your laptop, as we'll do in this class

Challenges involved:

  • Lots of tools to learn, each with many options
  • Lots of odd syntax to learn – one misplaced character can lead to errors or other problems
  • Although there's a ton of information about Linux on the Internet, it's hard to get started with the basics

This class aims to get you started addressing these challenges.

Setup to follow hands-on

This class is designed to be hands-on, to provide you with the enjoyment ( (smile) ) of working in the Linux command line environment.

However, all steps and scripts are detailed on this Wiki, and you will see me exercising processes on the command line interactively. So you may decide (at any time) to just watch and listen.

Note that you will have access to this Wiki even after the class, and I will email you a link to a video recording of the course.

If you choose to follow hands-on, you'll be using the BRCF "GSAF pod", a set of 3 compute servers attached to a large, shared storage server.

Accounts and servers

We have set up 99 "student" accounts, named student01 , student02 ... student99 . We'll assign one to you. The password for these accounts is given in the chat.

These credentials are active for the next few weeks, but will be de-activated on Sunday May 12, 2024, in the evening.

With your studentNN account you can ssh into one of the following servers:

  • gsafcomp01.ccbb.utexas.edu – odd number studentNN accounts
  • gsafcomp02.ccbb.utexas.edu – even number studentNN accounts


UT VPN

If you are not on the UT campus network, you'll need to have the UT VPN service active in order to connect to these servers. See How to Connect to the UT VPN

Logging in

You can access the servers using ssh in a Terminal program that runs on your computer. On Macs, this program is called Terminal. On Windows (Windows 10 or later) it is called Command Prompt or PowerShell. Find and open this program now on your computer.

 Find ssh in Windows 10+

Windows versions 10 and later have the ssh program built in to their Command Prompt or PowerShell Terminals.

To open it, go to the Windows Start menu → Search for Command Prompt, then press Enter to open it.

ssh is an executable program that runs on your local computer and allows you to connect securely to a remote computer. We're going to use ssh to access one of the above compute servers on the GSAF pod:

# From the UT campus network, or if you have the UT VPN active:
ssh student02@gsafcomp02.ccbb.utexas.edu
  • Answer yes to the SSH security question prompt
    • this will only be asked the 1st time you login
  • Enter the class password at the password prompt, then press Enter.
    • for security reasons, the text that you enter will not be displayed

Once you've successfully logged in, you can logout by just typing exit then Enter. You'll then be back in your local computer's Terminal environment.

If your Windows version does not support ssh, you can download PuTTY from: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

 Logging in with PuTTY

If you're using PuTTY as your Terminal from Windows:

  • Double-click the Putty icon
  • In the PuTTY Configuration window
    • make sure the Connection type is SSH
    • enter gsafcomp01.ccbb.utexas.edu for Host Name (or gsafcomp02.ccbb.utexas.edu)
      • Optional: to save this configuration for further use:
        • Enter gsafcomp into the Saved Sessions text box, then click Save
        • Next time select gsafcomp from the Saved Sessions list and click Load.
    • click Open button
    • answer Yes to the SSH security question
  • In the PuTTY Terminal
    • enter your student account name after the "login as:" prompt, then Enter
    • enter the password associated with our student accounts
      • for security reasons, the text that you enter will not be displayed

If you're attending remotely and do not have access to the UT VPN, you can use the Terminal functionality in the RStudio web application.

 Using the RStudio Terminal

To access the Terminal built into RStudio Server.

You should now see a command line in the RStudio type-in area.


If your Terminal has a dark background, the default shell colors can be hard to read. Execute this line to display directory names in yellow.

export LS_COLORS=$LS_COLORS:'di=1;33:fi=01:ln=01;36'

In the RStudio Terminal, yellow is the default color for directories, which can be difficult to see against its white background. Execute this line to display directory names in blue.

export LS_COLORS=$LS_COLORS:'di=1;34:fi=01:ln=01;36:'

We'll see later how to set this environment variable in your login script (~/.profile) so that it gets executed every time you login to this server.

For now, just copy the appropriate line above, paste it into your Terminal window (after logging on), then press Enter.