...
Info | |||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||
In the above code box, you see that the names start with a . when a filename starts with a . it conveys a special meaning to the operating system/command line. Specifically, it prevents that file from being displayed when you use the Let's look at a few different ways we will use the
Throughout the course you will notice that many options are supplied to commands via a single dash immediately followed by a single letter. Usually when you have multiple commands supplied in this manner you can combine all the letters after a single dash to make things easier/faster to type. Experiment a little to prove to yourself that the following 2 commands give the same output.
While knowing that you can combine options in this way helps you analyze data faster/better, the real value comes from being able to decipher commands you come across on help forums, or in publications. For ls specifically the following association table is worth making note of, but if you want the 'official' names consider using the
|
...
Warning |
---|
If you see anything besides " |
Setting up other shortcuts:
...
- Linux text editors installed at TACC (nano, vi, emacs). These run in your terminal window. vi and emacs are extremely powerful but also quite complex, so nano is the best choice as a first local text editor. It is also powerful enough that you can still accomplish whatever you are working on, it just might be more difficult if you try to do more complex edits. If you are already familiar with one of the other programs you are welcome to continue using it. If this is something you plan to use long term, it is worth spending the time to learn to rely on something other than nano after this class.
- A former lab member suggested that vs code may be the best current platform to combine much of this, and while I trust his experience and suggestion I don't have personal familiarity with it https://code.visualstudio.com/docs/remote/ssh .
- Text editors or IDEs that run on your local computer but have an SFTP (secure FTP) interface that lets you connect to a remote computer (Notepad++ or Komodo Edit). Once you connect to the remote host, you can navigate its directory structure and edit files. When you open a file, its contents are brought over the network into the text editor's edit window, then saved back when you save the file.
- Software that will allow you to mount your home directory on TACC as if it were a normal disk e.g. MacFuse/MacFusion for Mac, or ExpanDrive for Windows or Mac ($$, but free trial). Then, you can use any text editor to open files and copy them to your computer with the usual drag-drop.
...
- The most important thing to get used to is the convention of using . _ or capitalizing the first letter in each word in names rather than spaces in names, and limiting your use of any other punctuation. Spaces are great for mac and windows folder names when you are using visual interfaces, but on the command line, a space is a signal to start doing something different. Imagine instead of a BioITeam folder you wanted to make it a little easier to read and wanted to call it "Bio I Informatics Team" certainly everyone would agree its easier to read that way, but because of the spaces, bash will think you want to create 3 folders, 1 named Bio another named IInformatics and a third named Team. Now this is certainly behavior you can use when appropriate to your advantage, but generally speaking spaces will not be your friend. Early on in my computational learning I was told "A computer will always do exactly what you told it to do. The trick is correctly telling it to do what you want it to do".
- Name things something that makes it obvious to you what the contents are not just today but next week, next month, and next year even if you don't touch the it for weeks-months-years.
- Prefixing file/folder names with international date format (YYYY-MM-DD) will ensure that listing the contents will print in an order in which they were created. This can be useful when doing the same or similar analysis on new samples as new data is generated.
...
Expand | |||||||
---|---|---|---|---|---|---|---|
| |||||||
To answer the question, Yes, files/folders can have spaces. This is hidden away to keep you from accidentally thinking that this is a good idea. LET ME STRESS AGAIN this is a horrible habit to get into and will lead to unforced errors. Instead let's think about this from the prospective of encountering files or directories that you are working with but didn't create that have spaces in them. Assumably because a colleague who didn't take this course sent you some data, and not because you thought it was a good idea personally. Spaces can be "escaped" like many other special characters. Imagine someone sent you directory name "This is really annoying to use but I don't know it yet" to change into that directory you would have to type:
Notice that the apostrophe also had to be escaped, which should help show you not to use other punctuation.
|
Understanding TACC
Now that we've been using stampede2 for a little bit, and have it behaving in a way that is a little more useful to us, let's get more of a functional understanding of what exactly it is and how it works.
...
When you log into stampede2 using ssh you are connected to what is known as the login node or "the head node". There are several different head nodes, but they are shared by everyone that is logged into stampede2 (not just in this class, or from campus, or even from Texas, but everywhere in the world). Anything you type onto the command line has to be executed by the head node. The longer something takes to complete, or the more commands you send at once the slower the head node will work for you and everybody else. Get enough people running large jobs on the head node all at once (say a zoom class of summer school students) and stampede2 can actually crash leaving nobody able to execute commands or even log in for minutes -> hours -> perhaps even days if something goes really wrong. To try to avoid crashes, TACC tries to monitor things and proactively stop things before they get too out of hand. If you guess wrong on if something is safe to run on the head node, you may eventually see a message like the one pasted below. If you do, it's not the end of the world, but repeated messages will lead to revoked TACC access and emails where you have to explain what you are doing to TACC and your PI and how you are going to fix it and avoid it in the future.
Code Block | ||
---|---|---|
| ||
Message from root@login1.ls4.tacc.utexas.edu on pts/127 at 09:16 ... Please do not run scripts or programs that require more than a few minutes of CPU time on the login nodes. Your current running process below has been killed and must be submitted to the queues, for usage policy see http://www.tacc.utexas.edu/user-services/usage-policies/ If you have any questions regarding this, please submit a consulting ticket. |
...
Note that this may not be an inclusive list as it requires the name of the program, or its description to contain the word "alignment". Looking through the results you may notice some of the programs you already know and use for aligning 2 sequences to each other such as blast. Try broadening your results a little by searching for "align" rather than "alignment" to see how important word choice is. When you compare the two sets of results you will see that one of the new results is:
Code Block |
---|
bowtiebsmap: bowtiebsmap/2.3.2 Memory-efficient short read (NGS) aligner |
...
92
BSMAP for Methylation |
If you are sure you know the name of the program you need this list may be sufficient, but if you don't know exactly what you need the limited information available is probably not enough to make a good decision. To learn more about a particular program, try the following 2 commands:
...
For help with the ssh command please refer back to Windows10 or MacOS tutorials. If you log out and back in 1 more time, what do you notice is different?
...
Info | ||
---|---|---|
| ||
In my own work, I recently remarked to my PI that "I wish I had started using this 5 years ago", and was reminded that "it didn't exist 5 years ago, at least in its current super usable and popular format". It is entirely possible that future classes will be taught with only minimal references to the TACC module system, and this years course will feature far fewer than any previous year. |
...
If you have already activated your GVA-fastqc environment, the first line will not do anything, but if you have not, you will see your promt prompt has changed to now say (GVA-fastqc) on the far left of the line. As to the second command, like we saw with the module system above, things aren't quite this simple. In this particular case, we get a very helpful error message that can guide our next steps:
...
More information about "channels" can be found here. By the end of this course you may find that the 'bioconda' channel is full of lots of programs you want to use, and may choose to permanently add it to your list of channels so the above command conda install fastqc
and others used in this course would work without having to go through the intermediate of searching for the specific installation commands, or finding what channel the program you want is in. Information about how to do this, as well as more detailed information of why it is bad practice to go around adding large numbers of channels can be found here. Similarly, when we get to the read mapping tutorial, we will go over the conda-forge channel which is also very helpful to have.
Tip | ||
---|---|---|
| ||
channels: |
For now, use the error message you saw above to try to install the fastqc program yourself.
Expand | ||
---|---|---|
| ||
https://anaconda.org/bioconda/fastqc If you were unable to find this page, the most likely error you entered fastqc into the search box, and you recognized that 360650,000+ downloads was likely the program you wanted, you clicked the first bit of hyperlink you found which took you to the bioconda page instead of to the fastqc program. Personally, I think the entire box should be clickable to send you the program page, but nobody has asked me. |
...
If all goes well, the installation command should give you output similar to the following output with you answering "y" when prompted if you actually want to install the packages:
...
Genome Variant Analysis Course 2022 2023 home.