...
- In several tutorials (MultiQC, fastp, Advanced Breseq) I gave example command line for loops that can be used to generate commands files for large numbers of samples.
- Often I use a python script to generate files like this, as I am more fluent in python coding than I am in bash/command line scripting.
- Sometimes I generate some commands in Excel
- An example might be a formula of
'="breseq -j 6 -p -r Ref.gbk -o Run_output/" & A1 & " " & B1 & " " & C1 & " >& runLogs/" & A1'
with sample name in A1, read1 in B1 and read2 in C1.
- An example might be a formula of
- In a program like BBedit/notepad to copy paste the same command multiple times and change it as appropriate then paste the text directly into a nano editor on TACC.
- In the introduction tutorial, several alternative text editor programs were mentioned with different ssh connectivity
...
The majority of the files we have worked with have been in our $SCRATCH space. Recall that files on $SCRATCH can be deleted after a period of inactivity. Below is a list of things that you SHOULD copy to your $HOME or $WORK space.
Collecting class information via job submission
commands
Navigate to the $SCRATCH directory before doing the following.
...
Code Block | ||||
---|---|---|---|---|
| ||||
sbatch what_i_did_at_GVA2022.slurm |
Evaluating your job submission
Based on our example you may have expected 1 new file to have been created during the job submission (GVA2022.output.txt), but instead you will find 2 extra files as follows: what_i_did_at_GVA2022.e(job-ID), and what_i_did_at_GVA2022.o(job-ID). When things have worked well, these files are typically ignored. When your job fails, these files offer insight into the why so you can fix things and resubmit.
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
# remember that things after the # sign are ignored by bash cat GVA2022.output.txt > end_of_class_job_submission.final.output mkdir $WORK/GVA2022 mkdir $WORK/GVA2021/end_of_course_summary/ # each directory must be made in order to avoid getting a no such file or directory error cp end_of_class_job_submission.final.output $WORK/GVA2022/end_of_course_summary/ cp what_i_did* $WORK/GVA2022/end_of_course_summary/ # note this grabs the 2 output files generated by tacc about your job run as well as the .slurm file you created to tell it how to run your commands file cp commands $WORK/GVA2022/end_of_course_summary/ |
Helpful cheat sheets
...
Copy paste alternative to transfer
As we have seen several times in class, many output files are comma, or space, or tab delimited . While this type of formatting is often required for downstream applications, it can make it very difficult to look at if you are just trying to get a feel for what is going on. One solution to this is often to copy from the terminal screen to Excel. Sometimes Excel will recognize the delimiter character and each chunk of data will go into its own Excell cell. Other times each line goes in column A (or worse the entire copy paste goes into cell A1. For reasons unknown to me, using the text editor BBEdit (formerly known as text wrangler, not to be confused with any association to TACC's naming conventions) as an intermediate can automatically convert tab (and some space) delimited text to the invisible tab characters Excel expects, and if not (or if it uses comma delimiters), BBEdit's find/replace interface is capable of working with the same regular expressions you are increasingly becoming familiar with from the command line programs (sed, grep, awk). While there is a paid version of BBEdit I have never even bothered using their free 30 day trial.
Additional use for BBEdit
BBEdit is also a very lightweight plain text editor. This means that it is capable of opening large files (such as genbank references) without huge amounts of buffering or "destination" formatting issues that something like microsoft word would have. https://www.barebones.com/products/bbedit/. If in your own work you find an alternative that seems to do much of the same functionality as described here, I would love to hear about it.
Helpful cheat sheets
cheat sheets are a common thing produced by people for common commands they want to use but don't always remember the exact formatting for or needed options. Here are a list of cheat sheets that may be helpful and what they are helpful for. They should at least provide you with the scope different cheat sheets are produced, if you find one lacking there are others that may have something you are looking for and can be found pretty quickly using google.
- For linux command line
- For conda:
Return to GVA2021 GVA2022 to work on any additional tutorials you are interested in.