...
Once you have your TACC account/projects/etc. setup, here are a few tips:
1. Use Ranch (tape) to archive your data, or use Corral to store large datasets where they won't get purged.;
2. The BioITeam has established many useful repositories of data (e.g. blast databases, reference genomes, etc) and scripts/libraries at /corral-repl/utexas/BioITeam/.
3. Consider sourcing /corral-repl/utexas/BioITeam/bin/profile_ngs_course.bash
Some tips to using Ranch:
a) always use bbcp (installed on Fourierseq and all TACC machines) to move data - syntax is the same as scp but it's much faster.
b) Ranch is a tape archive - your data will be moved to tape, so it's not immediately available.
c) use "sls -2" to see the status of your files - The third line of the output will list attributes related to archiving:
Status Description
O The file is offline, removed from disk, and is only on tape.
P The file is offline with partial online
E The tape where the file resides has been flagged as "damaged." Contact TACC User Services.
- The file is online (and has not been copied to tape.)
d) use "stage" to bring your files back from tape so you can copy them back off of Ranch. You should think of doing this a day or so before you need them. You won't, but that's OK - you're here to learn.
2. Always start with VERY SMALL projects submitted to the DEVELOPMENT queue and see how they scale. It's not fun to learn you've used your entire 50,000 SU allocation "testing" something, or waited in the queue for hours only to find you had a typo in a script.
...