Find and download RNAseq data from run SRR390925, of experiment SRX112044, publication SRP009873. Copy the file to your home directory on Lonestar at TACC then extract the data in fastq format.
Copy from local machine to TACC
scp SRR390925.sra username@stampede.tacc.utexas.edu:~/
Login to stampede:
ssh username@stampede.tacc.utexas.edu
check that the file is in your home directory
stamp:~ ls SRR390925.sra
Find the SRA toolkit module
stamp:~ module spider sratoolkit ---------------------------------------------------------------------------- sratoolkit: sratoolkit/2.1.9 ---------------------------------------------------------------------------- Description: The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. This module can be loaded directly: module load sratoolkit/2.1.9 Help: The sratoolkit module file defines the following environment variables: TACC_SRATOOLKIT_DIR for the location of the sratoolkit distribution. Version 2.1.9
Load the module
stamp:~ module load sratoolkit
Invoke fastq-dump with no arguments to get basic usage
stamp:~ fastq-dump Usage: /opt/apps/sratoolkit/2.1.9//fastq-dump [options] [ -A ] <accession> /opt/apps/sratoolkit/2.1.9//fastq-dump [options] <path [path...]> Use option --help for more information /opt/apps/sratoolkit/2.1.9//fastq-dump : 2.1.9
Extract to fastq
stamp:~ $TACC_SRATOOLKIT_DIR/fastq-dump SRR390925.sra Written 1981132 spots for SRR390925.sra Written 1981132 spots total
Look at some data
stamp:~ ls SRR390925.fastq SRR390925.sra stamp:~ head SRR390925.fastq @SRR390925.1 ROCKFORD:1:1:0:1260 length=36 NCAACAAGTTTCTTTGGTTATTAACTACGACTTACC \+SRR390925.1 ROCKFORD:1:1:0:1260 length=36 \#CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC @SRR390925.2 ROCKFORD:1:1:0:293 length=36 NAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \+SRR390925.2 ROCKFORD:1:1:0:293 length=36 \#################################### @SRR390925.3 ROCKFORD:1:1:0:330 length=36 NAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAA
Count lines and number of reads (fastq has 4 lines/read)
stamp:~ wc -l SRR390925.fastq 7924528 SRR390925.fastq login2$ echo $((7924528 / 4)) 1981132
Using the UCSC Genome Browser, determine whether Craig Venter or James Watson has a higher risk of Alzheimer's disease.
Craig Venter has at least one SNP associated with Alzheimer's disease.
Using the UCSC Genome Browser, find and download a list of high-sequencing-depth regions in BED format.