Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
University Wiki Service

Core NGS Tools
Results will update as you type.
  • Day 1: Intro to NGS, Linux and TACC
  • Day 2: TACC batch system and FASTQ files
  • Day 3: Working with raw sequences
  • Day 4: Alignment and BAM file manipulation
  • Day 5: Post-Alignment Analysis
  • Resources
  • Catch up
  • Archive
    • 2023 Core NGS Tools misc
    • 2022 Core NGS Tools Home
    • 2021 Core NGS Tools Home
    • 2020 Core NGS Tools Home
    • Old stuff
      • Alignment (archive)
      • More Alignment exercises 0
      • Copy of good Alignment workflow
      • Samtools: viewing, counting and sorting your alignment data
      • Bedtools: Analyzing your aligned experiment
      • Piping and redirection
      • SAM format and samtools
      • Scott's Linux one-liners
      • Using TACC's Stampede cluster
      • Analysis with bedtools
      • Obtaining public datasets from NCBI
      • Exercise solutions
      • Shell scripting
      • Visualize mapped data at UCSC genome browser
    • Catch-up
    Calendars

You‘re viewing this with anonymous access, so some content might be blocked.
/
Exercise solutions

    Exercise solutions

    May 25, 2015

    SRA Toolkit Exercises

    SRA Exercise 1

    Find and download RNAseq data from run SRR390925, of experiment SRX112044, publication SRP009873. Copy the file to your home directory on Lonestar at TACC then extract the data in fastq format.

    A solution

    • SRA search page http://www.ncbi.nlm.nih.gov/sra.
    • Type in SRX112044 ? Search
    • On experiment summary page click SRR390925
      • takes you to the Run browser where you can see example reads
    • Under "Download", "Run" click "ftp" under .sra
      • save the file locally
    • Open a Terminal window, change into the directory where the file was stored
    • Copy from local machine to TACC

      scp SRR390925.sra username@stampede.tacc.utexas.edu:~/
      
      • the colon ( : ) after the hostname indicates this is a remote destination
      • the ~/ indicates your home directory
    • Login to stampede:

      ssh username@stampede.tacc.utexas.edu
      
      • check that the file is in your home directory

        stamp:~ ls
        SRR390925.sra
        
    • Find the SRA toolkit module

      stamp:~ module spider sratoolkit
      
        ----------------------------------------------------------------------------
        sratoolkit: sratoolkit/2.1.9
        ----------------------------------------------------------------------------
          Description:
            The SRA Toolkit and SDK from NCBI is a collection of tools and
            libraries for using data in the INSDC Sequence Read Archives.
      
          This module can be loaded directly: module load sratoolkit/2.1.9
      
          Help:
            The sratoolkit module file defines the following environment variables:
            TACC_SRATOOLKIT_DIR for the location of the sratoolkit distribution.
      
            Version 2.1.9
      
    • Load the module

      stamp:~ module load sratoolkit
      
    • Invoke fastq-dump with no arguments to get basic usage

      stamp:~ fastq-dump
      
      Usage:
        /opt/apps/sratoolkit/2.1.9//fastq-dump [options] [ -A ] <accession>
        /opt/apps/sratoolkit/2.1.9//fastq-dump [options] <path [path...]>
      
      Use option --help for more information
      
      /opt/apps/sratoolkit/2.1.9//fastq-dump : 2.1.9
      
    • Extract to fastq

      stamp:~ $TACC_SRATOOLKIT_DIR/fastq-dump SRR390925.sra
      Written 1981132 spots for SRR390925.sra
      Written 1981132 spots total
      
    • Look at some data

      stamp:~ ls
      SRR390925.fastq  SRR390925.sra
      
      stamp:~ head SRR390925.fastq
      @SRR390925.1 ROCKFORD:1:1:0:1260 length=36
      NCAACAAGTTTCTTTGGTTATTAACTACGACTTACC
      \+SRR390925.1 ROCKFORD:1:1:0:1260 length=36
      \#CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
      @SRR390925.2 ROCKFORD:1:1:0:293 length=36
      NAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      \+SRR390925.2 ROCKFORD:1:1:0:293 length=36
      \####################################
      @SRR390925.3 ROCKFORD:1:1:0:330 length=36
      NAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAA
      
    • Count lines and number of reads (fastq has 4 lines/read)

      stamp:~ wc -l SRR390925.fastq
      7924528 SRR390925.fastq
      login2$ echo $((7924528 / 4))
      1981132
      

    UCSC Genome Browser Exercises

    UCSC Exercise 1

    Using the UCSC Genome Browser, determine whether Craig Venter or James Watson has a higher risk of Alzheimer's disease.

    A Solution

    Craig Venter has at least one SNP associated with Alzheimer's disease.

    • http://genome.ucsc.edu/ ? Genome Browser ? submit
    • type APOE in gene box ? jump
    • under "Phenotype and Disease Association" change "GWAS Catalog" from "hide" to "squish" ? refresh
    • under "Variation & Repeats" click on "Genome Variants" to see subtrack information
      • note both Venter and Watson have published their genotypes here
      • deselect "1000 Genomes Pilot" tracks (click '-')
      • change "Maximum display mode"  from "hide" to "pack" ? Submit
    • zoom in on rs429358. click on rs429358 under "NGRI Catalog... tracks".
      • note association w/Alzheimer's disease
    • back in display window, note that Venter has a variant for this SNP while Watson does not

    UCSC Exercise 2

    Using the UCSC Genome Browser, find and download a list of high-sequencing-depth regions in BED format.

    A Solution
    • http://genome.ucsc.edu/cgi-bin/hgTable
    • clade: Mammal, genome: Human, assembly: hg19
    • group: Mapping and Sequencing tracdks, track: Hi Seq Depth
    • output format: BED - browser extensible data
    • filename: hi_seq_depth.bed
    • ? get output
    • ? get BED, save to local directory
    , multiple selections available,

    Confluence Documentation | Web Privacy Policy | Web Accessibility

    {"serverDuration": 12, "requestCorrelationId": "9c35a37952e0449d962161566e87353a"}