Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Login to ls6, start and idev session, then load the BioContainers bedtools module, and then check its version.

Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Fri      # or -A TRA23004
# or
idev -m 90 -N 1 -A OTH21164 -p development   # or -A TRA23004

module load biocontainers
module load bedtools
bedtools --version   # should be bedtools v2.27.1

Input format considerations

  • Most BEDTools functions now accept either BAM or BED files as input. 
    • BED format files must be BED3+, or BED6+ if strand-specific operations are requested.
  • When comparing against a set of regions, those regions are usually supplied in either BED or GTF/GFF.
  • All text-format input files (BED, GTF/GFF, VCF) should use Unix line endings (linefeed only).

...

Which type of RNA-seq library you have depends on the library preparation method – so ask your sequencing center! Our yeast RNA-seq library is sense stranded (; however note that most RNA-seq libraries these days, including ones prepared by GSAF, are antisense stranded).

If you have a stranded RNA-seq library, you should use either -s or -S to avoid false counting against a gene on the wrong strand.

...

  1. seqname - The name of the chromosome or contig.
  2. source - Name of the program that generated this feature, or other data source (e.g. public database)
  3. feature_type - Type of the feature, for example:
    • chromosome
    • CDS (coding sequence), exon
    • gene, transcript
    • start_codon, stop_codon
  4. start - Start position of the feature, with sequence numbering starting at 1.
  5. end - End position of the feature, with sequence numbering starting at 1.
  6. score - A numeric value. Often but not always an integer. Meaning differs and not usually important.
  7. strand - Defined as + (forward), - (reverse), or . (no relevant strand)
  8. frame - For a CDS, one of 0, 1 or 2, specifying the reading frame of the first base; otherwise '.'

...

Expand
titleMake sure you're in an idev session


Code Block
languagebash
titleStart an idev session
idev -m 120 -N 1 -A OTH21164 -r CoreNGS-Fri     # or -A TRA23004
# or
idev -m 90 -N 1 -A OTH21164 -p development  # or -A TRA23004

module load biocontainers
module load bedtools
bedtools --version   # should be bedtools v2.27.1


...

One of the first things you want to know about your annotation file is what gene features it contains. Here's how to find that: (Read more about what's going on here at piping a histogram)

Expand
titleSetup (if needed)


Code Block
languagebash
mkdir -p $SCRATCH/core_ngs/bedtools
cd $SCRATCH/core_ngs/bedtools
cp $CORENGS/catchup/references/gff/sacCer3.R64-1-1_20110208.gff . 


Read more about what's going on here at piping a histogram.

Code Block
languagebash
titleCreate a histogram of all the feature types in a GFF
cd $SCRATCH/core_ngs/bedtools
cat sacCer3.R64-1-1_20110208.gff | grep -v '^#' | cut -f 3 | \
  sort | uniq -c | sort -k1,1nr | more

...

Code Block
chrI    334     649     YAL069W         315     +       YAL069W    Dubious
chrI    537     792     YAL068W-A    255   255     +       YAL068W-A  Dubious
chrI    1806    2169    YAL068C         363     -       PAU8       Verified
chrI    2479    2707    YAL067W-A       228     +       YAL067W-A  Uncharacterized
chrI    7234    9016    YAL067C         1782    -       SEO1       Verified
chrI    10090   10399   YAL066W         309     +       YAL066W    Dubious
chrI    11564   11951   YAL065C         387     -       YAL065C    Uncharacterized
chrI    12045   12426   YAL064W-B       381     +       YAL064W-B  Uncharacterized
chrI    13362   13743   YAL064C-A       381     -       YAL064C-A  Uncharacterized
chrI    21565   21850   YAL064W         285     +       YAL064W    Verified
chrI    22394   22685   YAL063C-A       291     -       YAL063C-A  Uncharacterized
chrI    23999   27968   YAL063C         3969    -       FLO9       Verified
chrI    31566   32940   YAL062W         1374    +       GDH3       Verified
chrI    33447   34701   YAL061W         1254    +       BDH2       Uncharacterized
chrI    35154   36303   YAL060W         1149    +       BDH1       Verified
chrI    36495   36918   YAL059C-A       423     -       YAL059C-A  Dubious
chrI    36508   37147   YAL059W         639     +       ECM1       Verified
chrI    37463   38972   YAL058W         1509    +       CNE1       Verified
chrI    38695   39046   YAL056C-A       351     -       YAL056C-A  Dubious
chrI    39258   41901   YAL056W         2643    +       GPB2       Verified

Note that value in the 8th column. In the yeast annotations from SGD there are 3 gene classifications: Verified, Uncharacterized and Dubious. The Dubious ones have no experimental evidence so are generally excluded.

...