Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

So it is important to know which version of BEDTools you are using, and read the documentation carefully to see if changes have been made since your version. When you run the code below, you should see that the bedtools version on in the standard TACC module system is bedtools v2.2526.0. (Login to login5.ls5.tacc.utexas.edu first.)

...

  • Most BEDTools functions now accept either BAM or BED files as input. 
    • BED format files must be BED3+, or BED6+ if strand-specific operations are requested.
  • When comparing against a set of regions, those regions are usually supplied in either BED or GTF/GFF.
  • All text-format input files (BED, GTF/GFF, VCF) must should use Unix line endings (linefeed only).

The most important thing to remember about comparing regions using BEDTools, is that all region files must share the same set contig names and be based on the same reference! For example, if an alignment was performed against a human GRCh38 reference genome from Gencode, use annotations from the corresponding GFF/GTF annotations.

...

By default many bedtools utilities that perform overlapping, consider reads overlapping the feature on either strand, but it can be made strand-specific with the -s or -S option. This strandedness options for bedtools utilities refers the orientation of the R1 read with respect to the feature's (gene's) strand.

  • -s says the R1 read is sense stranded (on the same strand as the gene).
  • -S says the R1 read is antisense stranded (the opposite strand as the gene).

RNA-seq libraries can be constructed with 3 types of strandedness:

  1. sense stranded – the R1 read should be on the same strand as the gene.
  2. antisense stranded – the R1 read should be on the opposite strand as the gene.
  3. unstranded – the R1 could be on either strand

Which type of RNA-seq library you have depends on the library preparation method – so ask your sequencing center! Our yeast RNA-seq library is sense stranded (note that most RNA-seq libraries prepared by GSAF are antisense stranded).

If you have a stranded RNA-seq library, you should use either -s or -S to avoid false counting against a gene on the wrong strand.

...