...
| Sub-command | Description | Use case(s) |
|---|---|---|
| bamtobed | Convert BAM files to BED format. | You want to have the contig, start, end, and strand information for each mapped alignment record in separate fields. Recall that the strand is encoded in a BAM flag (0x10) and the exact end coordinate requires parsing the CIGAR string. |
| bamtofastq | Extract FASTQ sequences from BAM alignment records. | You have downloaded a BAM file from a public database, but it was not aligned against the reference version you want to use (e.g. it is hg19 and you want an hg38 alignment). To re-process, you need to start with the original FASTQ sequences. |
| getfasta | Get FASTA entries corresponding to regions. | You want to run motif analysis, which requires the original FASTA sequences, on a set of regions of interest. In addition to the BAM file, you must provide FASTA file(s) for the genome/reference used for alignment (e.g. the FASTA file used to build the aligner index). |
| genomecov |
|
|
| coverage |
|
|
| multicov | Count overlaps between one or more BAM files and a set of regions of interest. |
|
| merge | Combine a set of possibly-overlapping regions into a single set of non-overlapping regions. | Collapse overlapping gene annotations into per-strand non-overlapping regions before counting (e.g with featureCounts or HTSeq). If this is not done, the source regions will potentially be counted multiple times, once for each (overlapping) target region it intersects. |
| subtract | Remove unwanted regions. | Remove rRNA gene regions from a merged gene annotations file before counting. |
| intersect | Determine the overlap between two sets of regions. | Similar to multicov, but can also report (the overlapping regions, not just count ) the overlapping regionsthem. |
| closest | Find the genomic features nearest to a set of regions. | For a set of significant ChIP-seq transcription factor (TF) binding regions ("peaks") that have been identified, determine nearby genes that may be targets of TF regulation. |
...
The most important thing to remember about comparing regions using BEDTools, is that all input files must share the same set of contig names and be based on the same reference! For example, if an alignment was performed against a human GRCh38 reference genome from Gencode, use annotations from the corresponding GFF/GTF annotations.
...
By default many bedtools utilities that perform overlapping, consider reads overlapping the feature on either strand, but can be made strand-specific with the -s or -S option. This strandedness options for bedtools utilities refers the orientation of the R1 read with respect to the feature's (gene's) strand.
- -s says the R1 read is sense stranded (on the same strand as the gene).
- -S says the R1 read is antisense stranded (the opposite strand as the gene).
...
- seqname - The name of the chromosome or scaffoldcontig.
- source - Name of the program that generated this feature, or other data source (e.g. database)
- feature_type - Type of the feature. Examples of common feature types include:Some examples of common feature types are:, for example:
- CDS (coding sequence), exon
- gene, transcript
- start_codon, stop_codon
- start - Start position of the feature, with sequence numbering starting at 1.
- end - End position of the feature, with sequence numbering starting at 1.
- score - A numeric value. Often but not always an integer.
- strand - Defined as + (forward), - (reverse), or . (no relevant strand)
- frame - For a CDS, one of 0, 1 or 2, specifying the reading frame of the first base; otherwise '.'
...
| Code Block | ||||
|---|---|---|---|---|
| ||||
mkdir -p $SCRATCH/core_ngs/bedtools cd $SCRATCH/core_ngs/bedtools cp $CORENGS/yeast_rnaseq/sacCer3.R64-1-1_20110208.gffyeast_mrna.sort.filt.bam* . cp $CORENGS/yeast_rnaseq/yeast_mrna.sort.filt.bam* .catchup/references/gff/sacCer3.R64-1-1_20110208.gff . # Use the less pager to look at multiple lines less sacCer3.R64-1-1_20110208.gff # Look at just the most-important Tab-separated columns cat sacCer3.R64-1-1_20110208.gff | grep -v '#' | cut -f 1,3-5 | head -20 # Include the ugly 9th column where attributes are stored cat sacCer3.R64-1-1_20110208.gff | grep -v '#' | cut -f 1,3,9 | head |
...