Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that the input files here aren't technically XML files themselves, since they don't begin with an XML declaration, but this is a minor technical point.

Graphing a histogram

Graphing a histogram in CGView using each position as a datapoint will probably take lots of computational time, with marginal benefits to resolution. Averaging over some region of positions is probably a more efficient way of getting a histogram. The following script takes a table of position counts (as a text file), and outputs an XML fragment of the counts averaged over regions of length window_size which can be used as input to makeMap.sh above.

The position count table must have the form
<column 1> <column 2, probably position> <int> <int> ... <int>
<column 1> <column 2, probably position> <int> <int> ... <int>
...
where each <int> is a count.

Expand
titleScript to average a count table
Code Block
titleAverage Count Table

#! /usr/bin/env python
import sys

try:
    import argparse
except:
    print 'Try typing "module load python" and then running this again.'
    sys.exit(1)


def main():
    parser = argparse.ArgumentParser(description='Averages read depth along chromosome positions from a number of samples.')
    parser.add_argument('count_table')

    args = parser.parse_args()

    window_table = []

    window_size = 250
    window_sum = 0
    line_count = 0

    with open(args.count_table, 'r') as count_table:
        for line in count_table:
    
            fields = line.split()
            count_fields = [int(x) for x in fields[2:]]
            window_sum += sum(count_fields)

            line_count += 1
            if (line_count % window_size) == 0:
                window_table.append([line_count-window_size+1, line_count, window_sum])
                window_sum = 0

        if (line_count % window_size) != 0:
            window_table.append([line_count - (line_count % window_size) + 1, line_count, window_sum])

    max_count = 0
    for window in window_table:
        if max_count < window[2]:
            max_count = window[2]

    for window in window_table:
        window[2] = float(window[2])/max_count

    print '<featureSlot strand="direct">'
    print '<feature color="purple" decoration="arc">'
    for window in window_table:
        print '<featureRange start="{}" stop="{}" proportionOfThickness="{}" />'.format(window[0], window[1], window[2])
    print '</feature>'
    print '</featureSlot>'

if __name__ == '__main__':
    main()