tsrFinder

The tsrFinder tool identifies transcription start regions from PRO-Cap, PRO-Seq, and related sequencing experiments. This tsrFinder is a much quicker version (with few output modifications) of the previous two tsrFinders: one and two.

Usage

Usage:

PolTools tsrFinder [-h] [-t [threads]]
                          seq_file window_size min_seq_depth
                          min_avg_transcript_length max_fragment_size
                          chrom_size_file

Required Arguments

Description

Sequencing File

Bed formatted file from a sequencing experiment.

Window Size

The size of the TSRs to find.

Min Seq Depth

The minimum number of 5’ reads to be considered as a TSR.

Min avg Transcript Length

The minimum average transcript length will eliminate TSRs from sequencing artifacts.

Max fragment size

The maximum transcript length for a read to be included in tsrFinder analysis.

Chrom size file

A file containing the chromosome sizes. This can be optained using fetchChromSizes The hg38 chromsome size file can be found in the PolTools static directory.

Optional Arguments

Description

-t, –threads

Maximum number of threads. Default is the number of threads on the system. This program will not use more threads than twice the number of chromosomes in the chrom size file.

Behavior

tsrFinder generate a file named the sequencing file plus the tsrFinder parameters and -TSR.bed. This file contains the chromosome, left position, right position, the read sum, number of 5’ ends, strand, TSS left and right positions, the TSS strength, and the position of the average TSS (weighted mean location).

For example:

$ head seq_file.bed
chr1    11981   12023   A00876:119:HW5F5DRXX:1:2168:2248:1407   255     -
chr1    13099   13117   A00876:119:HW5F5DRXX:1:2203:31403:26757 255     -
chr1    13356   13423   A00876:119:HW5F5DRXX:1:2151:15808:7827  255     -
chr1    13435   13477   A00876:119:HW5F5DRXX:1:2273:15781:19241 255     -
chr1    13739   13772   A00876:119:HW5F5DRXX:1:2256:29966:10520 255     -
chr1    13741   13773   A00876:119:HW5F5DRXX:1:2235:4101:11882  255     -
chr1    14178   14203   A00876:119:HW5F5DRXX:1:2115:8241:31422  255     -
chr1    14734   14768   A00876:119:HW5F5DRXX:1:2165:23764:2440  255     -
chr1    14988   15012   A00876:119:HW5F5DRXX:1:2219:16134:32784 255     -
chr1    18337   18362   A00876:119:HW5F5DRXX:1:2149:32054:31328 255     -

$ head hg38.chrom.sizes
chr1    248956422
chr2    242193529
chr3    198295559
chr4    190214555
chr5    181538259
chr6    170805979
chr7    159345973
chrX    156040895
chr8    145138636
chr9    138394717

$ PolTools tsrFinder seq_file.bed 20 20 30 600 hg38.chrom.sizes
$ head seq_file_20_20_30_600-TSR.tab
chr1    629421  629441  738     20      +       629431  629432  4       629431
chr1    629490  629510  2263    64      +       629494  629495  14      629500
chr1    629564  629584  13877   273     +       629571  629572  188     629572
chr1    629685  629705  939     21      +       629698  629699  6       629697
chr1    629708  629728  1701    43      +       629723  629724  12      629719
chr1    629740  629760  1911    63      +       629759  629760  21      629753
chr1    629919  629939  1219    32      +       629929  629930  6       629931
chr1    630666  630686  2277    44      +       630681  630682  21      630679
chr1    630824  630844  658     21      +       630828  630829  4       630834
chr1    630879  630899  2237    49      +       630893  630894  16      630890