Skip to content
Lindsay Clark edited this page Jun 3, 2017 · 4 revisions

The program "exp_frag_size.py" can be used to predict the restriction digest fragment size associated with each tag or marker, which may be useful for understanding how read depth and fragment size are related. In addition to a FASTA file of the reference genome, in order to run this program requires a SAM file indicating alignment of the tags to the reference genome. This SAM file can be generated by running Bowtie2 or BWA on the FASTA file of tags that is optionally output by tag_manager.py, or by running Bowtie2 or BWA on the FASTA file output by the UNEAK pipeline.

At this time only a non-interactive version of the program is available. Here is an example of how to use it from the command line. First, use cd to navigate into a directory containing the SAM file and reference genome file. Copy tagdigger_fun.py and exp_frag_size.py into that directory. Then use the command:

python exp_frag_size.py -s alignment.sam -g reference.fasta -o output.csv -w C:\Users\lvclark\Documents\bestgenome -c CTGCAG,CCGG

where alignment.sam is replaced with the name of your SAM file, reference.fasta is replaced with the name of your reference genome file, output.csv is replaced with the desired name of a CSV file to output, and C:\Users\lvclark\Documents\bestgenome is the working directory where files should be read from and written to. In this example, "CTGCAG,CCGG" represents the two restriction cutsites for the PstI-MspI system. You can replace these with any number of cutsites for any other enzyme system. As an alternative to using the -c option, you can use -e and either PstI-MspI or NsiI-MspI:

python exp_frag_size.py -s alignment.sam -g reference.fasta -o output.csv -w C:\Users\lvclark\Documents\bestgenome -e NsiI-MspI

Additionally, if your reference genome is split among multiple FASTA files, you can use the -d argument to point to a directory that contains all of the reference genome FASTA files (and no other files). In this case the -g argument can be omitted. The path supplied to this argument should either be a full path, or should be with respect to the path given with the -w argument.

python exp_frag_size.py -s alignment.sam -d ..\genomesequence -o output.csv -w C:\Users\lvclark\Documents\alignmentfolder -c CTGCAG,CCGG
Clone this wiki locally