Skip to content

JiantaoShi/AMD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMD, an Automated Motif Discovery Tool

AMD is a de novo motif discovery tool for identifying over-represented short sequence patterns or motifs in a given population of DNA sequences compared to a user-defined background. These DNA sequences can be promoters/3'-UTRs of co-expressed genes or genomics regions targeted by a specific transcription factor. The general architecture of AMD is a pipeline of two phases for filtering and refinement of the core motifs. In the first phase, the core motifs represented by IUPAC consensuses are identified in a three-step procedure, including filtering, degeneration and extension. These resulting core motifs are then refined in the second phase to obtain the final motifs represented by position weight matrices (PWMs).

Installation

The compiled Windows and Linux releases of AMD can be found in 'binary' folder. The windows binary has been tested on window XP with Service Pack 2 (32-bit) and Windows 7, and the Linux version was tested on CentOS 5. In most cases, the downloaded file is ready to run and thus no installation process is required. To compile from the source, type 'make' in the comamnd line.

Preparing the Input

Two sequence files in FASTA format are required for successful running of AMD, which specify the foreground and background, respectively. AMD recognize only DNA sequences in ACGT alphabet (in uppercase), other characters are ignored.

Running the program

Since the AMD is a command line tool, running of AMD on Windows and Linux are similar. We take the windows platform for example. So if you use Linux, replace "AMD.exe" with "AMD.bin".

AMD.exe
  [-F] [Foreground file], required, the foreground sequence file in FASTA format.
  [-B] [Background file], required, the background sequence file in FASTA format.
	
  [-MI], optional, perform single-strand analysis (default is double stranded).
  [-T],  optional, the number of top k-mer core motifs considered (in the first step of the motif discovery process), and the default is 50. 
  [-CO], optional, the similarity cutoff used to eliminate the motif redundancy (in the final step of motif discovery), and the default value is 0.6. 
  [-FC], optional, the fold change cutoff used to remove highly degenerate motif, and the default value is 1.2.

Here are two simple commands for discovery of motifs of transcription factor or miRNA, respectively:

AMD.exe -F Example_Fg.fasta -B Example_Bg.fasta
AMD.exe -F Example_Fg.fasta -B Example_Bg.fasta -MI

Reading the output

The program output two files with extensions of Details and Matrix, respectively. The Detail file is a five-column table file, indicating the motif ID (column 1), consensus for core motif (column 2), consensus for final motif (column 3), number of sites used to modeling the final motifs (column 4) and MAP score (column 5). The Matrix file gives the position weight matrix file for each motif. The sequence logo of each motif can be generated by packages as enoLOGOS (www.benoslab.pitt.edu/enologos).

Citation

Shi J, Yang W, Chen M, Du Y, Zhang J, Wang K: AMD, an automated motif discovery tool using stepwise refinement of gapped consensuses. PLoS ONE 2011, 6:e24576. Full text

Contacts

If you have any questions or feedback, please contact us at:

Email: jshi@jimmy.harvard.edu

About

An Automated Motif Discovery Tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages