Skip to content

Several scripts for basic bioinformatics data manipulation

Notifications You must be signed in to change notification settings

gavieira/python_bioinfo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python scripts for bioinformatics data manipulation


DOI

mitodownloader.py

Downloads all RefSeq mitogenome records available for a given taxon

usage: mitodownloader.py [-h] [-f] TAXON_NAME

positional arguments:
  TAXON_NAME        Taxon name

optional arguments:
  -h, --help   show this help message and exit
  -f, --fasta  Downloads records in fasta format (default: genbank)

extract_large_contigs.py

Gets contig information from a multifasta file. Has to be used with one of three options (-c, -a, -r):

usage: python3 extract_large_contigs.py [-h] [-c | -a  | -r ] infile

-c, --count    Get a list of all contigs and their size
-a , --acc     Get a single contig by ID (please provide description line without '>')
-r , --range   Get sequence of all contigs inside a min-max length. Please provide the lower and upper limits such as '12000-18000'
-h, --help     show help message and exit

gb_to_fa.py

Converts a single genbank file to fasta, printing its output to the screen.

usage: python3 gb_to_fa.py sequence.gb

generate_phylip_from_multifasta.py

Aligns a multifasta file using clustal omega (at the moment, needs clustalo-1.2.4-Ubuntu-x86_64 on $PATH to work) and converts this alignment into a relaxed (more than 10 characters allowed for sequence identifiers) phylip alignment with no line wrapping.

The phylip alignment output can be used for the generation of phylogenetic/phylogenomic trees using PartitionFinder2.

This scripts only works with sequences that are less than 1 Gbp in size.

usage: python3 generate_phylip_from_mutlifasta.py [-h] [-t] multifasta.fa

optional arguments:
  -h, --help    show this help message and exit
  -t , --type   Type of data: {Protein, RNA or DNA(default)}

mitos_to_artemis.py

Converts .gff files generated by MitosWebServer to a modified .gff that can be exported to the Artemis Annotation tool.

usage: python3 mitos_to_artemis.py filename.gff

remove_score_seqin.py

Removes "score" values present in the annotation of MitosWebServer. The removal of the score values from seqin files is necessary in order to submit mitochondrial sequences to genbank.

usage: python3 remove_score_seqin annotated_sequence.seqin

sam_to_fastq.py

Extracts reads (in fastq format) from a sam file.

usage: python3 sam_to_fastq.py [-h] [-P] file.sam

optional arguments:
  -h, --help    show this help message and exit
  -P, --paired  Generates two paired-end data files (unpaired reads included)

sra_download.py

Downloads a list of datasets in sra file format.

The sra_download.py script works by reading a text file (list of sra datasets) that should contain two collumns using tab as separators: Accession number and species name, as represented below:

ERR1306022      Species1
ERR7295165      Species2
ERR1306034      Species3
SRR4409513      Species4

At the moment, the wget is required. Please install it before running the script:

pip install wget

Script usage:

python3 sra_download.py dataset_list.txt

split_multigenbank.py

Splits a multigenbank in individual records, generating a genbank file (name_of_species.gb) for each.

Script usage:

python3 split_multigenbank.py multirecord_file.gb

And many other scripts...

About

Several scripts for basic bioinformatics data manipulation

Resources

Stars

Watchers

Forks

Packages

No packages published