Python scripts for bioinformatics data manipulation

mitodownloader.py

Downloads all RefSeq mitogenome records available for a given taxon

usage: mitodownloader.py [-h] [-f] TAXON_NAME

positional arguments:
  TAXON_NAME        Taxon name

optional arguments:
  -h, --help   show this help message and exit
  -f, --fasta  Downloads records in fasta format (default: genbank)

extract_large_contigs.py

Gets contig information from a multifasta file. Has to be used with one of three options (-c, -a, -r):

usage: python3 extract_large_contigs.py [-h] [-c | -a  | -r ] infile

-c, --count    Get a list of all contigs and their size
-a , --acc     Get a single contig by ID (please provide description line without '>')
-r , --range   Get sequence of all contigs inside a min-max length. Please provide the lower and upper limits such as '12000-18000'
-h, --help     show help message and exit

gb_to_fa.py

Converts a single genbank file to fasta, printing its output to the screen.

usage: python3 gb_to_fa.py sequence.gb

generate_phylip_from_multifasta.py

Aligns a multifasta file using clustal omega (at the moment, needs clustalo-1.2.4-Ubuntu-x86_64 on $PATH to work) and converts this alignment into a relaxed (more than 10 characters allowed for sequence identifiers) phylip alignment with no line wrapping.

The phylip alignment output can be used for the generation of phylogenetic/phylogenomic trees using PartitionFinder2.

This scripts only works with sequences that are less than 1 Gbp in size.

usage: python3 generate_phylip_from_mutlifasta.py [-h] [-t] multifasta.fa

optional arguments:
  -h, --help    show this help message and exit
  -t , --type   Type of data: {Protein, RNA or DNA(default)}

mitos_to_artemis.py

Converts .gff files generated by MitosWebServer to a modified .gff that can be exported to the Artemis Annotation tool.

usage: python3 mitos_to_artemis.py filename.gff

remove_score_seqin.py

Removes "score" values present in the annotation of MitosWebServer. The removal of the score values from seqin files is necessary in order to submit mitochondrial sequences to genbank.

usage: python3 remove_score_seqin annotated_sequence.seqin

sam_to_fastq.py

Extracts reads (in fastq format) from a sam file.

usage: python3 sam_to_fastq.py [-h] [-P] file.sam

optional arguments:
  -h, --help    show this help message and exit
  -P, --paired  Generates two paired-end data files (unpaired reads included)

sra_download.py

Downloads a list of datasets in sra file format.

The sra_download.py script works by reading a text file (list of sra datasets) that should contain two collumns using tab as separators: Accession number and species name, as represented below:

ERR1306022      Species1
ERR7295165      Species2
ERR1306034      Species3
SRR4409513      Species4

At the moment, the wget is required. Please install it before running the script:

pip install wget

Script usage:

python3 sra_download.py dataset_list.txt

split_multigenbank.py

Splits a multigenbank in individual records, generating a genbank file (name_of_species.gb) for each.

Script usage:

python3 split_multigenbank.py multirecord_file.gb

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
__pycache__		__pycache__
codon_usage		codon_usage
mitos_wrapper		mitos_wrapper
pairwise_align_CDS		pairwise_align_CDS
sra_download		sra_download
.get_CDS_genbank.py.swp		.get_CDS_genbank.py.swp
.gitignore		.gitignore
README.md		README.md
ace_to_fasta.py		ace_to_fasta.py
blast2fasta.py		blast2fasta.py
blast2fasta_hits.py		blast2fasta_hits.py
blast_to_sqlite.py		blast_to_sqlite.py
change_fastq_header.py		change_fastq_header.py
codon_usage.py		codon_usage.py
combine_data_toxins.html		combine_data_toxins.html
create_gb_header_phylomito.py		create_gb_header_phylomito.py
extract_genes_trinotatedb.py		extract_genes_trinotatedb.py
extract_large_contigs.py		extract_large_contigs.py
fastq_counter.py		fastq_counter.py
fastq_counter_keyarg_raise.py		fastq_counter_keyarg_raise.py
gb_to_fa.py		gb_to_fa.py
gb_to_fa_biopython.py		gb_to_fa_biopython.py
generate_phylip_from_mutlifasta.py		generate_phylip_from_mutlifasta.py
get_CDS_genbank.py		get_CDS_genbank.py
get_blast_local_matches.py		get_blast_local_matches.py
get_gb_features.py		get_gb_features.py
get_seq_by_size_range.py		get_seq_by_size_range.py
hmmer_extract_seqs_domtab.py		hmmer_extract_seqs_domtab.py
mito_features_concatenate.py		mito_features_concatenate.py
mitodownloader.py		mitodownloader.py
mitos_to_artemis.py		mitos_to_artemis.py
orthodb_fa2sqlite.py		orthodb_fa2sqlite.py
pairwise_alignment.py		pairwise_alignment.py
pdb2fa.py		pdb2fa.py
remove_score_seqin.py		remove_score_seqin.py
sam_to_fastq.py		sam_to_fastq.py
split_multigenbank.py		split_multigenbank.py
translate_3frames.py		translate_3frames.py
translation_longest_orf.py		translation_longest_orf.py
translation_six_frames.py		translation_six_frames.py
trinotate_filter_report.py		trinotate_filter_report.py
trinotate_filter_report_dask.py		trinotate_filter_report_dask.py
trinotate_get_sprotmod_blastx.py		trinotate_get_sprotmod_blastx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python scripts for bioinformatics data manipulation

mitodownloader.py

extract_large_contigs.py

gb_to_fa.py

generate_phylip_from_multifasta.py

mitos_to_artemis.py

remove_score_seqin.py

sam_to_fastq.py

sra_download.py

split_multigenbank.py

And many other scripts...

About

Releases 1

Packages

Languages

gavieira/python_bioinfo

Folders and files

Latest commit

History

Repository files navigation

Python scripts for bioinformatics data manipulation

mitodownloader.py

extract_large_contigs.py

gb_to_fa.py

generate_phylip_from_multifasta.py

mitos_to_artemis.py

remove_score_seqin.py

sam_to_fastq.py

sra_download.py

split_multigenbank.py

And many other scripts...

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages