Skip to content

scrapinghub/shub-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1d163f1 · May 9, 2025
Jun 6, 2023
May 9, 2025
Sep 13, 2024
Feb 18, 2022
Apr 30, 2025
Sep 25, 2018
May 7, 2025
May 7, 2025
Sep 4, 2024
Aug 21, 2023
Sep 4, 2024
May 23, 2021
Apr 21, 2023
May 9, 2025

Repository files navigation

A set of tools for controlling processing workflow with spiders and script running in scrapinghub ScrapyCloud.

Installation

pip install shub-workflow

If you want to support s3 tools:

pip install shub-workflow[with-s3-tools]

For google cloud storage tools support:

pip install shub-workflow[with-gcs-tools]

Usage

Check Project Wiki for documentation. You can also see code tests for lots of examples of usage.

Note

The requirements for this library are defined in setup.py as usual. The Pipfile files in the repository don't define dependencies. It is only used for setting up a development environment for shub-workflow library development and testing.

For developers

For installing a development environment for shub-workflow, the package comes with Pipfile and Pipfile.lock files. So, clone or fork the repository and do:

> pipenv install --dev
> cp pre-commit .git/hooks/

for installing the environment, and:

> pipenv shell

for initiating it.

There is a script, lint.sh, that you can run everytime you need from the repo root folder, but it is also executed each time you do git commit (provided you installed the pre-commit hook during the installation step described above). It checks code pep8 and typing integrity, via flake8 and mypy.

> ./lint.sh