Jarbas — a tool for Serenata de Amor
Jarbas is part of Serenata de Amor — we fight corruption with data science.
Jarbas is in charge of making data from CEAP more accessible. In the near future Jarbas will show what Rosie thinks of each reimbursement made for our congresspeople.
Each Reimbursement
object is a reimbursement claimed by a congressperson and identified publicly by its document_id
.
Details from a specific reimbursement. If receipt_url
wasn't fetched yet, the server won't try to fetch it automatically.
URL of the digitalized version of the receipt of this specific reimbursement.
If receipt_url
wasn't fetched yet, the server will try to fetch it automatically.
If you append the parameter force
(i.e. GET /api/chamber_of_deputies/reimbursement/<document_id>/receipt/?force=1
) the server will re-fetch the receipt URL.
Not all receipts are available, so this URL can be null
.
Lists all reimbursements.
All these endpoints accepts any combination of the following parameters:
applicant_id
cnpj_cpf
document_id
issue_date_start
(inclusive)issue_date_end
(exclusive)month
subquota_id
suspicions
(boolean,1
parses toTrue
,0
toFalse
)has_receipt
(boolean,1
parses toTrue
,0
toFalse
)year
order_by
:issue_date
(default) orprobability
(both descending)in_latest_dataset
(boolean,1
parses toTrue
,0
toFalse
)
For example:
GET /api/chamber_of_deputies/reimbursement/?year=2016&cnpj_cpf=11111111111111&subquota_id=42&order_by=probability
This request will list:
- all 2016 reimbursements
- made in the supplier with the CNPJ 11.111.111/1111-11
- made according to the subquota with the ID 42
- sorted by the highest probability
Also you can pass more than one value per field (e.g. document_id=111111,222222
).
Lists all reimbursements of expenses from the same day as document_id
.
Subqoutas are categories of expenses that can be reimbursed by congresspeople.
Lists all subquotas names and IDs.
Accepts a case-insensitve LIKE
filter in as the q
URL parameter (e.g. GET /api/chamber_of_deputies/subquota/?q=meal
list all applicant that have meal
in their names.
An applicant is the person (congressperson or theleadership of aparty or government) who claimed the reimbursemement.
Lists all names of applicants together with their IDs.
Accepts a case-insensitve LIKE
filter in as the q
URL parameter (e.g. GET /api/chamber_of_deputies/applicant/?q=lideranca
list all applicant that have lideranca
in their names.
A company is a Brazilian company in which congressperson have made expenses and claimed for reimbursement.
This endpoit gets the info we have for a specific company. The endpoint expects a cnpj
(i.e. the CNPJ of a Company
object, digits only). It returns 404
if the company is not found.
There is also a tapioca-wrapper for the API. The tapioca-jarbas can be installed with pip install tapioca-jarbas
and can be used to access the API in any Python script.
Copy contrib/.env.sample
as .env
in the project's root folder and adjust your settings. These are the main variables:
DEBUG
(bool) enable or disable Django debug modeGOSS_VERSION
(str) Version for Goss tester in DockerSECRET_KEY
(str) Django's secret keyALLOWED_HOSTS
(str) Django's allowed hostsUSE_X_FORWARDED_HOST
(bool) Whether to use theX-Forwarded-Host
headerCACHE_BACKEND
(str) Cache backend (e.g.django.core.cache.backends.memcached.MemcachedCache
)CACHE_LOCATION
(str) Cache location (e.g.localhost:11211
)SECURE_PROXY_SSL_HEADER
(str) Django secure proxy SSL header (e.g.HTTP_X_FORWARDED_PROTO,https
transforms in tuple('HTTP_X_FORWARDED_PROTO', 'https')
)
DATABASE_URL
(string) Database URL, must be PostgreSQL since Jarbas uses JSONField.
CELERY_BROKER_URL
(string) Celery compatible message broker URL (e.g.amqp://guest:guest@localhost//
)
AMAZON_S3_BUCKET
(str) Name of the Amazon S3 bucket to look for datasets (e.g.serenata-de-amor-data
)AMAZON_S3_REGION
(str) Region of the Amazon S3 (e.g.s3-sa-east-1
)AMAZON_S3_CEAPTRANSLATION_DATE
(str) File name prefix for dataset guide (e.g.2016-08-08
for2016-08-08-ceap-datasets.md
)
GOOGLE_ANALYTICS
(str) Google Analytics tracking code (e.g.UA-123456-7
)GOOGLE_STREET_VIEW_API_KEY
(str) Google Street View Image API key
TWITTER_CONSUMER_KEY
(str) Twitter API keyTWITTER_CONSUMER_SECRET
(str) Twitter API secretTWITTER_ACCESS_TOKEN
(str) Twitter access tokenTWITTER_ACCESS_SECRET
(str) Twitter access token secret
To get this credentials follow python-twitter
instructions.
VIRTUAL_HOST_WEB
(str) host used for the HTTPS certificate (for testing production settings locally you might need to add this host name to your/etc/hosts
)LETSENCRYPT_EMAIL
(str) Email used to create the HTTPS certificate at Let's EncryptHTTPS_METHOD
(str) if set tonoredirect
does not redirect from HTTP to HTTPS (default:redirect
)
There are two combinations in terms of With Docker and Docker Compose environments.
- Develoment: simply running
docker-compose …
will triggerdocker-compose.yml
anddocker-compose.override.yml
with optimun configuration for developing such as:- automatic serving static files through Django
- restarting the Django on Python files changes
- rebuilding JS from Elm files on save
- skipping server cache
- Production: passing a specific configurarion as
docker-compose -f docker-compose.yml -f docker-compose.prod.yml …
will launch a more robust environment with production in mind, among others:nginx
in front of Django- server-side cache with memcached
- manually generate JS after edits on Elm files
- manually run
collectstatic
command is static changes - manually restarting server on change
That said instructions here keep it simple and runs with the development set up. To swicth always add -f docker-compose.yml -f docker-compose.prod.yml
after docker-compose
.
When using tghe production settings remember to double check the appropriate environment varables and to create a .env.prod
(separate from .env
) to hold production only values.
$ docker-compose up -d
Creating the database and applying migrations:
$ docker-compose run --rm django migrate
Seeding it with sample data:
$ docker-compose run --rm django reimbursements /mnt/data/reimbursements_sample.xz
$ docker-compose run --rm django companies /mnt/data/companies_sample.xz
$ docker-compose run --rm django suspicions /mnt/data/suspicions_sample.xz
$ docker-compose run --rm django tweets
If you're interesting in having a database full of data you can get the datasets running Rosie.
To add a fresh new reimbursements.xz
or suspicions.xz
brewed by Rosie, or a companies.xz
you've got from the toolbox, you just need copy these files to contrib/data
and refer to them inside the container from the path /mnt/data/
.
For text search in the dashboard:
$ docker-compose run --rm django searchvector
You can access it at localhost:8000
in development mode or localhost
in production mode.
$ docker-compose run --rm django reimbursements path/to/my/fresh_new_reimbursements.xz
To change any of the default environment variables defined in the docker-compose.yml
just export it in a local environment variable, so when you run Jarbas it will get them.
Not sure? Test it!
$ docker-compose run --rm django check
$ docker-compose run --rm django test
Jarbas requires Python 3.5, Node.js 8, RabbitMQ 3.6, and PostgreSQL 9.6. Once you have pip
and npm
available install the dependencies:
$ npm install
$ ./node_modules/.bin/elm-package install --yes # this might not be necessary https://github.com/npm/npm/issues/17316
$ python -m pip install -r requirements-dev.txt
In some Linux distros lzma
is not installed by default. You can check whether you have it or not with $ python -m lzma
. In Debian based systems you can fix that with $ apt-get install liblzma-dev
or in macOS with $ brew install xz
— but you might have to re-compile your Python.
Basically this means copying contrib/.env.sample
as .env
in the project's root folder — but there is an entire section on that.
Once you're done with requirements, dependencies and settings, create the basic database structure:
$ python manage.py migrate
To load data you need RabbitMQ running and a Celery worker:
$ celery worker --app jarbas
Now you can load the data from our datasets and get some other data as static files:
$ python manage.py reimbursements <path to reimbursements.xz>
$ python manage.py suspicions <path to suspicions.xz file>
$ python manage.py companies <path to companies.xz>
$ python manage.py tweets
$ python manage.py ceapdatasets
There are sample files to seed yout database inside contrib/data/
. You can get full datasets running Rosie or directly with the toolbox.
For text search in the dashboard:
$ python manage.py searchvector
We generate assets through NodeJS, so run it before Django collecting static files:
$ npm run assets
$ python manage.py collectstatic
Not sure? Test it!
$ python manage.py check
$ python manage.py test
Run the server with $ python manage.py runserver
and load localhost:8000 in your favorite browser.
If you would like to access the Django Admin for an alternative view of the reimbursements, you can access it at localhost:8000/admin/
creating an user with:
$ python manage.py createsuperuser