entityfactspicturesmetadataharvester is a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets* (as line-delimited JSON records) and retrieves the (Wikimedia Commons file) metadata of these pictures (as line-delimited JSON records).
*) EntityFacts are "fact sheets" on entities of the Integrated Authority File (GND), which is provided by German National Library (DNB)
It eats EntityFacts sheets as line-delimited JSON records from stdin.
It puts the (Wikimedia Commons file) metadata of each picture one by one as line-delimited JSON record to stdout.
entityfactspicturesmetadataharvester
optional arguments:
-h, --help show this help message and exit
- example:
example: entityfactspicturesmetadataharvester < [INPUT LINE-DELIMITED JSON FILE WITH ENTITYFACTS SHEETS] > [OUTPUT PICTURES METADATA LINE-DELIMITED JSON FILE]
The GND identifier from the EntityFacts sheet, where the picture (link) was found, is included into the metadata record (in the result). You can access/find it in the field 'gnd_id'.
- clone this git repo or just download the entityfactspicturesmetadataharvester.py file
- run ./entityfactspicturesmetadataharvester.py
- for a hackish way to use entityfactspicturesmetadataharvester system-wide, copy to /usr/local/bin
sudo -H pip3 install --upgrade [ABSOLUTE PATH TO YOUR LOCAL GIT REPOSITORY OF ENTITYFACTSPICTURESMETADATAHARVESTER]
(which provides you entityfactspicturesmetadataharvester
as a system-wide commandline command)
- entityfactssheetsharvester - a commandline command (Python3 program) that retrieves EntityFacts sheets from a given CSV with GND identifiers and returns them as line-delimited JSON records
- entityfactspicturesharvester - a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information