Skip to content

Python module to scrape information from a PDF file with different data types (eg. tables, graphs) and extract the largest number it can find.

Notifications You must be signed in to change notification settings

casychow/pdf_scraper_extract_largest_num

Repository files navigation

PDF Scraper Extract Largest Num

Goal: Find the largest number in this large pdf document. The unit is not important (could be dollars, years, pounds, etc), output the greatest numerical value in the document.

How to Run Software

  1. Download and install Python.
  2. Install the PyPDF2 library: pip install PyPDF2
  3. Ensure that the pdf file is in the same directory as the python script.
  4. Run the program using either method:
    • Using a terminal, change the working directory into the same directory as the python script and the pdf file, then use python pdf_extract_largest_num.py, or
    • Open a code editor that can run jypter notebook files and press "Run all" on the pdf_extract_largest_num.ipynb file.

About

Python module to scrape information from a PDF file with different data types (eg. tables, graphs) and extract the largest number it can find.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published