🇬🇧

📚 Book_Scraper

Fully-featured Python script to automatically scrape books from Books to Scrape, export data to CSV by category, and download cover images.

🚀 Features

✅ Scrapes all book categories
✅ Extracts titles, price, availability, rating, description, image
✅ Exports CSV files by category into the output_data/ folder
✅ Downloads cover images into optional subfolders
✅ Automatically navigates through pages (pagination)

🛠️ Installation

git clone https://github.com/dim-gggl/Book_Scraper.git
cd Book_Scraper

Then, create a virtual environment:

python3 -m venv venv
source venv/bin/activate

And install the dependencies:

pip install -r requirements.txt

🧑‍💻 Usage

source venv/bin/activate
python3 main.py

Follow the instructions in the terminal menu.

📝 CSV files and images are automatically generated inside the scripts/output_data folder.

📂 Simplified Structure

Book_Scraper/
├── scripts/
│   ├── __init__.py
│   ├── phase_1.py
│   ├── phase_2.py
│   ├── phase_3.py
│   ├── phase_4.py
│   ├── utils.py
│   └── output_data/
│       ├── book1.csv
│       └── ...
├── __init__.py
├── main.py
├── README.md
└── requirements.txt

📌 Sample Data

universal_product_code	title	price_including_tax	price_excluding_tax	number_available	category	review_rating
90fa61229261140a	Tipping the Velvet	£ 53.74	£ 53.74	In stock (20 available)	Historical Fiction	1/5

🎯 Learning Goals

Learn HTML scraping with BeautifulSoup
Automate data collection/export/processing
Prepare for more advanced projects like APIs or database interactions

📌 To Improve

Consider refactoring into an OOP (Object-Oriented Programming) approach
Improve the architecture to make it more modular
Add a simple web interface using Flask
Implement unit tests
Add logging or a verbose mode

🧠 Author

👤 Dimitri Gaggioli

Python Developer

dim-gggl

🌍 Stack

Python 3.12+
BeautifulSoup
Requests
CSV, OS, re, urllib

🌍 License

MIT — Use it wisely.

🇫🇷

📚 Book_Scraper

Script Python complet pour scraper automatiquement les livres du site Books to Scrape, exporter les données en CSV par catégorie et télécharger les images des couvertures.

🚀 Fonctionnalités

✅ Scraping de toutes les catégories de livres
✅ Récupération des titres, prix, stock, note, description, image
✅ Export CSV par catégorie dans un dossier output_data/
✅ Téléchargement des images de couverture dans des sous-dossiers en option
✅ Navigation automatique entre les pages (pagination)

🛠️ Installation

git clone https://github.com/dim-gggl/Book_Scraper.git
cd Book_Scraper

Puis installation d'un environnement virtuel :

python3 -m venv venv
source venv/bin/activate

Et installation des dépendances:

pip install -r requirements.txt

🧑‍💻 Utilisation

source venv/bin/activate
python3 main.py

Et laisse-toi guider par le menu du terminal.

📝 Les fichiers CSV et les images sont générés automatiquement dans le dossier scripts/output_data.

📂 Arborescence simplifiée

Book_Scraper/
├── scripts/
│   ├── __init__.py
│   ├── phase_1.py
│   ├── phase_2.py
│   ├── phase_3.py
│   ├── phase_4.py
│   ├── utils.py
│   └── output_data/
│       ├── book1.csv
│       └── ...
├── __init__.py
├── main.py
├── README.md
└── requirements.txt

📌 Extrait de données

universal_product_code	title	price_including_tax	price_excluding_tax	number_available	category	review_rating
90fa61229261140a	Tipping the Velvet	£ 53.74	£ 53.74	In stock (20 available)	Historical Fiction	1/5

🎯 Objectifs pédagogiques

Apprentissage du scraping HTML avec BeautifulSoup
Automatisation de collecte/export/traitement de données
Préparation à des projets plus ambitieux de type API ou intéractions avec des databases

📌 À améliorer

Envisager une refactorisation en P.O.O
Améliorer l'architecture, plus modulaire
Interface web simple (Flask)
Ajout de tests unitaires
Ajout de logs ou d’un mode verbose

🧠 Auteur

👤 Dimitri Gaggioli

Développeur Python ·

dim-gggl

🌍 Stack

Python 3.12+
BeautifulSoup
Requests
CSV, OS, re, urllib

🌍 Licence

MIT — Faites-en bon usage.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
SECURITY.md		SECURITY.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🇬🇧

📚 Book_Scraper

🚀 Features

🛠️ Installation

🧑‍💻 Usage

📂 Simplified Structure

📌 Sample Data

🎯 Learning Goals

📌 To Improve

🧠 Author

🌍 Stack

🌍 License

MIT — Use it wisely.

🇫🇷

📚 Book_Scraper

🚀 Fonctionnalités

🛠️ Installation

🧑‍💻 Utilisation

📂 Arborescence simplifiée

📌 Extrait de données

🎯 Objectifs pédagogiques

📌 À améliorer

🧠 Auteur

🌍 Stack

🌍 Licence

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

dim-gggl/Book_Scraper

Folders and files

Latest commit

History

Repository files navigation

🇬🇧

📚 Book_Scraper

🚀 Features

🛠️ Installation

🧑‍💻 Usage

📂 Simplified Structure

📌 Sample Data

🎯 Learning Goals

📌 To Improve

🧠 Author

🌍 Stack

🌍 License

MIT — Use it wisely.

🇫🇷

📚 Book_Scraper

🚀 Fonctionnalités

🛠️ Installation

🧑‍💻 Utilisation

📂 Arborescence simplifiée

📌 Extrait de données

🎯 Objectifs pédagogiques

📌 À améliorer

🧠 Auteur

🌍 Stack

🌍 Licence

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages