GitHub - marcotallone/bow-classifier: Implementing Bag of Visual Words (BoW) classifier for image classification and scene recognition

Bag of Visual Words Classifier

Computer Vision and Pattern Recognition Course Exam Project

SDIC Master Degree, University of Trieste (UniTS)

2024-2025

Implementing Bag of Visual Words (BoW) classifier
for image classification and scene recognition

Report

Python Notebook

Rendered Notebook

Author Info

Name	Surname	Student ID	UniTS mail	Google mail	Master
Marco	Tallone	SM3600002	marco.tallone@studenti.units.it	marcotallone85@gmail.com	SDIC

(back to top)

About The Project

Warning

Generative Tools Notice:
Generative AI tools have been used as a support for the development of this project. In particular, the Copilot generative tool based on OpenAI GPT 4o model has been used as assistance medium in performing the following tasks:

writing documentation and comments in implemented functions by adehering to the NumPy Style Guide
improving variable naming and code readability
minor bug fixing in implemented functions
grammar and spelling check both in this README and in the report
tweaking aesthetic improvements in report plots
formatting table of results in report

Quick Overview

The Bag of Visual Words (BoW) model is a popular computer vision technique used for image classification or retrieval. It is based on the idea of treating images as documents and representing them as histograms of visual words belonging to a visual vocabulary, which is obtained by clustering local features extracted from a set of images.

This project implements a BoW image classifier for scene recognition by first building a visual vocabulary from a set of test images and then performing multi-class classification using K-Nearest Neighbors (KNN) and Support Vector Machines (SVM) classifiers.
In particular, the visual vocabulary is built by clustering SIFT descriptors extracted from the test images and using the K-Means algorithm. Descriptors have been computed both from keypoints detected with the the SIFT algorithm and from dense sampling of the images with a fixed grid to compare the two approaches.
In the classification phase instead, the performance of a simple KNN classifier is compared with that of different SVM classifiers all adopting the ``one-vs-all'' strategy for multi-class classification. The SVM classifiers differ in the kernel used and in the kind of input features they are trained on.
Additionally, different ways to represent images as input feature vectors are tested. These include the classic representation as normalized histograms of visual words, the implementation of the soft assignment techniques proposed by Van Gemert et al. [3] and the use of the spatial pyramid feature representation proposed by Lazebnik et al. [1].
The objectives of this study are to compare the performance of the different classifiers and image representations and to reproduce the results obtained by Van Gemert et al. [3] and Lazebnik et al. [1] on the 15-Scenes dataset. For further details on the specific feature extraction techniques used and the machine learning algorithms implemented as well as the results with them obtained, please refer to the official report. For a description of the implementation of the BoW classifier, read instead the dedicated notebook.

Project Structure

The project is structured as follows:

├── 🐍 cv-conda.yaml  # Conda environment
├── 📁 datasets       # Datasets folder
│   ├── test
│   └── train
├── 📁 doc            # Project assignment
├── ⚜️ LICENSE        # License file
├── 📓 notebooks      # Jupyter Notebooks 
│   ├── bow-classifier.ipynb
│   ├── results-plots.ipynb
│   └── utils.py
├── 📜 README.md      # This README file
└── 📁 report         # Report folder
    ├── images
    ├── main.tex
    └── ...

In particular the notebooks/ folder contains the following notebooks:

bow-classifier.ipynb: the main notebook containing a step-by-step description of the implementation of the BoW classifier
results-plots.ipynb: a notebook containing the code to generate the plots and the final results presented in the report
utils.py: a Python script containing the utility functions implemented for this project

Built With

(back to top)

Getting Started

Requirements

The project is developed in Python and mostly requires the following libraries:

numpy, version 1.26.4
opencv, version 4.10.0
scikit-learn, version 1.5.2
tqdm, version 4.67.0

All the necessary libraries can be easily installed using the pip package manager.
Additionally a conda environment yaml file containing all the necessary libraries for the project is provided in the root folder. To create the environment you need to have installed a working conda version and then create the environment with the following command:

conda env create -f cv-conda.yaml

After the environment has been created you can activate it with:

conda activate cv

Usage Examples

For a detailed step-by-step description of the main tasks performed for this project and the implementation of the BoW classifier, please refer to the dedicated notebook.

(back to top)

Contributing

The goal of this repository was to implement a classifier based on the Bag of Visual Words approach and reproduce the results presented in the referenced papers in the context of a university exam project. However, if you have a suggestion that would make this better or extend its functionalities and want to share it with me, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement" or "extension".
Suggested contribution procedure:

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

References

[1] S. Lazebnik, C. Schmid, J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories", 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Volume 2, 2006, Pages 2169-2178, https://doi.org/10.1109/CVPR.2006.68

[2] L. Fei-Fei, P. Perona, "A Bayesian hierarchical model for learning natural scene categories", 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Volume 2, 2005, Pages 524-531 vol. 2, https://doi.org/10.1109/CVPR.2005.16

[3] J.C. van Gemert, J.-M. Geusebroek, C.J. Veenman, A.W.M. Smeulders, "Kernel Codebooks for Scene Categorization", Computer Vision -- ECCV 2008, 2008, Springer Berlin Heidelberg, Pages 696-709, https://doi.org/10.1007/978-3-540-88693-8_52

[4] C.-C. Chang, C.-J. Lin, "LIBSVM: A library for support vector machines", ACM Transactions on Intelligent Systems and Technology (TIST), Volume 2, Number 3, 2011, Pages 1-27, https://doi.org/10.1145/1961189.1961199

[5] P. J. Rousseeuw, "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis", Journal of Computational and Applied Mathematics, Volume 20, 1987, Pages 53-65, https://doi.org/10.1016/0377-0427(87)90125-7

[6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, "Scikit-learn: Machine learning in Python", Journal of Machine Learning Research, Volume 12, 2011, Pages 2825-2830, https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf

(back to top)

Acknowledgments

Computer Vision and Pattern Recognition course material (UniTS, Fall 2024) (access restricted to UniTS students and staff)
Best-README-Template: for the README template
Flaticon: for the icons used in the README

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bag of Visual Words Classifier

Computer Vision and Pattern Recognition Course Exam Project

SDIC Master Degree, University of Trieste (UniTS)

2024-2025

Table of Contents

Author Info

About The Project

Quick Overview

Project Structure

Built With

Getting Started

Requirements

Usage Examples

Contributing

License

References

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
datasets		datasets
docs		docs
notebooks		notebooks
report		report
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cv-conda.yaml		cv-conda.yaml

License

marcotallone/bow-classifier

Folders and files

Latest commit

History

Repository files navigation

Bag of Visual Words Classifier

Computer Vision and Pattern Recognition Course Exam Project

SDIC Master Degree, University of Trieste (UniTS)

2024-2025

Table of Contents

Author Info

About The Project

Quick Overview

Project Structure

Built With

Getting Started

Requirements

Usage Examples

Contributing

License

References

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages