Skip to content

Implementing Bag of Visual Words (BoW) classifier for image classification and scene recognition

License

Notifications You must be signed in to change notification settings

marcotallone/bow-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forks Stargazers Issues MIT License LinkedIn Gmail


Logo

Bag of Visual Words Classifier

Computer Vision and Pattern Recognition Course Exam Project

SDIC Master Degree, University of Trieste (UniTS)

2024-2025

Implementing Bag of Visual Words (BoW) classifier
for image classification and scene recognition

Report Python Notebook Rendered Notebook

Table of Contents                                

 1. Author Info
 2. About The Project
         - Quick Overview
         - Built With
         - Project Structure
 3. Getting Started
 4. Usage Examples
 5. Contributing
 6. License
 7. References
 9. Acknowledgments

Author Info

Name Surname Student ID UniTS mail Google mail Master
Marco Tallone SM3600002 marco.tallone@studenti.units.it marcotallone85@gmail.com SDIC

(back to top)

About The Project

Warning

Generative Tools Notice:
Generative AI tools have been used as a support for the development of this project. In particular, the Copilot generative tool based on OpenAI GPT 4o model has been used as assistance medium in performing the following tasks:

Quick Overview

The Bag of Visual Words (BoW) model is a popular computer vision technique used for image classification or retrieval. It is based on the idea of treating images as documents and representing them as histograms of visual words belonging to a visual vocabulary, which is obtained by clustering local features extracted from a set of images.

Image

This project implements a BoW image classifier for scene recognition by first building a visual vocabulary from a set of test images and then performing multi-class classification using K-Nearest Neighbors (KNN) and Support Vector Machines (SVM) classifiers.
In particular, the visual vocabulary is built by clustering SIFT descriptors extracted from the test images and using the K-Means algorithm. Descriptors have been computed both from keypoints detected with the the SIFT algorithm and from dense sampling of the images with a fixed grid to compare the two approaches.
In the classification phase instead, the performance of a simple KNN classifier is compared with that of different SVM classifiers all adopting the ``one-vs-all'' strategy for multi-class classification. The SVM classifiers differ in the kernel used and in the kind of input features they are trained on.
Additionally, different ways to represent images as input feature vectors are tested. These include the classic representation as normalized histograms of visual words, the implementation of the soft assignment techniques proposed by Van Gemert et al. [3] and the use of the spatial pyramid feature representation proposed by Lazebnik et al. [1].
The objectives of this study are to compare the performance of the different classifiers and image representations and to reproduce the results obtained by Van Gemert et al. [3] and Lazebnik et al. [1] on the 15-Scenes dataset. For further details on the specific feature extraction techniques used and the machine learning algorithms implemented as well as the results with them obtained, please refer to the official report. For a description of the implementation of the BoW classifier, read instead the dedicated notebook.

Project Structure

The project is structured as follows:

├── 🐍 cv-conda.yaml  # Conda environment
├── 📁 datasets       # Datasets folder
│   ├── test
│   └── train
├── 📁 doc            # Project assignment
├── ⚜️ LICENSE        # License file
├── 📓 notebooks      # Jupyter Notebooks 
│   ├── bow-classifier.ipynb
│   ├── results-plots.ipynb
│   └── utils.py
├── 📜 README.md      # This README file
└── 📁 report         # Report folder
    ├── images
    ├── main.tex
    └── ...

In particular the notebooks/ folder contains the following notebooks:

  • bow-classifier.ipynb: the main notebook containing a step-by-step description of the implementation of the BoW classifier
  • results-plots.ipynb: a notebook containing the code to generate the plots and the final results presented in the report
  • utils.py: a Python script containing the utility functions implemented for this project

Built With

OpenCV Jupyter Scikit-learn Conda Python

(back to top)

Getting Started

Requirements

The project is developed in Python and mostly requires the following libraries:

  • numpy, version 1.26.4
  • opencv, version 4.10.0
  • scikit-learn, version 1.5.2
  • tqdm, version 4.67.0

All the necessary libraries can be easily installed using the pip package manager.
Additionally a conda environment yaml file containing all the necessary libraries for the project is provided in the root folder. To create the environment you need to have installed a working conda version and then create the environment with the following command:

conda env create -f cv-conda.yaml

After the environment has been created you can activate it with:

conda activate cv

Usage Examples

For a detailed step-by-step description of the main tasks performed for this project and the implementation of the BoW classifier, please refer to the dedicated notebook.

(back to top)

Contributing

The goal of this repository was to implement a classifier based on the Bag of Visual Words approach and reproduce the results presented in the referenced papers in the context of a university exam project. However, if you have a suggestion that would make this better or extend its functionalities and want to share it with me, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement" or "extension".
Suggested contribution procedure:

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

References

[1] S. Lazebnik, C. Schmid, J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories", 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Volume 2, 2006, Pages 2169-2178, https://doi.org/10.1109/CVPR.2006.68

[2] L. Fei-Fei, P. Perona, "A Bayesian hierarchical model for learning natural scene categories", 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Volume 2, 2005, Pages 524-531 vol. 2, https://doi.org/10.1109/CVPR.2005.16

[3] J.C. van Gemert, J.-M. Geusebroek, C.J. Veenman, A.W.M. Smeulders, "Kernel Codebooks for Scene Categorization", Computer Vision -- ECCV 2008, 2008, Springer Berlin Heidelberg, Pages 696-709, https://doi.org/10.1007/978-3-540-88693-8_52

[4] C.-C. Chang, C.-J. Lin, "LIBSVM: A library for support vector machines", ACM Transactions on Intelligent Systems and Technology (TIST), Volume 2, Number 3, 2011, Pages 1-27, https://doi.org/10.1145/1961189.1961199

[5] P. J. Rousseeuw, "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis", Journal of Computational and Applied Mathematics, Volume 20, 1987, Pages 53-65, https://doi.org/10.1016/0377-0427(87)90125-7

[6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, "Scikit-learn: Machine learning in Python", Journal of Machine Learning Research, Volume 12, 2011, Pages 2825-2830, https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf

(back to top)

Acknowledgments

(back to top)

Releases

No releases published

Packages

No packages published