The Birdsong Classification project aims to accurately classify different bird species based solely on their vocalizations. Specifically, the project focuses on three species:
- Common Chaffinch (Fringilla coelebs)
- House Sparrow (Passer domesticus)
- Eurasian Blue Tit (Cyanistes caeruleus)
The classification process involves two main components:
- Preprocessing (MATLAB): Processes raw audio files, extracts syllables, and generates spectrograms.
- Classification (Python): Utilizes Convolutional Neural Networks (CNN) to classify the generated spectrograms.
This approach leverages signal processing techniques in MATLAB to prepare the data, followed by machine learning methodologies in Python to achieve high classification accuracy.

birdsong-classification/
├── data/
│ ├── processed/
│ │ ├── test/
│ │ │ ├── X.pickle
│ │ │ └── y.pickle
│ │ └── train/
│ │ ├── X.pickle
│ │ └── y.pickle
│ ├── raw/
│ │ ├── common_chaffinch/
│ │ │ ├── 1.mp3
│ │ │ ├── 10.mp3
│ │ │ └── 100.mp3
│ │ ├── eurasian_blue_tit/
│ │ │ ├── 1.mp3
│ │ │ ├── 10.mp3
│ │ │ └── 100.mp3
│ │ └── house_sparrow/
│ │ ├── 1.mp3
│ │ ├── 10.mp3
│ │ └── 100.mp3
│ └── spectrograms/
│ ├── common_chaffinch/
│ │ ├── 1001.jpg
│ │ ├── 1002.jpg
│ │ └── 1003.jpg
│ ├── eurasian_blue_tit/
│ │ ├── 1001.jpg
│ │ ├── 1002.jpg
│ │ └── 1003.jpg
│ └── house_sparrow/
│ ├── 1001.jpg
│ ├── 1002.jpg
│ └── 1003.jpg
├── matlab/
│ ├── config.m
│ ├── main.m
│ ├── MATLAB_Documentation.md
│ ├── src/
│ │ ├── generate_spectrograms.m
│ │ ├── preprocessing.m
│ │ ├── process_single_audio.m
│ │ ├── sample_syllables.m
│ │ ├── syllable_cut.m
│ │ └── utils/
│ │ ├── audio_utils.m
│ │ ├── constants.m
│ │ ├── filter_utils.m
│ │ ├── spectro_utils.m
│ │ └── SyllablePlayer.m
├── models/
│ ├── birdsong_classifier.h5
│ └── train_stats.npz
├── python/
│ ├── setup.py
│ ├── Python_Documentation.md
│ ├── src/
│ │ └── birdsong_classification/
│ │ ├── __init__.py
│ │ ├── data/
│ │ │ ├── dataset.py
│ │ │ ├── download_bird_songs.py
│ │ │ ├── prepare_data.py
│ │ │ └── preprocessing.py
│ │ ├── evaluation/
│ │ │ ├── evaluate.py
│ │ │ └── metrics.py
│ │ ├── models/
│ │ │ ├── model.py
│ │ │ └── train.py
│ │ ├── predict.py
│ │ └── utils/
│ │ ├── path_utils.py
│ │ └── visualization.py
├── results/
│ ├── confusion_matrix.png
│ ├── species_accuracy.png
│ └── training_history.png
├── README.md
└── project_structure.text
Prerequisites:
- MATLAB: Version R2020a or later.
- Toolboxes:
- Signal Processing Toolbox
- Parallel Computing Toolbox (for
parfor
loops)
Setup Instructions:
-
Clone the Repository:
git clone https://github.com/AliBakly/Bird-Song-Classification-Using-Spectral-Analysis-and-CNNs.git
-
Navigate to the MATLAB Directory:
cd birdsong-classification/matlab
-
Configure Constants:
- Open
src/utils/constants.m
in MATLAB. - Adjust parameters as needed (e.g., sampling rate, frequency bands).
- Open
-
Verify MATLAB Toolboxes:
- Ensure that the required toolboxes are installed and accessible.
Prerequisites:
- Python: Version 3.7 or higher.
- MATLAB: Required for prediction functionalities.
- MATLAB Engine API for Python: Must be installed separately.
Setup Instructions:
-
Clone the Repository:
git clone https://github.com/AliBakly/Bird-Song-Classification-Using-Spectral-Analysis-and-CNNs.git
-
Navigate to the Python Directory:
cd birdsong-classification/python
-
Install Dependencies:
pip install -e .
This command installs the package in editable mode along with all required dependencies.
-
Install MATLAB Engine API for Python:
- Locate MATLAB Engine API:
- Navigate to the MATLAB Engine directory, typically found at:
<MATLAB_INSTALL_DIR>/extern/engines/python
- Navigate to the MATLAB Engine directory, typically found at:
- Install the Engine:
cd <MATLAB_INSTALL_DIR>/extern/engines/python python setup.py install
- Verify Installation:
If no errors are thrown, the installation was successful.
import matlab.engine
- Locate MATLAB Engine API:
-
Verify Installation:
python -m birdsong_classification.utils.path_utils
Ensure no errors are thrown, indicating successful installation.
See the MATLAB documentation for more details.
Steps:
-
Organize Raw Data:
- Place your
.mp3
audio files indata/raw/
, categorized by species:data/raw/ ├── common_chaffinch/ │ ├── 1.mp3 │ ├── 10.mp3 │ └── 100.mp3 ├── eurasian_blue_tit/ │ ├── 1.mp3 │ ├── 10.mp3 │ └── 100.mp3 └── house_sparrow/ ├── 1.mp3 ├── 10.mp3 └── 100.mp3
- Or use the
download_bird_songs.py
script.
- Place your
-
Run the Main Script:
- Open MATLAB.
- Navigate to the
matlab
directory. - Execute the
main
function:main
-
Process Overview:
- Configuration:
config.m
sets parameters based on species. - Preprocessing: Extracts syllables from audio files.
- Spectrogram Generation: Converts syllables into spectrogram images.
- Output: Spectrograms are saved in
data/spectrograms/
, organized by species.
- Configuration:
-
Debugging (Optional):
- Enable Syllable Listening:
- In
config.m
, set:cfg.debug.listen_syllables = true;
- This launches the
SyllablePlayer
UI, allowing you to play and inspect extracted syllables.
- In
- Enable Syllable Listening:

See the Python documentation for more details.
Steps:
-
Navigate to the Python Directory:
cd birdsong-classification/python
-
Download Additional Bird Songs (If Needed):
- Use the
download_bird_songs.py
script to fetch more audio files. - Command:
python -m birdsong_classification.data.download_bird_songs --species "Eurasian blue tit,House sparrow,Common chaffinch" --quality "A" --num-files 300
- Parameters:
--species
: Comma-separated list of bird species.--quality
: Recording quality (A-E).--num-files
: Number of files to download per species.
- Use the
-
Prepare and Preprocess Data:
- Preprocess the downloaded data to generate training and testing datasets.
- Command:
python -m birdsong_classification.data.prepare_data
- Process Overview:
- Loading Data: Loads spectrogram images and labels.
- Splitting Data: Divides data into training and testing sets.
- Normalization: Standardizes data based on training statistics.
- Saving Processed Data: Stores processed datasets as
.pickle
files.
Steps:
-
Navigate to the Python Directory:
cd birdsong-classification/python
-
Train the CNN Model:
- Use the
train.py
script to train the model. - Command:
python -m birdsong_classification.models.train
- Process Overview:
- Loading Data: Imports training data from
data/processed/train/
. - Model Initialization: Builds the CNN architecture.
- Training: Fits the model on training data for a specified number of epochs.
- Saving Model: Stores the trained model in the
models/
directory. - Visualization: Generates plots for training history.
- Loading Data: Imports training data from
- Use the
Steps:
-
Navigate to the Python Directory:
cd birdsong-classification/python
-
Evaluate the Trained Model:
- Use the
evaluate.py
script to assess model performance. - Command:
python -m birdsong_classification.evaluation.evaluate
- Process Overview:
- Loading Test Data: Imports testing data from
data/processed/test/
. - Model Loading: Retrieves the trained model from
models/
. - Evaluation: Computes overall accuracy and loss.
- Per-Species Evaluation: Calculates accuracy for each bird species.
- Confusion Matrix: Generates and saves a confusion matrix plot.
- Classification Report: Outputs precision, recall, and F1 scores.
- Loading Test Data: Imports testing data from
- Use the
Steps:
-
Navigate to the Python Directory:
cd birdsong-classification/python
-
Predict Bird Species from New Audio Files:
- Use the
predict.py
script to classify new audio samples. - Command:
python -m birdsong_classification.predict path/to/new_audio.mp3
- Process Overview:
- Audio Processing: Uses MATLAB to extract syllables and generate spectrograms.
- Loading Model: Imports the trained CNN model.
- Prediction: Classifies the bird species and outputs confidence scores.
- Use the
The project achieved a high classification accuracy of 96.31% using the following configuration:
- Model: Model 2 with standardized RGB spectrogram images and syllable lengths set to 100ms.
-
CNN Architecture:
- Convolutional Layers: 64 and 128 filters with ReLU activation.
-
Pooling Layers:
$3 \times 3$ max-pooling to reduce computational complexity. - Fully Connected Layers: Increased from 32 to 128 neurons with dropout regularization.
-
Training Parameters:
- Optimizer: Adam with a learning rate of 0.0001.
- Epochs: 50
- Batch Size: 64
Confusion Matrix:

Interpretation:
- Common Chaffinch: 93% accurately classified.
- House Sparrow: 100% accurately classified.
- Eurasian Blue Tit: 96% accurately classified.
Training History:

The training and validation accuracy steadily increased, while the loss decreased, indicating effective learning without significant overfitting.