This README file provides an overview of the project structure and contents related to the programming part of Assignment 1. The programming part consists of implementing the solutions for two questions using the KNN.py
script.
The directory structure for Assignment 1 is as follows:
Assignment 1 - Solution - Lior Erenreich.pdf
: A PDF file containing my solution for the assignment.Assignment 1.pdf
: The assignment document itself, which includes questions and instructions.KNN.py
: The Python script that contains the code solution for the programming part of the assignment.
The programming part of Assignment 1 consists of two questions. Here's a brief summary of each question:
(a) Generate an N × n matrix of samples from the Bernoulli(1/2) distribution using numpy. Calculate the empirical mean ¯Xi
for each row i
, where N = 200000
and n = 20
.
(b) Calculate the empirical probability that |¯Xi − 1/2| > ϵ
for 50 values of ϵ ∈ [0, 1]
. Plot the empirical probability as a function of ϵ
.
(c) Add the Hoeffding bound of that probability as a function of ϵ
to the plot.
In this question, you will analyze the performance of the Nearest Neighbor (NN) algorithm on the MNIST dataset.
(a) Implement a function knn
that accepts a set of train images, corresponding labels, a query image, and the number of nearest neighbors k
. The function should use the k-NN algorithm with the Euclidean L2 metric and return a prediction for the query image.
(b) Run the algorithm on the first n = 1000
training images for each test image, using k = 10
. Calculate the accuracy of the prediction and compare it to a completely random predictor.
(c) Plot the prediction accuracy as a function of k
for k = 1, ..., 100
and n = 1000
. Discuss the results and identify the best k
value.
(d) Run the algorithm with k = 1
on an increasing number of training images (n = 100, 200, ..., 5000
). Plot the prediction accuracy as a function of n
. Discuss the results.
To execute the code solution for the programming part of Assignment 1, follow these steps:
- Ensure you have the necessary dependencies installed, including
numpy
,matplotlib
, andscikit-learn
(version 1.0.2 or above). - Open the
KNN.py
file in a Python environment or editor of your choice. - Run the script to execute the code and generate the required plots and results.
Please note that the code includes comments to explain the different sections and functions.
- The MNIST dataset is loaded using the
fetch_openml
function fromscikit-learn
. The dataset includes handwritten digit images and their corresponding labels. - The code utilizes the
knn
function to implement the k-NN algorithm for classification based on the nearest neighbors. - The plots generated by the code will be displayed to visualize the results.