Skip to content

Latest commit

 

History

History

Assignment 1

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Assignment 1

This README file provides an overview of the project structure and contents related to the programming part of Assignment 1. The programming part consists of implementing the solutions for two questions using the KNN.py script.

Project Structure

The directory structure for Assignment 1 is as follows:

  • Assignment 1 - Solution - Lior Erenreich.pdf: A PDF file containing my solution for the assignment.
  • Assignment 1.pdf: The assignment document itself, which includes questions and instructions.
  • KNN.py: The Python script that contains the code solution for the programming part of the assignment.

Instructions for Programming Part

The programming part of Assignment 1 consists of two questions. Here's a brief summary of each question:

Question 1: Visualizing the Hoeffding bound

(a) Generate an N × n matrix of samples from the Bernoulli(1/2) distribution using numpy. Calculate the empirical mean ¯Xi for each row i, where N = 200000 and n = 20.

(b) Calculate the empirical probability that |¯Xi − 1/2| > ϵ for 50 values of ϵ ∈ [0, 1]. Plot the empirical probability as a function of ϵ.

(c) Add the Hoeffding bound of that probability as a function of ϵ to the plot.

Question 2: Nearest Neighbor

In this question, you will analyze the performance of the Nearest Neighbor (NN) algorithm on the MNIST dataset.

(a) Implement a function knn that accepts a set of train images, corresponding labels, a query image, and the number of nearest neighbors k. The function should use the k-NN algorithm with the Euclidean L2 metric and return a prediction for the query image.

(b) Run the algorithm on the first n = 1000 training images for each test image, using k = 10. Calculate the accuracy of the prediction and compare it to a completely random predictor.

(c) Plot the prediction accuracy as a function of k for k = 1, ..., 100 and n = 1000. Discuss the results and identify the best k value.

(d) Run the algorithm with k = 1 on an increasing number of training images (n = 100, 200, ..., 5000). Plot the prediction accuracy as a function of n. Discuss the results.

Usage

To execute the code solution for the programming part of Assignment 1, follow these steps:

  1. Ensure you have the necessary dependencies installed, including numpy, matplotlib, and scikit-learn (version 1.0.2 or above).
  2. Open the KNN.py file in a Python environment or editor of your choice.
  3. Run the script to execute the code and generate the required plots and results.

Please note that the code includes comments to explain the different sections and functions.

Additional Notes

  • The MNIST dataset is loaded using the fetch_openml function from scikit-learn. The dataset includes handwritten digit images and their corresponding labels.
  • The code utilizes the knn function to implement the k-NN algorithm for classification based on the nearest neighbors.
  • The plots generated by the code will be displayed to visualize the results.