k-Nearest Neighbors (kNN) is a machine-learning algorithm. It works like this: we have an existing set of example data, our training set. We have labels for all of this data—we know what class each piece of the data should fall into. When we’re given a new piece of data without a label, we compare that new piece of data to the existing data, every piece of existing data. We then take the most similar pieces of data (the nearest neighbors) and look at their labels. We look at the top k most similar pieces of data from our known dataset; this is where the k comes from. (k is an integer and it’s usually less than 20.) Lastly, we take a majority vote from the k most similar pieces of data, and the majority is the new class we assign to the data we were asked to classify.

#### k-Nearest Neighbors

• Pros: High accuracy, insensitive to outliers, no assumptions about data
• Cons: Computationally expensive, requires a lot of memory
• Works with: Numeric values, nominal values

#### Pseudocode for kNN

For every point in our dataset:

• calculate the distance between inputX and the current point
• sort the distances in increasing order
• take k items with lowest distances to inputX
• find the majority class among these items
• return the majority class as our prediction for the class of inputX

Suppose we use the Euclidian distance to calculate the distance between two vectors x and y.
$$d= \sqrt[]{(x1-y1)^2+(x2-y2)^2}$$
If there are 4 features in the dataset, the distance between points (1, 2, 3, 4) and (4, 3, 2, 1)
would be calculated by
$$d= \sqrt[]{(1-4)^2+(2-3)^2+(3-2)^2+(4-1)^2}$$

#### Implementation in Python Code

kNN.py

To use it, open your terminal in the directory where the code file is. And type the following command:

And you will see your prediction result.