The K-Nearest Neighbour algorithm is similar to the Nearest Neighbour algorithm, except that it looks at the closest *K* instances to the unclassified instance. The class of the new instance is then given by the class with the highest frequency of those *K* instances. This is useful because the influence of anomalous instances is reduced.

Try this out below. If you diagnose 5 No's then the diagnosis will be 'Strepthroat', compared with the diagnosis of 'Allergy' with the standard Nearest Neighbour algorithm.

**Choosing K** K = 1 will be the same as nearest neighbour, as it only looks at the 1st closest. K = N (where N is the number of training instances) would be bad because it would base the classification on the class frequency of all the instances, not just the closest ones. So there must be an optimal value of

Back to Data Mining

It's not what you know, it's whoyouknow.co.uk