
Diagnosing with Distance: How Scaling Shapes KNN Decisions
This practice demonstrates a K-Nearest Neighbors (KNN) model in Python with the Wisconsin Breast Cancer dataset, using perimeter and concavity to classify diagnoses (malignant or benign) and reveal how scaling matters.

KNN Code
This KNN model was built with two scaled features (perimeter & concavity) to classify breast cancer. It was trained on labeled data, made a prediction for a hypothetical new observation, and identified five nearest neighbors.

Scaled Features
The plot shows the space after standardizing both features, placing the new observation in a fair comparison context. The five identified neighbors (yellow) visually confirm they are the closest points.

Unscaled Features
The plot shows the space without scaling, where perimeter dominates distance calculations. This skews neighbor selection, as small changes in that feature outweigh larger changes in concavity.
Inspired by the Linear regression, Classification, and Resampling session for the Machine Learning Software Foundations Certificate at the Data Sciences Institute, University of Toronto.
