Diagnosing with Distance: How Scaling Shapes KNN Decisions

This practice demonstrates a K-Nearest Neighbors (KNN) model in Python with the Wisconsin Breast Cancer dataset, using perimeter and concavity to classify diagnoses (malignant or benign) and reveal how scaling matters.

KNN Code

This KNN model was built with two scaled features (perimeter & concavity) to classify breast cancer. It was trained on labeled data, made a prediction for a hypothetical new observation, and identified five nearest neighbors.

Scaled Features

The plot shows the space after standardizing both features, placing the new observation in a fair comparison context. The five identified neighbors (yellow) visually confirm they are the closest points.

Unscaled Features

The plot shows the space without scaling, where perimeter dominates distance calculations. This skews neighbor selection, as small changes in that feature outweigh larger changes in concavity.

«

Inspired by the Linear regression, Classification, and Resampling session for the Machine Learning Software Foundations Certificate at the Data Sciences Institute, University of Toronto.

Chun-Yuan (Eric) Chen
(Pronouns: He/Him/His)

Data Voyager

Reading between rows to decode the depth of patterns
insight(map) ← query(curiosity) + check(quality) + decode(terrain(rows))

ON Canada

Diagnosing with Distance: How Scaling Shapes KNN Decisions

KNN Code

Scaled Features

Unscaled Features

«