top of page

Diagnosing with Distance: How Scaling Shapes KNN Decisions

This practice demonstrates a K-Nearest Neighbors (KNN) model in Python with the Wisconsin Breast Cancer dataset, using perimeter and concavity to classify diagnoses (malignant or benign) and reveal how scaling matters.

P000005_KNNcode.jpg

KNN Code

This KNN model was built with two scaled features (perimeter & concavity) to classify breast cancer. It was trained on labeled data, made a prediction for a hypothetical new observation, and identified five nearest neighbors.

P000005_scaled.jpg

Scaled Features

The plot shows the space after standardizing both features, placing the new observation in a fair comparison context. The five identified neighbors (yellow) visually confirm they are the closest points.

P000005_unscaled.jpg

Unscaled Features

The plot shows the space without scaling, where perimeter dominates distance calculations. This skews neighbor selection, as small changes in that feature outweigh larger changes in concavity.

Inspired by the Linear regression, Classification, and Resampling session for the Machine Learning Software Foundations Certificate at the Data Sciences Institute, University of Toronto.

Welcome to my e-home, and thanks for stopping by!

Here's where I stash my data viz creations, dive into analysis adventures, and share the coolest research sparks. You'll also catch glimpses of my "oceanic expedition"—a wild ride through curiosity and discovery. Feel free to snoop around, explore, and reach out with any ideas. I'm always up for a coffee chat!​

© 2025 by Chun-Yuan Chen. Powered and secured by Wix. Licensed under CC BY-NC-ND 4.0.

bottom of page