Charting KNN’s Performance Odyssey: Navigating Setups

This work leverages parallel coordinates using the Wisconsin Diagnostic Breast Cancer dataset to plot KNN Regression configurations as intuitive paths, spotlighting training and test RMSE’s rise and fall at a glance.

Test RMSE Extremes

This figure shows the best- and worst-performing KNN setups on the test set. You can see how adjusting training proportions, neighbor counts, and cross-validation folds influences the model’s generalization across scenarios.

Training RMSE Extremes

Unlike the left fig., which shows extremes in test RMSE, this one shows that the lowest training RMSE doesn’t ensure minimal test error. With the same %training, balanced k and more folds boost training performance.

Neighbor Count Extremes

This figure compares two different k settings with the same training size and cross-validation fold, showing that the setting with the minimal k (=1) ends up with higher RMSE on both training and test data.

«

Inspired by the Linear regression, Classification, and Resampling session for the Machine Learning Software Foundations Certificate at the Data Sciences Institute, University of Toronto.

Chun-Yuan (Eric) Chen
(Pronouns: He/Him/His)

Data Voyager

Reading between rows to decode the depth of patterns
insight(map) ← query(curiosity) + check(quality) + decode(terrain(rows))

ON Canada

Charting KNN’s Performance Odyssey: Navigating Setups

Test RMSE Extremes

Training RMSE Extremes

Neighbor Count Extremes

«