
Charting KNN’s Performance Odyssey: Navigating Setups
This work leverages parallel coordinates using the Wisconsin Diagnostic Breast Cancer dataset to plot KNN Regression configurations as intuitive paths, spotlighting training and test RMSE’s rise and fall at a glance.

Test RMSE Extremes
This figure shows the best- and worst-performing KNN setups on the test set. You can see how adjusting training proportions, neighbor counts, and cross-validation folds influences the model’s generalization across scenarios.

Training RMSE Extremes
Unlike the left fig., which shows extremes in test RMSE, this one shows that the lowest training RMSE doesn’t ensure minimal test error. With the same %training, balanced k and more folds boost training performance.

Neighbor Count Extremes
This figure compares two different k settings with the same training size and cross-validation fold, showing that the setting with the minimal k (=1) ends up with higher RMSE on both training and test data.
Inspired by the Linear regression, Classification, and Resampling session for the Machine Learning Software Foundations Certificate at the Data Sciences Institute, University of Toronto.
