Early detection of Breast cancer brings benefit to entire health system as the screening need not be dependent on human component rather Machine learning can improve efficiency of detection and guarantees more time treating the disease than detecting it in first place.
For this project, we have picked up “Wisconsin Breast Cancer Diagnostic” dataset from the “UCI Machine Learning Repository”. The data has 569 records of Cancer biopsies; each has 32 attributes.
Analysis requirement include:
Use k-NN to classify unlabelled samples by assigning them to class of similar labelled samples
For k-NN to work efficiently, we measure distance between unlabelled sample with class of labelled sample. Also, feature with larger range shall dominate distance measurement even in presence of feature with smaller range.