Skip to content

huseyincavusbi/breast_cancer_supervised

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supervised Learning on Breast Cancer

Supervised Learning Experiments on Wisconsin Breast Cancer Dataset

This repository contains various experiments with supervised machine learning models on the Wisconsin Breast Cancer dataset.

Table of Contents

  1. Introduction
  2. Regression Study
  3. Classification Study
  4. Results
  5. Contributing
  6. License
  7. Acknowledgements
  8. Contact

Introduction

This project aims to explore and analyze the Wisconsin Breast Cancer dataset using different supervised learning techniques. The experiments are divided into two main studies: Regression and Classification.

Regression Study

  • Objective: Investigate how much the mean radius feature is affected by other tumor features.
  • Methods: Two linear regression models were utilized.
  • Approach:
    • Determine the feature that most affects the mean radius
    • Analyze how the change in training set size impacts model performance.

Classification Study

  • Objective: Predict the diagnosis (benign or malignant) based on tumor characteristics.
  • Methods: Five supervised learning algorithms were evaluated.
  • Approach:
    • Define the features most associated with the diagnosis of a malignant mass.
    • Determine the outstanding algorithm for diagnosis prediction.

Results

Regression Study

Objective: Investigate the influence of other tumor features on the mean radius.
Methods: Two Linear Regression models with different training set sizes were employed.
Key Findings:
    Model performance varies with training set size.
    The features that most/least or positively/negatively affect the mean radius feature were identified.
    Detailed results and performance metrics (e.g., R², MAE) can be found in the wisconsin_lnr_reg.ipynb file.

Classification Study

Objective: Predict the diagnosis (benign or malignant) based on tumor features.
Methods: Evaluation of five Supervised Learning algorithms 
Key Findings:
    The best performing algorithm was identified based on classification reports and graphs.
    Important features associated with malignant diagnoses highlighted.
    Detailed results and performance metrics can be found in the wisconsin_models.ipynb file.

Contributing

I am open to any criticism and contribution to this project! To contribute:

Fork the repository. Create a new branch for your feature or bugfix:

git checkout -b feature-name

Commit your changes:

git commit -m 'Add new feature'

Push to the branch:

git push origin feature-name

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

I would like to thank everyone who contributed to the language/packages and dataset I used in this project.

Wolberg,William, Mangasarian,Olvi, Street,Nick, and Street,W.. (1995). Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B.

Contact

If you have any questions, suggestions or topics you would like to discuss, feel free to contact me: