Breast-Cancer-Detection

Breast cancer detection using machine learning models.

1. Dataset

We used the UCI Machine Learning Repository.
Link: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29
The dataset was created by Dr. William H. Wolberg, physician at the University Of Wisconsin Hospital at Madison, Wisconsin, USA.

2. Programming Language, Libraries and IDE

Programming Language: Python 3
Libraries: pandas, numpy, seaborn, and sklearn
IDE: Jupyter Notebook

3. Basic Mathematics

3.1 Mean

Mean is the average of the given numbers and is calculated by dividing the sum of the given numbers by the total number of numbers.
Mean of a random varibale X, μ = Σ(X_i)/n

3.2 Standard Deviation

Standard deviation is a measure of how dispersed the data is in relation to the mean.
Standard deviation of a population X, σ = (Σ(X_i - μ)²/n)^1/2

3.3 Correlation

Correlation describes the strength of association between two variables.
Pearson correlation coefficient between two random variables X and Y can be calculated by the formula:

3.4 Standarization

Standardization scales each input variable separately by subtracting the mean and dividing by the standard deviation to shift the distribution to have a mean of zero and a standard deviation of one.
Formula for standarization: x_new = (x_old-μ)/σ

4. Machine Learning Models

Logistic Regression Classifier
Nearest Neighbor Classifier
Support Vector Machines Classifier
Kernel SVM Classifier
Naive Bayes Classifier
Decision Tree Classifier
Random Forest Classifier

5. Metrics

F1 Score
Accuracy Score

Confusion Matrix:

Precision = TP/(TP + FP)
Recall = TP/(TP + FN)
F1 Score = 2*(Precision * Recall)/(Precision + Recall)
Accuracy Score = (TP + TN)/(TP + FP + FN + TN)

6. Result

Accuracy Score:

Logistic Regression — 97.36%
Nearest Neighbor — 94.73%
Support Vector Machines — 95.61%
Kernel SVM — 98.24%
Naive Bayes — 96.49%
Decision Tree Algorithm — 95.61%
Random Forest Classification — 97.36%

F1 Score:

Logistic Regression — 96.47%
Nearest Neighbor — 93.02%
Support Vector Machines — 94.25%
Kernel SVM — 97.61%
Naive Bayes — 95.23%
Decision Tree Algorithm — 93.97%
Random Forest Classification — 96.38%

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Breast Cancer Detection.ipynb		Breast Cancer Detection.ipynb
LICENSE		LICENSE
README.md		README.md
breast_cancer.csv		breast_cancer.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast-Cancer-Detection

1. Dataset

2. Programming Language, Libraries and IDE

3. Basic Mathematics

3.1 Mean

3.2 Standard Deviation

3.3 Correlation

3.4 Standarization

4. Machine Learning Models

5. Metrics

6. Result

About

Releases

Packages

Languages

License

wise-saint/Breast-Cancer-Detection

Folders and files

Latest commit

History

Repository files navigation

Breast-Cancer-Detection

1. Dataset

2. Programming Language, Libraries and IDE

3. Basic Mathematics

3.1 Mean

3.2 Standard Deviation

3.3 Correlation

3.4 Standarization

4. Machine Learning Models

5. Metrics

6. Result

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages