An end-to-end machine learning project using the UCI Heart Disease dataset to predict the presence of heart disease based on clinical and demographic features.
Usage
View the notebook:
View Notebook
View the full report:
View Report
Techniques Used
Preprocessing
- Categorical values encoded numerically (e.g. chest pain type, sex, thal).
- Missing data handled with:
- Mode/Median Imputation
- K-Nearest Neighbors (KNN) Imputation (tuned for optimal
k)
- StandardScaler used to normalize continuous variables.
- Dropped low-informative features like
fbsandrestecgbased on Mutual Information.
Models Trained
- Logistic Regression
- Binary classification
- Multiclass classification
- Tuned using regularization strength
C - Applied PCA for visualization and insight
Results
| Model | Accuracy | False Negatives |
|---|---|---|
| Logistic Regression (Binary, Untuned) | 80% | 23 |
| Logistic Regression (Binary, Mean Imputation) | 82% | 18 |
| Logistic Regression (Binary, KNN Imputation) | 84% | 15 |
| Logistic Regression (Multiclass, Tuned) | 57% | 11 |
Languages
Python (Pandas, scikit-learn, matplotlib, seaborn)
