32

Machine Learning - Heart Disease

A Python Jupyter Notebook Logistic Regression project for classifying heart disease using patient health data.

An end-to-end machine learning project using the UCI Heart Disease dataset to predict the presence of heart disease based on clinical and demographic features.

Usage

View the notebook:
View Notebook

View the full report:
View Report

Techniques Used

Preprocessing

  • Categorical values encoded numerically (e.g. chest pain type, sex, thal).
  • Missing data handled with:
    • Mode/Median Imputation
    • K-Nearest Neighbors (KNN) Imputation (tuned for optimal k)
  • StandardScaler used to normalize continuous variables.
  • Dropped low-informative features like fbs and restecg based on Mutual Information.

Models Trained

  • Logistic Regression
    • Binary classification
    • Multiclass classification
    • Tuned using regularization strength C
    • Applied PCA for visualization and insight

Results

ModelAccuracyFalse Negatives
Logistic Regression (Binary, Untuned)80%23
Logistic Regression (Binary, Mean Imputation)82%18
Logistic Regression (Binary, KNN Imputation)84%15
Logistic Regression (Multiclass, Tuned)57%11

Languages

Python (Pandas, scikit-learn, matplotlib, seaborn)