Machine Learning

Airbnb Rating Prediction

Built a classification pipeline to predict whether an Airbnb listing would receive a perfect rating using structured and text-based features.

Completed2026-04PythonPandasScikit-learnXGBoostTF-IDF

Overview

Built a machine learning pipeline to predict whether an Airbnb listing would receive a perfect review score.

Problem

The goal was to separate perfect-rating and non-perfect-rating listings using a mix of structured listing data and natural language text.

Data / Inputs

  • Nearly 100,000 training records.
  • Numerical, categorical, and text-based listing fields.
  • Review-oriented and host-oriented listing metadata.

Approach

  • Cleaned and standardized structured features.
  • Engineered derived columns for business-relevant signals.
  • Applied TF-IDF to text fields and reduced dimensionality where helpful.
  • Trained and compared logistic regression, random forest, and XGBoost models.
  • Tuned probability thresholds to improve the business usefulness of predictions.

Results

  • Reached approximately 0.815 validation ROC-AUC.
  • Improved classification performance through feature selection and threshold tuning.

What I Learned

This project sharpened my understanding of how feature engineering, threshold decisions, and text representation choices affect real-world classification outcomes.