00

Machine Learning

Airbnb Perfect Rating Prediction

Classification workflow for predicting perfect guest rating outcomes from listing features.

Machine learning classification project predicting perfect guest rating outcomes using structured and text-based listing features.

Completed / Coursework2026-04PythonPandasScikit-learnXGBoostTF-IDF

01

Case File

Overview

Built and compared machine learning models to predict whether an Airbnb listing would receive a perfect guest rating using structured and text-based listing data.

Problem

Marketplace operators and hosts benefit from understanding which listing features correlate with exceptional guest outcomes. The goal was to classify listings into perfect-rating and non-perfect-rating groups and identify which signals mattered most.

Dataset / Inputs

  • Structured Airbnb listing features
  • Text-based listing fields converted into model-ready inputs
  • Training and validation workflow for imbalanced classification

System Architecture

  • Cleaned and prepared Airbnb listing data for modeling.
  • Engineered numerical, categorical, and text-derived features.
  • Built classification models using Logistic Regression, Random Forest, and XGBoost.
  • Evaluated model performance using ROC-AUC and accuracy.
  • Compared model approaches to identify stronger decision-making strategies.

What I Built

  • End-to-end modeling workflow for feature preparation and classifier training
  • TF-IDF-based text feature pipeline
  • Model comparison workflow across logistic regression, random forest, and XGBoost
  • Validation framework for threshold and performance analysis

Tools

  • Python
  • Pandas
  • Scikit-learn
  • XGBoost
  • TF-IDF

Results / Proof Points

  • Reached approximately 0.815 validation ROC-AUC.
  • Identified which listing features and text signals were most associated with perfect guest outcomes.

Business Value

The project demonstrates how predictive analytics can support listing optimization, marketplace decision support, and evidence-based prioritization of listing improvements.

What I Learned

This project strengthened my understanding of feature engineering, text modeling, threshold tuning, and how evaluation choices influence the usefulness of a classification system.

Next Steps

Future versions could incorporate explainability outputs, segment-level comparisons, and additional marketplace outcomes beyond perfect-rating classification.

Limitations

The workflow focused on perfect-rating classification rather than broader marketplace outcomes such as pricing, occupancy, or revenue. Future versions can expand the problem scope once stronger public artifacts are ready.