Ticket Price Regressor

Project Motivation and Overview
As a concert lover (See my passions page), I've often been frustrated by the lack of clarity and consistency in ticket pricing. This inspired me to build a ticket price prediction model from scratch. The goal was to develop a realistic and applicable pipeline that tackles the practical challenges of working with messy, multi-source data rather than relying on pre-cleaned datasets.
Key Skills
This project utilized a variety of tools and techniques across the machine learning lifecycle, including data collection from APIs, advanced feature engineering, and the integration of pre-trained language models. Key technologies included the Ticketmaster API, Spotify API, BERT embeddings, and classical machine learning algorithms like SVM, Random Forest, and XGBoost.

While building and refining this ticket price predictor, I encountered significant hurdles due to the highly skewed nature of the dataset, where most ticket prices clustered at lower levels while a minority extended into disproportionately high tiers. Attempts to address this skewness with log transformations and alternative loss functions, such as Poisson, yielded limited benefits due to inherent data constraints.

Nevertheless, by integrating advanced feature engineering, optimizing hyperparameters, and experimenting with multiple regressors, I achieved a Mean Absolute Error of 8.52 using Random Forest on the final test set. This performance demonstrates the model’s practical utility, although the skewed data distribution and limited predictive power of features still constrains further accuracy gains.

In the future, I intend to enrich the dataset with more extensive textual descriptions to make even better use of BERT for identifying semantic information, these descriptions could come from wikipedia venue descriptions or spotify artist bios.