The MS BAIM Experiential Learning course provides our graduate students the opportunity to apply the knowledge from their studies in an applicable area via a thesis or project. This course is ideal for students that will pursue analytical-type careers (e.g. Data/OR Analyst, Data Scientist, Decision Scientist, Business Consultant, etc.) and it is designed to polish and integrate the knowledge and skills you developed from your masters-level coursework by successfully developing an analytics solution with an industry partner. Thus, most of your time will be devoted to working with your project teammates to provide the answers and deliverables specified by the partner. Projects usually have 3-5 members per team. Below is a list of some of the projects the MS BAIM students worked on last year:
This study builds and assesses various predictive models to understand the effects that certain housing features have on the price of a home in many different markets. Using these predictive models, we formulate an optimization model that allows home owners (or better yet future home sellers) which features about their home they should invest in so as to maximize the value of their home. This study is important because purchasing a home is one the greatest investments an individual or family will make in their lifetimes. Many people make investments into their home over time, but many do not have a good idea of what their return on investment (ROI) is for their market. We provide a solution to this problem and discuss how this approach to interfacing predictive and prescriptive models can be effective for many other types of problems.
The project assesses the use of IBM Watson to identify an equitable mix of incoming graduate students. In this first phase, quantitative estimators of a student’s future performance in Krannert are derived from the application submittal. Estimators are an additional input to Purdue’s holistic admission review process which considers many different elements. This research holds importance by demonstrating an innovative data-informed decision support system for the admissions process, while considering unintended consequences of a predictive modelling approach.
We study the performance of machine learning algorithms that have not been previously investigated to support the problem of blood donation prediction. We build models on clustered data sets using k-means clustering and not using clustering to see if performance is significantly improved using clustering or not. The motivation for this research is that blood demand is gradually increasing by the day due to needed transfusions due to accidents, surgeries, diseases etc. Accurate prediction of the number of blood donors can help medical professionals gauge the future supply of blood and plan accordingly to entice voluntary blood donors to meet demand. We found that in a non-cluster C.50 tree realized the best accuracy, a clustered (k=4) ANN model yielded the best, while a clustered (k=4) SVM model yielded the best specificity, which might be the best for targeted advertisement. Our current solution is within the top 8% of all current participants in the DataDriven.org blood prediction competition
Coupon offering is one of the traditional and prevalent sales tools to both attract potential customers and increase the satisfaction of existing customers. With enough purchase history, it is possible to predict which shoppers, when presented an offer, will buy a new item. However, identifying the shopper who will become a loyal buyer prior to the initial purchase is a more challenging task. Acquired Valued Shoppers Data asks participants to predict which shoppers are most likely to repeat purchase. The challenge provides almost 350 million rows of completely anonymized transactional data from over 300,000 shoppers. It is one of the largest problems run on Kaggle to date. During the course of this project we worked on various machine learning libraries such as H2o, Xgboost and Vowpal Wabbit to optimize our solution and move up on the leaderboard of this data challenge.
This study builds and evaluates predictive models to estimate the probability that a team member will leave an employer within a future planning horizon. We then develop a theoretical decision model that provides guidance in how one might use the prediction outputs for future decision or policy making to better manage their workforce. This study proposes a framework to use analytics for human resource workforce management. The literature on analytics in human resources is scarce in comparison to other business verticals such as sales, marketing, and operations. Most research we have studied discuss employee turnover on theoretical grounds, rather than providing analytical decision-support solutions. Using data from a local retailer we develop a working framework that provides guidance to human resource professionals in how descriptive, predictive, and prescriptive analytics can work together to support workforce management decision and policy making.
We investigated the effect of population purchase prevalence adjustments on probability forecasts used to support the assortment planning decision for sparse demand products. We investigated the performance of various predictive models on various sized and various levels of imbalance. The performance was assessed using traditional statistical performance measures, as well as with probability calibration plots, which help gauge how well the models perform with regard to the actual business purchasing behavior. Both of these measures are important when determining which model performs optimally in the case of sparse demand assortments. In this study, we have found that not rebalancing consistently leads to the best overall accuracy regardless of how imbalanced the data set is. This evidence is not as conclusive with the AUC statistic, but we found many of the AUC values to essentially be no different from one another (i.e. tied) for the three rebalancing methods we researched (no rebalance/raw, down, up). Based on the findings, for all levels of class imbalance, we recommend that this data should not be rebalanced in future modeling runs.