Data Science and Machine Learning Projects | Tools: Python, R, Stata, LaTeX & Tableau

Churn Prediction Model Deployment using AWS Elastic Beanstalk {github}

  • Trained, dockerized, tested and deployed a churn prediction logistic regression model using AWS Elastic Beanstalk.
  • Used python 3.9, numpy, pandas, scikit-learn, docker, pip3 and awsebcli.
  • Solely executed the entire project workflow from start to end.

Streamlit-based Probability Distribution Fitter Web Application {blog | app | github}

  • Created and deployed a streamlit-based web application on the Heroku cloud, which compares 80+ probability distributions and ranks them based on their fit to the data.
  • The application uses the fitter and scipy library to fit and compare distributions.
  • It offers two fit mechanisms. Users can select the ten most common distribution comparison options or manually input distributions from the drop-down list.

House Price Prediction Model Deployment on Heroku {app | Githib}

  • Trained, dockerized, and deployed a house-price prediction model using GitHub actions on the Heroku cloud.
  • Used python 3.9, numpy, pandas, scikit-learn and docker
  • Worked independently for two days to automate the model deployment using GitHub CI/CD pipeline.

Seattle Fremont Bridge’s Daily Bicycle Count Forecasting {github}

  • Made two years of the forecast for daily bicycle ridership using Facebook’s Prophet library.
  • Evaluated the model using a forward-chaining cross-validation mechanism with a horizon of 90 days, 30 days period and an initial value of 730 days. RMSE metric is estimated using a rolling window of 0.1 (10%).

Lung Cancer Patents’ Survival Prediction using Survival Analysis {blog | github}

  • Predicted lung cancer patients’ survival using Cox PH and AFT models.
  • Used lifelines python library for survival modelling. Further, results were validated using Stata statistical software.
  • Results revealed ‘Gender’ variable has the highest influence on survival. It was found that Female patients’ survival time is approximately 52% higher than that of male patients.

Bayesian A/B Testing of Email Click-Through Rate {github}

  • Two variants of an email (with and without header image) were sent, and click-through data was gathered.
  • A Monte Carlo simulation was performed to identify which email variant performs better and to what extent.

Key Performance Indicators Tableau Dashboards {link}

  • Created interactive Sales, HR and Accounts Key Performance Indicators dashboards to illustrate transactions.

Built choropleth maps using Python and QGIS {github}

  • Created a choropleth map using QGIS showing pedestrian deaths due to road injuries in India (2020). {link}
  • Investigated median household income across US states (2019) using QGIS. {link}
  • Created a dynamic covid-19 map using folium and geopandas. {link}

Ph.D. Projects | Tools: Python, R, Stata, LaTeX & Tableau

Identified the Relationship between Waiting Time and Signal Violation Likelihood using Survival Analysis. {link}

  • Identified the optimal waiting time of pedestrians at intersection crosswalks using COX-PH and AFT models.
  • The study results proposed optimal red-phase length for pedestrian signals at crosswalks in Kolkata city.

Factors Influencing Pedestrian Signal Violation Behaviour {link}

  • Investigated signal violation tendency of oncoming pedestrians after receiving social information disseminated from a group of pedestrians waiting at the curbside or crossing in the do-not-walk phase.
  • Results revealed that the number of pedestrians waiting for the green and crossing on the do-not-walk phase influences an oncoming pedestrian’s signal violation decision.

Analysed Usability of Foot Over Bridges Across India {link}

  • Analysed pedestrian foot-over bridge utilisation across four Indian cities using tree-based ensemble techniques.
  • The safety and security, walk environment, frequency of daily use, comfort, location type, length of travel, stairway dimensions and reduced walkable width affected the choice of using the FOBs.

Factors Influencing Pedestrians’ Distracted Road Crossing Behaviour {link}

  • Designed distraction-themed questionnaire, estimated required sample size, trained interviewers, conducted face-to-face interviews across Kolkata city and collected 446 valid samples.
  • Identified factors influencing pedestrians’ distracted road crossing behaviour using binary logistic regression models.

Parents’ Role in School Model Choice for their Chidren {link}

  • Analysed the parents’ role in school mode choice for their children in Guwahati city using a multinominal logit model.
  • It highlights how parents’ perceptions of safety, economic standards, and child characteristics impact mode choice.

M.Tech Project

Public Transport Performance Evaluation

  • Evaluated the qualitative factors that diminished or increased system usage for both bus and train users.