my data science portfolio

A showcase of my projects and my skills.

Latest Data Projects

Providing data-driven suggestions for HR

My goals in this project are to analyze the data collected by the HR department and to build a model that predicts whether an employee will leave the company. This model can benefit the HR department in knowing if an employee is likely to leave the company. The model can help to identify factors that contribute to their leaving and then the HR department will increase employee retention in the company because it is time-consuming and expensive to find, interview, and hire new employees.

Predicting Taxi Gratuities in New York City

The goal of this project was to create a multiple linear regression and random forest model to predict high rider gratuity or not. This project utilized yellow taxi trips taken in New York City during 2017. The final random forest model performed with 86% accuracy and 72% precision determining what features were most important in separating low tippers from high tippers. It is important to understand what factors encourage riders to leave tips in order to help drivers obtain a livable wage..

Classifying TikTok Videos

This project aimed to create an XGBoost and random forest model to help identify claims and opinions. The purpose of this model is to mitigate misinformation in videos on the TikTok platform. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritize them more efficiently. Videos that are labeled opinions will be less likely to go on to be reviewed by a human moderator. Videos that are labeled as claims will be further sorted by a downstream process to determine whether they should get prioritized for review. For example, perhaps videos that are classified as claims would then be ranked by how many times they were reported, then the top x% would be reviewed by a human each day.

user churn prediction

This project is part of a larger effort at Waze to increase growth. Typically, high retention rates indicate satisfied users who repeatedly use the Waze app over time. Developing a churn prediction model will help prevent churn, improve user retention, and grow Waze's business. An accurate model can also help identify specific factors that contribute to churn. In this project I will work on developind a churn prediction model that will help prevent churn, improve user retention, and grow Waze's business.

Predict Future Sales

Here is daily historical sales data, and I am going to forecast the total amount of products sold in every shop for the test set. Taking into consideration that the list of shops and products slightly changes every month. Creating a robust model that can handle such situations is part of the challenge.

Zircon mineral detection

With a small amount of images which contains many minerlas, my mission is to use them to train a model to be able to detect Ziron mineral.


spacex falcon-9 landing prediction

We will collect and use data related to different rocket launches to be provided with an appropriate machine-learning algorithm to predict if the first stage of the rocket, which is about to launch, will land successfully to be used again in another launch. Therefore, we can predict the cost of a launch.


Healthcare System

In this project, a symptoms checker feature - the XGBoost model - allows users to choose their symptoms and then provide them with their potential disease. In the Healthcare System app, the users can use the first-aid feature - NLP model - using their voice or text and ask for help in emergencies such as cuts, abrasions, fever, and so on.

My Analytics Toolbox

  • Python
  • Descriptive stats
  • Porbability distibutions
  • Sampling distributions
  • Confidence intervals
  • Hypotheses tests
  • Linear regression
  • Machine learning
  • Structured query language (SQL)
  • Inferential Statistics
  • exploratory data analysis (EDA)
  • Tableau
  • Feature Engineering.
  • ML Pipelines and ML Operations (MLOps)
  • Ensembling
  • A/B Testing and Model Deployment
Upcoming projects

I love constantly learning new methods and expanding the range of problems I can work on.


foodcard

The nutrition facts label is a label required on most packaged food in many countries, showing what nutrients and other ingredients are in the food. My job here is to build a model that can extract nutrition facts data from a captured phone image for any packaged product.


Gait Recognition

Gait recognition is a type of behavioral biometric authentication that recognizes and verifies people by their walking style and pace.


fake news detection

Donec eget ex magna. Interdum et malesuada fames ac ante ipsum primis in faucibus. Pellentesque venenatis dolor imperdiet dolor mattis sagittis magna etiam.