Image by Author
# Introduction
Data science is often confused with machine learning, but it’s actually much more than that. It’s about collecting, cleaning, analyzing, and visualizing data to find useful patterns that can help us in decision-making. Machine learning is just one small part of this bigger picture. I started this Fun Projects series to encourage practical learning because honestly, you don’t learn data science by watching endless theory. You learn it by building.
For this article, I’ve picked five projects that cover different stages of a typical data science workflow, from basic data cleaning to exploring data,…
Image by Author
# Introduction
Data science is often confused with machine learning, but it’s actually much more than that. It’s about collecting, cleaning, analyzing, and visualizing data to find useful patterns that can help us in decision-making. Machine learning is just one small part of this bigger picture. I started this Fun Projects series to encourage practical learning because honestly, you don’t learn data science by watching endless theory. You learn it by building.
For this article, I’ve picked five projects that cover different stages of a typical data science workflow, from basic data cleaning to exploring data, building models, and even deploying them for real-world use.
# 1. The ONLY Data Cleaning Framework You Need
This video is by Christine Jiang, who works as a data analyst, and she shares a really practical approach to data cleaning that I think anyone working on projects will find useful. While cleaning data, we often think “how clean is clean enough,” and Christine shows a clear way to handle this using her five-step CLEAN framework. She walks through how to find solvable versus unsolvable issues, standardize values, document everything, and iterate to make your data reliable without aiming for “perfect.” The examples she uses, like fixing missing country codes or inconsistent product descriptions, are very relatable and the mindset she emphasizes is just as important as the tools. I found this to be a super practical guide for anyone trying to handle real-world data effectively.
# 2. Exploratory Data Analysis in Pandas
This video shows why just having data is not enough and how looking at the numbers carefully can reveal hidden patterns. The presenter walks through inspecting datasets, summarizing distributions, checking for missing values and outliers, and visualizing relationships between columns using pandas and seaborn. I found it really practical because it doesn’t just show the commands, it explains why each step matters and how statistics can tell you things that are not obvious at first glance. This is a great guide for anyone who wants to explore real-world data and get meaningful insights before jumping into modeling.
# 3. Data Visualization using Pandas and Plotly
This video by Greg Kamadt, founder of Data Independent, shows how telling a story with your data is just as important as building models. He walks through a hands-on tutorial using pandas for data wrangling and Plotly for interactive charts, starting with the basics of what makes a visualization effective. You’ll see how to load and shape data, pick the right chart types, and add formatting touches that make your charts clear and easy to understand. I really liked how practical it is, with tips on handling real-world issues like outliers, date axes, and aggregations, and how small choices can improve readability. By the end, you’ll know how to create interactive, shareable charts that communicate insights effectively.
# 4. Feature Engineering Techniques For Machine Learning in Python
Once your data is clean and understood, it’s time to create better features. This tutorial focuses on the “feature engineering” stage, where you transform and generate new data columns that can make your model smarter. The instructor explains techniques like encoding categorical variables, handling missing data, dimensionality reduction (principal component analysis (PCA)), and creating interaction terms. I like that it also highlights what not to do like leaking data, overfitting, and over-engineering features. This is a great resource for anyone who wants to move from raw data to building well-engineered features for real-world machine learning.
# 5. Deploying a Machine Learning Model in a Streamlit App and Making Live Predictions
Finally, the most satisfying part — bringing your model to life. In this tutorial, Yiannis Pitsillides shows how to deploy a trained machine learning model using Streamlit. He walks through loading a stored model, setting up a clean interface with input boxes and buttons, and generating real-time predictions for car prices. The video even includes a feature importance visualization using Plotly, so you can see which inputs matter most. I liked how practical it is, with tips on keeping raw and cleaned data separate, handling dependencies, and running the app locally or on a host. It’s a short tutorial, but it does the job beautifully and gives you that “end-to-end” experience that most beginners miss.
# Wrapping Up
These projects cover all the key stages of a data science workflow and show how theory comes to life in practice. Grab your datasets and start experimenting. There’s no better way to learn data science than by doing.
Kanwal Mehreen is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.