Hiring managers reviewing data analyst resumes see the same things over and over. SQL listed under skills, a few courses on LinkedIn Learning or Coursera, maybe a certification. What they almost never see is proof. An actual project that shows the candidate sat down with real data, wrote meaningful queries, and produced something that answers a business question.
That gap between claiming SQL skills and demonstrating SQL skills is exactly where most job applications get filtered out. The fix is not more courses. It is projects. Specific, well-documented projects that live on GitHub and show an interviewer what you can actually do with a database.
This guide covers the best SQL projects for your resume at every level, with the source code and datasets you need to build them, and an explanation of what each project signals to the person reading your CV.
What Makes an SQL Project Resume-Worthy
Not every SQL project belongs on a resume. A project that creates a table, inserts five rows, and runs a SELECT statement is practice, not a portfolio piece. A resume-worthy SQL project has a few things in common.
It starts with a real business question. Not “demonstrate the use of GROUP BY” but “which product categories drove the most revenue in Q3 and how did that compare to Q2?” Business questions make projects feel professional. Technical exercises make them feel like homework.
It uses a real dataset with enough rows to be meaningful. Fifty rows of made-up data does not demonstrate that you can work with data at any realistic scale. Publicly available datasets with tens of thousands of rows do.
It is documented. A GitHub repository with no README tells a hiring manager nothing about what problem you were solving or what you found. A short write-up explaining the business context, your approach, and your key findings turns a folder of SQL files into a narrative they can follow and share with the wider team.
It uses SQL features beyond basic SELECT statements. JOINs, GROUP BY, and WHERE are table stakes. Window functions, CTEs, subqueries, and aggregate logic show depth.
Project 1: Retail Sales Analysis (Beginner)
This is the best starting project for anyone new to SQL portfolios because the dataset is clean, the business questions are intuitive, and the analysis covers a wide range of SQL fundamentals.
Dataset: Use the Sample Superstore dataset from Kaggle, which contains over 9,000 rows of retail order data including order dates, customer segments, product categories, sales, profit, and shipping information.
What you build: A series of analytical queries that answer questions like which product category generates the highest profit margin, which customer segment has the highest average order value, which region has the most returned orders, and how monthly sales trend across a two-year period.
Skills demonstrated: SELECT with multiple columns, GROUP BY, ORDER BY, aggregate functions like SUM, AVG, COUNT and MAX, date functions for monthly and quarterly grouping, and basic JOINs if you split the data across multiple tables.
Source code starting point:
-- Monthly revenue trend
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(sales) AS total_sales,
SUM(profit) AS total_profit,
ROUND(SUM(profit) / SUM(sales) * 100, 2) AS profit_margin_pct
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;
-- Top 5 product subcategories by revenue
SELECT
sub_category,
SUM(sales) AS total_sales,
SUM(profit) AS total_profit,
COUNT(DISTINCT order_id) AS number_of_orders
FROM orders
GROUP BY sub_category
ORDER BY total_sales DESC
LIMIT 5;
What it signals to hiring managers: You can take a raw dataset and extract business-relevant insights using SQL. You understand aggregation, grouping, and time-based analysis. You are not just running SELECT star.
Project 2: Customer Cohort Analysis (Intermediate)
Cohort analysis is one of the most requested analytical skills in data analyst job descriptions and it is almost never demonstrated in candidate portfolios. Building this project immediately differentiates you from candidates who only show surface-level SQL.
Dataset: Use a publicly available e-commerce dataset from Kaggle such as the Online Retail dataset from the UCI Machine Learning Repository, which contains transactional data for a UK retailer with over 500,000 rows.
What you build: A cohort analysis that groups customers by the month they made their first purchase, tracks how many of them made another purchase in each subsequent month, and calculates retention rates by cohort over time.
Skills demonstrated: CTEs, window functions, date manipulation, self-joins, conditional aggregation, and the ability to think about data at the customer level rather than the transaction level.
Source code starting point:
-- Step 1: Find each customer's first purchase month
WITH first_purchase AS (
SELECT
customer_id,
MIN(DATE_FORMAT(invoice_date, '%Y-%m')) AS cohort_month
FROM transactions
GROUP BY customer_id
),
-- Step 2: Join back to all transactions
customer_activity AS (
SELECT
t.customer_id,
fp.cohort_month,
DATE_FORMAT(t.invoice_date, '%Y-%m') AS activity_month
FROM transactions t
JOIN first_purchase fp ON t.customer_id = fp.customer_id
),
-- Step 3: Calculate months since first purchase
cohort_data AS (
SELECT
cohort_month,
activity_month,
COUNT(DISTINCT customer_id) AS active_customers
FROM customer_activity
GROUP BY cohort_month, activity_month
)
SELECT * FROM cohort_data ORDER BY cohort_month, activity_month;
What it signals to hiring managers: You understand how businesses think about customer lifetime value and retention. You can write multi-step SQL using CTEs. You are comfortable with the kind of analysis that directly informs product and marketing decisions.
Project 3: Employee Performance and HR Analytics (Intermediate)
HR analytics projects are valuable in portfolios because every company has employee data and this project demonstrates that you can handle sensitive, real-world data structures with appropriate logic.
Dataset: Use the IBM HR Analytics Employee Attrition dataset on Kaggle, which contains 1,470 employee records with attributes including department, job role, salary, performance rating, years at company, and whether the employee left.
What you build: A dashboard-ready analysis that identifies which departments have the highest attrition rates, what salary bands correspond to high versus low performance ratings, which job roles have the longest average tenure, and what combination of factors most commonly precedes an employee leaving.
Skills demonstrated: CASE statements for conditional categorization, multi-table JOINs, window functions for ranking within groups, subqueries, and percentage calculations using conditional aggregation.
Source code starting point:
-- Attrition rate by department
SELECT
department,
COUNT(*) AS total_employees,
SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS employees_left,
ROUND(
SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2
) AS attrition_rate_pct
FROM employees
GROUP BY department
ORDER BY attrition_rate_pct DESC;
-- Average salary by job role and performance rating
SELECT
job_role,
performance_rating,
ROUND(AVG(monthly_income), 2) AS avg_monthly_income,
COUNT(*) AS employee_count
FROM employees
GROUP BY job_role, performance_rating
ORDER BY job_role, performance_rating;
What it signals to hiring managers: You can handle multi-dimensional data, write conditional logic in SQL, and produce analysis that maps directly to business decisions HR and leadership teams make.
Project 4: COVID-19 Data Exploration (Intermediate to Advanced)
This project became popular because Alex Freberg published a widely referenced walkthrough of it, which means many hiring managers have seen it. Your version needs to go further than the standard tutorial to stand out. Use it as a foundation and add your own questions on top.
Dataset: Our World in Data COVID-19 dataset, available on GitHub, which contains country-level data on cases, deaths, vaccinations, population, and hospital capacity.
What you build: Beyond the standard death percentage and vaccination rollout queries, add analysis of the relationship between vaccination rates and subsequent case reduction, identify countries that outperformed their expected outcomes based on GDP per capita, and build a rolling 7-day average of new cases using window functions.
Skills demonstrated: Window functions including rolling averages and running totals, CTEs, complex JOINs across multiple tables, NULL handling, and the ability to ask original questions rather than just following a tutorial.
Source code starting point:
-- 7-day rolling average of new cases by country
SELECT
location,
date,
new_cases,
ROUND(
AVG(new_cases) OVER (
PARTITION BY location
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2
) AS rolling_7day_avg
FROM covid_deaths
WHERE continent IS NOT NULL
ORDER BY location, date;
What it signals to hiring managers: You can work with real-world messy data including NULLs and inconsistent entries. You know window functions. You ask original analytical questions rather than just reproducing what a tutorial told you to build.
Project 5: Financial Transactions Fraud Detection (Advanced)
This is the project that gets you noticed for data engineering and analytics engineering roles, not just analyst positions. Fraud detection logic in SQL demonstrates that you understand how to think about anomalies, thresholds, and pattern detection at scale.
Dataset: Use the PaySim synthetic mobile money transaction dataset on Kaggle, which contains over 6 million simulated transaction rows with transaction type, amount, and a fraud flag you can use to validate your detection logic.
What you build: A rule-based fraud detection system in SQL that flags transactions based on multiple criteria: transaction amounts significantly above a customer’s historical average, multiple transactions from the same account within a short time window, transactions to new recipients combined with unusually large amounts, and accounts with sudden spikes in activity.
Skills demonstrated: Window functions for calculating historical baselines, self-joins for time-window analysis, CTEs for building multi-step detection logic, and performance considerations for running complex queries against millions of rows.
Source code starting point:
-- Flag transactions more than 3x the customer's average
WITH customer_averages AS (
SELECT
customer_id,
AVG(amount) AS avg_transaction_amount,
STDDEV(amount) AS stddev_amount
FROM transactions
GROUP BY customer_id
)
SELECT
t.transaction_id,
t.customer_id,
t.amount,
ca.avg_transaction_amount,
ROUND(t.amount / ca.avg_transaction_amount, 2) AS amount_vs_average,
CASE
WHEN t.amount > ca.avg_transaction_amount * 3 THEN 'HIGH RISK'
WHEN t.amount > ca.avg_transaction_amount * 2 THEN 'MEDIUM RISK'
ELSE 'NORMAL'
END AS risk_flag
FROM transactions t
JOIN customer_averages ca ON t.customer_id = ca.customer_id
ORDER BY amount_vs_average DESC;
What it signals to hiring managers: You can build analytical systems, not just run reports. You understand business risk logic and can encode it in SQL. You have worked with large datasets and thought about query structure and performance.
How to Present These Projects on Your Resume and GitHub
The SQL itself is only half the work. How you present the project determines whether a hiring manager reads it or skips it.
On GitHub, every project needs a README that covers the business problem in one paragraph, the dataset source and size, the key questions you answered, the SQL concepts and functions used, and the key findings in two or three bullet points. Add screenshots of your query outputs or a simple visualization of the results. A project with a clear README takes ten minutes to read and leaves a clear impression. A folder of SQL files with no context gets closed in thirty seconds.
On your resume, list each project with a one-line description that leads with the business outcome, not the technical method. “Analyzed 500,000 retail transactions to identify $2.3M in revenue leakage across three product categories using SQL window functions and cohort analysis” is stronger than “Built SQL queries using CTEs and window functions on retail dataset.”
SQL Projects Cheat Sheet
| Project | Level | Dataset | Key Skills |
|---|---|---|---|
| Retail Sales Analysis | Beginner | Sample Superstore, Kaggle | Aggregation, GROUP BY, date functions |
| Customer Cohort Analysis | Intermediate | Online Retail UCI | CTEs, window functions, retention logic |
| HR Attrition Analytics | Intermediate | IBM HR Dataset, Kaggle | CASE statements, multi-table JOINs |
| COVID-19 Exploration | Intermediate | Our World in Data | Rolling averages, NULL handling, original questions |
| Fraud Detection System | Advanced | PaySim, Kaggle | Baseline calculations, risk flagging, large datasets |
Common Mistakes to Avoid
Using tiny made-up datasets. Ten rows of data does not demonstrate anything about your ability to work with real-world data. Use publicly available datasets with at least 10,000 rows.
Copying a tutorial exactly without adding anything. Hiring managers recognize the standard COVID-19 walkthrough and the standard Superstore analysis. Follow tutorials to learn the approach, then add three or four original questions that the tutorial did not ask. That addition is what makes the project yours.
Skipping the README. A GitHub repository with no documentation is invisible to hiring managers who spend thirty seconds deciding whether to look further. The README is your pitch. Write it like one.
Only demonstrating basic SQL. If every query in your portfolio is SELECT, FROM, WHERE, GROUP BY you are showing a hiring manager the minimum. Window functions, CTEs, and subqueries are expected for mid-level analyst roles. Include at least one project that uses them.
Building five shallow projects instead of two deep ones. Depth is more impressive than volume. Two well-documented projects that answer real business questions with advanced SQL are worth more than five projects that each demonstrate one basic concept.
The difference between a resume that gets a callback and one that gets ignored is almost never credentials. It is evidence. Projects on GitHub that a hiring manager can click through and actually read are evidence. Build them, document them well, and let the work speak louder than the bullet points on your CV.
FAQs
What SQL projects should I put on my resume as a beginner?
Start with a retail sales analysis or e-commerce dataset project that demonstrates aggregation, GROUP BY, JOINs, and date-based analysis. Use a public dataset from Kaggle with at least 10,000 rows, write queries that answer real business questions, and document everything in a GitHub README. One well-documented project beats five undocumented ones every time.
Where can I find datasets for SQL projects?
Kaggle is the best starting point and has hundreds of free datasets across every industry. The UCI Machine Learning Repository has clean, well-documented datasets ideal for analysis projects. Our World in Data provides public datasets on health, economics, and demographics. For e-commerce and retail data, the Sample Superstore dataset and the Online Retail dataset from UCI are the most commonly used.
Do SQL projects need to be on GitHub?
They do not have to be, but GitHub is the standard place hiring managers expect to find portfolio work. It is free, widely understood, and makes it easy to share a link on your resume or LinkedIn profile. If you prefer, you can also document projects in a personal blog post or a portfolio website, as long as the work is accessible and readable by someone who was not there when you built it.
How long does it take to complete an SQL portfolio project?
A beginner project with a clean dataset and straightforward queries takes two to four days including the documentation. An intermediate project with CTEs and window functions takes one to two weeks if you are learning as you go. An advanced project like the fraud detection system can take two to three weeks. Speed matters less than quality. A project you spent three weeks on and documented thoroughly is far more impressive than one you rushed through in a day.
What SQL skills do employers actually look for in interviews?
JOINs and the ability to explain when to use each type are consistently tested. Window functions, particularly ROW_NUMBER, RANK, LAG, and LEAD, come up in almost every technical interview for analyst roles. CTEs are expected at mid-level and above. Writing subqueries, debugging a query that returns unexpected results, and explaining query optimization basics round out what most employers assess. Your projects should touch all of these.