How to Perform Cohort Analysis in Excel or SQL

If you have ever wondered why some customers keep coming back month after month while others disappear after their first purchase — cohort analysis is the technique that answers that question with data.

Cohort analysis is one of the most powerful tools in any analyst’s toolkit. It reveals retention patterns, product improvement signals, and customer lifetime behavior that aggregate metrics completely hide. And unlike many advanced analytics techniques, it is genuinely achievable in both Excel and SQL without specialized tools or data science expertise.

In this guide, we will walk through exactly how to perform cohort analysis — from defining cohorts to building a complete retention matrix in both Excel and SQL, with real examples.

What Is Cohort Analysis?

Cohort analysis is a technique that groups users or customers by a shared characteristic — typically the time period when they first engaged and then tracks the behavior of each group over time.

The most common cohort is the acquisition cohort — grouping users by the month they first signed up, made their first purchase, or first used a product. You then track what percentage of each group returned in subsequent months.

Simple Analogy

Imagine you opened a restaurant in January. You want to know whether customers who came for the first time in January are more loyal than customers who first came in June.

Without cohort analysis, you just see total visitors each month — a number that mixes new and returning customers together.

With cohort analysis, you track the January cohort separately from the June cohort. You see that 40% of January customers returned in February, and only 25% of June customers returned in July. That difference tells you something important maybe your summer marketing attracted less loyal customers, or your winter menu drives better retention.

Why Aggregate Metrics Hide the Truth

Consider a product with these monthly active users:

January:  10,000
February: 11,000
March:    12,500
April:    13,000

This looks like healthy growth. But what if the breakdown is:

January cohort retained:  2,000 (20%)
February new users:       9,000
February cohort retained: 1,800 (16%)
March new users:          10,700

Each cohort is actually retaining worse than the one before. The growth is masking deteriorating retention — new user acquisition is outpacing the losses. Without cohort analysis, you would never see this until growth slows and the underlying problem becomes a crisis.

The Three Types of Cohorts

Acquisition Cohorts (Most Common)

Groups users by when they first joined, signed up, or made their first purchase.

Example: All users who signed up in January 2024 form the January cohort.

Behavioral Cohorts

Groups users by a specific action they took — not when they joined.

Example: All users who completed onboarding, all users who purchased a specific product, or all users who used a premium feature.

Size Cohorts

Groups users by a characteristic at the time of acquisition — company size, plan tier, geographic region.

Example: Enterprise customers vs SMB customers vs individual users — tracked separately to see if retention differs by segment.

This guide focuses on acquisition cohorts because they are the most universally applicable and the clearest starting point.

Setting Up the Dataset

We will use the same dataset for both the SQL and Excel approaches.

Table: user_orders
Columns: user_id, order_date, order_amount

sql

-- Sample data structure
CREATE TABLE user_orders (
    user_id     INT,
    order_date  DATE,
    order_amount DECIMAL(10, 2)
);

-- Sample records (abbreviated)
INSERT INTO user_orders VALUES
(1001, '2024-01-05', 85.00),
(1001, '2024-02-18', 120.00),
(1001, '2024-04-03', 95.00),
(1002, '2024-01-12', 200.00),
(1002, '2024-01-28', 75.00),
(1003, '2024-02-07', 150.00),
(1003, '2024-03-15', 88.00),
(1003, '2024-04-22', 210.00),
(1004, '2024-02-14', 95.00),
(1005, '2024-03-01', 175.00),
(1005, '2024-04-10', 90.00),
(1006, '2024-01-20', 310.00),
-- ... (thousands more rows in practice)

Cohort Analysis in SQL — Step by Step

Step 1: Find Each User’s First Order Date (Cohort Assignment)

The first step is identifying when each user made their very first purchase. This defines which cohort they belong to.

sql

-- Step 1: Get the first order date for each user
WITH user_cohorts AS (
    SELECT
        user_id,
        MIN(order_date) AS first_order_date,
        DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM user_orders
    GROUP BY user_id
)
SELECT * FROM user_cohorts LIMIT 10;

Output (sample):

user_id	first_order_date	cohort_month
1001	2024-01-05	2024-01-01
1002	2024-01-12	2024-01-01
1006	2024-01-20	2024-01-01
1003	2024-02-07	2024-02-01
1004	2024-02-14	2024-02-01
1005	2024-03-01	2024-03-01

Users 1001, 1002, and 1006 all joined in January — they form the January 2024 cohort.

Step 2: Calculate Months Since First Order

For each order, calculate how many months have passed since the user’s first order. This is the “cohort age” — Month 0 is the acquisition month, Month 1 is the following month, and so on.

sql

-- Step 2: Join orders with cohort assignments and calculate age
WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM user_orders
    GROUP BY user_id
),
orders_with_cohort AS (
    SELECT
        o.user_id,
        uc.cohort_month,
        DATE_TRUNC('month', o.order_date) AS order_month,
        -- Calculate months between order and cohort start
        DATEDIFF('month', uc.cohort_month,
                 DATE_TRUNC('month', o.order_date)) AS months_since_first
    FROM user_orders o
    JOIN user_cohorts uc ON o.user_id = uc.user_id
)
SELECT * FROM orders_with_cohort
ORDER BY cohort_month, user_id, months_since_first
LIMIT 15;

Output (sample):

user_id	cohort_month	order_month	months_since_first
1001	2024-01-01	2024-01-01	0
1001	2024-01-01	2024-02-01	1
1001	2024-01-01	2024-04-01	3
1002	2024-01-01	2024-01-01	0
1002	2024-01-01	2024-01-01	0
1003	2024-02-01	2024-02-01	0
1003	2024-02-01	2024-03-01	1
1003	2024-02-01	2024-04-01	2

User 1001 (January cohort) placed orders at Month 0, Month 1, and Month 3 — skipping Month 2.

Note on DATEDIFF syntax: The syntax varies by database. Use the appropriate version for your platform:

sql

-- PostgreSQL
DATE_PART('month', order_month) - DATE_PART('month', cohort_month)
+ 12 * (DATE_PART('year', order_month) - DATE_PART('year', cohort_month))

-- MySQL
TIMESTAMPDIFF(MONTH, cohort_month, order_month)

-- BigQuery / Snowflake
DATE_DIFF(order_month, cohort_month, MONTH)

-- SQL Server
DATEDIFF(MONTH, cohort_month, order_month)

Step 3: Count Active Users Per Cohort and Month

Count distinct users in each cohort who made at least one order in each month period.

sql

-- Step 3: Count distinct active users per cohort per month
WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM user_orders
    GROUP BY user_id
),
orders_with_cohort AS (
    SELECT
        o.user_id,
        uc.cohort_month,
        DATEDIFF('month', uc.cohort_month,
                 DATE_TRUNC('month', o.order_date)) AS months_since_first
    FROM user_orders o
    JOIN user_cohorts uc ON o.user_id = uc.user_id
),
cohort_activity AS (
    SELECT
        cohort_month,
        months_since_first,
        COUNT(DISTINCT user_id) AS active_users
    FROM orders_with_cohort
    GROUP BY cohort_month, months_since_first
)
SELECT * FROM cohort_activity
ORDER BY cohort_month, months_since_first;

Output (sample):

cohort_month	months_since_first	active_users
2024-01-01	0	850
2024-01-01	1	340
2024-01-01	2	255
2024-01-01	3	212
2024-02-01	0	720
2024-02-01	1	252
2024-02-01	2	180
2024-03-01	0	650
2024-03-01	1	195

Step 4: Get Cohort Size (Month 0 Users)

The cohort size is the number of users active at Month 0 — the total users in that cohort.

sql

-- Step 4: Add cohort size for percentage calculation
WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM user_orders
    GROUP BY user_id
),
orders_with_cohort AS (
    SELECT
        o.user_id,
        uc.cohort_month,
        DATEDIFF('month', uc.cohort_month,
                 DATE_TRUNC('month', o.order_date)) AS months_since_first
    FROM user_orders o
    JOIN user_cohorts uc ON o.user_id = uc.user_id
),
cohort_activity AS (
    SELECT
        cohort_month,
        months_since_first,
        COUNT(DISTINCT user_id) AS active_users
    FROM orders_with_cohort
    GROUP BY cohort_month, months_since_first
),
cohort_sizes AS (
    SELECT
        cohort_month,
        active_users AS cohort_size
    FROM cohort_activity
    WHERE months_since_first = 0
)
SELECT
    ca.cohort_month,
    ca.months_since_first,
    ca.active_users,
    cs.cohort_size,
    ROUND(ca.active_users * 100.0 / cs.cohort_size, 1) AS retention_rate
FROM cohort_activity ca
JOIN cohort_sizes cs ON ca.cohort_month = cs.cohort_month
ORDER BY ca.cohort_month, ca.months_since_first;

Output:

cohort_month	months_since_first	active_users	cohort_size	retention_rate
2024-01-01	0	850	850	100.0
2024-01-01	1	340	850	40.0
2024-01-01	2	255	850	30.0
2024-01-01	3	212	850	24.9
2024-02-01	0	720	720	100.0
2024-02-01	1	252	720	35.0
2024-02-01	2	180	720	25.0
2024-03-01	0	650	650	100.0
2024-03-01	1	195	650	30.0

Every cohort starts at 100% at Month 0 by definition. Retention is tracked as a percentage of that starting cohort size.

Step 5: Pivot Into a Retention Matrix

The final step pivots the long-format result into the classic cohort heatmap matrix — cohorts as rows, months as columns.

sql

-- Step 5: Pivot to retention matrix
-- (Using conditional aggregation — works in most SQL dialects)
WITH user_cohorts AS (
    SELECT user_id,
           DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM user_orders GROUP BY user_id
),
orders_with_cohort AS (
    SELECT o.user_id, uc.cohort_month,
           DATEDIFF('month', uc.cohort_month,
                    DATE_TRUNC('month', o.order_date)) AS months_since_first
    FROM user_orders o JOIN user_cohorts uc ON o.user_id = uc.user_id
),
cohort_activity AS (
    SELECT cohort_month, months_since_first,
           COUNT(DISTINCT user_id) AS active_users
    FROM orders_with_cohort GROUP BY cohort_month, months_since_first
),
cohort_sizes AS (
    SELECT cohort_month, active_users AS cohort_size
    FROM cohort_activity WHERE months_since_first = 0
),
retention AS (
    SELECT ca.cohort_month,
           ca.months_since_first,
           ROUND(ca.active_users * 100.0 / cs.cohort_size, 1) AS retention_rate
    FROM cohort_activity ca
    JOIN cohort_sizes cs ON ca.cohort_month = cs.cohort_month
)
SELECT
    cohort_month,
    MAX(CASE WHEN months_since_first = 0 THEN retention_rate END) AS month_0,
    MAX(CASE WHEN months_since_first = 1 THEN retention_rate END) AS month_1,
    MAX(CASE WHEN months_since_first = 2 THEN retention_rate END) AS month_2,
    MAX(CASE WHEN months_since_first = 3 THEN retention_rate END) AS month_3,
    MAX(CASE WHEN months_since_first = 4 THEN retention_rate END) AS month_4,
    MAX(CASE WHEN months_since_first = 5 THEN retention_rate END) AS month_5
FROM retention
GROUP BY cohort_month
ORDER BY cohort_month;

Result — Retention Matrix:

cohort_month	month_0	month_1	month_2	month_3	month_4	month_5
2024-01-01	100.0	40.0	30.0	24.9	19.5	15.2
2024-02-01	100.0	35.0	25.0	18.8	14.2	NULL
2024-03-01	100.0	30.0	21.5	15.9	NULL	NULL
2024-04-01	100.0	28.0	18.2	NULL	NULL	NULL

NULL appears where data does not yet exist — a March cohort cannot have Month 4 data if it is currently July. This is expected and correct.

Cohort Analysis in Excel — Step by Step

Step 1: Prepare Your Data

In Excel, start with a clean table:

user_id	order_date	order_amount
1001	05/01/2024	85.00
1001	18/02/2024	120.00
…	…	…

Ensure order_date is formatted as a date — not text. Select the column → Format Cells → Date.

Step 2: Calculate First Order Date Per User

In a new column or separate summary table, find the first order date for each user.

If your data is in columns A (user_id), B (order_date), C (amount):

In a separate column D, add a header “First Order Date” and use:

=MINIFS($B$2:$B$1000, $A$2:$A$1000, A2)

This finds the minimum date in column B where column A matches the current user ID.

Step 3: Extract Cohort Month

In column E, extract just the month and year from the first order date:

=DATE(YEAR(D2), MONTH(D2), 1)

Format this column as date with format “MMM-YYYY” for readability.

Step 4: Calculate Order Month and Months Since First

In column F, extract the month of each order:

=DATE(YEAR(B2), MONTH(B2), 1)

In column G, calculate months since first order:

=DATEDIF(E2, F2, "M")

Or using YEARFRAC:

=(YEAR(F2)-YEAR(E2))*12 + MONTH(F2) - MONTH(E2)

Step 5: Build the Summary Table Using COUNTIFS

Create a summary table with cohort months as rows and months since first (0, 1, 2, 3…) as columns.

Summary table structure (starting at cell I1):

	Month 0	Month 1	Month 2	Month 3
Jan-2024
Feb-2024
Mar-2024

In cell J2 (January cohort, Month 0 — active users):

=COUNTIFS($E$2:$E$1000, $I2, $G$2:$G$1000, J$1_value)

Where $I2 is the cohort month date and J$1_value is the month number (0 for Month 0).

More precisely, using numeric month values in row 1 (0, 1, 2, 3…):

=COUNTIFS($E$2:$E$1000, $I2, $G$2:$G$1000, J$1)

Copy this formula across all cells in the table.

Step 6: Calculate Retention Percentages

Create a parallel table showing percentages. If your count table is in columns I through N, create a percentage table in columns P through U.

In the percentage table, cell Q2 (January cohort, Month 1 retention):

=IF(J2=0, "", ROUND(K2/J2*100, 1))

Where J2 is the Month 0 count (cohort size) and K2 is the Month 1 count.

For Month 0, always show 100%:

=IF(J2=0, "", 100)

Step 7: Apply Conditional Formatting — The Heatmap

The retention matrix becomes dramatically more readable with a color heatmap — green for high retention, red for low.

Select all percentage cells in your retention table
Home tab → Conditional Formatting → Color Scales
Select Green-Yellow-Red (or Red-White-Green depending on your preference)
Power BI will automatically apply gradient colors based on values

Alternatively, use custom conditional formatting:

Home → Conditional Formatting → New Rule → Format cells based on their values
Set minimum (0%) as red, midpoint (30%) as yellow, maximum (100%) as green

The resulting heatmap makes patterns immediately visible — healthy cohorts appear as green rows that fade slowly. Problematic cohorts appear as rows that turn red quickly.

Step 8: Add a Pivot Table for Flexibility

For more dynamic cohort analysis in Excel, use a Pivot Table.

Select your full data table
Insert → Pivot Table
Row: Cohort Month (column E)
Column: Months Since First (column G)
Values: Count of user_id (Distinct Count — requires Power Pivot)

For distinct count:

Right-click on the value field → Value Field Settings → Summarize Values By → Distinct Count

This requires the Data Model (Power Pivot). Enable it:

When creating the pivot, check “Add this data to the Data Model”
Then distinct count becomes available

Reading and Interpreting the Retention Matrix

What to Look For

Down each column — Compares cohorts at the same age. Month 1 retention for January vs February vs March. If later cohorts show higher Month 1 retention, your product is improving. If it is declining, something is getting worse.

Across each row — Shows how retention decays for one cohort over time. A flat or slow-declining row indicates strong retention. A steep drop-off indicates a problem with keeping users engaged.

The diagonal — Each diagonal position represents the same calendar month but different cohort ages. The diagonal helps you see whether absolute activity levels are growing.

Patterns and What They Mean

Healthy retention pattern:

Month 0: 100%
Month 1: 40%  → Large initial drop is normal
Month 2: 35%  → Slows down — retained users are loyal
Month 3: 33%  → Stabilizes — this is your "true" retained base
Month 4: 32%
Month 5: 31%

The curve flattens after the initial drop — users who stay past Month 2 tend to stay long-term.

Worsening retention across cohorts:

Jan cohort Month 1: 40%
Feb cohort Month 1: 35%
Mar cohort Month 1: 28%
Apr cohort Month 1: 22%

Month 1 retention is declining with each cohort — a signal that recent acquisition is bringing in lower-quality users, or a product change is reducing early engagement.

Step-change improvement:

Jan cohort Month 1: 25%
Feb cohort Month 1: 26%
Mar cohort Month 1: 38%  ← Something changed
Apr cohort Month 1: 40%

A sudden improvement in March suggests a product update, onboarding change, or acquisition channel shift that took effect around that time.

Real-World Use Cases

SaaS Product — Feature Adoption Cohort

A SaaS company tracks cohorts by signup month and measures weekly active usage. They notice that cohorts from months when they introduced guided onboarding show 15 percentage points higher Month 2 retention than cohorts from before onboarding. This becomes the business case for investing further in onboarding improvement.

E-commerce — First Purchase Cohort

An e-commerce platform groups customers by their first purchase month and tracks repeat purchase rates. January cohort customers (acquired during holiday sales) show poor long-term retention — they were deal-seekers who never returned at full price. The marketing team shifts acquisition budget away from discount-heavy campaigns toward loyalty-building channels.

Mobile App — Cohort by Acquisition Channel

A mobile app splits its cohort analysis by acquisition channel — organic, paid social, app store search. Organic cohorts show 3x better Month 3 retention than paid social cohorts. The growth team reallocates budget from paid social toward content and SEO that drives organic discovery.

Common Mistakes to Avoid

Using calendar month active users instead of cohort-based counting — Aggregate monthly active users mixes cohorts together and hides the patterns that make cohort analysis valuable. Always track users relative to their acquisition date, not the calendar date
Defining cohort membership too broadly — A cohort should have a clear, single definition — first purchase month, first login month. Mixing different definitions of “first engagement” across users corrupts the analysis
Ignoring the diagonal when comparing cohorts — When comparing Month 1 retention across cohorts, you are comparing activity in different calendar months. External factors (seasonality, market events, platform changes) affect all cohorts simultaneously. Factor this in before attributing changes to cohort-specific factors
Not waiting long enough before concluding — A March cohort analyzed in April only has Month 0 and Month 1 data. Wait at least 3–6 months before drawing conclusions about long-term retention for any cohort
Treating NULL as zero in the retention matrix — NULL means data does not yet exist — the future. Zero means zero users were retained. These are completely different. Never fill NULL cells with 0 in your retention matrix
Analyzing too many cohorts simultaneously — With two years of monthly data you have 24 cohorts. Trying to find patterns across all 24 simultaneously is overwhelming. Focus on recent cohorts for improvement signals and older cohorts for long-term behavior benchmarks

Cohort analysis is one of the most revealing analytical techniques available and now you have both the SQL and Excel implementations to apply it immediately.

Here is a quick recap of the complete process:

Assign each user to a cohort based on their first order or signup month
Calculate how many months have passed between each activity and their first activity
Count distinct active users per cohort per month period
Divide by cohort size to get retention percentages
Pivot to create the retention matrix
Apply conditional formatting to create a heatmap for easy pattern recognition
Read down columns to compare cohorts at the same age, across rows to see individual cohort decay

Start with three to six months of data and your most important engagement event — first purchase, first login, first feature use. Build the matrix, apply the heatmap, and look for the patterns. They are almost always there — and they almost always point directly to where the most impactful product and marketing improvements can be made.

FAQs

What is cohort analysis?

Cohort analysis groups users by a shared characteristic — typically the time period of their first engagement and tracks the behavior of each group over time. It reveals retention patterns and product improvement signals that aggregate metrics completely hide.

What is the difference between cohort analysis and retention analysis?

Retention analysis measures what percentage of users return over time. Cohort analysis is the method used to perform retention analysis accurately — by grouping users into cohorts so you compare users at the same stage of their lifecycle rather than mixing different groups together.

How do I perform cohort analysis in SQL?

Use four steps: find each user’s first order date (MIN with GROUP BY), calculate months since first order (DATEDIFF), count distinct active users per cohort per month (COUNT DISTINCT with GROUP BY), and calculate retention percentages by dividing by cohort size. Pivot using CASE WHEN to create the matrix.

How do I perform cohort analysis in Excel?

Use MINIFS to find first order dates, calculate months since first using DATEDIF or date arithmetic, then use COUNTIFS to count active users per cohort per month period. Create a retention percentage table and apply color scale conditional formatting for the heatmap.

What is a good retention rate for cohort analysis?

It depends entirely on your industry and business model. SaaS products typically aim for Month 1 retention above 30–40%. E-commerce businesses vary widely — 20–30% Month 1 repeat purchase rate is often considered healthy. Always benchmark against your own historical cohorts and industry comparables.

How many months of data do I need for cohort analysis?

You need at least three to six months of data to see meaningful patterns. With only one or two months, you can only compare Month 0 and Month 1 behavior — not enough to see where retention stabilizes or how cohorts compare at the same age.

How to Perform Cohort Analysis in Excel or SQL

What Is Cohort Analysis?

Simple Analogy

Why Aggregate Metrics Hide the Truth

The Three Types of Cohorts

Acquisition Cohorts (Most Common)

Behavioral Cohorts

Size Cohorts

Setting Up the Dataset

Cohort Analysis in SQL — Step by Step

Step 1: Find Each User’s First Order Date (Cohort Assignment)

Step 2: Calculate Months Since First Order

Step 3: Count Active Users Per Cohort and Month

Step 4: Get Cohort Size (Month 0 Users)

Step 5: Pivot Into a Retention Matrix

Cohort Analysis in Excel — Step by Step

Step 1: Prepare Your Data

Step 2: Calculate First Order Date Per User

Step 3: Extract Cohort Month

Step 4: Calculate Order Month and Months Since First

Step 5: Build the Summary Table Using COUNTIFS

Step 6: Calculate Retention Percentages

Step 7: Apply Conditional Formatting — The Heatmap

Step 8: Add a Pivot Table for Flexibility

Reading and Interpreting the Retention Matrix

What to Look For

Patterns and What They Mean

Real-World Use Cases

SaaS Product — Feature Adoption Cohort

E-commerce — First Purchase Cohort

Mobile App — Cohort by Acquisition Channel

Common Mistakes to Avoid

FAQs

What is cohort analysis?

What is the difference between cohort analysis and retention analysis?

How do I perform cohort analysis in SQL?

How do I perform cohort analysis in Excel?

What is a good retention rate for cohort analysis?

How many months of data do I need for cohort analysis?

Leave a Comment Cancel Reply

Copyright © 2026 codewithfimi.com - All Rights Reserved