In partnership with

Your manager asks: "How is user retention looking?"

You run a quick query. 60% of all users were active this month. You report it. Your manager nods. Everyone moves on.

But that 60% number is hiding something ugly.

What if I told you that users who signed up 6 months ago have an 80% retention rate—but users who signed up last month have a 25% retention rate?

The blended, aggregate number is 60%. But the truth is, the product is actually getting worse for new users. The loyal, older users are just papering over the problem.

This is why aggregate metrics lie. And this is exactly why Cohort Analysis exists.

What is a cohort?

A cohort is simply a group of people who share a common characteristic—usually the date they started using your product.

January signups are one cohort. February signups are another. Cohort analysis means tracking each of these groups separately over time, instead of lumping everyone together into one massive bucket.

It is the difference between:

  • "60% of our total users are active." (Meaningless without context).

  • "Of the users who signed up in January, 72% are still active. Of the users who signed up in March, only 31% are still active." (Now you actually know something is broken).

Building a Retention Cohort from Scratch

Let's say you work at a subscription app. You want to know: For each month's batch of new signups, what percentage is still active 1 month later? 2 months later?

Here is the SQL to build the classic "Cohort Triangle":

WITH user_cohorts AS (
    SELECT 
        user_id, 
        DATE_TRUNC('month', signup_date) AS cohort_month
    FROM users
),
user_activity AS (
    SELECT 
        user_id, 
        DATE_TRUNC('month', activity_date) AS activity_month
    FROM events
    WHERE event_type = 'active_session'
    GROUP BY user_id, DATE_TRUNC('month', activity_date)
)
SELECT
    c.cohort_month,
    COUNT(DISTINCT c.user_id) AS cohort_size,
    COUNT(DISTINCT CASE 
        WHEN a.activity_month = c.cohort_month + INTERVAL '1 month' 
        THEN c.user_id END) AS month_1,
    COUNT(DISTINCT CASE 
        WHEN a.activity_month = c.cohort_month + INTERVAL '2 months' 
        THEN c.user_id END) AS month_2,
    COUNT(DISTINCT CASE 
        WHEN a.activity_month = c.cohort_month + INTERVAL '3 months' 
        THEN c.user_id END) AS month_3
FROM user_cohorts c
LEFT JOIN user_activity a ON c.user_id = a.user_id
GROUP BY c.cohort_month
ORDER BY c.cohort_month;

The output looks like this:

Cohort

Size

Month 1

Month 2

Month 3

Oct 2025

5,000

3,200 (64%)

2,800 (56%)

2,500 (50%)

Nov 2025

6,200

3,700 (60%)

2,900 (47%)

2,200 (35%)

Dec 2025

7,500

3,400 (45%)

2,100 (28%)

Jan 2026

8,000

2,800 (35%)

How to read this (This is crucial):

Read this table column by column, not row by row. That is where the story lives.

Look at the Month 1 column. The October cohort retained 64% of its users. But by January, the Month 1 retention had plummeted to 35%.

The product's ability to keep new users is deteriorating fast. Each newer cohort retains worse than the previous one. If you had just reported "overall retention is 48%," nobody would panic. But this table screams that a recent product update or marketing channel is broken.

The mistake 90% of juniors make

They calculate retention the lazy way.

-- ❌ DON'T DO THIS
SELECT
    COUNT(DISTINCT CASE WHEN last_active_date >= '2026-01-01' THEN user_id END) AS active,
    COUNT(DISTINCT user_id) AS total,
    ROUND(100.0 * active / total, 2) AS retention_rate
FROM users;

This divides current active users by ALL users who have ever existed in the database. It treats someone who signed up 3 years ago exactly the same as someone who signed up yesterday.

That isn't "retention." That is just the "percentage of users who haven't deleted their account."

The right way is always cohort-based: Track specific groups over consistent time periods.

AI is moving too fast to track alone. > Keep up with the latest tools, trends, and news in just 10 minutes a day. Get the morning newsletter that cuts through the AI noise.

Wake up to better business news

Some business news reads like a lullaby.

Morning Brew is the opposite.

A free daily newsletter that breaks down what’s happening in business and culture — clearly, quickly, and with enough personality to keep things interesting.

Each morning brings a sharp, easy-to-read rundown of what matters, why it matters, and what it means to you. Plus, there’s daily brain games everyone’s playing.

Business news, minus the snooze. Read by over 4 million people every morning.

Beyond Time: Behavioral Cohorts

Grouping users by signup month is the baseline. But the real magic happens when you group users by what they did.

  • Users who completed onboarding vs. those who skipped it.

  • Users who made their first purchase within 24 hours vs. those who took a week.

  • Users who came from TikTok Ads vs. Organic Google Search.

SELECT
    CASE WHEN onboarding_completed = TRUE THEN 'Completed Onboarding'
         ELSE 'Skipped Onboarding' END AS user_segment,
    COUNT(DISTINCT user_id) AS total_users,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN active_month_1 = TRUE THEN user_id END) 
        / COUNT(DISTINCT user_id), 1) AS month_1_retention_pct,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN active_month_3 = TRUE THEN user_id END) 
        / COUNT(DISTINCT user_id), 1) AS month_3_retention_pct
FROM user_cohort_enriched
GROUP BY 1;

The Output:

Segment

Total Users

Month 1 Retention

Month 3 Retention

Completed Onboarding

12,000

68%

45%

Skipped Onboarding

8,000

29%

11%

Now you don't just have a chart; you have an actionable business insight. Completing the onboarding flow is a massive predictor of long-term retention. If the Product team can force or incentivize users to finish onboarding (pushing that completion rate from 60% to 80%), overall company revenue will significantly increase.

This is the kind of analysis that gets presented in board meetings. It connects user behavior directly to a business outcome and suggests a clear intervention.

How to visualize it

The classic cohort visualization is a "Heatmap." In Excel, Tableau, or Power BI, simply apply conditional formatting to your pivot table matrix. Green for high retention, Red for low.

The visual makes the patterns jump out immediately. Keep the table itself clean. No 3D effects, no messy gridlines. The data is compelling enough on its own.

One thing to do this week

If you have access to a database (or a public Kaggle dataset with transaction dates), build a simple cohort table.

Post a screenshot of the heatmap on LinkedIn or your portfolio, but include a one-paragraph "Insight" explaining exactly what the table tells you about the business. The practice of writing the SQL is good; the practice of interpreting the triangle is what gets you hired.

Next week: WTF Does a Risk Analyst Actually Do? — Credit scores, fraud flags, and why banks will never stop hiring analysts.

RAJ Real Analyst Jobs

P.S. Added new 25+ fresh remote jobs: realanalystjobs.com
Added new free projects : https://realanalystjobs.com/projects
Document your 2026 Journey: https://realanalystjobs.com/journey
Talk to me : https://realanalystjobs.com/raj

Keep Reading