Data models for Subscription based products!
There's joy and agony for analytics in this very messy business model.
Subscription is a common business model for online products. It is used across domains (from SaaS tools to Coffee stores and from Netflix to nytimes!) because businesses love it.
Subscription is a great way to get long-term paying customers for your business. For these businesses, revenue comes from -
converting new users to paying members
retaining the paying members for as long as possible!
That's all the business context we need for the subscription model. If you want to know more about the business aspect, I have linked two short youtube videos at the end of this post.
If you want to dive deep into subscription-related analysis and reporting, the Data Analysis Journal has some great resources. But you need a subscription to access the best stuff!
Challenges of working with subscription data
Subscription is fascinating from a data standpoint. Let’s consider a monthly subscription for now.
When a subscription starts, a user becomes a ‘member’ for a month and we have to calculate and maintain this flag in our tables against the user. So if I need the number of members today, I have to query the database for the past 30 days (for a monthly subscription) !
Also, the definitions of metrics like ‘conversion’ look very simple till you start writing them in SQL and deal with edge cases like delayed payment, plan upgrades, and a free trial extension to prevent cancellation.
Things spiral out of control even faster if your company launches a new plan with a different membership period - as startups often do while experimenting with subscriptions. Your queries, written for a monthly subscription, start tagging the annual subscribers as ‘cancelled’ after just a month because you forgot to update all your queries! True story.
When I first worked on a subscription product, I wrote queries on raw data because the product had a weekly plan and it’s okay to query 7 days of data. Soon, a monthly plan was introduced I updated the queries to cover 30 days of data. This slowed down and almost crashed our entire dashboard!
So we quickly built some aggregated tables (or data models!) and the queries started running smoothly. This served as my introduction to data modelling and related concepts. And it was such a great entry point - In subscription, we get one data point - a row of data from the payment service for a transaction - we get this for a free trial (where the amount it 0) and each renewal after the free trial.
Based on this, we have to maintain many different models - depending on the complexity of the subscription plans.
Raw Data
The subscription data consists of one row per transaction as explained above. It has the following details:
unique_transaction_id
first_transaction_id (so we can link this id to the first transaction of a subscription plan)
user_id
plan_id
plan_type
amount (and currency)
timestamp
membership renewal date
Plan_id refers to the subscription plan. The payment provider does not have much information about it - but the product team does. So we store the plan details manually. A 'plan_details' table has the following columns:
plan_id
has_free_trial (TRUE is the plan has a free trial, FALSE otherwise)
free period
has_discounted_period
discounted period
discounted_amount
renewal period
amount
That’s it. That’s all the raw data we need for a ton of reporting and analysis.
Before we get into data models..
In subscription products, every other metric is seen in connection with subscription. This is similar to casual and hypercasual games where the goal is to increase engagement and retention.
Let's take the example of Spotify to keep the discussion simple. Let's assume that Spotify has just one plan - a monthly plan that starts with a 1-month free trial (this is the most common plan in subscription). Spotify wants subscribers to keep paying from one month to the next. So, the product managers at Spotify would be interested in -
increasing the % of new users that take a trial
increasing the % of trials to conversions (conversion means a payment)
decreasing the % of existing members who cancel
increasing the % of members who return after cancallation
Since PMs are obsessed with subscriptions, the data team has to look at subscriptions from all possible angles! The efficient way to work with subscription data is through data models.
Data models for subscription
The most important table is - user_status.
In user_status, we track the status of every member as of now. We classify users as:
Free: these are users who have not started the free trial
Free trial: these are users who are currently in the free trial duration
Expired trial: these are users who started the free trial but cancelled before first payment
Expired member: these were members before, but are now using the free version of the product
Member: these are paying members in a active subscription
Companies are obsessed with converting their users to members and so every query joins with the user_status table.
The next key thing is to keep these members as members. All of us use many subscriptions and cancel/pause them all the time. To get this data easily, we can maintain a renewal_table. Here we track the latest transaction of a user and the current user status (from the user_status table). It should have -
user_id
transaction_id
date of previous payment
current payment status
user status
With this table, we can quickly calculate:
Free trial to membership conversion
1st to 2nd renewal (and so on till the 10th renewal)
member churn (churn means a member has stopped paying.)
For a startup, the above tables would have 1000s of 10000s of rows. As this grows, it makes sense to create many more tables with filtered data.
second level layers
Often we want to query the behaviour of existing users. To do this, we can create a table active_members with the below details-
user_id
membership type (trial, paid)
membership started on
last payment date
member till
total revenue received
total renewals
plan id
plan period
We are also interested in expired members. So we can track them with a separate table -
user_id
plan expiration date
plan id
expiration reason (if any)
lifetime revenue
We can create more lower-level tables from the above as the use cases increase -
lifetime table (lifetime revenue of each user, membership start date, number of renewals till date, as well as current status) - for all previous and current payers
expired trial : users who are currently active on product but who never become payers after a free trial
Free trial: users who are on free trial
work smarter with historical data
With subscription tables, we get the current state of membership and payments. With these tables, we can get the current number of subscribers with a short query. But what if you want the number of subscibers one week ago or an year ago?
It can be really tricky to get these details over time without data models because of the time-period-based nature of subscriptions. To get the active members for a past date, you need to query a lot of historical data ( as much as your longest plan), get the latest transaction of a user, and check if the plan period is active on that date or not. Messy.
The easy to way to achieve this is by storing the data models as daily snapshots. So you can quickly refer back to the table from a month ago! As Chandler says- “Can it be any simpler?”
Resources
To appreciate the craze of subscription, check out these short videos below:
Why is everything a subscription
Benefits of a subscription model for creators
This was just an introduction (and that too barely) to subscription data modelling. I cannot cover too much of it in text (it’s pointless to talk data in text after a point), but if you are working with subscription data and need help, book a call.