Motivation: this post walks through how to build a “product qualifying” model to identify which users of your product are most likely to buy. For explanations of why this is a great idea, and you should already be doing it, see this excellent article from Madkudu, or the Product-Led Growth book. At some point I’ll write up a more expository post on my theory-of-the-case and experience using this technique in SaaS for the last 3 years. At boardable.com we saw a 7.5x increase in conversion for our product-qualified leads after building this system out! See Part 2 for a discussion of building insight tools to use the PQL scores across a SaaS business.
Ingredients
Auto-captured page views, clicks, form submits, the activity of users in your product, with a unique identifier for each account.
Heap, Mixpanel, Pendo and others provide the service. For thoughts on which things in your product are the most relevant to track: stay tuned!
Definition of “conversion.” What is the first destination you would like to get customers to? Usually this is paying you in some way (subscribing, buying, etc.)
Definition of “dropped off.” How do you know when a user is not going to convert (pay you)? If you have a time-limited trial you can use the end of the trial (with a sufficient time delay to ensure they do not reactive or convert a few days later). If it is in an endless free product something like “30 days of not logging in” works well for “dropped off.”
Conversion baseline: over the time period you are analyzing, what is your overall conversion rate? Baseline = converted accounts / (dropped off + converted) accounts.
Preparation
Collect all of the auto-captured data for users who converted and those who dropped off. Depending on how much data you have specifying a certain time period is helpful, say the 180 days from 30 days ago and back. Ignore any data from users that converted before this time period, or who are still active but not converted (these are the ones we will “score” at the end).
We want to put together a table where each row is a unique account and each column is a different event (page view, button click, etc.), and for each account we have a count of how many times that account did that event. For now we only need 0 or 1 (didn’t do the action, did the action), but having counts can enrich it later.
Table of actions completed by unique users.
See the end of this post for how to implement this on a database containing data sync’d from Heap. If you have data from another system the SQL should be reasonably modifiable to accomplish the same thing.
Theory
What we are looking for are actions in the product that indicate adoption, that the user is an “activated” user and is likely to upgrade to a paid account (or if we have packaged it well then they will see the value in the paid access). The nuts-and-bolts actions that we can auto-capture roll up to activities that make the product useful, like sending messages, adding friends, or putting items in a cart. Once we have the event tables filled out (see above), we leverage the logical relationships of necessity and sufficiency to form the building blocks for this model.
Necessity: a necessary condition is one that must true in order for another to also be true. When talking about product actions this can be understood as required events. If the only way to click the ‘buy’ button is to sign up for an account then the ‘sign up’ action is necessary for ‘buy.’ Everyone who clicks ‘buy’ will have already completed ‘sign up.’
Sufficient: a sufficient condition implies another, but is not required. At Boardable ‘delete document’ is a sufficient condition for ‘upload document’ (you can’t delete something that is not there), but if someone hasn’t deleted anything they may or may not have already uploaded something.
Necessary action: all converted users did the action.
A fully necessary action is one that all converted users completed. It is a required action, like signing up or entering a payment method, indicating progress towards conversion but with little predictive value. In set terms: all converted users are a subset of all users who completed the action.
Rather than simply look at “how many people do the action and convert?”, we instead look at the conversion population in the field of users that either did the action at least once, or never did it (see next image, the bounds of the paper represent all users in our specific time period). This catches people who maybe never saw the feature in the product and were unaware it was a capability.
How the converted circle sits across the did-the-action/did-not-do-the-action field gives us a picture of necessity. To the degree that the conversion bubble sits in the “action” side it is necessary, and if the bubble sits in the “no action” side it is completely unnecessary.
Set of users that converted are split into ones that did the action and the ones that did not.
Necessity
To calculate necessity we look at how many accounts converted without doing the action. If no one converts without doing the action we can assert that it is a necessary step on the path to converting (such as selecting a payment method).
How many users ...
Converted | Dropped Off | |
---|---|---|
did action | Ac | Ad |
did not | Nc | Nd |
Baseline = (Ac + Nc) / (Ac + Ad + Nc + Nd)
Nec = Nc / (Ac + Ad + Nc + Nd)
N = Nec / Baseline = Nc / (Ac + Nc)
At the limits, N = 0 when no one has converted and skipped this action (making it fully necessary), and N = 1 when everyone who has skipped this action converted (making it unnecessary).
Sufficiency
To obtain sufficiency we look at everyone who did the action and ask “how many accounts that did the action converted?” This can be plotted as a ratio of “Did Action” samples in the field of converted vs. dropped off users. If this ratio is higher than the baseline overall conversion rate then we have a valuable action!
All users who did the action split into those that went on to convert vs those that dropped off.
To the degree that the action bubble sits in the converted side it is sufficient and we could say “doing action X implies conversion,” but not “everyone who converted did action X.” Once we know how many of the “did the action” users converted vs failed we have a useful calculation of “sufficiency”:
S = Ac / (Ac + Nc) is incorrect, stated as “out of all the users that converted, how many did this action?” This is a Necessity calculation, again.
In this formulation, S = 0 (Ac ~ 0) only when no one does the action and converts (aka it’s a conversion killer action, the “cancel now” button), and S = 1 (Nc ~ 0) when it is a necessary action (everyone that converts does the action), which we already capture above.
S = Ac / (Ac + Ad) where Ac+Ad > 0
At the limits, S = 0 when everyone who does the action drops off (Ac approaches 0), and S = 1 when everyone who does the action converts (Ad approaches 0).
We can now define 3 categories of actions:
Necessary but not Sufficient (N ~ 0, S ~ 0): required actions, indicating average conversion if a user does the action, and very low chance if they don’t do it (account creation, selecting payment, etc.)
Necessary and Sufficient (0 < N < 1, 0 < S < 1): actions that indicate engagement. Low chance of conversion if the user hasn’t done it, and a high chance if they do.
Sufficient but not Necessary (N ~ 1, 0 < S < 1): more users that do these actions end up converting than not, but they are not required. These are the most interesting.
Application
I have had great success splitting actions into these 3 categories (along the lines of the 'Activation, Engagement, Acceleration’ framework described by Madkudu), and then creating a “score” for each active account. I worked out a formula that combines both the N & S calculations to create a single estimate of the Product-Qualifying value for each action, and then sum these values to create an account’s Product-Qualified Lead score.
Finally, all active accounts are broken down into Low, Medium, and High PQL groups in order to deliver targeted messaging and automation routing. Continue on to Part 2 of this post series for more on delivering value across teams using PQL scores.
Code
The following breaks down the calculations in SQL sub-queries (this is Redshift SQL, a variant of PostgreSQL). The data it operates on consists of:
.“Users” table with a unique entry for each individual user, containing
“user_id”: a unique identifier
“trial_ends_at”: a custom property that indicates the date the user’s free trial ended. We only want events that happened during each user’s free trial, not things they do after converting.
“plan”: a custom property that indicates the user converted to a paid account. Change this to whatever you have that indicates conversion.
An event table, “sign_up_complete” in this example. The event table has 2 data points we need:
“user_id” that matches the unique identifiers on the users table
“time” when the event took place
WITH user_table AS (
SELECT
"users"."user_id",
trial_ends_at,
-- if user has a plan they converted, switch to a boolean flag
CASE WHEN plan IS NULL THEN
0
ELSE
1
END AS "converted"
FROM
"users"
WHERE
-- use 180 days of data, but start 30 days back to ignore most recent data
trial_ends_at >= GETDATE () - INTERVAL '210 DAY'
AND trial_ends_at < GETDATE () - INTERVAL '30 DAY'),
event_table AS (
-- Join users with the event we're analyzing. Replace 'sign_up_complete' with your event table
SELECT
user_table.*,
sign_up_complete.time
FROM
-- Get all users, their trial end date, and an indicator of conversion
user_table
-- LEFT JOIN so we keep all users who did not do the event
LEFT JOIN sign_up_complete ON sign_up_complete.user_id = user_table.user_id),
data_table AS (
-- Count how many times each user did the action
SELECT
user_id,
trial_ends_at,
converted,
-- "time" is when the event happened. If there is more than one "time" (the user repeated the action) we count it as 1.
CASE WHEN count("time") > 0 THEN
1
ELSE
0
END AS "action_count"
FROM
event_table
WHERE
-- Only get events that each user completed during the trial period, or users that did not do the event
"time" IS NULL
OR "time" < trial_ends_at
GROUP BY
user_id,
trial_ends_at,
converted),
metric_table AS (
-- calculate our basic metrics:
SELECT
COUNT(DISTINCT (user_id)) "accounts",
SUM(converted) "count_converted",
SUM(converted * action_count) "converted_action",
SUM(action_count) "count_action"
FROM
data_table),
matrix_table AS (
-- calculate the converted/failed/did action/didn't do action matrix
SELECT
1.0 * count_converted / accounts AS "baseline",
converted_action,
count_converted - converted_action AS "converted_no_action",
count_action - converted_action AS "failed_action"
-- the 4th component we don't use:
-- (accounts - count_converted) - (count_action - converted_action) AS "failed_no_action"
FROM
metric_table
)
-- Finally, we calculate our necessary and sufficient components!
SELECT
baseline,
1.0 * converted_action / (converted_action + failed_action) "sufficient",
1.0 * converted_no_action / (converted_no_action + converted_action) "neccessary"
FROM
matrix_table;
Dr. Ben Smith is a Data Scientist and thinker, fascinated by the appearance of computers in our daily lives, creativity, and human struggles. He has had the privilege to think, learn, and write at the University of Illinois, the National Center for Supercomputing Applications, the Cleveland Institute of Art, Case Western Reserve U., IUPUI, and at Boardable: Board Management Software, Inc.
If you have feedback or questions please use Contact me to get in touch. I welcome thoughtful responses and constructive critique.