Getting started: Segments
A user segment is a collection of customers defined by some combination of behavioral conditions. For example, we may be interested in analyzing a segment of users having completed ArticleRead
followed by PhotoUpload
.
Let’s begin by writing a query for all users completed both events in the past 30 days, along with when they completed it:
set app_id=12345;
set lookback_period_in_days=-30;
SELECT localytics_device_id,
event_name,
client_time
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')
This will give:
localytics_device_id | event_name | client_time |
---|---|---|
user2 | PhotoUpload | Sat, 11 Jan 2017 11:30:04 +0000 |
user1 | ArticleRead | Sat, 14 Jan 2017 09:30:21 +0000 |
user2 | ArticleRead | Sat, 15 Jan 2017 12:32:14 +0000 |
user1 | PhotoUpload | Sat, 18 Jan 2017 20:30:12 +0000 |
… | … |
Note that user1
completes PhotoUpload
after ArticleRead
, according to client_time
. Let’s modify change our query to only retrieve localytics_device_id's
:
set app_id=12345;
set lookback_period_in_days=-30;
SELECT localytics_device_id
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')
GROUP BY localytics_device_id
This query only gives us users who completed both events, but we’re only interested in users who did them in a particular order. That is, for each user, we want the timestamp for the latest PhotoUpload
to be greater than the first time the user did ArticleRead
. We can retrieve these timestamps using MAX
and MIN
functions:
set app_id=12345;
set lookback_period_in_days=-30;
SELECT localytics_device_id, MAX(
CASE
WHEN
event_name = 'PhotoUpload' THEN client_time
END
) as maximum_photo_upload_time,
MIN
(CASE
WHEN
event_name = 'ArticleRead' THEN client_time
END
) as minimum_article_read_time
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')
GROUP BY localytics_device_id
This gives:
localytics_device_id | minimum_article_read_time | maximum_photo_upload_time |
---|---|---|
user1 | Sat, 14 Jan 2017 09:30:21 +0000 | Sat, 18 Jan 2017 20:30:12 +0000 |
user2 | Sat, 15 Jan 2017 12:32:14 +0000 | Sat, 11 Jan 2017 11:30:04 +0000 |
… | … |
Finally, we just need to filter the aggregated view only users who completed PhotoUpload
after ArticleRead
. All we need to is add to the HAVING
clause
set app_id=12345;
set lookback_period_in_days=-30;
SELECT localytics_device_id, MAX(
CASE
WHEN
event_name = 'PhotoUpload' THEN client_time
END
) as maximum_photo_upload_time,
MIN
(CASE
WHEN
event_name = 'ArticleRead' THEN client_time
END
) as minimum_article_read_time
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')
GROUP BY localytics_device_id
HAVING maximum_photo_upload_time > minimum_article_read_time
Adding additional conditions
Adding an additional condition to your segment is easy. In our original example, maybe you only want to consider ArticleRead
events wherecountry=us
. You can add this condition as follows:
set app_id=12345;
set lookback_period_in_days=-30;
SELECT localytics_device_id, MAX(
CASE
WHEN
event_name = 'PhotoUpload' THEN client_time
END
) as maximum_photo_upload_time,
MIN
(CASE
WHEN
(event_name = 'ArticleRead' AND country = 'us') THEN client_time
END
) as minimum_article_read_time
FROM fact_events
WHERE app_id = 40957
AND client_date <= Dateadd(day ,-30, current_date())
AND (event_name = 'ArticleRead' OR (event_name = 'PhotoUpload' and country='us'))
GROUP BY localytics_device_id
HAVING maximum_photo_upload_time > minimum_article_read_time
Segment Template
Here is a general template to create user segments for a users who completed Event1
followed by Event2
with some conditions/filters <conditions>
:
set app_id=12345;
set lookback_period_in_days=-30;
SELECT localytics_device_id,
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND ((event_name = 'Event1' AND <conditions>) OR (event_name = 'Event2' AND <conditions>))
GROUP BY localytics_device_id
HAVING (
CASE
WHEN
MAX(
CASE
WHEN
(event_name = 'Event2' AND <conditions>) THEN client_time
END
) > MIN(
CASE
WHEN
(event_name = 'Event1' AND <conditions>) THEN client_time
END
) THEN 1
ELSE 0
END
) = 1 -- Change to "0" if you want to answer "Which users complete Event1 and then never complete Event2?"