Getting started: Segments

A user segment is a collection of customers defined by some combination of behavioral conditions. For example, we may be interested in analyzing a segment of users having completed ArticleRead followed by PhotoUpload.

Let’s begin by writing a query for all users completed both events in the past 30 days, along with when they completed it:

set app_id=12345;
set lookback_period_in_days=-30;

SELECT localytics_device_id,
       event_name,
       client_time
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')

This will give:

localytics_device_id event_name client_time
user2 PhotoUpload Sat, 11 Jan 2017 11:30:04 +0000
user1 ArticleRead Sat, 14 Jan 2017 09:30:21 +0000
user2 ArticleRead Sat, 15 Jan 2017 12:32:14 +0000
user1 PhotoUpload Sat, 18 Jan 2017 20:30:12 +0000
 

Note that user1 completes PhotoUpload after ArticleRead, according to client_time. Let’s modify change our query to only retrieve localytics_device_id's:

set app_id=12345;
set lookback_period_in_days=-30;

SELECT localytics_device_id
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')
GROUP BY localytics_device_id

This query only gives us users who completed both events, but we’re only interested in users who did them in a particular order. That is, for each user, we want the timestamp for the latest PhotoUpload to be greater than the first time the user did ArticleRead. We can retrieve these timestamps using MAX and MIN functions:

set app_id=12345;
set lookback_period_in_days=-30;

SELECT localytics_device_id, MAX(
    CASE
      WHEN
        event_name = 'PhotoUpload' THEN client_time
    END
    ) as maximum_photo_upload_time,
   MIN
    (CASE
      WHEN
        event_name = 'ArticleRead' THEN client_time
    END
    ) as minimum_article_read_time
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')
GROUP BY localytics_device_id

This gives:

localytics_device_id minimum_article_read_time maximum_photo_upload_time
user1 Sat, 14 Jan 2017 09:30:21 +0000 Sat, 18 Jan 2017 20:30:12 +0000
user2 Sat, 15 Jan 2017 12:32:14 +0000 Sat, 11 Jan 2017 11:30:04 +0000
 

Finally, we just need to filter the aggregated view only users who completed PhotoUpload after ArticleRead. All we need to is add to the HAVING clause

set app_id=12345;
set lookback_period_in_days=-30;

SELECT localytics_device_id, MAX(
    CASE
      WHEN
        event_name = 'PhotoUpload' THEN client_time
    END
    ) as maximum_photo_upload_time,
   MIN
    (CASE
      WHEN
        event_name = 'ArticleRead' THEN client_time
    END
    ) as minimum_article_read_time
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND (event_name = 'ArticleRead' OR event_name = 'PhotoUpload')
GROUP BY localytics_device_id
HAVING maximum_photo_upload_time > minimum_article_read_time

Adding additional conditions

Adding an additional condition to your segment is easy. In our original example, maybe you only want to consider ArticleRead events wherecountry=us. You can add this condition as follows:

set app_id=12345;
set lookback_period_in_days=-30;

SELECT localytics_device_id, MAX(
    CASE
      WHEN
        event_name = 'PhotoUpload' THEN client_time
    END
    ) as maximum_photo_upload_time,
   MIN
    (CASE
      WHEN
        (event_name = 'ArticleRead' AND country = 'us') THEN client_time
    END
    ) as minimum_article_read_time
FROM fact_events
WHERE app_id = 40957
AND client_date <= Dateadd(day ,-30, current_date())
AND (event_name = 'ArticleRead' OR (event_name = 'PhotoUpload' and country='us'))
GROUP BY localytics_device_id
HAVING maximum_photo_upload_time > minimum_article_read_time

Segment Template

Here is a general template to create user segments for a users who completed Event1 followed by Event2 with some conditions/filters <conditions>:

set app_id=12345;
set lookback_period_in_days=-30;

SELECT localytics_device_id,
FROM fact_events
WHERE app_id = $app_id
AND client_date <= dateadd(day ,$lookback_period_in_days, current_date())
AND ((event_name = 'Event1' AND <conditions>) OR (event_name = 'Event2' AND <conditions>))
GROUP BY localytics_device_id
HAVING (
CASE
  WHEN
    MAX(
    CASE
      WHEN
       (event_name = 'Event2' AND <conditions>) THEN client_time
    END
    ) > MIN(
    CASE
      WHEN
       (event_name = 'Event1' AND <conditions>) THEN client_time
    END
    ) THEN 1
  ELSE 0
END
) = 1 -- Change to "0" if you want to answer "Which users complete Event1 and then never complete Event2?"