Lesson #5: Learn to Create Basic Recency, Frequency and Monetary Segments

RFM (Recency, Frequency, Monetary) analysis is a simple, easy to comprehend, highly used marketing model for behavior based customer segmentation. It groups customers based on their transaction history – how recently did they transact, how often did they transact and how much did they purchase.

RFM models helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions and also for future personalization services.

See also

As with all of our lessons if you want to catalog this code in Aginity Pro or Aginity Team click the catalogicon to skip right to the queries to add to your catalog.

Step 1 - The Required Data

For the sake of this example we want to make it simple but let you expand it against any data you have available to you.

We are going to mock up some data using a basic SQL UNION ALL command. We will use two different tables which are very typical in RFM modeling, TRANSACTION_HEADER and TRANSACTION_DETAIL.

Here is the structure of each table.

TRANSACTION_HEADER

Column name Column Description Data Type
customer_id This is a unique identifier of a customer that purchased. Integer
transaction_id A unique identifier of a transaction. Integer
transaction_date The date on which the transaction occurred. Date

TRANSACTION_DETAIL

Column name Column Description Data Type
transaction_id A unique identifier of a transaction. FK to TRANSACTION HEADER. Integer
quantity The quantity of items purchased. Integer
net_amount The total amount of items purchased. Decimal(14,3)

Sample RFM Data Scripts

The following SQL Scripts are portable and used to simulate the RFM model.

TRANSACTION_HEADER

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  -- transaction_header data

  select cast(123456 as integer) as customer_id, cast(11111 as integer) as transaction_id, cast('2019-01-01' as date) as transaction_date
  union all
  select cast(123456 as integer) as customer_id, cast(11112 as integer) as transaction_id, cast('2019-01-04' as date) as transaction_date
  union all
  select cast(123456 as integer) as customer_id, cast(11113 as integer) as transaction_id, cast('2019-01-07' as date) as transaction_date
  union all
  select cast(123456 as integer) as customer_id, cast(11114 as integer) as transaction_id, cast('2019-01-10' as date) as transaction_date
  union all
  select cast(123456 as integer) as customer_id, cast(11115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
  union all
  select cast(123456 as integer) as customer_id, cast(11116 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date
  union all
  select cast(123456 as integer) as customer_id, cast(11117 as integer) as transaction_id, cast('2019-01-20' as date) as transaction_date
  union all
  select cast(123456 as integer) as customer_id, cast(11118 as integer) as transaction_id, cast('2019-01-27' as date) as transaction_date
  union all
  select cast(234567 as integer) as customer_id, cast(21115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
  union all
  select cast(234567 as integer) as customer_id, cast(21116 as integer) as transaction_id, cast('2019-01-15' as date) as transaction_date
  union all
  select cast(234567 as integer) as customer_id, cast(21117 as integer) as transaction_id, cast('2019-01-16' as date) as transaction_date
  union all
  select cast(234567 as integer) as customer_id, cast(21118 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date

TRANSACTION_DETAIL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
      --transaction_detail data

      select cast(11111 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11112 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11112 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.25 as decimal(13,2)) as net_amount
      union all
      select cast(11113 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11112 as integer) as transaction_id, cast(15 as integer) as quantity, cast(1.10 as decimal(13,2)) as net_amount
      union all
      select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(25.34 as decimal(13,2)) as net_amount
      union all
      select cast(11114 as integer) as transaction_id, cast(2 as integer) as quantity, cast(14.32 as decimal(13,2)) as net_amount
      union all
      select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.10 as decimal(13,2)) as net_amount
      union all
      select cast(11115 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11116 as integer) as transaction_id, cast(1 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
      union all
      select cast(11117 as integer) as transaction_id, cast(2 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
      union all
      select cast(11118 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(21115 as integer) as transaction_id, cast(14 as integer) as quantity, cast(4.10 as decimal(13,2)) as net_amount
      union all
      select cast(21116 as integer) as transaction_id, cast(16 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
      union all
      select cast(21117 as integer) as transaction_id, cast(4 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
      union all
      select cast(21118 as integer) as transaction_id, cast(1 as integer) as quantity, cast(43.10 as decimal(13,2)) as net_amount

Step 2 - The RFM Query

We are using a WITH statement to collapse two passes into one required SQL statement. Let’s start with the statement itself that returns the aggregated analytics and the segment each customer falls in.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
    with customer_metrics as
    (
    select
     th.customer_id,
     count(distinct th.transaction_id) as trips_per_period,
     sum(td.quantity * td.net_amount) as total_spend_per_period,
     datediff(DAY,current_date, max(th.transaction_date)) AS days_since_last_transaction
    from  TRANSACTION_HEADER th
    join  TRANSACTION_DETAIL td on th.transaction_id = td.transaction_id
    where th.transaction_date > dateadd(day, cast ($lookback_days as integer)*-1,current_date)
    AND td.quantity > 0             -- returns ignored
    group by th.customer_id),
    rfm as
    (
    select customer_id,
      ntile($buckets) over (order by days_since_last_transaction desc) as r,
      ntile($buckets) over (order by trips_per_period desc) as f,
      ntile($buckets) over (order by total_spend_per_period desc) as m,
      trips_per_period,
      total_spend_per_period,
      days_since_last_transaction
    from customer_metrics
    )
    select customer_id
         , r
         , f
         , m
         , trips_per_period
         , total_spend_per_period
         , days_since_last_transaction
         , ntile($buckets) over (order by 1.0*r+1.0*f+1.0*m)
    from rfm;  -- weights on final calculation - default to 1.0

Breakdown of the SQL

There are two queries embedded in the WITH statement: customer_metrics and rfm.

The customer_metrics query will aggregate:

  1. The trips_per_period by counting distinct transaction id’s (F or Frequency).
  2. The total_spend_per_period by summing net_sales and the quantity sold (M or Monetary).
  3. The days_since_last_transaction by finding the difference between the current date and the last purchase date (R or Recency).

In this query we have two parameters: $lookback_days which tells you how long from current date do you want to segment customers purchases by and $buckets which signifies the number of segments you want the query to return.

The rfm query then uses the windowing function, ntile, which will take the ordered data from the customer_metrics query and segment them into equal size (number of rows per group).

The final query as shown below brings together all the information from the WITH queries and displays it along with a final ntile of the RFM calculation.

1
2
3
4
5
6
7
8
9
    select customer_id
          , r
          , f
          , m
          , trips_per_period
          , total_spend_per_period
          , days_since_last_transaction
          , ntile($buckets) over (order by 1.0*r+1.0*f+1.0*m)
     from rfm;  -- weights on final calculation - default to 1.0

Note

The 1.0 you see in the query above represents equal weight to the R, F and M calculation. In some cases an organization may want to weight each measure differently to perform the final segmentation. For instance, you may choose to apply a weight of 2 to monetary and 1 and 1 to frequency and recency.

Ignore this answer, it does not work: Better use the answer from Louis

For anchor, you may define “short” anchor names like this:

RFM Sample Aginity Catalog Assets

There are six assets you can add to your catalog. I chose to add them as shown below.

../../_images/RFMCatalog.png

These queries are written using ANSII standard SQL so should work across most database platforms. Just select a connection in the Pro/Team Editor and either double click the catalog item and execute or drag and drop the catalog item which will expose the code and run them.

DATA-transaction_header

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    (select cast(123456 as integer) as customer_id, cast(11111 as integer) as transaction_id, cast('2019-01-01' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11112 as integer) as transaction_id, cast('2019-01-04' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11113 as integer) as transaction_id, cast('2019-01-07' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11114 as integer) as transaction_id, cast('2019-01-10' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11116 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11117 as integer) as transaction_id, cast('2019-01-20' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11118 as integer) as transaction_id, cast('2019-01-27' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21116 as integer) as transaction_id, cast('2019-01-15' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21117 as integer) as transaction_id, cast('2019-01-16' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21118 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date)

DATA-transaction_detail

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      (select cast(11111 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11112 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11112 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.25 as decimal(13,2)) as net_amount
      union all
      select cast(11113 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11112 as integer) as transaction_id, cast(15 as integer) as quantity, cast(1.10 as decimal(13,2)) as net_amount
      union all
      select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(25.34 as decimal(13,2)) as net_amount
      union all
      select cast(11114 as integer) as transaction_id, cast(2 as integer) as quantity, cast(14.32 as decimal(13,2)) as net_amount
      union all
      select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.10 as decimal(13,2)) as net_amount
      union all
      select cast(11115 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(11116 as integer) as transaction_id, cast(1 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
      union all
      select cast(11117 as integer) as transaction_id, cast(2 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
      union all
      select cast(11118 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
      union all
      select cast(21115 as integer) as transaction_id, cast(14 as integer) as quantity, cast(4.10 as decimal(13,2)) as net_amount
      union all
      select cast(21116 as integer) as transaction_id, cast(16 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
      union all
      select cast(21117 as integer) as transaction_id, cast(4 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
      union all
      select cast(21118 as integer) as transaction_id, cast(1 as integer) as quantity, cast(43.10 as decimal(13,2)) as net_amount)

FML - Days Since Last Transaction (Recency)

This asset is a reusable formula that calculates the days between execution run time (current date) and the maximum transaction date for each customer.

1
current_date - max(th.transaction_date)

FML - Total Spend in Period (Monetary)

This asset is a reusable formula that calculates the aggregation (sum) of net sales, defined as quantity multiplied by net_amount over the specified time period.

1
sum(td.quantity * td.net_amount)

FML - Trips per Period (Frequency)

This asset is a reusable formula that counts the number of distinct transactions within the specified time period.

1
  count(distinct th.transaction_id)

Sample - RFM Query

This asset uses the formulas above and then calculates the segementation using the windowed analytic function ntile.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
      with customer_metrics as
      (
      select
      th.customer_id,
      @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/FML - Trips per Period (Frequency)} as trips_per_period,
      @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/FML - Total Spend in Period (Monetary)} as total_spend_per_period,
      @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/FML - Days Since Last Transaction (Recency)} AS days_since_last_transaction
      from  @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/DATA-transaction_header}
      join  (SELECT * FROM @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/DATA-transaction_detail}) td
      on th.transaction_id = td.transaction_id
      where td.quantity > 0             -- returns ignored
      --and th.transaction_date > dateadd(day, cast($lookback_days as integer)*-1,current_date)  -- consider the past 365 days - customization opportunity (1)
      group by th.customer_id),
      rfm as
      (
      select customer_id,
      ntile($buckets) over (order by days_since_last_transaction desc) as r,  -- split into 10 bins - customization opportunity (2)
      ntile($buckets) over (order by trips_per_period desc) as f,
      ntile($buckets) over (order by total_spend_per_period desc) as m,
      trips_per_period,
      total_spend_per_period,
      days_since_last_transaction
      from customer_metrics
      )
      select customer_id
         , r
         , f
         , m
         , trips_per_period
         , total_spend_per_period
         , days_since_last_transaction
         , ntile($buckets) over (order by 1.0*r+1.0*f+1.0*m)
      from rfm