Skip to Content

A Direct Marketing In-flight Forecasting System

The Rebel's picture

Shannon Terry & Ben Ogorek
Nationwide Insurance
Contact us: terrys@nationwide.com, ogorekb@nationwide.com

Summary

R's powerful built-in functions and ease of web-based deployment led to its role in the In-flight Forecasting System, a browser-based utility designed to forecast the total incremental benefit of a marketing tactic when only a fraction of the marketing responses have been observed. This article shows how R can be used to estimate this total incremental benefit early, as well as how R can be run in an interactive, business-friendly format over the web to deliver this methodology to the business.

Introduction

One early morning, the Vice President of Customer Analytics, Wes Hunt, called a surprise meeting. He started off with a simple question:

"Can we can get an early read on our direct marketing tactics?"

The simple question turned out to be a deceptively simple one. On one hand, our marketing professionals must make decisions at the speed of business and thus require timely information. On the other hand, it was well understood that the benefits of a direct marketing campaign may materialize over a period of months. Other than waiting for results to arrive, it was not immediately clear how to proceed.

From Wes's question sprung forth an initiative to find a solution, starting with a search for the software that would provide the backbone for our early read methodology. Knowing that publication grade, easy-to-create graphics were essential, R quickly became the leading contender. What we couldn't know was how an obscure statistical function built into base R, isoreg(), would provide so helpful in solving our statistical problem.

This How-To article is separated into three main sections. The first demonstrates how to use R to estimate the incremental effectiveness of a marketing technique when the responses are delayed by weeks or months. The second section shows how this learning may be applied to future marketing campaigns that are in-flight, allowing for real-time estimates of total benefit. Finally, we demonstrate how to deploy these R programs to business users via a user-friendly web interface.

The Marketing Engine: Incremental Effects Through Time

For an arbitrary direct marketing intervention, which we call a "tactic", we seek two learn about

(1) the total number of marketing responses due to the intervention, and
(2) the distribution of these responses over time.

The total incremental lift, (1), quantifies the total effect (benefit) of the marketing tactic. The response time distribution, (2), describes how this effect was manifested over time. Though seemingly of a lower priority, the response time distribution is complementary to the total incremental lift. Clearly, considering the time value of money, marketing tactics that work more quickly are financially desirable. But most important for our application is that (2) names the outstanding benefit at any given time, as a proportion, allowing for an early forecast of the total incremental lift.

For concreteness, consider a binary (0/1) marketing response for a collection of units in discrete time, with weeks w = 1,...,13. Assume that within this collection of units, some are randomly assigned to a "treatment" condition where the marketing intervention is experienced at the start of week w = 1. The rest of the units are "held out" to form a control group and do not experience the marketing intervention during the 13 week period. For all units, a 0 is recorded each week until the first response. For the response week and all subsequent weeks, a 1 is recorded. Many units in both treatment and control groups will not respond during the study; they will have a value of 0 at every week w = 1,...,13.

For the rest of this article, we work with the simulated R data frame simulation.df, created by the code below.

rm(list = ls())
set.seed(4380) 
 
## INPUTS ##
  ##Sample Sizes ##
	n.treatment = 50000
	n.control = 50000
 
	## Status Quo ##
	# These are purely hypothetical rates for illustration purposes
	baseline.rate <- .002 #probability that an individual responds at Time t with no treatment
 
	###Critical Input###
	effectiveness.pct <- .003   # proportion of treatment group that will respond if contacted but not otherwise!
 
	## Assumptions ##
	# Examples of values that could help derive other success criteria for a DM program
	response.value <- 100 #dollar value of a response
	cost.per.unit <- .30
### END INPUTS ###
 
n.incremental <- n.treatment*effectiveness.pct
ROI <- effectiveness.pct*n.treatment*response.value - n.treatment*cost.per.unit
 
##ECHO inputs and implications
cat("treatment sample size:", n.treatment, 
		"\r control sample size:", n.control,
		"\r baseline rate of acquisition", baseline.rate*100, "percent",
		"\r incremental effectiveness:", effectiveness.pct*100, " percent",
		"\r assumed response value:", response.value, " dollars",
		"\r assumed cost.per.unit", cost.per.unit, "dollars",
		"\r implied ROI", ROI, "dollars\n"
	)
 
####  BEGIN SIMULATION #####
sim.data <- data.frame(week = NULL, control.responses = NULL, baseline.responses = NULL)
 
control.remaining <- n.control
baseline.remaining <- n.treatment #Hypothetical world where there is no treatment given to the treatment group.
for(week in 1:13){
	control.responses <- rbinom( 1, control.remaining, baseline.rate)
	control.remaining <- control.remaining - control.responses
	baseline.responses <- rbinom( 1, baseline.remaining, baseline.rate)
	baseline.remaining <- baseline.remaining - baseline.responses
 
	sim.data <- rbind(sim.data, data.frame(week, control.responses, baseline.responses))
}
 
### Universe 2: Everything as it would have been, plus INCREMENTAL ADDITIONS according to a discretized Weibull ###
alpha <- 2.0; lambda <- 3.5 
discretized.Weibull.draws <- table( round(.5 + rweibull(10000, alpha, lambda) ))
simulation.distribution <- (1/10000)*c(discretized.Weibull.draws, rep(0, times = 13 - length(discretized.Weibull.draws)))
plot(simulation.distribution, main = "Simulated distribution of incremental response times", 
     type = "h", xlim = c(1,13), xaxt="n", xlab = "week", ylab = "Probability" )
axis(1, at=c(1:13), labels = c(1:13))
 
if(n.incremental > 0){
	incremental.responses <- as.data.frame( table( round(.5 + rweibull(n.incremental, alpha, lambda) )) )
	names(incremental.responses) <- c("week", "incremental.responses")
 
	simulation.df <- merge(sim.data, incremental.responses, by = "week", all.x = TRUE)
	simulation.df[is.na(simulation.df$incremental.responses) == TRUE, "incremental.responses"] <- 0
} else {
	simulation.df <- cbind(sim.data, incremental.responses = 0)
}
 
simulation.df$treatment.responses  <- simulation.df$baseline.responses + simulation.df$incremental.responses
simulation.df$treatment.sample.size <- n.treatment
simulation.df$control.sample.size <- n.control
simulation.df$baseline.responses <- NULL
simulation.df$incremental.responses <- NULL
 
### OUTPUT DATA SET: simulation.df ###

Running the code above produces the data set seen below (note that the random seed has been set).

> simulation.df
   week control.responses treatment.responses treatment.sample.size control.sample.size
1     1                95                 113                 50000               50000
2     2                99                 131                 50000               50000
3     3                99                 123                 50000               50000
4     4               115                 124                 50000               50000
5     5               105                 132                 50000               50000
6     6               119                 111                 50000               50000
7     7                95                 118                 50000               50000
8     8                95                  99                 50000               50000
9     9                94                 115                 50000               50000
10   10                93                  96                 50000               50000
11   11                95                  91                 50000               50000
12   12                95                 100                 50000               50000
13   13               109                  95                 50000               50000

Based on the simulated distribution of the incremental additions, virtually all effects of the marketing intervention have been realized by week 13. An intuitive estimate of the total incremental lift (1) is to compare the cumulative rate of response by week 13 for the treatment and control groups

treatment.p <- sum(simulation.df$treatment.responses)/n.treatment
control.p <- sum(simulation.df$control.responses)/n.control
lift <- treatment.p - control.p
lift

to achieve an estimate of 0.0028, quite close to the true value of 0.0030. We also estimate the standard error using the Normal approximation (note that the rule of thumb is satisfied since 50,000*.002 = 100).

lift.se <- sqrt( treatment.p*(1-treatment.p)/n.treatment  + control.p*(1-control.p)/n.control )
lift.se
 
# An approximate 95% confidence interval for the lift
cat("Approximate 95% CI for lift: [", lift - 1.96*lift.se, ", ", lift + 1.96*lift.se, "]")

The standard error of 0.0010 leads to an approximate 95% confidence interval of [0.0008, 0.0048]. Note that 0 is not included in the interval.

Though it seems obvious, take a moment to consider why subtracting averages (proportions in the binary case) leads us to an estimate of the total incremental lift. For each response from the treatment group, it is unknown whether it would have occurred had the tactic not been employed. This recognition that there is an alternative universe, one that we can never see and that the truth critically depends on, is referred to as the fundamental problem of causal inference (Holland, 1988 cited by Rubin, 2005). By averaging, we indulge in what Holland (1988) calls the "statistical solution" to the fundamental problem. Averaging works in this case because the assignment mechanism to the treatment or control is statistically unrelated to the outcomes from either of the hypothetical universes.

With an estimate of the total incremental lift in hand (though it required waiting 13 weeks), we now turn to the response time distribution (2) of marketing responses caused by the treatment. We would like to proceed as in standard analysis of time-to-event data but face two complications. The first is that don't know which events are due to the treatment (the fundamental problem). The second is that many responses will never happen.

To bypass the first problem, we may use Holland's statistical solution at each week and appeal to the random treatment assignment. Define G(w) to be the cumulative probability that a unit receiving the treatment responds because of the treatment at or before week w. We get a naive estimate of G(w) by differencing of the cumulative rates of response for the treatment and control groups at weeks w=1,…,13.

T <- cumsum(simulation.df$treatment.responses)/simulation.df$treatment.sample.size
C <- cumsum(simulation.df$control.responses)/simulation.df$control.sample.size
 
delta <- T - C   # naive estimate of G(w)
delta

A few points are worth noting. First, our naive estimate of G(13) is the exact same as our estimate of the total incremental lift (1), which is 0.0028 in our simulated data set. Second, assuming we have waited until all (or virtually all) of the incremental responses have arrived, G(13) is approximately equal to G(+∞), the cumulative sum of all the incremental value that the marketing tactic will ever provide. As G(+∞) describes a moment indefinitely far from any observable measurement period, we label this ideal quantity terminal lift.

Notice that G(w) is like a cumulative distribution function (CDF) in that G(w) = Pr(W<= w), but it is improper since G(+∞) is approximately 0.0028 << 1. Following the approach from Zhao and Zhou (2006) we hypothesize that there exists a proper CDF F(w), i.e., our response time distribution, such that F(w) = G(w) / G(+∞).

At this point, we have an estimate of G(w) and G(+∞), but we are not in a position to apply the formula F(w) = G(w) / G(+∞). The reason is that our naive estimate of G(w), which was just a difference of cumulative proportions, will likely be decreasing in spots (especially after the effects of the marketing tactic have diminished). The estimate would not be a valid CDF as the non-decreasing requirement is violated.

To obtain a non-decreasing estimate of G(w), we turn to an obscure technique known as isotonic regression (Barlow and Brunk, 1972). Showcasing the power and convenience of R, the function isoreg() comes built in to base R and requires very little syntax:

delta.isoreg <- isoreg(delta)
terminal.lift <- tail(delta.isoreg$yf, 1)
cat("Isotonic regression estimate of terminal lift:", terminal.lift)

Visually, the isotonic regression can be seen using the plot() function of the isoreg object:

plot(delta.isoreg, main = "Isotonic regression estimate", xaxt="n", xlab = "week", ylab = "Cumulative response rate")
abline( h = terminal.lift , lty=2, col="purple")
text(x = 4, y = terminal.lift*1.025, paste("Terminal lift =", round(terminal.lift, digits=4)) , col= "blue", cex = 1)
axis(1, at=c(1:13), labels = c(1:13))

The fitted values of the isotonic regression are stored in the element "yf" of an isoreg object (also a list, which here is delta.isoreg). This is a monotonic regression; the last member of delta.isoreg$yf is also the largest. This becomes our final estimate of the terminal lift.

The predicted values from the isotonic regression, truncated at zero on the left, become our non-decreasing estimates of G(w), w = 1,...,13:

G <- ifelse(delta.isoreg$yf < 0, 0, delta.isoreg$yf)

Now we are able to comfortably apply the formula F(w) = G(w)/ G(+∞).

delta.proper.CDF <- G / terminal.lift

At this point, it may seem that little has been gained past our original computation of the total incremental lift. After all, we still had to wait until the effects of the marketing tactic were (practically) exhausted! However, it is when we have learned of a stable F(w), over time and similar marketing tactics, that we can take the next step - the early read forecast.

From Past Learning to Future Prediction

The downside to the methods described in the previous section is that the analyst is required to wait to estimate G(+∞) until after the impact of a campaign is exhausted. In this section we enable “early read” forecasts by considering the case where a marketing response time distribution F(w) has already been estimated and has been determined to be stable over time. We will estimate G(+∞), the ultimate incremental lift of the marketing tactic, starting at week w = 1 and updating the estimate for weeks 2, …,13.

Note that a simple algebraic rearrangement of the formula F(w) = G(w) / G(+∞) gives G(+∞) = G(w) / F(w). That is, the observed lift up to week w is "scaled up" based on the percent of completion. From this point on, we refer to the early read estimate of G(+∞) at week w as G(+∞)(w). In this section, we operate under the assumption that F(w) has been sufficiently discovered from past tactics. Continuing to use our simulation data set, simulation.df, we will assume for the purposes of this section that the true F(w) underlying data generation, simulation.distribution, is known (i.e., has been discovered).

# For demo purposes - the response time distribution is known #
G.inf <- G / cumsum(simulation.distribution)

As in all statistical estimation tasks, we’d like to quantify the uncertainty in G(+∞)(w). Here we describe a working method of obtaining confidence intervals that sidesteps some of the complexity. In principle, by the delta method we could derive the asymptotic variance of G(+∞)(w) as function of the variances of our G(w) and F(w) estimates. In our prototype application, however, we treat F(w) as a known constant; this simplification means that the only random element is G(w), is a difference of two independent proportions. The textbook formulas are easily coded in R.

treatment.variance <- T*(1-T) / simulation.df$treatment.sample.size
control.variance <- C*(1-C) / simulation.df$control.sample.size
 
# For demo purposes - the response time distribution is known #
working.se <- (1/cumsum(simulation.distribution))*sqrt(treatment.variance + control.variance)
 
working.ucl <- G.inf + 1.96*working.se
working.lcl <- G.inf - 1.96*working.se

The R code below plots the forecasts and the working confidence intervals. In practice, response data will arrive over time, and a new forecast and confidence interval will be available each week. The code below has been intentionally stubbed off to simulate only the first 4 weeks of actual responses. The code comments indicate how to see all forecasts that would have been made during the response window.

# Plot the early read forecasts  
#Set up window
plot(1~1, col = "olivedrab", pch = 19, cex=1,
  xaxt="n",
  xlim = c(1, 13),
  ylim = c(min(working.lcl, na.rm = TRUE) - .005, max(working.ucl, na.rm = TRUE)+.005),
  main = "Early Read Forecasts and Confidence Intervals"
  , xlab = "Weeks (from Treatment Start)"
  , ylab = "Response Rate Delta"
)
axis(1, at = seq(1, 13, by = 1))
 
legend("bottomright"
  , c("Response Rate Forecast", "95% Confidence Limits", "No Response")
  , lty = c(NA, 3, 1)
  , lwd = c(1, 1, 1)
  , pch = c(19, 17, NA)
  , col = c("olivedrab", "red", "darkred")
  , title="Legend"
  , cex = .5)
 
points(working.ucl[1] ~ simulation.df$week[1], col = "red", bg = "red", pch=25, cex= 0.5)
points(working.lcl[1] ~ simulation.df$week[1], col = "red", bg = "red", pch=24, cex= 0.5)
 
# Add a basic reference line
abline(h=0, col= "darkred", lwd = 1)
 
# simulate data coming in over time and show new forecasts
for (k in 1:4){      # replace 4 with nrow(simulation.df) to see all weeks
	j <- k - 1
	points(G.inf[k] ~ simulation.df$week[k], col = "olivedrab", pch = 19, cex=1)
	points(working.ucl[k] ~ simulation.df$week[k], col = "red", bg = "red", pch=25, cex=.5)
	points(working.lcl[k] ~ simulation.df$week[k], col = "red", bg = "red", pch=24, cex=.5)
	if (j > 0){
		lines(working.ucl[j:k] ~ simulation.df$week[j:k], col = "red", lty = 3)
		lines(working.lcl[j:k] ~ simulation.df$week[j:k], col = "red", lty = 3)  
 
		x1 <- simulation.df$week[j]
		x2 <- simulation.df$week[j]
		x3 <- simulation.df$week[k]
		x4 <- simulation.df$week[k]
		y1 <- working.lcl[j]
		y2 <- working.ucl[j]
		y3 <- working.ucl[k]
		y4 <- working.lcl[k]
		polygon(x=c(x1,x2,x3,x4), y = c(y1,y2,y3,y4), border = NA, col = rgb(0,0,0,.1))
 
	    # Slow it down so we can see forecasts and confidence changing as time plays out
		Sys.sleep(1.5)
	}
}

After demonstrating this technique, a question arose as to why the confidence bands did not shrink to zero length at week 13. This brought up an important point about the nature of the uncertainty in this situation. If our goal was to estimate the values of the week-13 data set at week w = 1,...,12, then indeed the uncertainty would shrink to zero as w incremented to 13. However, recall that when we first approached the estimation of the total incremental lift, given all 13 weeks of simulation data, we still had to estimate the incremental difference between the treatment and control groups. The fundamental problem and it's associated uncertainty always persists.

Business Deployment

We started building the In-Flight Forecasting System using R on our laptops. The initial idea was to create a dashboard which, much like the media's projection of a winner or loser on election night, made early-read projections about the final results of a particular direct marketing tactic. Since we were analyzing completed campaigns retroactively, we created animations of a tactic's estimated finish starting at the first week and updated weekly.

After demonstrating the dashboard to our peers, our direct mail marketing expert, Diana Surtihadi, suggested a format that her business partners were accustomed to. Agreeing that the tool must be available to our marketing partners when they need it, we set about to create a web-based version of the program. The key requirement was a user interface that allowed for comparison of multiple tactics regarding current performance and future projections.

While R is a great tool for statistical analysis and rapid prototyping, it can also be the basis for an application server. The first step of our business deployment was to migrate the R code running on our laptops to a Linux-based server in our analytic lab. Using CGI processing, such as what is available in David Firth's excellent CGIwithR package, R can be employed as a web scripting language capable of processing form inputs and providing outputs in the form of calculations, tables, and graphs. So, we installed R, a basic web server (e.g. Apache), and the CGIwithR package on our server. Style sheets were created resembling a popular presentation template that we used internally, and supporting HTML functions were built.

The result is a fully functional browser-based version of the code.

Notice that the URL ends with ".R" since we're literally having the web server invoke R scripts via CGI. R has a some very respectable packages that can output HTML tags such as R2HTML and hwriter that can output pages similar to the one shown above; however, we elected to create our own package of functions due to some specific internal requirements. We would encourage you to inspect the available packages of HTML functions before trying to build your own.

CGIwithR, (available here), is highly recommended because of the ease of use. To execute an R script on the web requires fairly straightforward changes to your base R code. For example, the first line in your R script indicates that the script will be processed as a CGI script and executed using R along the specified server path:

#! /usr/lib/R/bin/R

Inside your CGI R script, you must follow the above line with commands that will output the dynamically generated HTML that you wish to present to your users. The output can be done using simple cat() functions in R or via specific package functions such as those found in hwriter or R2HTML.

The following is a highly-simplified version of a custom function that can be used to output HTML and CGI form input options using an R data frame. With some simple CSS, you can output the selection options using alternating row colors as shown here:

WriteHTMLdf.checklistinput<- function(df) {
    # Highly simplified example of how to build web form options using an R data frame
 
    # Check our input
    if (class(df) != "data.frame") {
        stop("What?!? Are you crazy?!?  The passed variable is not a data frame!")
    }
 
    # Get dimensions of the df
    k <- nrow(df)    # height
 
    # Start the list...
    cat('<ul class="somestyle">\n')
 
    # Write the list elements
    for(y in 1:k) {
        tmp.opt <- paste(y, sep="")
        tmp.opttxt <- paste(df[y,2],' -- ', df[y,3])
        if (y %% 2 == 1) {
            cat(paste('<li class="alt"><label for="', tmp.opt, '"><input id="', tmp.opt, '" name="', tmp.opt, '" type="checkbox" />', sep=""), tmp.opttxt, '</label></li>\n')
        } else  {
            cat(paste('<li>            <label for="', tmp.opt, '"><input id="', tmp.opt, '" name="', tmp.opt, '" type="checkbox" />', sep=""), tmp.opttxt, '</label></li>\n')
        }   
    }
 
    # End the list
    cat('</ul>\n')
}

If the above code resides in a file called dm_selector.R, for example, you can submit the user form input to a second R script called dm_processor.R using syntax such as this as part of your dynamically generated HTML page:

cat('    <form action="/cgi-bin/R.cgi/dm_processor.R" method="POST">')

In the WriteHTMLdf.checklistinput example function above, the options displayed to the user have a value name that is equal to their index within the data frame. This makes processing the selections much easier in the form processing R script, dm_processor.R. For example, you could list the selections submitted with the following code:

cat('<p>You selected the following tactics:</p><br>')
 
for (i in 1:length(formData)) {
    cat('<p>', paste(DM.tactics[names(formData)[i],2:length(DM.tactics)])  , '</p>')
}

Note that we're retrieving the correct data frame indices in this example by using the names in the formData list object.

Just remember that you'll need to appropriately guard your web server against erroneous user input and you'll also need to appropriately handle errors that may occur in your R code. Also, if you plan on presenting plots to your users, you may wish to investigate packages such as Cairo that allow higher-quality images to be generated from within your R scripts.

We would encourage readers who are interested in using R on the web to explore this exciting space, review the many contributions within the R community, and to share your own ideas with the community.

Most importantly, this web-based deployment of R allows our business partners to access the power of R via their favorite web browser without having to install any software or submitting any R code directly. In fact, the only way that our business partners would even know R is running is to recognize when the URL ends with ".R" Since R is a business-owned and managed tool at our organization, this general approach has allowed us to rapidly deploy many analytic applets and widgets that serve as prototypes. The best of these mini applications move out of our lab and are rebuilt with the assistance of our IT partners in a more scalable R environment.

Discussion

Using R's statistical and graphical capabilities to generate early reads for direct marketing tactics, our prototype In-flight Forecasting System has the potential to provide substantial value to our business partners. As marketing responses become available, the system is updated automatically. With R's deployment to the web, the system is consistently available to our marketing professionals, providing them the information they need, when they need it. For poorly performing tactics, such timeliness can prevent wasted marketing spend; for successful tactics, it means identifying them sooner and capitalizing on the momentum of effective marketing tactics.

References

Barlow, R.E. and Brunk, H.D. (1972). “The isotonic regression problem and its dual.” Journal of the American Statistical Association. Vol. 67, No. 337, pp. 140-147.

David Firth and with extensions by Duncan Temple Lang (2010). CGIwithR: CGI Programming in R. R package version 0.73-0. http://CRAN.R-project.org/package=CGIwithR

Holland, P.W. (1986), "Statistics and causal inference" (with discussion). Journal of the American Statistical Association. Vol. 81, No. 396, pp. 945-970.

Lecoutre, Eric (2003). The R2HTML Package. R News, Vol 3. N. 3, Vienna, Austria.

Gregoire Pau (2010). hwriter: HTML Writer - Outputs R objects in HTML format. R package version 1.3. http://CRAN.R-project.org/package=hwriter

R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.

Rubin, D.B. (2005). “Causal inference using potential outcomes: design, modeling, decisions.” Journal of the American Statistical Association. Vol. 100, No. 469, pp. 322-331.

Simon Urbanek and Jeffrey Horner (2011). Cairo: R graphics device using cairo graphics library for creating high-quality bitmap (PNG, JPEG, TIFF), vector (PDF, SVG, PostScript) and display (X11 and Win32) output.. R package version 1.5-0. http://CRAN.R-project.org/package=Cairo

Zhao, X. and Zhou, X. (2006). "Proportional hazards models for survival data with long-term survivors." Statistics and Probability Letters. Vol. 76, Issue 15, pp.1685-1693.

Comments

Rronan's picture

Great work.
Do you know how to enable support for non-ASCII characters with CGIwithR?

EDIT: FastRWeb is similar to CGIwithR and supports non-ASCII characters.

I translated this article in french: Un système de prévision à la volée de marketing direct

lowell.it1's picture

Our flagship link building service – Paint It White. Links from manually created beautiful web 2.0 posts on premium blogs. Supported by an array of tier 2 and 3 links for maximum ranking increases
link building service