4

Estimating Price Response

Chapter 3 introduces the concept of a price-response function that specifies how demand for a product offered to a segment of customers through a particular channel varies as a function of price, and Chapter 5 shows that prices can be optimized, assuming that we have appropriate price-response functions in hand. It is clear that a seller would love to have a price-response function for each cell in the pricing and revenue optimization (PRO) cube, but it is not clear where these functions would come from. This chapter describes a number of approaches for estimating price response. We focus primarily on data-driven approaches in which future price response is estimated on the basis of an analysis of historical data. A major benefit of data-driven approaches is that they are objective: if future customer behavior is similar to past behavior, then data-driven estimation should provide a good predictor of how demand will respond to price in the future. In addition, data-driven approaches facilitate ongoing evaluation and updating. New observations of demand can be used to evaluate the predictive power of the current model and to update its parameters to reflect changes in market conditions or customer behavior.

The primary obstacle to using data-driven approaches is the availability of data. The ideal data set would contain a large number of observations specifying how past customers responded to a wide range of prices. If a seller does not have access to such a data set—if the number of historical observations is too small or does not include a sufficiently wide range of prices—then an alternative approach such as conjoint analysis or customer surveys needs to be used. Such data-free approaches do not require extensive historical data, but they can be limited in their ability to accurately represent price response across a large number of pricing segments.

In this chapter, we start by considering the type of data required to support a data-driven approach and some different ways that a seller can obtain the necessary data, such as randomized tests and A/B tests. The chapter illustrates regression-based approaches to estimating price-response functions with an extended example. It shows how data on potential demand can be used to improve price-response estimation. We discuss the issues of collinearity and endogeneity in the data and how they can bias the estimation of price sensitivity. Finally, we discuss the situation in which historical data are not available. In this case, alternative approaches such as surveys or conjoint analyses may be useful.

4.1 DATA SOURCES FOR PRICE-RESPONSE ESTIMATION

Any data-driven approach to estimation requires a data set that includes information about historic sales. At a minimum, the data set needs to include prices and total sales by time period. In some cases, this may be the only data available. However, in many cases there may be more data about such relevant factors as competitive prices or promotions. A particularly powerful piece of data that is often available in online retailing is a measure of potential demand—the number of customers who took an action (such as clicking on a link) to observe the price of a product. As discussed later, when potential demand information is available, using it in an estimation of the price-response function can significantly improve accuracy. Other data that may be available include day-of-week, seasonal, and competitive prices, along with other variables. To the extent that these improve the accuracy of estimation, they should be included in the price-response model.

Two conditions are necessary for a data set of historical outcomes to support price-response estimation. First, there must be a sufficient number of observations for the resulting estimates to be statistically significant. In the case of a single price-response function based on price alone, this means at least 40 or more observations. If additional variables are incorporated in the price-response function, then more observations will be required. Second, there must be enough variation among the prices in the data to provide a basis for estimation. If the same price was always offered for the same product, then it is extremely difficult (if not impossible) to estimate a price-response function. If both conditions are not met by the data currently available, then the seller needs to either collect more data or utilize a data-free approach.

There are several ways in which a seller might obtain data to support price-sensitivity estimation. The ideal is to perform randomized price tests. If this is not possible, A/B tests are a good alternative. In cases in which neither randomized tests nor A/B tests are feasible, a seller may need to rely on so-called natural experiments, in which different prices are offered to similar groups for reasons other than price testing. In addition, changes in price at either side of a pricing-segment boundary can be used as part of a regression discontinuity design. In the remainder of this section, we consider each of these approaches to obtaining a data set to support price-response estimation.

The methods described in this section apply to list-pricing situations in which a take-it-or-leave it price is displayed for a particular product for each customer arriving through a particular channel. This list-pricing modality is the standard practice for most retail markets (both online and offline) as well as consumer products such as travel, hotel, rental cars, and so on. In contrast, in many business-to-business settings and in consumer credit, customers approach a seller that offers the desired product, and an individualized price can be quoted to each customer based on knowledge of the customer’s characteristics and desired product. Chapter 13 covers the issue of estimating price response and optimizing prices in such cases.

4.1.1 Randomized Price Testing

The gold standard for statistical inference is randomized testing. In a randomized test, members of the population are assigned either to a control group or to a test group (sometimes called the treatment group). There may be one or more test groups. The test groups receive different treatments while the control group does not receive any treatment. In a price test, the control group is quoted business-as-usual prices and the treatment groups are quoted alternative prices. The idea behind randomized testing is, as far as possible, to eliminate any systematic differences among the test groups and the control group that could influence the outcome. If this is done successfully, it is reasonable to infer that differences in outcome observed among the groups can be attributed solely to the differences in treatment.1 In price testing, the inference is that any systematic differences in demand rate among groups are attributable to the differences in prices.

Ideally, a randomized test should meet four conditions:

• Assignment to the treatment and control groups must be imposedconsumers cannot choose the group that they are assigned to. If individuals are allowed to select their group, then selection bias might influence the results: an individual’s choice of a particular treatment could be correlated with her response.

• The test and control groups must be balanced with respect to observable characteristics that might influence outcome. For this reason, so-called randomized tests are often not truly randomized: individuals are often systematically assigned to the treatment and control groups in a way that ensures that the groups are balanced with respect to observable characteristics. For an online retailer, this might mean assigning customers to test and control groups so that aspects such as total purchases in the last year are balanced. For a brick-and-mortar retailer, this might mean assigning stores to test and control groups so that similar characteristics are balanced.

• There must be a sufficient number of observations in each group for the differences in outcome to be statistically significant.

• The treatment (e.g., different prices) should be the only difference in the way the control group and the test group are treated—for example, the groups should not be exposed to different marketing campaigns (unless that is part of the treatment being tested).

In a price test with a single test group, customers are assigned either to the test group or to the control group according to the principles listed above. Customers assigned to the control group are quoted the control price pc, and customers assigned to the treatment group are quoted a test price ptpc. After a sufficient number of observations, demands from the test group and the control group, which we label t and c, respectively, are observed. (Recall that the caret above a variable indicates that it is an observation from the data.) Refering to Equation 3.7, the arc elasticity between pt and pc can be estimated as [(ct)(pc + pt)] / [(c + d̂t)(ptpc)]. Ideally, price testing would use multiple test groups, each with a different price. These prices and corresponding demands can be considered points on a price-response function, and the estimation approaches that we consider later in this chapter can be used to estimate the corresponding parameters.

The internet is generally well suited for randomized testing. In an online price test, each successful applicant can randomly be quoted a price chosen from a number of predetermined alternatives. The test ends when a sufficient number of responses have been obtained at each price. Furthermore, online retailers usually (but not always) have access to information on potential demand, which can be used to derive more accurate estimates of price response, as we see in Section 4.2.2.

4.1.2 A/B Testing

In an A/B test, customers with some characteristic are chosen to receive a control price and customers with a different characteristic are chosen to receive a test price. The difference from randomized testing is that selection of customers into the treatment and control groups is not random—although the idea is to choose the test and control groups so that assignment is as close to random as possible. In a well-constructed A/B test, the characteristic used to separate customers into test and control groups is not correlated with the variable of interest—in this case, price response.

Brick-and-mortar retailers perform A/B price tests by offering test prices through a set of test stores. The remaining stores offer pricing as usual. In evaluating the outcome, demand at the test stores should not necessarily be compared to the overall take-up at all other stores. Rather, each test store should be paired with a control store such that members of each pair are as similar as possible to one another in terms of characteristics such as demographics, urban/rural, and competitive environment. This pairing should be done before running the test to avoid selection bias. The goal is to eliminate all other possible factors in order to isolate the influence of price on take-up.

The problem with an A/B test is that, by definition, it is not randomized—changes in competition or the underlying economic environment can influence outcomes at the control store but not at the test stores, or vice versa. These potential influences are the source of many of the objections that can be raised after the fact to the results of A/B tests, such as “The test stores performed better because those regions saw stronger economic growth than the control stores.”

In the absence of true randomization, objections to A/B tests can be hard to answer. One approach to controlling for external influences in A/B tests is difference-in-differences (DiD). For a test-store and control-store pair, DiD compares the change in response in test stores before and after the price test to the change in response in control stores. The idea is that the magnitude of the price effect is best measured not by the difference in demand between the test store and the control store during the test period but by the difference between the changes in demand before and during the test period.

Example 4.1

A retailer wants to test the price sensitivity of a popular television. The price of the television has been $550 at all its stores during the month of March. To run a test, he chooses two stores—Store A and Store B—that are similar in terms of customer demographics. For the month of April, he raises the price of the television set to $600 in Store A (the test store) and holds the price at $550 in Store B (the control store). The corresponding sales in each store are shown in Table 4.1.

The difference-in-differences approach relies on the parallel-trends assumption: in the absence of the treatment, it is assumed that the test group and the control group would have evolved in parallel. For Example 4.1, this is equivalent to the assumption that sales at Store A would have grown at the same rate as those of Store B if the price had been the same for both stores in April. If the parallel-trends assumption does not hold, then the resulting analysis of test results will be invalid.

One difficulty is that we cannot verify directly whether the parallel-trends assumption has held for an A/B test because we cannot observe what sales would have been for the control set under the treatment. However, it is useful to verify that the parallel-trends assumption held for the test and control sets before the test. If so, then it is at least reasonable to assume that it held during the test. If the parallel-trends assumption did not hold before the test, then it is not likely that it held during the test. The idea is illustrated in Figure 4.1. Each of the two graphs in this figure shows sales for a test store and a control store for several weeks prior to and during the test. In the left-hand graph, sales in the test and control move largely in parallel before the start of the test. In this case, it is plausible to claim that the parallel-trends assumption holds and that the difference in sales during the test is due primarily to the treatment. In the right-hand graph, sales in the test and the control groups were moving in different directions prior to the start of the test. It is thus much less likely that the difference in sales during the test is due primarily to the treatment; it would appear that factors other than just price are driving the difference in sales between the two stores.

A criticism of A/B tests is that they may be subject to the so-called Hawthorn effect, which occurs when the fact that the staff of a store know that it has been chosen for a test changes their behavior in a way that influences the outcome. If local staff know that their store is being used to test special pricing, they may increase the effort they put into selling the product. This is particularly true if they believe that the results will be scrutinized more closely than usual. This change in behavior may in itself change sales, thereby confounding the results of the test. For this reason, a store’s staff should be unaware that it is a test store in an A/B test. If this is not possible, then both the test and the control stores should be monitored to ensure that they are providing consistent treatment to customers.

TABLE 4.1

Television demand for Store A and Store B in Example 4.1

Figure 4.1 Examples in which demand appears to meet the parallel-trends assumption (left graph) and violate the parallel trends assumption (right graph) in the pre-test period.

Proper A/B tests can supply important information about price response, but they also have drawbacks. First, for any product, estimating the price-sensitivity function requires measuring response at more than just two points. Furthermore, sellers want to estimate price sensitivity not just for a single product but for all cells in the PRO cube. This may mean estimating price-sensitivity functions for hundreds or thousands of price segments. In many cases, information technology limitations or corporate policy may limit the ability of a seller to run a sufficient number of controlled price tests to cover all pricing segments.

4.1.3 Switchback Tests

Switchback tests are often used in cases when a seller does not want to offer different prices to customers for the same product at the same time—a practice that many consumers feel is unfair (see Chapter 14). A switchback test is a special form of an A/B test in which the control price is charged to all customers through a certain store or a channel for a period of time, then the price is switched to a test price for a period of time, then it is changed back to the control price (or to another test price) for some period of time, and so on. For example, an online retailer might charge the control price for an hour, then switch to a test price for an hour, and then switch back to the control price for a month. In a switchback test, the customers who arrive during the periods when the control price is charged serve as the control group, and the customers who arrive during the periods when a test price is charged serve as the test group. A retailer who switched back and forth hourly between test and control prices for a week would have 84 hours of test observations and 84 hours of control observations at the end of the week. A brick-and-mortar retailer might switch prices daily or weekly at one of his stores to observe the response.

Switchback tests need to be designed so that time is not a confounding factor—online customers shopping in the morning may have very different price responses than customers shopping in the evening, and customers who do their shopping during the weekend may be very different from those shopping during a weekday. For this reason, the switchbacks need to be designed so that control and test prices span different days and times.

4.1.4 Natural Experiments

The term natural experiment refers to a situation in which different treatments—in our case, different prices—were offered to different groups of customers for reasons other than running a test or optimizing prices. If the differential treatments were imposed randomly, then a natural experiment can supply useful information about price response. An example of a natural experiment would be a software defect or data entry error that caused some customers to be quoted erroneous prices while others were quoted standard prices. If the erroneous prices were quoted to a random set of customers, then the difference in take-up between customers quoted the random prices and those quoted standard prices can be used to estimate price sensitivity. Another example of a natural experiment might be imposition of a local regulation that causes prices to be different in one jurisdiction from a neighboring jurisdiction. In this case, differences in demand on either side of the jurisdictional boundary might be used to estimate price sensitivity.

By definition, natural experiments cannot be planned, and they rarely provide ideal input for analysis. However, it is worthwhile for sellers to be aware of the possibility and to consider any situation in which different prices were offered to similar populations as an opportunity to better understand price response.

4.1.5 Regression Discontinuity Design

A seller who has not varied prices much over time may be able to use a regression discontinuity design (RDD) to estimate elasticities at the boundaries of his pricing segments. In most situations, we would expect that price sensitivity varies continuously across a segment boundary, while price jumps from one side of the boundary to the other. We would also expect that customers close to the boundary on either side should share similar price sensitivities, and we can use the price jump across the boundary as a sort of natural experiment to estimate price sensitivity.

Regression discontinuity design was used to estimate the sensitivity of customers to the price of an Uber ride. The researcher used data from the first 24 weeks in 2015. At that time, Uber was applying a surge multiplier to prices based on the balance between supply and demand—if demand exceeded the available supply of drivers, a multiplier greater than one would be applied to the standard fare, with the multiplier increasing the more that demand exceeded supply. (More information on how ride-sharing companies calculate dynamic prices can be found in Section 7.6.4.) At the time, Uber calculated the surge price to a high number of decimal points but only displayed the surge price at discrete increments to facilitate the user experience. Thus, Uber might calculate that the ideal surge would be 1.6214, but it would charge the customer 1.6 times the standard fare. In all cases, Uber would round to the nearest surge increment. It is this rounding that provides the price discontinuity that can be exploited—a 1.249 surge would be rounded down to 1.2 while a 1.251 surge would be rounded up to 1.3. It is reasonable to assume that conditions (and the characteristics of customers) when the calculated surge is 1.249 should be almost the same as those when the calculated surge is 1.251, so the fact that one group got charged 1.2 and the other 1.3 is very close to a randomized experiment. By applying this methodology to a sample of 50 million UberX customers in Uber’s four largest cities, the company was able to estimate price elasticities, which fell largely between 0.4 and 0.6. (For more details on this analysis, see Cohen et al. 2016.)

4.2 PRICE-RESPONSE ESTIMATION USING HISTORICAL DATA

Assume we have a data set that includes at least historical demand and prices. It may also contain additional information relevant to demand such as competitive prices and promotional activity. In its most general form, the problem of estimating price response can be specified as finding a functional form and parameters for a model that best fits the data. We consider models of the form

where

fijk is a price-response function for product i sold to customer segment j through channel k,

dijkt is demand for product i sold to customer segment j through channel k during time period t,

pt is the vector of prices for all products sold through all channels to all customer segments,

xt is a vector of independent variables (or features) that influence demand,

θ is a vector of parameters for the price-response function, and

• εijkt is an error term, typically assumed to be drawn from a normal distribution with zero mean.

The specification in Equation 4.1 is general in the sense that it allows demand in any particular cell of the PRO cube to be a function of the prices in all other cells. This allows for the possibility of substitute and complementary products. Equation 4.1 also includes a vector of independent variables xt = (x1t, x2t, . . . , xKt). Elements of the vector xt can include any information we have available at the time we are setting price that could influence demand and/or price sensitivity. Examples include day-of-week, holidays, seasonality, prices offered by competition, and promotional activity. When we start the estimation process, we have historic values of dijkt, p t, and xt for some set of time periods t = 1, 2, . . . , T. The goal of estimation is to find the values of the parameters θ = (θ1, θ1, . . . , θL) such that the function fijk (pt, xt; θ) best fits or predicts2 the values of dijkt for t = 1, 2, . . . , T. Regression is the standard approach to this type of problem.

To illustrate the process of price-sensitivity estimation, we draw on an extended example. We assume a seller has been conducting switchback price tests in which he has set a different price each day for nine weeks and observed the resulting demand. In order to compare different approaches on an apples-to-apples basis, the seller has designated the data from the first six weeks (42 days) of the test as the training set and the data of the last three weeks (21 days) of the test as the test set. The parameters of different models will be estimated using the training set (a process known as training) and the performance of the models compared on the test set. The purpose of maintaining separate test and training sets is to prevent overfitting, which we discuss in more detail in Section 4.3.2.

The data for the 42 days in the training set are shown in Table 4.2 and the data for the test set in Table 4.3. We assume for simplicity that the seller maintained adequate inventory on hand so that he did not run out of stock on any day during the training period or the test period, which means that the observed sales are equal to demand on each day.

TABLE 4.2

Price and demand data from the example training set

TABLE 4.3

Price and demand data from the example test set

Tables 4.2 and 4.3 include two additional columns. Weekend is equal to 1 if the day is Saturday or Sunday and is equal to 0 otherwise.3 Potential is a measure of potential demand. For the moment, we ignore these two variables and apply regression to a model where only demand and price are available. In Section 4.2.2, we extend the approach to include additional variables, in particular potential demand.

4.2.1 Regression

Let us start with the simplest interesting model using the data in Table 4.2; that is, let us fit a model of the general form d(p) = f(p), where f(p) is a price-response function. In this case, the three simplest potential models that could be used are those described in Section 3.1.4 and listed in the first three rows in Table 3.3—namely,

• Linear: d(p) = D(1 – bp)+

• Exponential: d(p) = Aebp

• Constant elasticity: d(p) = Apε

The goal of the estimation process is to determine the values of the parameters of each of these functions that best fit the historic data—that is, the values of A and b for the linear and exponential functions and the values of A and b for the constant-elasticity function such that the corresponding predictions of demand based on actual price are as close as possible to those shown in Table 4.2. This is a regression problem. There are different forms of regression, depending on the nature of the data and the functional form being fit. Of these, linear regression is the simplest and most familiar. Linear regression is a standard function in any statistical software including SAS, R, and the Analysis ToolPak in Excel. (The discussion that follows assumes familiarity with linear regression and, for the homework problems, access to software with linear regression capabilities.)

Linear regression can be applied directly to find the parameters of the linear price-response function. While the exponential and constant-elasticity price-response functions are both nonlinear, they can be transformed into linear relationships by taking the logarithm of each side. The exponential price-response function then becomes

and the constant-elasticity price-response function becomes

where ln[x] indicates the natural logarithm of the value x. Equations 4.2 and 4.3 are both linear, which means that we can apply linear regression to these transformed versions to estimate the parameters for each function. For the linear price-response function, the x variable is price and the y variable is demand. For the exponential price-response function, the x variable is price and the y variable is the logarithm of demand. For the constant-elasticity price-response function, the x variable is the logarithm of price and the y variable is the logarithm of demand.

Applying linear regression to each of the price-response functions using the data from the training set in Table 4.2 gives us the parameter values shown in Table 4.4, which shows the estimated parameter values and the associated r2 values estimated for each of the three models based on the training data set.4 Note that the product whose price we are testing appears to be highly price-elastic, with an estimated elasticity of 12.8. This is observable from the raw data as demand is generally significantly lower on days when the price is high. Second, based on the r2 values reported in the table, the three methods appear to have about the same quality of fit to the historic data. However, this can be misleading: The r2 values computed for the exponential and constant-elasticity price-response functions are based on transformed data. In particular, they are based on how ln[d(p)] fits to the logarithm of the demand observations rather than to how d(p) fits to actual demand. To perform an apples-to-apples comparison, we need to convert the predictions of ln[d(p)] generated by the exponential and constant-elasticity functions to d(p) and perform the comparison on that basis. Following that, we can estimate the quality of model fit based on the demand values of the test set in Table 4.3.

TABLE 4.4

Results from three different price-response models trained on the data in Table 4.2

NOTE: A “hat” over a variable in this context indicates that it is a statistically based estimate of the underlying parameter. Thus, Âis a statistically based estimate of the real parameter A.

TABLE 4.5

Actual demand and predicted demand for linear, exponential, and constant-elasticity price-response models applied to the test data set in Table 4.3

We convert the ln(p) and ln(d) values to prices and demands using the identity that x = eln(x). Table 4.5 shows the actual demands for each day during the test period along with the demands predicted by each of the three models.

At this point, an obvious question would be which of these three models is better. To answer that question, we need measures of how well the predictions of each model match the observed data. Three metrics are commonly used to measure model fit.

Root-mean-square error (RMSE). RMSE is a measure of the average difference between a forecast value and an observed value. It is defined as

Here, pt is the price for period t, and t is the corresponding demand (the “hat” in t indicates that it is an observation); d(pt) is the model prediction of demand based on pt, and T is the total number of observations.

Mean Absolute Percentage Error (MAPE). MAPE measures the average absolute percentage difference between forecast values and observed values. It is defined as

MAPE is an average of the percentage errors in the forecasts relative to the observations. A forecast that is 10% high contributes the same amount to MAPE as one that is 10% low.

Weighted MAPE. MAPE has an obvious shortcoming—a high percentage difference between forecast and observation is worse when the magnitude of the item being forecast is large compared to when it is small. If we forecast demand of 1 item for a day and actual demand turns out to be 3, the absolute percentage error is 200%. This is the same percentage error as if we forecast demand of 200 items and demand turns out to be 600. Both of these are treated the same in computing MAPE—however, the second missed forecast seems much worse. If we are dealing with observed values that have a wide range in magnitude, we should be able to do better on a percentage basis in forecasting the larger values and recognize that a 200% miss in forecasting a value of 200 is worse than the same percentage miss in forecasting a value of 1. One way to incorporate this effect is by weighting each entry in MAPE by the magnitude of the observation—in our case, demand. This gives us weighted MAPE, which is calculated by

Each of the three measures of fit calculated for each model based on the test set data are shown in Table 4.6. The exponential and constant-elasticity models have virtually identical performance on all three measures. The linear model is slightly better than the other two in terms of RMSE and weighted MAPE but slightly worse in terms of MAPE. All three price-response functions are shown graphically in Figure 4.2. In the region of the test—that is, for prices between $6.45 and $7.35—the exponential and constant-elasticity functions are essentially indistinguishable, which explains why their performance is so similar.

The situation in Table 4.6 is quite common—test results often give no clear-cut winner among competing price-response functions, and some functions give almost identical results. One possibility in this case is to pick a winner among the three and set the corresponding optimal price. The rightmost column in Table 4.6 shows the optimal price calculated from each of the three price-response models, assuming that the incremental cost per unit is $6.00. Since the linear model outperformed the other two on RMSE and weighted MAPE, the seller could simply set a price of $6.76 and move forward. A second option is to do more testing. One outcome of the test is that the optimal price under any of the models falls between $6.50 and $6.80. The seller could run another few weeks of tests, considering only prices within that range to refine the parameters of the model.

TABLE 4.6

Performance of each price-response model on the test set data in Table 4.3

Figure 4.2 Linear, exponential, and constant-elasticity price-response functions with parameters shown in Table 4.4.

Another, perhaps more appealing, approach is to incorporate additional variables into the estimation. Table 4.2 includes information on whether a test date was a weekend or weekday and on demand potential that was not included in the model estimation. Both of these variables, along with others that might be available, have the potential to improve model fit (Exercise 1). However, for purposes of price-sensitivity estimation, incorporation of potential demand data is particularly powerful.

4.2.2 Incorporating Potential Demand

In brick-and-mortar retail, sellers can usually only observe realized demand—the number of products that they sold. They do not have a good idea of the number of people who might have actively considered purchasing a product but did not—either because the price was too high or for some other reason. In contrast, online retailers can usually see the number of people who visited a product page and were quoted a price. For example, Amazon provides sellers in its marketplace with statistics on glance views, defined as customer visits to a product detail page. Amazon also provides marketplace sellers with their glance view conversion rates for each product, defined as the number of orders divided by the number of glance views. Information such as glance views, which we call potential demand, can be extremely valuable in enabling more accurate price-response estimates.

It is likely that a brick-and-mortar seller would not have access to information on the potential demand for every product in his store. In this case, the seller can only estimate price response using data on product sales along with additional features such as day of week and seasonality. On the other hand, an online retailer is likely to have access to information on potential demand—for example, that shown in the rightmost column of Table 4.2. When such information is available, we can use it to decompose the price-response function into potential demand (which we assume is independent of price) and a conversion rate, which incorporates customer response to price. In this case, we can write the price-response function in Equation 4.1 as

where Dijkt is potential demand during time period t and ρijk is a price-response function that estimates the fraction of potential demand that converts into sales as a function of price and the other explanatory variables. For the case of this single product during a single time period, we can drop the subscripts and write the price-response function with potential demand information as d(p) = Dρ(p) + ε, which is very similar to Equation 3.1, recognizing that the fraction of potential demand that will result in a purchase at a price is equal to the fraction of customers who have a willingness to pay greater than the price plus an error term. When we can observe potential demand D, then we can consider the problem as being one of estimating the parameters of the conversion rate function ρ(p) based on empirical observations of the conversion rate t = t/t and the prices pt for values in the training set t = 1, 2, . . . , T.

Under the assumption that each customer purchases at most one item, we will have 0 ≤ t ≤ 1 for all t in the training set. (We should remove all observations for which t = 0 from the training set.) Two of the price-response functions introduced in Chapter 3 and listed in Table 3.3 lend themselves particularly well to this setting: the logit and the probit. As discussed in Chapter 3, the logit corresponds to a logistic distribution of customer willingness to pay and the probit to a normal distribution of customer willingness to pay. While the normal might seem to be the more natural choice, the two functions are quite similar and they predict similar levels of price response when presented with the same data. The logit, however, is much easier to work with and is much more commonly used, and we use it for the remainder of this section.

The logit conversion-rate function has the form

We want to determine the values of the parameters a and b so that the resulting function fits as closely as possible to the observed values of 1, 2, . . . , T. After applying some algebra, we can write Equation 4.5 as

Taking the logarithm of both sides, we obtain

The quantity on the right side of Equation 4.6 is called the log conversion odds. It is the logarithm of the fraction of quotes that converted at a given price divided by the fraction that did not convert. To estimate the parameters of the conversion-rate function, we regress the observed log conversion odds against the price by applying linear regression to Equation 4.6:

This gives estimated values of the a and b parameters of â = 36.67 and b̂ = 5.25. When we apply the logit function using these parameters to the test data set in Table 4.3, we find that the resulting RMSE = 4.40, MAPE = .19, and weighted MAPE = .08. These values are substantially lower (i.e., better) than the fit statistics for the linear, exponential, and constant-elasticity price-response functions shown in Table 4.6, demonstrating that incorporating potential demand information in price-response estimation can significantly improve fit of the price-response function; this result is generally independent of the functional form used (Exercise 2). In fact, sellers that have access to potential demand data have an advantage because the data allow them to decompose demand influences that are not price related and the irrelevant data, or noise, in potential demand from actual customer price response as embodied in the conversion function.5

4.3 THE ESTIMATION PROCESS

In Section 4.2, we went through the process of fitting three different model forms (linear, exponential, constant-elasticity) to a training data set and then measuring the quality of the model predictions on a test data set. While this is a toy problem, at a high level the steps are similar to those that would be performed in a real exercise. In particular, a data-driven estimation of price response typically involves the following steps:

1. Obtain historical data on prices and demands and other explanatory variables via one of the approaches described in Section 4.1. Sufficient historic data must be gathered to support estimation, and the data typically require some preparation to be put in the format required for estimation.

2. Divide the data into a training set and a test set.

3. Estimate different models using different variables with the training set, and measure their performance on the test set. Choose the best among the candidate models.

Each of these steps is discussed in the following sections.

4.3.1 Data Preparation

Alternatives for obtaining price and demand data to estimate price response are described in Section 4.1. A couple of comments need to be added. First, while the problem of obtaining data is straightforward to describe, experience has shown that obtaining a sufficiently clean data set to perform useful analysis is often the most painful and time-consuming part of the entire process. Often, the raw data are quite messy and unsuited for analysis. As a simple example, in many cases returns are entered as negative sales on the day they are received. Not only can this result in negative sales for a day, but also it is not clear that this is how returns should be incorporated in price-response estimation in the first place. Most organizations seeking to estimate price response for the first time find that cleansing the data takes far more time and resources than initially planned.

Another common issue is the treatment of stockouts and sellouts. A retailer who sells out of a brand of bottles of aspirin by Wednesday does not know what total demand would have been had the aspirin remained in stock. Similarly, an airline that sells all of the seats for a particular flight a week before departure does not know what the total demand for that flight would have been if additional seats were available. This is a problem of censored data—for a given price, we do not observe the corresponding demand; rather, we observe sales, which is the minimum of demand and capacity.

There are several approaches to estimating demand in the face of censored data. Perhaps the simplest is a scaling approach. While a seller may not be able to observe potential demand once he has stocked out, it is likely that he can observe historical sales by period as well the period in which the stockout occurred. The scaling approach simply scales the observed sales by one over the fraction of the demand that normally occurred by the stockout time. For example, an airline might sell out a flight one week before departure. Looking at its history of flights that did not sell out, the airline could calculate that bookings up to one week prior to departure typically represented 3/4 of sales for the flight, with the remaining 1/4 booking in the last week. In this case, it could estimate the demand for the flight as 4/3 times sales.

Example 4.2

A retailer stocks 30 bottles of a popular multivitamin every Friday night. His average weekly sales are 24.8 bottles. When he does not stock out, his average sales for each day are as follows: Saturday, 6.2 bottles; Sunday, 4.8 bottles; Monday, 1.6 bottles; Tuesday, 2.2 bottles; Wednesday, 3.4 bottles; Thursday, 3.4 bottles; and Friday, 3.2 bottles. He runs a low-price test and sells his entire stock of 30 bottles by Wednesday evening. Using the scaling model, he estimates that demand for the entire week would have been (24.8/18.2) × 30 = 40.9 bottles.

4.3.2 Choosing a Training Set and a Test Set

The reason for partitioning the historical data into a training set and a test set is to avoid the problem of overfitting, which occurs when some of the variables that have been included in the model actually reduce the predictive power of the model. If we start with a very simple model, we will find that adding more variables will always increase the performance of the model as measured on the training set. That is, if we add any external variable to the regressions we performed in the previous section, the performance of every single one of the functional forms would improve—that is, RMSE would decrease. However, at some point, additional variables are being fit only to the noise in the training set data. The situation is illustrated in Figure 4.3: as more and more variables are added to a model, the model error on the training set—whether measured by RMSE, MAPE, or another metric—will always decrease. However, at some point adding more variables begins to reduce the predictive power of the model; at this point the model is said to be overfit. If a model is overfit, then it should have a worse fit on the test set than an alternative model, even though it has a better fit on the training set. For this reason, we use the training set to train (i.e., fit) models and the test set to measure their performance.

Figure 4.3 Typical behavior of model fit error as measured on a test set and on the training set as the number of variables in the model is increased.

Opinions differ on how much of the data set should be held out as a test set. Common practice is 33%, but some sources advocate 50%. Perhaps more important than the size is the requirement that the test set and the training set be balanced to the extent possible—that is, the profile of the customers purchasing and the external influences on them should be matched as closely as possible between the two sets.

4.3.3 Model Selection

Once data have been acquired and test and training data sets established, the final phase in estimation is model selection. This includes the selection of the functional form for the model (e.g., linear versus logit), as well as estimation of the parameters for the chosen functional form.

Typically, a functional form is chosen prior to estimation. This means that the primary task is to choose the variables to be included in the model. For example, given the data in Table 4.2, should we include Weekend as an explanatory variable in the model? When relatively few exogenous variables are available, we can test including all of them and exclude those that do not improve the model fit. We can create new explanatory variables by transforming exogenous variables (e.g., taking the logarithm) and by crossing variables (e.g., creating a new variable—for example, Price times Weekend). The goal of model selection is to determine a set of explanatory variables and corresponding parameters that results in the best prediction of observed demand.

To initiate the modeling process, we choose a simple initial model, apply logistic regression to estimate the coefficients of the model on the training set, and then measure the fit of this model on the test data set. This model is called the champion. Once we have established an initial champion, we can determine a final model using a champion-challenger process, which proceeds in five steps:

1. Choose a set of variables and variable transformations that has not been tried before.

2. Run a regression using the variables, variable combinations, and variable transformations on the training set. The resulting model is called the challenger.

3. Measure the performance of the challenger on the test data. If the challenger performs better than the champion, then the challenger becomes the new champion. If the champion outperforms the challenger, then the champion remains the champion.

4. Go to step 1 and continue until performance is no longer improving as new challengers are evaluated.

5. Once a final champion has been chosen, reestimate the coefficients of the model using all of the available data (training data and test data).

This process is repeated each time a new challenger model is proposed. This means that, at any time, the current champion has demonstrated the best performance of all of the considered models as measured using the test data. It is important to update the test data periodically with recent observations to ensure that model performance is measured against data that reflects the most current market conditions.

The approach of starting from a simple model and adding variables and cross-variables in a way that sequentially improves the model (as measured on the test data set) is known as forward stepwise regression. You may wonder, why not take the opposite approach: start by including all the variables and all their transformations and crosses, and then sequentially eliminate those that are not significant—all those with high p-values (indicating low statistical significance), for example. This approach is called backward stepwise regression. It is a bad idea for three reasons. First, if the number of combinations and variables to be tested is greater than the number of observations in the training set, most regression packages will not converge. This limits the ability to consider variables and combinations. The second reason is that using all possible variables and combinations of variables will lead to a model that is overfit—that is, it looks great in fitting the training set but does a terrible job on the test set. In this case, the p-values may not give a good signal regarding which variable to remove. Finally, it could be the case that multiple variables are correlated and that no single one of them has a strong influence on demand, so all of them would be excluded. However, if the others had been excluded, one of the remaining variables might have a significant influence and should be included. This is an issue of collinearity, which is described in the next section.

4.4 CHALLEGES IN ESTIMATION

The previous section describes a forward stepwise regression process for estimating the coefficients of a price-response model. While the process of choosing explanatory variables can be complex, the overall idea should be clear: sequentially add new variables and combinations of variables to find the model that best fits the historic data as measured by performance on the test set. Unfortunately, finding a model that simply fits the data well is not sufficient for purposes of pricing optimization. To determine an optimal price, we need to have more than a price-response function that predicts demand; we need a price-response function that accurately represents the causal link between price and demand. This is a stricter requirement than predictive accuracy and it requires testing, and potentially modifying our model, for two possible conditions: collinearity, in which one or more of the exogenous variables influences price, and endogeneity, in which some variable not in the model influences both price and demand. If either of these situations exist, then we will need to modify our model if we want to accurately represent the dependence of demand on price. We consider each in turn.

4.4.1 Collinearity

Collinearity occurs when one or more of the other explanatory variables in the model influences both demand and price. As a simple example, the process of forward stepwise regression might reveal that the previous period’s price is significant in predicting this period’s demand.

Example 4.3

A seller estimates a linear model based on historical data using both price and unit cost for an item as explanatory variables. He obtains a model that fits historical data well:

d(p, c) = (100 – 1.3p – .24c)+,

where c is unit cost. Furthermore, both coefficients have p-values < .01. However, the seller runs a regression of price on cost using the same set of historical data and finds that the equation p = 1.21c + .02 has an extremely good fit. This indicates that (not surprisingly) price is highly collinear with cost and that cost should probably be dropped from price-response function.

In Example 4.3, the source of collinearity is clear: cost influences demand through its influence on price; that is, higher costs lead to higher prices, which leads to lower demand. It is unlikely (although not impossible) that cost has a direct influence on demand; it is more likely that the significance of cost in the estimation was the result of cost “reaching through” price to influence demand. In this case, dropping cost as an explanatory variable would almost certainly result in a model that better represented the causal influence of price on demand.

The test for collinearity is straightforward—a linear regression is run using price as the target variable and the exogenous variables as explanatory variables. That is, we test a model of the form p = a1x1 + a2x2 + . . . + aixi + ε, where p is price, xi is an exogenous variable, and ai is the corresponding coefficient. We run a linear regression using the training set observations. If any of the exogenous variables comes up as significant in the regression, then it is possible that that variable is collinear with price and should be eliminated from the estimation of price response. (More sophisticated approaches to dealing with collinearity are discussed in econometrics texts such as Woolridge 2015.)

Note that collinearity is an issue only when price is correlated with other variables in the model—collinearity among nonprice explanatory variables alone is not a problem. Also, collinearity is only a problem when the source of historical data is not the result of a randomized experiment or A/B test. When prices are assigned randomly, the link between price and other explanatory variables is broken and—assuming that the randomization has been effective—collinearity with price will not be an issue.

4.4.2 Endogeneity

Collinearity is an issue for price-response estimation when price is correlated with other explanatory variables. Endogeneity can be an issue when anticipated demand influences price or when both demand and price are influenced by exogenous variables not included in the model—so-called omitted variables. Airlines provide an excellent example of the phenomenon. Figure 4.4 shows the average fare and the corresponding total demand for 45 past departures of a flight with a 100-seat aircraft assigned. A naïve reading of this data would be that demand for this flight does not obey the law of demand—higher prices seem to result in higher demand. The reality is, of course, that the pattern in Figure 4.4 results from the fact that airline revenue managers are good at anticipating demand and adjusting the price to reflect imbalances between supply and demand (we discuss in detail how they do this in Chapters 8 and 9). As a result, it is higher demand—or more precisely the anticipation of higher demand on the part of revenue managers—that is driving higher prices rather than vice versa.

A (very) naïve approach to the data shown in Figure 4.4 would be to assume that the demand for the flight followed a linear price-response function d(pt) =D – bpt + ε, where ε is a mean-zero error term. Applying this model to the data in Figure 4.4 and performing linear regression would give the nonsensical result of an upward-sloping demand curve. A more realistic model might include variables that represent the actions of opening and closing booking classes—call the vector of these variables x. The problem is not simply that x is omitted from the model; it is that x is omitted and it is correlated both with price (because opening and closing booking classes changes the average price on the flight) and with the demand on the flight (because the revenue managers anticipate demand in making their decisions). When one or more variables have been omitted and those variables influence both price and demand, then endogeneity is likely to confound the estimation of price sensitivity.

Figure 4.4 Historic average fares and corresponding demands for a flight with 100 seats.

Detecting endogeneity and deriving methods to develop estimation procedures that account for (or clear) the endogeneity have been among the most active areas of econometric research of the last few years. The lesson for price-response estimation is that all price-response regressions should be tested for endogeneity. The Durbin-Wu-Hausman test is the most popular test and is available in most statistical packages, such as R and Stata. If significant endogeneity is present in the estimation, then you need to consider carefully how to proceed. One approach is to find and include all variables that have been omitted from the model and are correlated with both price and demand. However, in many cases historic values of these variables might not be available. Another approach is to find an instrumental variable that is correlated with price but uncorrelated with the error term. Chapter 13 discusses this approach to bid-response estimation in the context of customized pricing.

Finally, the ultimate cure for endogeneity, as for collinearity, is price testing. Data from a properly randomized price will be free from both collinearity and endogeneity.

4.5 UPDATING THE ESTIMATES

The coefficients of a price-response function cannot simply be estimated once and then held constant for all time. Rather, the coefficients need to be updated over time in response to changing market and competitive conditions. Most sellers periodically update the coefficients of their price-response functions at intervals ranging from one month to six months. One consideration is the number of new observations that have been received—there is no gain in updating when there have not been sufficient observations to lead to a statistically significant difference. However, it is also important to reestimate coefficients whenever a substantial change in market or competitive conditions has occurred. A large rise or fall in incremental cost or competitive prices could mean that the price of an item moves to a range in which customer response is very different. It is important to recognize that the estimation techniques described in this chapter are valid only near the range of prices in the historical data. For example, the estimates of the parameters for the models shown in Table 4.4 are valid only for the range of prices in the training set—that is, $6.45 to $7.35. If prices significantly outside the test region need to be considered, then the model may not be valid. In this case, the seller should monitor demand carefully and reestimate price-sensitivity coefficients as soon as practical.

In addition to making periodic updates, a prudent seller will continually compare actual demand to the predictions of his model. This approach is consistent with the closed-loop pricing process shown in Figure 2.5. If the price-response model predicts that take-up in a particular pricing segment at the offered price should be 85% but actual take-up is only 65%, this is a signal that the price-sensitivity coefficients for that segment should be updated. However, it can be the case that price sensitivity within a segment may change in a way that is not reflected in changes in take-up rate. To the extent that it is feasible, the most prudent approach for a seller is to perform periodic price tests to ensure that current estimates of elasticity are still valid and, if they are not, to reestimate the coefficients of the price-response function.

4.6 DATA-FREE APPROACHES TO ESTIMATION

The approaches to price-sensitivity estimation discussed so far rely on the existence of a database containing a significant amount of price and demand history. In some cases, these data may not be available either because the product is too new to have generated a significant number of observations or because there is insufficient price variation in the data to support estimation. In this case, it may be impossible to use data-driven approaches to estimate price sensitivity, and alternative, data-free approaches must be used. The most popular of these approaches include the following:

Surveys and focus groups: Both surveys and focus groups can be used to elicit information about the importance of price relative to other aspects of a product. Both rely on extrapolating response from a relatively small group of customers to the entire population.

Conjoint analysis: Conjoint analysis is a systematic method for eliciting customer trade-offs among price, brand, and other features. Each participant in a conjoint study is given a series of choices between two different possible offerings with different features and prices. For example, a particular choice might be “Would you prefer a 20-year fixed mortgage of $100,000 at an APR of 5.5% or a 25-year $100,000 mortgage with an initial fixed APR of 3.5% for a year that converts to a variable-rate mortgage at prime plus 2% after the first year?” For each choice presented, participants specify either which alternative they prefer or that they are indifferent between the two alternatives. To estimate price sensitivity, price must be included along with other attributes such as term, brand, and channel. The idea behind conjoint analysis is that, by structuring the alternatives and the sequence of choices appropriately, the preference structure of each participant can be determined with a relatively small number of questions. If a study includes a sufficient number of participants with backgrounds representative of the population as a whole, the results can be used to construct a price-response function for the entire population. Conjoint analysis has been widely used in designing and pricing new products across a wide range of industries, and there is an extensive literature on how to conduct effective conjoint analyses.

Data-free approaches to price-sensitivity estimation have one important strength: they can be used to elicit information about combinations of features and price that the seller has not previously offered. This is relevant for new products and may be important if market conditions have changed so dramatically that past customer behavior is not a reliable predictor of future behavior.

However, data-free approaches have their drawbacks. Surveys are subject to response bias: recipients who return a survey may not be representative of the population as a whole. For conjoint analyses, it may be difficult to obtain results from groups of participants who are both representative of the population as a whole but large enough to provide statistically reliable results. In addition, participants in surveys and focus groups often give the answer that reflects the choice they think they should make rather than the choice they actually would make in the real world. In many cases, they overestimate their willingness to pay for additional features.

Finally, surveys and conjoint analyses typically present choices in a structured manner (“How important is customer service to you in choosing an online seller?” or “Would you prefer product x to product y?”) that does not necessarily reflect the way purchase decisions arise and are addressed by customers in their daily lives, in which purchase options are not generally evaluated in a pairwise manner. For this reason, focus groups, surveys, and conjoint analyses can give sellers an idea of the general level at which prices should be set, but optimization of the prices across a number of pricing segments typically requires a data-driven approach.

4.7 SUMMARY

• The gold standard for estimating price response is the randomized price test. The goal of randomization is to isolate the effect of price on take-up by choosing test and control populations that are balanced in terms of all other variables that might influence take-up. Ensuring balance may require assigning participants to groups in a fashion that is not truly random.

• Many sellers find price testing to be difficult or impossible. As a result, the data used to estimate price response are usually derived either from (nonrandomized) A/B price tests or from natural dispersion in historical pricing, or both. This dispersion might result from so-called natural experiments (accidents), from pricing discontinuities, or from field pricing discretion.

• To fit a price-response function to a data set including prices and demands, the data is divided into a training set and a test set. For a particular set of candidate variables, regression is performed using the training set. The fit of the resulting model is measured on the test set using measures such as root-mean-square-error and MAPE. Candidate models are tested by adding, transforming, combining, and removing predictive variables. Each time a new model is estimated, its performance on the test data is compared to the previous model. If the new model is more predictive, it becomes the new champion. This process continues until new models are no longer demonstrating improved performance.

• In estimating the parameters of a price-response function, we are not only interested in the predictive accuracy of the model; we are also interested in the causal link between price and demand. This means that we need to ensure that the results of a regression exercise do not demonstrate collinearity, in which price is influenced by other variables in the model, or endogeneity, in which variables that influence both demand and price have not been included in the model.

• If historical data are not available or there is no price dispersion in historic data, then data-free methods such as surveys or conjoint analysis can be used. It is generally advisable that price testing be used to supplement these approaches moving forward so that price response is ultimately based on actual market observations.

4.8 FURTHER READING

Some of the material in this chapter was adapted from my book Pricing Credit Products (2018), which discusses the problem of estimating price response in the context of consumer lending.

Two instructive examples of the use of the difference-in-differences (DiD) method in retailing are Gallino and Moreno 2014, which uses DiD to evaluate the effect of a buy-online/pick-up-in-store program for a retail chain, and Blake, Nosko, and Tadelis 2015, which uses DiD to estimate the effectiveness of online advertisement purchases.

Many classic econometric texts on the application of regression go into detail on the process of linear regression as well as such topics as the treatment of collinearity and endogeneity. Examples include Wooldridge 2015 and Greene 2003. A book that takes a modern, machine-learning-based approach to estimation is Business Data Science, by Matt Taddy (2019). Chapters 2, 5, and 6 are particularly relevant.

Testing for endogeneity and developing estimation approaches that produce unbiased coefficient estimates in the presence of endogeneity has been an extremely active area of econometric and statistical research in the past decades. In fact, survey of the literature found that the treatment of endogeneity was the primary source of difference among estimates of price sensitivity in similar industries (Bijmolt, Heerde, and Pieters 2005). Davidson and MacKinnon 1993 provides a particularly clear description of endogeneity testing and the use of instrumental variables to account for endogeneity in estimation. A good overview of approaches to estimation in the presence of endogeneity is found in the book Mostly Harmless Econometrics: An Empiricist’s Companion, by Joshua Angrist and Jörn-Steffen Pischke (2009).

A good, high-level introduction to conjoint analysis is Getting Started with Conjoint Analysis, by Bryan Orme (2009). For a survey of applications in marketing, see the survey article by Paul Green and V. Srinivasan (1990).

4.9 EXERCISES

1. For the three price-response functions listed in Table 4.4—linear, exponential, and constant-elasticity—fit a model on the training data in Table 4.2 that includes Weekend as an explanatory variable along with Price.

a. What are the coefficients for Price and Weekend in each model?

b. Calculate RMSE, MAPE, and weighted MAPE for the models on the test data in Table 4.3 including Weekend. How do the fits of the models including Weekend compare to the fits of the previous models? Which model best fits the data?

2. Fit a logistic model to the test data in Table 4.3 using Price, Weekend, and Potential Demand as explanatory variables.

a. What are the coefficients for each explanatory variable?

b. Calculate RMSE, MAPE, and weighted MAPE for the logistic model on the test data in Table 4.3. How does the fit of this model compare to the models estimated in the text and the model estimated in Exercise 1?

NOTES

1. The requirement that the treatment is the only difference between the test and control groups is enforced in medical research by the use of triple-blind trials in which the patient, the administering physician, and those evaluating the results do not know which individuals belong to which group. This prevents patients from changing their behavior according to whether they are receiving the treatment or the placebo. It also prevents physicians from unconsciously providing a different level of care to a patient on the basis of whether that patient has been assigned to the treatment or control group. Finally, it prevents researchers from evaluating outcomes differently, depending on whether a patient was in the control or test group.

2. In this section and in the field of statistical estimation, the terms predict and prediction are used to refer to the ability of a model to fit historical data—in our case, the ability of our model to generate good values of historic demand dt given historical prices pt and historical values of exogenous values xt. This is different from the everyday use of the terms to refer to the ability to predict future values.

3. This book uses the common convention that the names of variables representing values in the data are italicized. Thus, Weekend represents the observed value of whether an observation occurred on a weekend or not.

4. The values in the table and in subsequent analyses of this data were derived using the Regression function in Excel v13.6 running on a Mac. Using a different package or a different computer might give slightly different answers.

5. A difficulty arises if take-up is equal to 1 on a particular day. In this case, t = 1, and the left side of Equation 4.7 is undefined. In this case, it is common practice to replace the left side of the equation with a large positive number.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!