Appendix B: Probability

This appendix provides a brief review of the concepts and formulas from probability theory that are used in basic pricing and revenue optimization. A fuller treatment can be found in any text on the subject, such as Mood, Graybill, and Boes 1974; Jaynes 2003; or Bertsekas and Tsitsiklis 2008.

We use the notation Pr{X} to refer to the probability that event X will occur. X can be any event—for example, “Total demand during the week for a product will be greater than 100” or “No-shows will be exactly equal to 10 and shows will be equal to 95.” However, no matter what event X represents, its probability of occurrence will always be between 0 and 1, so 0 Pr{X} 1. The probability that event X does not occur is 1 – Pr{X}.

If X takes on numeric values—such as the number of heads when a fair coin is flipped 100 times or the number of sweaters that will be sold in a month—then X is termed a random variable. By convention, random variables are denoted with capital letters.

For any two events, X and Y, we use Pr{X, Y} (or sometimes Pr{XY}) to refer to the probability that both X and Y occur. Thus, Pr{Shows are greater than 80, Demand is equal to 100} means “The probability that shows are greater than 80 and demand is equal to 100.” This should be contrasted with the notation Pr{X|Y}, which denotes the conditional probability of event X given that event Y occurs. Thus, Pr{Shows are greater than 80|Demand is equal to 100} means “the probability that shows are greater than 80 given that demand is equal to 100.” The two concepts are linked by the formula

Pr{X |Y} = Pr{XY}/Pr{Y}

for Pr{ Y} > 0. Note that since Pr{ Y} 1, Pr{X | Y} Pr{XY}.

Two events X and Y are called independent if Pr{X | Y} = Pr{X}. Intuitively, independence refers to the case in which the probability that event X will occur is unaffected by the value of Y. For example, if we say that weekly sales of product A are independent of product B, it means that our probability that sales of product B would exceed 10 units in that week would not change if we learned that sales of product A were 0, 10, or 100 units. Independence is always mutual; X independent of Y means that Y is independent of X, and vice versa. If X and Y are independent, then Pr{XY} = Pr{X}Pr{Y}.

B.1 PROBABILITY DISTRIBUTIONS

A probability distribution is an assignment of probabilities to a set of events that are “mutually exclusive and collectively exhaustive.” Mutually exclusive means that, at most, one of the events will occur. Collectively exhaustive means that one of the events must occur. Together they guarantee that the sum of the probabilities of all events (or the integral of the probabilities if the events are continuous numbers) equals 1.

For a discrete distribution, f(i) is the probability that event i will occur. In this book, we consider only discrete distributions defined on the nonnegative integers—that is, on i = 0, 1, 2, . . . , .1 By the mutually exclusive and collectively exhaustive property, we must have

The function f (i) is known as a probability density function. Given a discrete probability density function, we define the cumulative distribution function by

If f is the probability density function on demand in a particular week, then F(i) is the probability that demand will be less than or equal to i. Note that F(i) is increasing in i, f(0) = F(0), and limiF()= i 1.

Although discrete distributions are by far the most relevant for pricing and revenue optimization, we will use continuous distributions in some cases. For a continuous distribution, the cumulative distribution function F(X) denotes the probability that the random variable x will take a value less than or equal to X. We assume that F(X) is continuous and differentiable, in which case the corresponding probability density function can be computed as f(x) = dF(x)/dx. Then

For a continuous distribution, the probability that a random variable will be between a and b, with a < b, is F(b) – F(a). The cumulative distribution function F(x) is increasing in x and limx→∞F(x)=1.

For both continuous and discrete distributions, we define the complementary cumulative distribution function (CCDF) as (x) = 1 – F(x). The CCDF (x) is the probability that a random variable X with cumulative distribution function F(x) will be greater than x. It is a nonincreasing function of x.

B. 1.1 Expectation

Let the random variable X be distributed according to f. Then, the expectation or mean of X is defined by

when f(·) is a discrete distribution and by

when f(·) is continuous. The notation E[·] denotes the expectation operator, meaning that it is the probability-weighted sum (for a discrete variable) or integral (for a continuous variable) of the function inside the brackets. The mean of the variable itself is denoted by μ.

The mean of a distribution should not be confused with the mode of the distribution, which is the most likely observation given a single sample. Nor should it be confused with the median of the distribution, which is the value such that the probability that a single observation will exceed that value is the same as the probability that it will be less than the value.

An alternative formula for calculating the mean of a discrete distribution defined over the nonnegative numbers i = 0, 1, . . . , is

A similar result holds for a conditional probability distribution for a random variable X defined for X 0—namely,

The concept of expectation can be extended to any function. Thus, if g(i) is a function of i, the expectation of g(i) is defined as

for a discrete variable and

for a continuous variable. Note that, in general, E[g(i)] g[E[i]], with equality holding only if the function g is linear.

For any random variable X and any number a, E[aX] = aE[X]. Furthermore, for random variables, for any i and j. This property does not apply to general functions of random variables—for example, E[X/Y] E[X]/E[Y] in general.

For purposes of pricing and revenue optimization, the expectation of the minimum of a random variable and a fixed capacity is often important. In other words, we want to calculate E[min(i, C)], where C is a fixed capacity and i follows some discrete distribution f(i) on the nonnegative integers. The following formula is often useful in this case:

We are also often interested in E[(iC)+] E[max(iC, 0)]. If f(i) is a probability density function of total demand and C is capacity, then E[(iC)+] is the expected number of customers in excess of capacity—that is, the expected number of customers turned away. Since the expected number of customers served plus the expected number turned away must equal expected demand, we can combine Equations B.1 and B.2 to derive

B.1.2 Variance and Standard Deviation

Two other important properties of probability distributions are the variance and the standard deviation, both of which measure the spread of the distribution. The variance of a distribution is the expected square of the distance between a sample from the distribution and the mean. That is,

The variance of a distribution is always greater than or equal to 0; the higher the variance, the broader the spread of the distribution. For independent random variables, the variance of the sum is equal to the sum of the variances; that is,

The standard deviation, denoted by σ, is equal to the positive square root of the variance: Like the variance, σ ≥ 0 for any distribution. Let σ[Xi] be the standard deviation associated with random variable Xi. Then, from Equation B.4, we can derive the formula for the standard deviation of the sum of independent random variables:

Note that Equations B.4 and B.5 hold only when X1, X2, . . . , Xn are independent.

The ratio of the standard deviation to the mean of a distribution is sometimes called the coefficient of variation, or CV, where CV = σ/μ.

B.2 CONTINUOUS DISTRIBUTIONS

B.2.1 The Uniform Distribution

The uniform distribution is “the distribution of maximum ignorance.” It is used when we know nothing about the distribution of X except that it lies between bounds a < b, but we have no reason to believe that it is more likely to lie at any one point within that interval than another. The uniform distribution has a probability density function,

The mean of the uniform distribution is μ = (a + b)/2, and its standard deviation is

B.2.2 The Exponential Distribution

The exponential distribution has a single parameter, λ > 0, and a probability density function,

f(x)=λe λx

for x 0. The mean of the exponential distribution is μ = 1/λ, and the standard deviation is the same as the mean: σ = 1/λ.

B.2.3 The Normal Distribution

The normal (or Gaussian) distribution is the famous bell-shaped curve, the most widely used distribution in statistics and mathematical modeling. There are two reasons for its popularity. The first is that theory has shown that the normal distribution is, in many situations, a reasonable distribution to use when the actual underlying distribution is unknown or when the underlying distribution is the result of many different random influences. The second reason is that many practical techniques have been developed for making calculations with the normal distribution. Such techniques are easily accessible in most software and spreadsheet packages, such as Excel.

The probability density function for the normal distribution is

where μ is the mean and σ is the standard deviation.

The normal distribution with mean 0 and standard deviation 1 is so important that it has its own notation:

where ϕ(x) denotes the normal distribution with mean 0 and standard deviation 1. The cumulative distribution function of the normal distribution with mean 0 and standard deviation 1 is denoted by Φ(x). Both ϕ(x) and Φ(x) are included in all statistical and spreadsheet packages. This is useful because they can be used to calculate f(x) and F(x) for any normal distribution, by means of the following transformations:

where f(x) is the normal density function with mean μ and standard deviation σ, and F(x) is the corresponding cumulative distribution function.

Example B.1

Monthly demand for a product has been observed to follow a normal distribution with mean of 10,000 and standard deviation of 8,000. The probability that demand will be less than 12,000 units is then given by Pr{Demand less than 12,000 units in a month} = Φ[(12,000 – 10,000)/8,000] = Φ(0.25) = 0.60.

The normal distribution is widely used. However, a few words of caution are in order. First, the normal distribution is continuous, so care needs to be used in applying it to situations where outcomes are discrete. These issues are addressed in Section B.3.4 on the discrete normal distribution. Second, bear in mind that the normal distribution puts positive probabilities on all outcomes from – to . This can become a problem in situations where outcomes less than zero are not physically meaningful (e.g., demand, sales, no-shows, prices). Third, caution is particularly important when using the normal distribution in a situation where the standard deviation is large relative to the mean. For example, when the mean is equal to the standard deviation (e.g., the coefficient of variation is 1), the normal distribution places about a 16% probability on the outcome’s being less than zero. In cases where the coefficient of variation is high, it may be necessary to use another distribution to obtain reasonable results.

B.3 DISCRETE DISTRIBUTIONS

B.3.1 The Bernoulli Distribution

The Bernoulli distribution is the simplest interesting discrete distribution. It is defined as

The Bernoulli distribution applies to the situation in which a particular event has a probability p of occurring and, thus, a probability 1 – p of not occurring. In this case, the outcome of the event’s occurring can be arbitrarily labeled 1 and the outcome where the event does not occur can be labeled 0. The probability 1 – p of the event’s not occurring is often denoted by q.

Example B.2

A fair coin is to be flipped once. Let 0 denote heads and 1 denote tails. Then the outcome of the flip follows a Bernoulli distribution with p = 0.5.

The mean of the Bernoulli distribution is p and its variance is p(1 – p) = pq, so its standard deviation is

B.3.2 The Discrete Uniform Distribution

The discrete uniform distribution is appropriate whenever a number of mutually exclusive events all have equal probability of occurring. Then, if there are n events, the possibility that any one of them will occur is 1/n. We assume that the events are indexed by consecutive integers i = a, a + 1, a + 2, . . . , b, where a 0 and b > a. In this case, we can write the uniform probability density function as

The cumulative distribution function for the uniform distribution is

The cumulative distribution function for the uniform distribution function (i) = 1 – F(i) is linear between a and b. This leads to the connection between the uniform willingness-to-pay distribution and the linear demand function discussed in Chapter 3.

The mean of the uniform distribution is μ = (a + b)/2, and its standard deviation is

Example B.3

A seller believes that if he prices a new shirt at $79.00, then total demand for the shirt will have equal probability of being any amount between 10,000 and 20,000 units. This corresponds to a uniform distribution with a = 10,000 and b = 20,000. This means the probability that demand will be less than or equal to 18,000 will be F(18,000) = (18,000 – 10,000 + 1)/(20,000 – 10,000 + 1) = 0.8. Mean demand is (20,000 + 10,000)/2 = 15,000 shirts, with a standard deviation of

B.3.3 The Binomial Distribution

The binomial distribution arises whenever each of a known number of people makes a decision between two mutually exclusive alternatives when each person has the same probability of choosing the first alternative and the decisions are all made independently (in the probabilistic sense). If the number of people choosing between the alternatives is n and the probability that any individual will choose the first alternative is p, then the total number choosing the first alternative will follow a binomial distribution with parameters n and p. In addition, the total number choosing alternative 2 will follow a binomial distribution with parameters n and q, where q = 1 – p.

Example B.4

One hundred passengers have booked the same flight. Each passenger has probability of 0.8 of showing, and passengers will make independent decisions whether or not to show. In this case, the number of shows will follow a binomial distribution with p = 0.8 and n = 100. Furthermore, no-shows will follow a binomial distribution with p = 0.2 and n = 100.

Example B.5

Ten thousand shoppers will log on to the website of a car loan provider. The supplier believes that if he offers an annual percentage rate of 6.9%, each shopper will have a 0.05 probability of filling out an application for a loan. The distribution of applications filled out would then be binomial with p = 0.05 and n = 10,000.

The classic example of a binomial distribution is drawing balls from a large urn with replacement. If an urn contains a large number of black and white balls such that p is the fraction of black balls, and n balls are drawn from the urn with the drawn balls replaced, then the number of black balls drawn will be binomial with parameters n and p. Similarly, if a fair coin is tossed n times, the number of heads tossed will be binomial with parameters n and p = 0.5.

The probability density function for the binomial distribution is

where

The mean of the binomial distribution is μ = pn, the variance is p(1 – p)n, and the standard deviation is

Example B.6

A bed-and-breakfast has 8 rooms and has accepted 10 bookings. On average, 85% of its bookings show, with the rest canceling. Assuming a show rate of .85, the expected number of guests who will show up is .85 × 10 = 8.5, with a variance of .85 × .15 × 10 = 1.275 and a standard deviation of 1.13. The probability that exactly 7 guests will show up is

Because of the terms of the form in Equation B.7, the binomial distribution can be very difficult to work with in practice. In particular, the term n! grows very large very quickly. Using Equation B.7 to calculate the number of shows given 100 bookings would require calculating 100!, which is a 158-digit number. In addition, there is no closed-form solution for the cumulative distribution function F(i) for a binomial distribution. Fortunately, for large n, the binomial distribution is well approximated by the normal distribution with mean μ = pn and standard deviation

Example B.7

An online tour broker believes that 30,000 people will visit his website in the next week, with each visitor having a 2% chance of purchasing a tour. The number of tours sold would then follow a binomial distribution with p = 0.02 and n = 30,000. His probability density function on demand can be approximated by a normal distribution with mean 0.02 × 30,000 = 600 and The tour operator wants to know the probability that more than 580 customers will seek to purchase a tour. He can estimate this as 1 – Φ[(580 – 600)/24.25] = 1 – Φ(–0.8025) = 0.788. Therefore, there is about a 79% chance that demand will be greater than 580.

B.3.4 The Discrete Normal Distribution

The normal distribution is a continuous distribution. However, in some cases we want to apply it to situations in which outcomes are discrete. For example, we might want to model demand for one-night arrivals at a hotel as “normal with mean 200 and standard deviation 100.” Furthermore, in most of these cases only nonnegative values will be meaningful—therefore, we need to restrict the distribution to the nonnegative numbers. We do this by defining

and

Defining the normal distribution in this fashion makes it easy to use standard statistical functions in Excel, for example, for computations. However, a word of caution that this comes with a cost: The discrete normal distribution as defined here will now have a mean that is different (generally higher) than μ and a standard deviation that is somewhat lower than σ. Whether this is important or not depends on the application.

B.4 SAMPLE STATISTICS

So far, this appendix has discussed concepts such as the mean, standard deviation, and variance in terms of somewhat abstract probability distributions. In practice, we are often interested in calculating the best estimates of these statistics from data. Of course, all statistical software packages provide built-in functions for generating these estimates; thus, the AVERAGE function in Excel estimates the statistical mean of a set of data, and the standard deviation is determined by the STDEV.P function.

Assume that we have N numerical data observations x1, x2, . . . , xN. Then the mean of the observations is given by

the variance is given by

and the standard deviation, by

These statistics are often measures of interest in themselves. For example, it may be of interest to know that the mean daily demand for a product over the last month was 140.3, with a standard deviation of 61.2. However, the mean and standard deviation can also be used as estimates of the properties of an underlying distribution; thus, we might model future demand for the product using a normal distribution with mean 140.3 and a standard deviation of 61.2. By the law of large numbers, the sample statistic estimates will approach the “true” values for a population as the number of observations N increases.

NOTE

1. Discrete distributions can also be defined with a range that includes negative integers. However, this book considers only distributions over nonnegative integers.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!