Common section

images

4. Bayesian Analysis of Historical Methods

As one analyst put it, BT actually explains “what are regarded as sound methodological procedures” and reveals “the infirmities of what are acknowledged as unsound procedures” in almost any empirical field.¹ In other words, Bayes's Theorem underlies all other methodologies and thus explains why certain methods are regarded as sound, and others not—even when advocates or detractors of various methods are unaware of BT's capability in this regard. This entails a testable prediction, that all valid empirical methods reduce to BT: any method you propose will either be logically invalid or it will be described by BT. One might challenge how universally that's true, but here I will demonstrate that it at least holds for historical methods. I'll start with the most widely applicable examples, increasing in degrees of generalization, then test a few common methods of narrower scope.

Years ago I described two historical methods as defining the best that historians have deployed: the Argument to the Best Explanation (ABE) and the Argument from Evidence (AFE).² The literature on historical method and the epistemology of history essentially supports this conclusion, all of it being reducible to one or the other.³ Yet I have since discovered that everything I had argued can be better framed in Bayesian terms, especially since neither the ABE nor the AFE solves the problems I now identify as plaguing every historical method: establishing logical validity and epistemic sufficiency; in other words, why should the conclusion of these arguments be deemed logically valid and when is the evidence enough to warrant belief in the conclusion?

THE ARGUMENT FROM EVIDENCE (AFE)

According to the AFE, there are at least five respected categories of historical evidence, and the more a conclusion has support from each category, the more likely it is to be true. Those categories are:

Physical-Historical Necessity: the degree to which history could not have proceeded as it did had the event(s) not occurred; that is, the degree to which the event is required to account for all subsequent history.

Direct Physical Evidence: archaeological evidence, material evidence, evidence physically produced by the event(s) or person(s) in question.

Unbiased or Counterbiased Corroboration: witnesses who have no known motive to lie or exaggerate, or even a motive to lie or exaggerate in the opposite direction.

Credible Critical Accounts: accounts by known scholars of the period that exhibit the use of a critical analysis and evaluation of multiple lines of evidence (as opposed to just repeating a story).

Eyewitness Accounts: accounts by actual eyewitnesses of the event(s) or person(s) in question.

All these amount to saying “these categories of evidence are unlikely to exist unless the proposed event happened,” which in Bayesian terms means a low P(e|~h.b) relative to a high P(e|h.b). That's how you represent the fact in BT that the evidence (e.g., subsequent history, eyewitness testimony, etc.) is very improbable unless h (the hypothesized event[s]) happened. For example, Julius Caesar's conclusive capture of Rome and his unchallenged firsthand account of crossing the Rubicon would both have been improbable unless Caesar actually crossed the Rubicon. Not impossible, but improbable—certainly less probable than if Caesar had indeed done that. Which is simply a colloquial way of saying P(e|~h.b) is low (because any plausible alternative account of things would render that evidence unlikely) and P(e|h.b) is high (as the event having occurred makes all that evidence very likely indeed). The five categories of evidence in the AFE just represent five different ways evidence can be more probable on h than ~h. And that list of five isn't even exhaustive. Though additions to it are likely to have much less weight (these five being typically the strongest types of evidence to have), there still are other kinds of evidence. Hence, BT is more generic and thus more universal, and all the premises of the AFE are logically included in the premises of BT.

This is confirmed by observing the effect of taking evidence away. If the physical-historical necessity of an event is minimal, then given solely that factor P(e|~h.b) and P(e|h.b) are about equal, because then the evidence (of the subsequent course of history) is as likely to have occurred even if our hypothesis is false. We thus need other evidence. Indeed, if we should expect the subsequent course of history to have been different on h, then the value of that evidence is actually reversed and counts against our hypothesis. Then it is P(e|~h.b) that is high, and P(e|h.b) that is low. Likewise, if there is no physical evidence and if that absence of evidence is just as likely on either h or ~h (such as due to the rarity of evidence surviving, as is frequently the case in ancient history), then again the consequents are equal (or near enough—see my discussion of ‘evidence loss’ in chapter 6, page 219). But if that absence of evidence is unusual on h (as is often the case in more modern history, where we actually expect physical and documentary evidence), then its absence argues against h, and P(e|~h.b) is then higher than P(e|h.b), sometimes very much higher (I'll examine the specific logic of an Argument from Silence later in this chapter, page 117). The same reasoning can be followed through for the other three categories of evidence. In fact, you can use BT to fully analyze the consequences of differing degrees of evidence. For example, just how unbiased or counterbiased a source is may not be so black and white. If it is only slightly unbiased, then the degree by which it will lower P(e|~h.b) will be smaller than if it were an ideally neutral source (such as someone who doesn't even understand the significance of what they are attesting to).⁴

In every case, the degree to which the consequents differ from each other will reflect the degree to which the evidence is unexpected on either h or ~h. It should not be difficult to select an a fortiori measure of degree from the canon of probabilities supplied in chapter 3 (and repeated in the appendix, page 286). If you want to be sure whether h is credible, and the evidence does entail P(e|h.b) is higher than P(e|~h.b), then select a measure of degree that is less than you are certain it is. For example, if you are sure the odds must be way lower than one in one hundred that all this evidence would exist if ~h, then select one in one hundred (0.01), or even one in twenty (0.05), as reflecting the degree of difference between the consequents. For example, you might assign P(e|h.b) = 1 and P(e|~h.b) = 0.05. Then whatever result you get, if it supports believing h, it will support that conclusion even more, since you know any correction of your estimated probabilities can only raise the final epistemic probability that h is true (as explained in chapter 3, page 85).

What about prior probability? The AFE as stated implicitly assumes all competing hypotheses are equally likely prior to considering the evidence for them—which limits its validity. For example, the famous rain miracle of Marcus Aurelius only has extant evidence supporting magic or miracle as its explanation and description.⁵ To what extent are we obliged to conclude that those reports are correct, rather than preferring a hypothesis of what happened that explains these reports and the event without recourse to sorcery or meddling gods? The AFE can't answer that question in any logically valid way. But it can answer the question of whether the evidence we have is very likely or unlikely on any given hypothesis, which corresponds to the consequent probabilities in BT. So when the priors are indeed equal, or close enough that the evidence-produced disparity in consequents would easily overwhelm them, the AFE gives intuitively correct results. Thus, BT formally represents the logic of the AFE, and the AFE is only valid insofar as it can be validly represented with BT.

THE ARGUMENT TO THE BEST EXPLANATION (ABE)

According to the ABE, there are five qualities a theory can possess in respect to the evidence, such that the more it fulfills those qualities over any alternative explanation of the same evidence, the more likely it is to be true. Those qualities are (quoting and paraphrasing McCullagh⁶):

Plausibility: the hypothesis must conform to the expectations set by our background knowledge; formally, “it must be implied…by a greater variety of accepted truths than any other, and be implied more strongly than any other; and its probable negation must be implied by fewer beliefs, and implied less strongly than any other.”

Ad Hocness: the hypothesis must rely on the fewest ad hoc assumptions possible to explain the evidence, that is, assumptions for which there is no evidence or established agreement, or things just made up to force the hypothesis to fit; formally, “it must include fewer new suppositions about the past,” and about the nature of man and the world, “which are not already implied to some extent by existing beliefs.”

Explanatory Power: the hypothesis must make the evidence we have very probable; formally, “it must make the observation statements it implies more probable than any other.”

Explanatory Fitness: the hypothesis must not contradict any evidence or well-established beliefs, or at least contradict them much less than any competing theory does (since contradictory evidence can be explained away by various devices, sometimes legitimately; indeed a new result contradicting a prior belief is exactly how we discover a prior belief is false); formally, “when conjoined with accepted truths it must imply fewer observation statements…which are believed to be false.”

Explanatory Scope: the hypothesis must explain more of the evidence we have than any other hypothesis can; formally, “it must imply a greater variety of observation statements” that can be checked against surviving evidence.

This list is essentially just a lay summary of BT. For each criterion, the question hinges on how much h exceeds all the alternatives (which together constitute ~h), which requires a measure of degree, which is by definition mathematical. And such a measure of degree (by which h exceeds ~h) is exactly what BT employs. With the ABE, the end result requires combining all five factors, which can be complicated if competing hypotheses match or exceed each other on different criteria. Yet the ABE provides no means of ascertaining the effect of combining all five criteria. Even in straightforward cases, where h exceeds ~h on every criterion, what degree of belief is warranted by that degree of superiority on each of those five criteria? The ABE alone cannot answer that question. BT can; likewise on complex cases. Hence, BT is superior to the ABE.

In fact, the ABE criteria themselves are just colloquial versions of the premises in BT. Plausibility combined with Ad Hocness is simply a description of prior probability. Hence, prior probability in BT is the combination of ‘Plausibility’ and ‘Ad Hocness’ in the ABE. The more our background evidence renders our theory more typical, the higher its prior. The less typical our background evidence renders our theory, the lower its prior. And requiring new suppositions (about the world and the past) entails an untypical explanation, whereas an explanation fully (and indeed better) supported by background knowledge is thereby, by definition, more typical. BT represents this fact by a difference in priors. And this can only be validly represented when all possible explanations are represented in relative terms to each other, which only a proper application of BT ensures (as in BT the sum of all priors must equal one). Adding ad hoc elements likewise includes the tactic of inventing ‘excuses’ for the evidence not fitting your theory, which also lowers prior probability (as I demonstrated in chapter 3, page 80), which means that not depending on such excuses necessarily raises the prior (by exactly as much as depending on them would have lowered it).

The last three criteria then combine to entail the consequent probabilities in BT. Explanatory Power is almost an exact description of consequent probability. It only lacks reference to the circumstance of a hypothesis entailing a low consequent—which is accomplished by the Criterion of Explanatory Fitness. In fact, those two criteria are obviously two sides of the same measure: one refers to evidence that is very expected on a hypothesis, the other to evidence that is very unexpected on that same hypothesis; the one entailing a high consequent, the other a low one. The only thing missing is the middle possibility: evidence some hypotheses neither predict nor contradict. And that's essentially what Explanatory Scope picks up, by addressing facts a theory makes likely but that another theory makes neither likely nor unlikely. Combine all three, along with the fact that it is stated in each that you are measuring the degree of difference between the tested hypothesis and all its alternatives, and you simply have the difference of consequent probabilities measured in BT.

To see why they're the same, once again we only need examine what happens when evidence is added or taken away. Increasing Explanatory Scope (relative to a competing hypothesis) entails decreasing P(e|~h.b), since not explaining those facts renders them less probable, while explaining them renders them probable, keeping P(e|h.b) high. In contrast, increasing Explanatory Fitness or Power directly entails increasing P(e|h.b), while decreasing either of them entails decreasing it. And insofar as a competing hypothesis itself has a high or low Fitness or Power, P(e|~h.b) is also rendered high or low accordingly; likewise if ~h has the greater Scope, then it's P(e|h.b) that drops instead of rises. Thus, BT describes and in fact legitimizes the ABE. And yet BT is superior to the ABE, by having the precision that guarantees the logical clarity and validity of its results and ensures that ambiguous and unstated assumptions (about measures of relative degree) become clear and stated, and thus open to challenge and thus requiring sounder defense, thereby also ensuring its premises will be more sound than is likely under the vague structure of the ABE.

Thus, BT achieves greater soundness and validity than the ABE. Which reminds us again that any argument against the applicability of BT to history logically entails the same argument against the applicability of the ABE to history, and the AFE as well; and, I predict, all methodologies whatever, insofar as they have any validity to begin with.

THE HYPOTHETICO-DEDUCTIVE METHOD (HDM)

In fact, any valid form of hypothetico-deductive method is described by BT. Even the principle of Ockham's Razor (when validly formulated) follows necessarily from BT.⁷Hypothetico-Deductive Method (HDM) is the procedure of forming a hypothesis h, deducing observations that would be made if h is true and observations that would be made if h is false, and making many observations until the probability of h either far exceeds all known alternatives, or drops below all credibility. This exactly describes the BT ratio of consequents (or else the iterative use of BT, on which see chapter 5, page 168). But you can always redesign any alternative hypothesis so that ~h also predicts all the same observations as h. So how do you tell the difference? In practice, if your predictions are good ones (i.e., the evidence that results from many diverse observations normally entails a high ratio of consequents favoring h), then redesigning a new hypothesis so as to “just happen” to make all those same predictions will require an enormous Rube Goldbergesque contraption of additional assumptions, whereas the tested hypothesis is simple and requires few new assumptions; at which point, Ockham's Razor is invoked, and h thus prevails over ~h.

This difference between such explanations is described by the logic of prior probability in BT.⁸ Thus, Ockham's Razor is merely a declaration of that fact: the more assumptions you tack onto any h (especially novel assumptions), the lower its prior must be (as I demonstrated in chapter 3, page 80). Because any collection of “coincidences” like that is less typical, and thus less probable, than causes that do not require them (much less so many of them). This follows not merely from the addition of novel assumptions, but also the addition of improbable assumptions. For example, that the CIA might meddle with science experiments is not impossible (in fact such meddling in other affairs is actually attested, as are some of the means and motives necessary to meddle in the same way with scientific research), yet any h that relied on the unverified assumption that the CIA was meddling with your experiment would by virtue of that fact be far less probable than almost any h excluding that assumption, and thus ‘CIA interference’ is usually (and rightly) axed by Ockham's Razor. And that represents the lowering of priors based on our background knowledge regarding what is and isn't typical (the CIA meddling in science experiments isn't typical).

Thus all scientific methods, which are simply iterations of HDM, are described by BT.⁹ Historical methods are identical to scientific methods in this respect, being just another set of iterations of HDM. In fact, many sciences are historical, for example, geology, cosmology, paleontology, criminal forensics, all of which explore not merely scientific generalizations but historical particulars, such as when the Big Bang occurred, or how the solar system formed, or exactly when or where a large asteroid struck the earth, or when a volcano erupted and what resulted from it, or what happened to a specific species in a specific historical period, or who committed what crime when. Not even the claim that historians must deal with human thoughts and intentions makes a difference, as these are as much a necessary occupation of psychologists, economists, sociologists, and anthropologists. It's also fundamental to the scientific study of game theory and all of cognitive science. Nor is there any demarcation based on the role of controlled experiments. Much of science does not rely on experiments but primarily involves field observations (e.g., astronomy, zoology, ecology, paleontology), an approach to evidence directly analogous to the historian (most clearly parallel in the science of archaeology, but “field observations” of the artifacts we call “texts” and “documents” is just as analogous). Conversely, experiments sometimes do have a place in historical methodology.¹⁰ And as noted in chapter 3 (page 47), science is actually dependent on history, just as much as history depends on science.

So there is no qualitative difference. History is thus continuous with science. The difference between them is only quantitative: history must work with much less data, of much less reliability. Therefore, its results have less certainty and less precision. BT even explains this: the data available to science are of such scale and quality as to raise the final epistemic probability of its results to incredible heights (so high, in fact, often no one even bothers to calculate them, nor need they). But in history we are almost always dealing with final probabilities that, however high they may be, nevertheless allow a possibility of being mistaken that isn't negligible—to such a degree, in fact, that scientists would reject comparably uncertain results. But their rejecting such results does not mean those results are not believable, only that they do not obtain a scientific degree of certainty. The demarcation between science and nonscience is not the demarcation between believable and unbelievable conclusions. It is merely the demarcation between conclusions that are, for all intents and purposes, decisively certain (albeit still revisable), and conclusions that are not. But many conclusions can be believed with legitimate confidence without being decisively certain. We meet with such beliefs routinely in journalism, economics, and daily life. So as long as we face this fact and accept history is like that, we can proceed scientifically without pretending to the certainty of scientific results.

FORMAL PROOF OF UNIVERSAL APPLICABILITY

Since BT fully describes HDM without remainder, and HDM is a higher-level generalization of all historical methods, including the AFE and ABE, we could simply conclude here and now that Bayes's Theorem models and describes all valid historical methods. No other method is needed, apart from the endless plethora of techniques that will be required to apply BT to specific cases—of which the AFE and ABE represent highly generalized examples, but examples at even lower levels of generalization could be explored as well (such as the methods of textual criticism, demographics, or stylometrics). All become logically valid only insofar as they conform to BT and thus are better informed when carried out with full awareness of their Bayesian underpinning.

This should already be sufficiently clear by now, but there are always naysayers. For them, I shall establish this conclusion by formal logic.

P1.	BT is a logically proven theorem.
P2.	No argument is valid that contradicts a logically proven theorem.
C1.	Therefore, no argument is valid that contradicts BT.

P1 is an established fact (see note 9 for chapter 3, page 300). P2 is true by definition, that is, what it is to be a logically valid form of argument is to be consistent with formal logic, and all logical proofs are consistent with formal logic, ergo, to be inconsistent with a logical proof is to be inconsistent with formal logic, which entails by definition an invalid argument. Formally, if B = ‘we must accept the sound conclusions of formal logic,’ A = ‘BT is true,’ and C = ‘there is some historical method that is logically valid but contradicts BT,’ then:

P3.	If B, then A.
P4.	If C, then ~A.
C2.	Therefore, if B, then ~C.

This is a logically necessary truth.¹¹ Therefore there can be no valid historical method that contradicts BT. This leaves only two other possibilities: either (a) all valid historical methods are fully modeled and described by BT (and are thereby reducible to BT), or (b) there is at least one valid historical method that does not contradict BT but that nevertheless entails a different epistemic probability than BT for at least one historical claim h. The only way that can be logically possible is if there is something that could be said about the epistemic probability of h that is not said about the epistemic probability of h in BT. Because if BT already says that about h (i.e., if it already contains a premise about the effect of that same fact on the epistemic probability of h), then the only way any method can say anything different (about the effect of that fact on the epistemic probability of h) is by contradicting BT, which we just demonstrated is logically impossible. Any method that did that would have to be logically invalid.

Can that point be proven? Yes. About the epistemic probability of h a method can say all the same things as BT (in which case it must give the same conclusion as BT, or else it is contradicting BT and therefore contradicting logic), or it can say more things than BT, or it can say fewer things than BT. Methods that say less than BT, yet declare an epistemic probability for h, can only be methods that ignore known facts that affect the epistemic probability of h, and such methods are necessarily invalid and must thereby be excluded, which leaves only methods that say more than BT. This does not mean, however, all methods are invalid that only seem to say less than BT but that in fact (implicitly) say all the same things as BT. Just as the AFE is logically valid only if it is allowed to implicitly assume all prior probabilities are equal, so, too, any valid statistical arguments that ignore considerations of prior probability are still implicitly assuming some prior probability, whereas if they are not assuming that, then they are logically invalid.¹² For not assuming the priors are equal entails assuming the priors must be different, and any method that assumes the prior probability of h is different from ~h but does not enter that difference somehow into its calculation of the final probability of h is willfully illogical—at any rate, it thereby directly contradicts BT, which, as proved, is contrary to logic.¹³ So all that's left are methods that say more than BT. But we know of nothing that can be said that would validly affect the epistemic probability of h other than what is already said by the premises in BT. And if a method says nothing different about the probability of a claim being true than is already said by BT, that method can be fully replaced by BT without logical consequence.

Accordingly, I propose the following testable hypothesis:

P5. Anything that can be said about any historical claim h that makes any valid difference to the probability that h is true will either (a) make h more or less likely on considerations of background knowledge alone or (b) make the evidence more or less likely on considerations of the deductive predictions of h given that same background knowledge or (c) make the evidence more or less likely on considerations of the deductive predictions of some other claim (a claim which entails h is false) given that same background knowledge.

Thus to reject my conclusion (that all valid historical methods are reducible to BT) requires providing a counter-example to P5—which I predict no one can do. It's probably impossible, as by definition b and e encompass all data (i.e., the union of those two sets produces the set of all things known), and h and ~h encompass all theories, and BT logically includes every probability entailed by b and e on every theory. For example, P(h|b.e) is transitively identical to P(h|e.b) and is by definition the probability that h is true given all available knowledge (the union of e and b), so there is by definition no other knowledge that can alter that probability. Yet P(h|e.b) follows by logical necessity from P(h|b), P(e|h.b), and P(e|~h.b); therefore no other probability can have any relevance to determining P(h|e.b).

That means the following is true by definition:

P6. Making h more or less likely on considerations of background knowledge alone is the premise P(h|b) in BT; making the evidence more or less likely on considerations of the deductive predictions of h on that same background knowledge is the premise P(e|h.b) in BT; making the evidence more or less likely on considerations of the deductive predictions of some other claim that entails h is false is the premise P(e|~h.b) in BT; any value for P(h|b) entails the value for the premise P(~h|b) in BT; and these exhaust all the premises in BT.

Formally, if C = ‘a valid historical method that contradicts BT,’ D = ‘a valid historical method fully modeled and described by (and thereby reducible to) BT,’ and E = ‘a valid historical method that is consistent with but only partly modeled and described by BT,’ then:

P8. Either C, D, or E. (proper trichotomy)

P9. ~C. (from C2.)

C3. Therefore, either D or E.

P10. If P5 and P6, then ~E.

P11. P5 and P6.

C4. Therefore, ~E.

P12. If ~C and ~E, then only D. (from P8.)

P13. ~C and ~E. (from C2 and C4.)

C5. Therefore, only D.

Therefore, only a valid historical method fully modeled and described by (and thereby reducible to) BT exists. In other words, no other valid historical methods exist.¹⁴

Indeed, I believe that for any claim h (whether in history or any other subject of knowledge whatever), if h is capable of being true or false, then its probability of being true (given what you happen to know at the time) is exactly and only the probability entailed by BT. Therefore, no one can reject the valid and sound conclusions of BT in any subject of factual inquiry, and no one can claim that BT does not or cannot determine the probability that any h is true (given all present knowledge). The preceding proof already entails this must be true for claims about history, and that is sufficient for my present purpose. But I shall pause to demonstrate the broader thesis as well, as it reinforces the narrower.

The broader thesis follows from an argument I developed in chapter three (pages 83–88). Of the four probabilities in BT, one entails another (each prior is always the converse of the other), so there are only three independent statements of probability in BT. For each of those you either know that its value is higher than 0.5 or lower than 0.5, or else so far as you know it is 0.5—because if you don't know whether it's higher or lower, then by definition so far as you know it's as likely as not. If it weren't as likely as not, then by definition that would mean you know it is not 0.5, which entails you know it is either higher or lower. Therefore, for every premise in BT, you always know its probability is either A (0.5), B (higher than 0.5), or C (lower than 0.5), “so far as you know,” and that exhausts all logical possibilities. But no matter what value thereby results for each and every premise in BT (whether A, B, or C), a conclusion necessarily then follows regarding the probability that h is true, given what you are thus claiming to know—even if that probability is flat out 50/50. For example, for any claim h, if you know nothing about what any of these three probabilities for it are, then so far as you know they are all 0.5, which logically entails the posterior epistemic probability, the probability that h is true (“given what you know so far”) is 0.5. And because BT is logically valid, you must always accept its conclusions when you accept its premises. Thus the only way to deny such a conclusion is to affirm different premises (e.g., that one of those three probabilities is not 0.5), in other words, to affirm that you know what those probabilities are (at least well enough to know they aren't the probabilities just affirmed).

Of course by arguing a fortiori you might say something like it's “0.5 or higher,” but even in that case mathematically you are entering a 0.5 in the equation, so it amounts to affirming that the probability is simply 0.5. What you then do with the conclusion is determined by whether you think the “true” probability could be higher or could be lower (the method of arguing a fortiori, which I discussed in chapter 3, page 85). But mathematically you still have to choose to set the limit of your margin of error at 0.5, or above 0.5, or below 0.5. There is no fourth alternative. That some conclusion is then always entailed I demonstrate with a logical flow chart in the appendix (page 286), where BT conclusions are shown to follow even from the simplest trichotomy here proposed, that is, that each premise has a value of either 0.5 or < 0.5 or > 0.5. But the same analysis follows for any degree of precision your knowledge can honestly claim.

For example, if you must admit P(h|b) is > 0.5, then for any arbitrary number above 0.5, for instance n = 0.75, either you believe (A) P(h|b) is n, or (B) P(h|b) is greater than n, or (C) P(h|b) is less than n, or else (D) you can't claim to know anything more than that n is above 0.5. And so on, for any other n. You cannot deny any of these possibilities without affirming one of them, and once one of them is affirmed, a conclusion always necessarily follows from BT as to what the probability is that h is true given what you know. It might still be that or higher, or that or lower, depending on which limit of your margin of error you are defining, but that's still a probability that cannot be rejected without giving in and accepting BT and just affirming different values for its premises.

To be a little more specific, if you affirm “very high confidence” that h would turn out to be true before looking at the specific evidence for h, then you cannot logically believe P(h|b) is less than 0.75. Because to simultaneously assert “I have a very high confidence that h will turn out to be true” and “I believe h will be true only three out of every four times” is to hold two contradictory beliefs. To believe h will be true only three out of four times is simply not to believe “with very high confidence” that h will turn out to be true. That would be a confidence only somewhat high. Just ask yourself again whether you'd get into a car that had a one in four chance of exploding, and then ask why you would still claim to have a “very high confidence” in the safety of that car. Or if that context is too extreme, ask yourself if you'd bet your career on a result that had a one in four chance of soon being refuted, and then ask why you'd have a “very high confidence” in a result like that. This becomes all the clearer as your certainty increases. If you are certain h will turn out to be false, because you rightly believe it is wildly improbable for h to be true (e.g., as when h = “Caesar rode a winged horse and camped on the moon”), then you cannot believe that P(h|b) is even as high as 0.01 (or 1 percent), much less any higher. This was already evident in chapter 3's opening example of the sun being supernaturally eclipsed for three hours.

It follows that no other method of inference, such as using ordinal or qualitative rankings of confidence absent any reference to probabilities, can supplant BT. It can work alongside it, as a heuristic or simplified way of getting the same result. But you can't get a valid conclusion from any other method and have that conclusion contradict BT. And for every hypothesis, BT entails some conclusion (for whatever state of knowledge you are presently in). Because for any probability in BT (whether the prior or either consequent), you will always have some confident belief that it is at least some value or higher, or some value or lower, and this belief logically requires you to accept the conclusions that follow from that belief, in the very manner BT entails.

Again, for example, if you cannot deny that the prior probability that the sun went out for three hours is “not higher than 0.01,” and you cannot deny that the consequent probability that no one else would notice this is “not higher than 0.01,” and you cannot deny that the consequent probability is anything significantly less than 1 that some sacred writing about Jesus would claim the sun went out if that claim was fabricated, then you simply cannot deny that the epistemic probability that the sun went out is not higher than 0.001 / (0.001 + 0.99) = 0.00101, or roughly one tenth of 1 percent (and is likely much lower). You are logically obligated to agree that this is true, unless and until you can demonstrate any of those underlying probabilities to be different. Resorting to other methods of inference simply cannot extricate you from this obligation. At best they can only confirm the same result, and thus simply corroborate BT.

My conclusion is therefore inescapable. For each of the three premises in BT (the prior and the two consequents), for any claim h, always there is some probability that you will be confident declaring, such that any amount beyond it you would not be confident declaring. That probability is then the only one that will entail a conclusion you can be confident in—because to get a different conclusion out of BT, you must input a different probability, yet if you are inputting a probability you are not confident in, then you cannot be confident in whatever conclusion that that probability entails. Because every weakness in an argument's premises always translates to the conclusion. Thus a conclusion that can only be arrived at by affirming premises you are not confident are true is by definition a conclusion you are not confident is true. But for every probability you are confident in, BT entails a conclusion you are logically compelled to be equally confident in.

There is no way around this. Because no matter how ignorant you claim to be, some value always necessarily follows for P(h|e.b), since by definition that is a probability conditional on what you know, and therefore a probability follows even if what you know is nothing (that probability would always be 0.5). So there is always a Bayesian probability that h is true given everything you know. This is due to the key difference between epistemic and physical probabilities as demarcated in chapter 2 (page 24). With regard to physical probabilities, you can legitimately say you simply have no idea what the probability is (and therefore you are not obligated to pick any one of the only logical possibilities available), but you cannot legitimately say this with regard to epistemic probabilities. To say you have no idea, before looking at evidence e, whether h or ~h is true logically entails that for you (i.e., so far as you presently know), h is as likely as ~h (until you see some evidence). Because the only way you can claim to know that h is not as likely as ~h is to claim to know that h is more likely than ~h or that h is less likely than ~h; but you have already affirmed that you do not know either. Therefore, so far as you know, h is as likely as ~h. Which when translated into mathematical notation simply means P(h|b) = 0.5. So even denying there is a probability always entails there is a probability—for you, and given the information you have at that point in time. It follows that you are always in some state of knowledge or ignorance that entails an epistemic probability for any h according to BT, and no other method of inference can validly contradict it. BT therefore underlies all valid empirical reasoning. Or so I contend.

Even if you disagree with that broader thesis, you still cannot deny that BT underlies all valid historical reasoning, as that at least I formally proved earlier. Applying this knowledge now allows us to test the validity of any methodological principle in the study of history. Two major examples are the ‘Argument from Silence’ and what's amusingly called the ‘Smell Test.’ The following analysis of these can serve as a model by which to evaluate any other methodological principle in history.

BAYESIAN ANALYSIS OF THE ‘SMELL TEST’

The ‘Smell Test’ is a common methodological principle in the study of myth, legend, and hagiography. This test can be most simply stated as “if it sounds unbelievable, it probably is.” When we hear tales of talking dogs and flying wizards, we don't take them seriously, even for a moment. We immediately rule them out as fabrications. We usually don't investigate. We don't wait until we can find evidence against the claim. We know right from the start the tale is bogus. Yet the only basis for this judgment is the Smell Test. Is that test valid?

It is certainly ubiquitously accepted by historians in every field. It is suspiciously only rejected by religious believers, and then only when it's applied to amazing claims they prefer to believe. They ground this rejection in the claim that we shouldn't be biased against the supernatural, and God can do anything. Yet if they honestly believed in those principles they would be compelled to concede the miracle claims of every religion “because you shouldn't be biased against the supernatural, and God can do anything.” This includes all the pagan miracles (incredible apparitions of goddesses, mass resurrections of cooked fish, wondrous healings, and teleportations), Muslim miracles (splitting moons, wailing trees, flights to outer space), Buddhist miracles (bilocation, levitation, creating golden ladders with a mere thought), and indeed every and any amazing claim whatever. Tales “proving” reincarnation? We can't reject them—because God can do anything. Ghosts confirming to the living that heaven is run by a Chinese magnate and his staff? We can't rule it out. That would be bias against the supernatural.

Honestly living that way would be impossible. You would have to believe everything you read or hear unless you can specifically present evidence sufficient to discount it: an impossible task. You would be left with a belief system hopelessly frightening and contradictory—and mired in a thousand false beliefs. Such behavior also goes against all established background knowledge, which contains endless examples of miracle claims refuted by fortuitous inquiry (and no good case of any miracle claim surviving such inquiry).¹⁵ In other words, our bias against the supernatural is warranted, just as our bias against the honesty of politicians is warranted: we've caught them being dishonest so many times it would be foolish to implicitly trust anyone in politics. Likewise, amazing tales: we've caught them being fabricated so many times it would be foolish to implicitly trust any of them.

The Smell Test thus represents an intuitive recognition of: (a) the low prior probability of the events described (i.e., P(h|b) << 0.5); (b) the ease with which the evidence could be fabricated (i.e., P(e|~h.b) is always high, unless we have sufficient evidence to the contrary), in fact often the ease with which such an event if real would produce or entail much better evidence (i.e., P(e|h.b) is often low); (c) how typically miracle claims are deliberately positioned in places and times where a reliable verification is impossible (and when such verification is possible, are refuted), which fact alone makes them all inherently suspicious; and (d) sometimes the similarity of a miracle story to other tales told in the same time and culture is additionally suspect, like the odd frequency with which gods in the ancient West rose from the dead, transformed water into wine, or resurrected dead fish, oddities that curiously never occur anymore, and which are so culturally specific as to suggest more obvious origins in storytelling.¹⁶

Both (c) and (d) can raise the consequent probability of nonmiraculous explanations, and also reduce the consequent probability of the miraculous. But condition (a) is the point just made: such claims are contrary to reality as we know it. This doesn't just mean only miracles, but all wonders, like implausible coincidences and unrealistic social reactions and behaviors. Hence, the issue is not the presumption that miracles never happen, but the documented fact that, if they happen, they happen exceedingly rarely (just as implausible coincidences and unrealistic human behaviors do), whereas false tales of the fantastic happen with exceeding frequency; and likewise, the fact that miracles suspiciously happen all the time only in historical periods (or geographical regions) that are comparatively illiterate, superstitious, or unenlightened, in conditions lacking the means of verifying no shenanigans were involved (in either the event or its telling), whereas in ages and places where we have widespread education and organized skepticism and the tools and opportunity to test wild claims, the phenomena always disappear. Both are established facts in b(our background knowledge) and thus all our estimates of probability must be conditioned on these facts. Even if you are a firm believer in the miraculous, the facts remain the same: most wondrous claims (by far) are bogus. Your priors must reflect that, regardless of your worldview. And like the Tibetan peasant claim in chapter 3 (page 72), when we lack specific evidence to confirm a claim, or lack the means to verify it by reliable tests, the priors must dictate what is reasonable to believe. That reasoning is both logically valid and sound. Thus, BT confirms that the Smell Test is valid, even on point (a) alone.

Conditions (b), (c), and (d) only strengthen this conclusion. Even outside the context of wondrous claims, ancient texts are full of lies and falsehoods, even when generated by eyewitnesses, contemporaries, and critical historians, or anyone who ought to have known better.¹⁷ Our background knowledge also establishes how easy it is to rapidly fabricate and disseminate false stories, even without challenge (like the darkening of the sun with which we began in chapter 3; and more examples I'll explore in the next volume), and how easy it is for a claimed miracle to entail evidence we curiously don't have. The darkening of the sun predicts a vast quantity of evidence that, by not existing, disconfirms the story. Likewise, the frequency of resurrection stories in antiquity entails a phenomenon that should still be observed with the same frequency, yet is not (except in such mundane ways as to refute any miracle claimed to be analogous—such as from the application of CPR and ordinary cases of misdiagnosed death). Thus, the disappearance of this phenomenon is an unexpected piece of evidence on the theory that any resurrection is real, just as the disappearance of angels and gods who used to descend and deliver speeches with surprising frequency in antiquity is unexpected on the theory that these things ever really happened. It's not impossible that “things just changed,” but it is improbable—because we cannot predict from any established theory that such a change would indeed have happened, much less happened conveniently as soon as we had better methods and means to test such claims (and it is precisely that coincidence that is otherwise very improbable). Any logically valid argument must take this improbability into account.¹⁸Thus, incredible claims can only pass the Smell Test if they have correspondingly strong evidence in their support, which means evidence that is even more improbable on the claim's being false than the claim's being true is already improbable on prior considerations ((a) through (d)). For example, if in all past cases a claim's being true is a tenth as likely as its being false, then to believe that claim we need evidence that's over ten times more unlikely on any other explanation. That is, if P(~h|b) = 10 × P(h|b), then P(e|h.b) must exceed 10 × P(e|~h.b) for h to be credible; and if P(~h|b) = 1,000 × P(h|b), then P(e|h.b) must exceed 1,000 × P(e|~h.b) for h to be credible; and so on. In other words, the Smell Test simply reduces to the principle “extraordinary claims require extraordinary evidence” (see chapter 3, page 72; chapter 5, page 177; and chapter 6, page 253). Which means the Smell Test reduces to BT.

BAYESIAN ANALYSIS OF THE ARGUMENT FROM SILENCE

Historians routinely rely on Arguments from Silence: when something isn't said or attested, we conclude it didn't happen. Such reasoning is often challenged with the quip “absence of evidence is not evidence of absence.” But the truth is, absence of evidence is evidence of absence—but only when that evidence is expected. You also sometimes hear the axiom “you can't prove a negative,” but that's also false. Negatives are often quite easy to prove and we prove them all the time. In fact, logically, every positive claim entails a converse negative claim, thus merely in the act of proving a positive we have always proven a negative; often a great number of them.

The question of whether Jesus existed, for example, would be decisively proven in the negative by the recovery of an authenticated letter signed by the Apostle Peter outright saying that Jesus was only a cosmic being whose sojourn on earth was merely a symbolic myth, and who was only known to anyone through mystical perception. And we could have had a great deal more evidence than that—as we do for the ahistoricity of Betty Crocker, for example. Hence, proving a negative in principle is no difficulty. The ahistoricity of Moses and Abraham and all the other patriarchs is now generally accepted by scholars the world over as an established fact, quite rightly, and yet without even need of such a smoking gun as a contemporary epistle declaring them a fiction.¹⁹ But can we validly argue that if Jesus didn't exist we would have such a letter from Peter (or any such evidence), and therefore the fact that we don't argues against the notion? Unfortunately, no, because we have little reason to expect such evidence to have survived for us to now have it. Indeed, there would be no reason for anyone actually to say Jesus didn't walk the earth until someone started saying he did. If that only happened after Peter died, he won't ever have written a letter gainsaying it. Whether we can expect someone to have done so, however, is a question I must ask in the next volume.

For the present, our concern is with when an Argument from Silence is valid and sound—and when it is not. The logical conditions have already been correctly stated:

To be valid, the argument from silence must fulfill two conditions: the writer whose silence is invoked in proof of the non-reality of an alleged fact, would certainly have known about it had it been a fact; [and] knowing it, he would under the circumstances certainly have made mention of it. When these two conditions are fulfilled, the argument from silence proves its point with moral certainty.²⁰

That would be a slam-dunk case. But a relatively weaker deployment is possible, to the extent that either condition is less certain. So it may only be “somewhat certain” that the relevant authors knew the fact and would mention it, in which case this argument can produce only a “somewhat certain” conclusion. Generally speaking, based on the hypothesized fact itself, and in conjunction with everything we know on abundant, reliable evidence, should we expect to have evidence of that fact? If the answer is yes, and yet no such evidence appears, then an Argument from Silence is strong. If the answer is no, then it's weak. Not having more evidence of the sun going out (examined in chapter 3) is a strong Argument from Silence, but not having a letter from the first Apostles explicitly declaring Jesus a fiction is weak. The examples of Caesar shaving or playing dice with a hooker (examined in chapter 2) are weaker still, being exactly what historians have in mind when declaring absence of evidence is not evidence of absence. Yet as the sun case proves, that rule does not always apply.

Once again, BT describes the logic of this argument. If on h we should expect some evidence e₁ given b (all our background knowledge) and yet we don't have e₁, then the consequent probability of h must be reduced—by exactly as much as lacking e₁ is unlikely (because the absence of that evidence is a part of the full e that must be explained by h). This same rule operates on the consequent of ~h as well, if ~h entails evidence we don't have. The tricky bit is the effect b has on this estimate. Our background knowledge establishes a very low expectation for the survival of evidence from antiquity, particularly the kind of evidence we would expect if Jesus didn't exist (I shall discuss the more generic problem of ‘lost evidence’ in chapter 6, page 219). However, that same background knowledge establishes a rather high expectation that the evidence that didsurvive would possess certain characteristics, and some scholars have argued that the surviving evidence is of a different character entirely. In my next volume I will discuss this oddity and how it might be dealt with. The point to observe here is that the Argument from Silence is a commonly accepted historical tool, and is logically valid precisely and only when it conforms to BT.

What else will Bayes's Theorem teach us about the methods particularly used by Jesus scholars? To that we now turn.

Page

Contents

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!