Modern history

CHAPTER FOUR

Conducting Empirical Tests

The world can never be well known by theory: practice is absolutely necessary.

—Lord Chesterfield

In theory there is no difference between theory and practice. In practice there is.

—Yogi Berra

CHAPTERS 2 AND 3 DEVELOPED these central hypotheses: battle outcomes affect war-termination offers; third-party activities affect war-termination offers; severe postwar commitment fears encourage a state to ignore combat outcomes and pursue absolute victory; fears of escalating costs can push a fearful state to accept a limited outcome; if a fearful state sees almost no chance of eventual victory, it will make concessions to end the war; capture of a good that reduces the commitment problem can encourage a belligerent to accept a limited war outcome. This book tests these central hypotheses (the cases did not lend themselves to testing the peacekeeping hypothesis presented in chapter 3). This chapter describes how these hypotheses were tested. The first section describes the principal mode of analysis employed, qualitative case study techniques. The second section explores the possibility of executing quantitative tests, explaining why conducting such tests with satisfactory validity is difficult.

A QUALITATIVE APPROACH

In this book, the dependent variable is the decision of a wartime belligerent at a particular point in a war to demand more concessions, offer more concessions, or make no change in its war-termination offer. A war-termination offer is a belligerent’s proposal to end the war under certain described conditions, such as redrawing an international territorial border or the change in the national leadership of the adversary. Note that there can be several cases (decisions) per belligerent per war. To test the hypotheses, an empirical sample needs to include cases of intrawar behavior in which belligerents made decisions about war-termination behavior, namely whether to raise, lower, or leave unchanged their wartermination offers.

The central empirical approach here is qualitative, the in-depth examination of war-termination behavior in a small number of wars. The specific strategy taken, especially within the individual empirical chapters, is within-case analysis, employing process tracing. Within-case analysis “focuses not on the analysis of variables across cases, but on the causal path in a single case.”¹ Process tracing, “a procedure for identifying steps in a causal process leading to the outcome of a given dependent variable of a particular case in a particular historical context,”² is almost by definition an integral part of within-case analysis.

Process tracing enjoys some important advantages over a pure congruence method, in which the covariation between independent and dependent variables is observed. Establishing universal rules for combat success and combat failure is difficult, because interpreting battle outcomes is highly context-dependent. Sometimes, capturing territory constitutes combat success, and sometimes it does not. The only way to assess with high confidence whether the hypothesized information proposition relationship of combat failure making concessions more likely is to assess directly how a particular combat outcome was interpreted by the national political leadership, and then whether and how that interpretation was translated into a change in war-termination demands.

This book analyzes the war-termination behavior of belligerents in several wars. An array of wars and their belligerents have been included in the sample, including the American Civil War, the Korean War, World War II in both Europe and the Pacific, World War I, the Winter War, and the Continuation War. Although this sample is not large or random, it does have several important virtues. First, each war contains several different war-termination decisions, meaning that the size of the sample of decisions exceeds twenty, providing more information with which to test the theory. Second, and relatedly, there is variance in the dependent variable. For the variety of war-termination decisions, in some instances greater concessions were demanded (as after the Union victory at Gettysburg, after the Soviets broke the Mannerheim line in Finland in 1940, and on the U.S. side of the Korean War in September 1950), in some instances greater concessions were offered (as on the Soviet side in the Continuation War, on the Finnish side towards the end of the Winter War, and on the Japanese side in August 1945), and in some instances there was no change in war-termination demands (such as Japan during almost all of the Pacific War, Britain in 1940, and the Confederacy during almost all of the American Civil War). Relatedly, some belligerents maintained limited war aims (such as the Soviet Union in the second half of the Winter War, the U.S. in the Korean War after the Chinese intervention, and Japan during World War II), and some pursued absolute war aims (such as the Allies during World War II, the Soviet Union at the outset of the Winter War, and the U.S. in the Korean War in September 1950).

Third, there is variation in the independent variables, and there is variation in information received on combat capabilities. Belligerents received favorable and unfavorable information from the battlefield, since sometimes belligerents got encouraging information (such as the Union at the end of the Civil War and the Soviet Union in February–March 1940), sometimes belligerents received discouraging information (such as the U.S. in summer 1950, the Soviet Union in 1941, and the Soviet Union in December 1939–January 1940), sometimes the information the belligerent received was expected (such as Japanese successes up through middle 1942), and sometimes it was unexpected (such as the Red Army’s setbacks against Finland in December 1939 and the poor performance of South Korean and American forces in summer 1950). There is also variation in third-party behavior, as sometimes third parties do intervene, which in turn changes belligerents’ calculations (such as China’s intervention in the Korean War in October 1950 and Soviet intervention against Japan in August 1945), sometimes third-party intervention seems possible but never arrives (such as plans for Anglo-French intervention in the Winter War), sometimes there is the hope for long-term third-party intervention that does eventually arrive (British recognition in 1940 that American intervention was not imminent, though possible in the medium or long term), sometimes third parties consider but then dismiss intervention (such as the European powers eventually declining to intervene on behalf of the Confederacy during the Civil War), and sometimes there is no variation in the role of third parties (Germany remained a Finnish ally throughout the Continuation War).

There is variation on some of the commitment-related variables. Generally, there is little consequential variation on the central commitment variable, fear that the adversary might renege on a war-ending deal. The theoretical discussion in chapter 3 explains this minimal variation. Anarchy means that a state is always fearful of the possibility of another state violating the terms of an agreement. The temptation to break an agreement is high, both because of greed and fear. Although trust can be a basis for making cooperation between states possible, belligerents have especially low levels of trust, both because of the ongoing violence between them, and because the initiation of the war likely broke a neutrality or border agreement.

There is variation on other variables which mediate the effect of credible commitment concerns on war-termination behavior. The theory predicts that belligerents’ fears about adversaries breaking postwar agreements should be more intense when the postwar balance of power changes, when first strike advantages might appear, and/or when the preferences of the adversary change, perhaps because of a change in leadership. Some belligerents were fearful that troop demobilizations following war’s end would shift the balance of power and create a temporary window of opportunity for the adversary (the U.S. in 1950 fearing that troop withdrawal would encourage a North Korean reattack). Sometimes belligerents’ fear shifts in the material balance of power (such as U.S. fears about the growing power of the Communist bloc in 1950, as well as belligerents such as Britain 1940 and the Union in the Civil War, which worried that the terms of the peace deal would cause a shift in the balance of power; see next paragraph). Some belligerents feared that possible domestic political changes might encourage the adversary to renege on an agreement (the Confederacy feared that laying down its arms and rejoining the Union would tempt Northern politicians to ignore moderate reconstruction terms,³ and Roosevelt in 1942 feared that less than absolute victory might permit a repetition of the collapse of the 1919 Versailles agreement, which was caused by the rise of Hitler and the Nazi Party). Some feared that the adversary might in the future attract a powerful ally (the Soviet Union feared in 1939 that Finland would join Germany or Britain in a future war). Fear existed among some belligerents that in the future conditions favorable to a new war would present themselves, and the adversary would reattack (such as general German paranoia of Britain and France in World War I). This last set of belligerents understood that a change in the balance of power can make war more likely, but also projected that variations in the balance of power are inevitable, and sooner or later an attractive opportunity for a new war will reappear. In other words, fear of an unfavorable shift in the balance of power is endemic, and such belligerents maintained these fears without necessarily deep consideration of the structural determinants of the balance of power (e.g., the comparative growth in each side’s population).

Variance occurred across the cases to the extent that the good itself affected the intensity of the commitment problem. Sometimes belligerents perceived that capture of some increment of the good would reduce the commitment problem (such as German capture of Belgium in World War I and Soviet capture of bits of Finnish territory in the Winter and Continuation Wars), sometimes they feared that making concessions on the good exacerbated the commitment problem (such as Churchill’s fear that a peace deal with Hitler in summer 1940 would require the sacrifice of British naval power, Lincoln’s concerns about the dangers of giving up on the emancipation issue, and American fears in the Korean War that concessions on the prisoners of war issue would swing the balance of power), and sometimes they perceived that the good had no effect on the balance of power (Japan felt confident that abandoning its colonial claims would not affect the balance of power or the likelihood of American compliance with a peace deal).

Variance also occurred in belligerents’ perceptions about the costs of continuing to fight, and the chances for eventual victory if fighting continued. Some fearful belligerents receiving discouraging information still maintained some hope of eventual victory (such as Britain in 1940 hoping for eventual American intervention to turn the tide, the U.S. in 1942 counting on long-term economic mobilization, and Japan up to August 1945 hoping that it could eventually force the U.S. to make concessions if it could inflict on the U.S. a decisive battle defeat), and some fearful belligerents had almost no hope of eventually winning (such as Japan in August 1945 and perhaps the Soviet Union in 1941). Some belligerents feared that the costs of continuing to fight the war and/or pursue absolute victory promised to escalate steeply (such as Japan in August 1945, the Soviet Union in February–March 1940, and the United States after Chinese intervention) and some belligerents saw the costs of continuing to fight the war as being acceptable (such as the U.S. in World War II and the Union in the Civil War). There is virtually no variation in the promise of postwar peacekeeping (with the arguable exception of U.S. discussion from 1951 forward of postwar troop deployments in Korea), so that hypothesis was not tested.

The main hypotheses aside, the cases vary along other potentially relevant dimensions. There are wars with roughly symmetric power relationships (World Wars I and II) and wars with more asymmetric power relationships (Korea, the Winter War, and the Continuation War). The sample includes wars taking place in Asia, Europe, and North America. The wars take place across a wide temporal range, from the mid-nineteenth to the mid-twentieth century. The duration of the wars varies from months (such as the Winter War) to years (such as the Civil War and Korean War). The belligerents include democracies (the U.S., Britain, and Finland), dictatorships (the Soviet Union), and mixed regimes having characteristics of both (Germany in World War I and Japan).

The analysis in each empirical chapter usually focuses on a handful of key decisions in each war, rather than evaluating all war-termination decisions throughout the war. The chapters are designed to test the variety of propositions presented in chapters 2 and 3. One goal is to test the information propositions of chapter 2 as being themselves sufficient explanations of war-termination behavior since some might consider the information propositions alone to be the conventional wisdom as to how states make war-termination decisions. In service of this latter goal, many of these key decisions constitute “easy” tests for the information proposition alone. They are episodes in which incoming information was clearly encouraging or discouraging for at least one belligerent, meaning that independent variables of combat outcomes and expectations about the war’s future course are easily coded, and the information propositions can make clear predictions. If information (such as combat outcomes) is highly favorable, then an approach focusing just on information would strongly predict that the belligerent should demand more concessions. If information is highly unfavorable, then the information-only approach would strongly predict that the belligerent should offer more concessions.

Copyrighted image removed by Publisher

Table 4.1 lists the twenty-two cases of belligerent war-termination decision-making analyzed across chapters 6–10.

The case studies explore decision-makers’ perceptions across a number of variables, including commitment fears, perceptions about the current and future course of the war, the likelihood of third-party intervention, and so on. The cases rely on a variety of types of evidence of decisionmakers’ perceptions and beliefs, such as public and private statements by the principals themselves. The evidence will be assessed critically since such evidence suffers from at least three potential flaws. First, individuals sometimes misstate their intentions, either strategically or because of simple error. Franklin Roosevelt falsely declared that the idea of unconditional surrender “popped into his mind” at the January 1943 Casablanca conference. We now know that by January 1943 he had been committed to this idea for at least several months.⁴ Winston Churchill falsely declared in his history of World War II that no one in May 1940 considered negotiating with Adolf Hitler. U.S. government official John Allison mischaracterized the events of the first weeks of the Korean War in his autobiography.

Some might say that diary entries are the best available source of evidence of intentions. Because they are reflections offered immediately, they are, making them also the freshest recollections, undecayed by the passage of time. They are less likely to be aimed at a particular audience, too (in contrast to, for example, a public speech). However, even diary entries have limitations. Sometimes diary entries are more reflective of emotional tides than a considered opinion that actually motivates decisions and policy-making. For example, in two separate diary entries in 1952, Truman fumed that the only way to break the deadlock in the Korean War negotiations with the Communists would be to threaten them with nuclear destruction. He wrote in his diary that the U.S. should demand that the Communists accept peace terms within ten days, and if they did not, “This means all out war. It means that Moscow, St. Petersburg, Mukden, Vladivostok, Peking, Shanghai, Port Arthur, Darien, Odessa, Stalingrad and every manufacturing plant in China and the Soviet Union will be eliminated. This is the final chance for the Soviet Government to decide whether it desires to survive or not.”⁵ Of course, the U.S. never adopted such a policy, nor did Truman ever introduce it for discussion among his advisers.

Second, the documentary record is incomplete, and it is sometimes systematically incomplete. For example, an important question regarding the German decision not to negotiate in 1917–18 concerns whether or not German economic special interests influenced German political and military decision-making. One historian claims that the personal records of a key military staff member who may have had industrial ties were systematically purged of any documents that might have indicated a link.⁶ The official minutes of the 1942 United States committee that first drew up the unconditional surrender demand were merely summary, generally truncating or omitting the arguments made by the committee members in those sessions.⁷ The official minutes of the National Security Council are also quite brief, merely summarizing themes discussed rather than elaborating specific arguments.

Third, scholars themselves sometimes differ in evaluating the same materials. Sometimes differences occur in the translation of foreign language documents into English.⁸ Differences also crop up in how to treat different documentary materials. In examination of Japanese war termination at the end of World War II, Tsuyoshi Hasegawa and Robert Frank differ over how to treat documentary records regarding Hirohito’s statements at the crucial August 15, 1945 conference in which statements were made describing the motivations behind the Japanese decision to surrender.⁹

The empirical tests also consider competing explanations for the observed war-termination decisions. Perhaps the most prominent alternative set of war-termination explanations concerns domestic politics. Chapter 1 presented two different domestic politics explanations: the first that because democratic leaders are especially casualty-sensitive they are more likely to make concessions as the costs of war mount; and the second being that mixed regimes are more likely to increase their war aims when they face moderate defeat in order to increase the likelihood of achieving a victory that will provide sufficient goods to distribute to the leader’s governing coalition, thereby keeping the leader in power. Each chapter will assess one or both of the domestic politics explanations as appropriate, as well as other possibly relevant explanations.

The empirical analysis serves two further functions beyond testing hypotheses. One function is to present and solve specific empirical puzzles. Although external validity and generalizability are leading goals of empirical research, the satisfactory explanation of individual, historically significant episodes is also an important goal. The cases in this book answer some important historical questions, such as: why the United States escalated the Korean War to pursue the conquest of North Korea, a decision that directly led to Chinese involvement and a three-year war costing hundreds of thousands of lives; why the American Civil War dragged on for so long, costing hundreds of thousands of lives and gouging scars in the American social, political, and geographic landscape felt to this day; why Germany in 1917–18 declined the opportunity to end the war and digest its vast gains in the East, choosing instead to fight on, eventually incurring its own decisive defeat; why during World War II even in the darkest moments the Allies refused to concede, and instead pressed on for total victory and the annihilation of humanity’s most dangerous creation, militaristic fascism.

A second function is theory building. Political science has greatly benefited from the spare and fruitful bargaining metaphor as a way of enriching our theoretical speculation about the nature of war. However, these very clean ideas have almost never been applied to the very messy reality of actual war. Empirical tests of these ideas will provide greater insight as to which components of the theory are worth keeping, which appear worth limiting or outright rejecting, and where there is unexplained variance demanding greater theoretical elaboration and development. These issues are given special treatment in the conclusion.

THE DIFFICULTIES OF CONDUCTING QUANTITATIVE TESTS

Although quantitative methods would provide advantages of rigor and external validity, severe internal validity problems prevent their application here. Consider first that there are two possible approaches to building a quantitative dataset appropriate for testing the hypotheses of chapters 2 and 3, assuming one has a predefined dataset on all (interstate) wars such as that produced by the Correlates of War project. The first approach would be to break up each war into separate combat events, like battles. Each case would be a single battle, and testing the central information perspective hypothesis would require, in particular, coding the outcome of each battle, which would then have predicted impacts on war-termination behavior. However, some wars, such as insurgencies, do not lend themselves to being accurately characterized as a string of discrete battles. The appropriate size of a battle would also differ across wars, making it difficult to establish a universal definition of battle size.

An alternative to using battle-level data would be to use time units during war as cases, such as the war-month. In this approach, for each war-month (for example) the independent variables are coded (such as which side achieved greater success in battle) and the dependent variables are coded (how did war-termination offers change, if at all). In a particular war-month, there may be no battles, one battle, or several battles, and a single aggregate coding for each war-month would be produced.

Difficulties arise in choosing exactly what time period to choose— for example, war-year, war-month, or war-week. The disadvantage of choosing larger time periods (such as a month or year) is that some wars may last less than a single time period, making it impossible to assess whether the flow of battle information changed war-termination behavior. However, using smaller time periods such as days or weeks introduces problems, as daily and weekly combat performance data is generally unavailable in even the best documented war. Further, reducing the time slice may make it too easy to falsify the information proposition that combat results cause changes in war-termination offers. Leaders rarely have the organizational capacity to make daily adjustments in their war-termination offers, so the record would reveal lots of variance on the independent variable of combat outcomes, and little variance on the dependent variable of war-termination behavior.

The issue of choosing the unit of analysis aside, testing the information proposition requires coding data on intrawar combat performance. To code outcomes quantitatively, there are three general approaches: coding casualties, coding the gain or loss of territory, and a more subjective coding of who wins each battle. A general problem with the first two approaches is the heterogeneity of wars since sometimes leaders focus more on territorial gains and losses than on casualties (such as Germany and the Soviet Union in World War II), and sometimes they focus more on casualties (such as the U.S. in the Vietnam War). It gets more complex. Among belligerents focusing on casualties, sometimes leaders focus on relative casualty rates (such as the U.S. in the Vietnam War), and sometimes they focus on the absolute casualty rates of their adversary (such as Japan in the Pacific War and North Vietnam). This can mean a battle can be both a tactical defeat and a strategic victory. In the December 1862 Battle of Fredericksburg during the U.S. Civil War, even though the Confederates repelled the assault and suffered half as many casualties as Union forces, Lincoln speculated “that if the same battle were to be fought over again, every day, through a week of days, with the same relative results, the army under [Confederate General Robert E.] Lee would be wiped out to its last man, the Army of the Potomac would still be a mighty host, the war would be over, the Confederacy gone.”¹⁰ The Confederate leadership saw it the same way. Confederate Vice President Alexander Stephens wrote in his memoirs that the Union “could afford to lose any number of battles, with great losses of men, if they could thereby materially thin our ranks. In this way by attrition alone, they would ultimately wear us out.”¹¹

Indeed, leaders often talk about accepting tradeoffs between measurable indices of progress, such as territory, lives, and time, and the key point is that what leaders value (and hence what they consider to be an indicator of success) varies widely across wars. Sometimes a military sacrifices lives for time, as when the British government in May 1940 ordered a force contingent at Calais to fight to the death in order to increase the amount of time that the rest of the British Expeditionary Force would have to evacuate from the beaches of Dunkirk.¹² During the October 1941 battles of Bryansk and Vyazma between the German and Soviet armies, the Germans killed, wounded, or captured nearly a million Soviet soldiers at the cost of far fewer German losses. However, this was in some sense a strategic victory for the Soviet Union, as these human losses bought the Red Army critically needed time to ready the defenses of Moscow.¹³ Sometimes, giving up territory is part of a military’s defensive scheme, such as employing a defense-in-depth strategy to meet a blitzkrieg attack.¹⁴ In 1812, the Russian military welcomed the long French advance towards Moscow, knowing it would be Napoleon’s undoing. General Mikhail Kutisov remarked that, “Napoleon is a torrent which we are as yet unable to stem. Moscow will be the sponge that will suck him dry.”¹⁵ On an operational level, sometimes an adversary will give up territory temporarily as a feint. Hitler lured France into moving its forces forward into Belgium, making Allied forces vulnerable to encirclement following the German breakthrough in the Ardennes.¹⁶ Making matters even worse, the relative importance of a particular indicator of success may vary not only across wars, but may also vary within wars. In the Korean War, the U.S. (and probably the Communist side as well) shifted in 1951 from a focus on territory to a focus on casualties.¹⁷ Fortunately, these heterogeneity/contextuality problems can be solved with a qualitative approach, appreciation of the specifics of the war and the belligerent’s strategy for victory can indicate what constitutes favorable combat outcomes and what constitutes unfavorable combat outcomes.

These conceptual issues aside, there are severe data availability problems. More objective data on combat outcomes is scarce. Data on territorial control during war is even harder to find, and essentially impossible to collect given the generally thin description in the historical record of the exact location of front lines. Casualty data are very difficult to come by. For example, there are not even official aggregate casualty estimates (never mind casualties on a monthly basis) on the Communist side of the Korean War. Estimates vary widely across Chinese and non-Chinese sources, from 152,000 dead to 3 million dead.¹⁸ In the American Civil War, official battle casualty estimates are about 75 percent missing on the Confederate side.¹⁹

In sum, single indicators of combat success like casualty ratios or territory captured are unlikely to capture the variety of meaning of combat outcomes within and across wars. Another approach might be to code battle outcomes subjectively, classifying them on the basis of a comprehensive view of the historical record as being a win for one side or the other, or a draw. HERO, the only available dataset that provides comprehensive battle-level codings, suffers severe limitations.²⁰ More generally, a substantial amount of judgment is required for the coding of each case, because it is based on the context of belligerent political–military strategy, and the available evidence on leader perceptions. The better strategy is to make these coding decisions transparent in the context of case studies, rather than obscure the thinking behind each individual coding by merely assigning a numerical value to each coding.

In sum, there are simply too many serious problems to permit the widespread application of quantitative analysis techniques for testing the hypotheses in chapters 2 and 3. The main empirical strategy of this book is the use of case studies. The next six chapters apply case study methods to an array of conflicts, fought over more than a century, to help understand how belligerents try to end their wars.

Page

Contents

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!