Quantitative and Econometric Methodologies

Authored by: Govinda Clayton

Routledge Handbook of Civil Wars

Print publication date:  February  2014
Online publication date:  February  2014

Print ISBN: 9780415622585
eBook ISBN: 9780203105962
Adobe ISBN: 9781136255786




Quantitative research is a form of inquiry based upon the collection and analysis of numerical data. The quantitative method has two principal purposes: to describe the main features of a body of data (descriptive statistics) and to make conclusions that extend beyond the data being observed (inferential statistics). Both descriptive and inferential statistics have led to significant advances in our understanding of civil conflict. Descriptive analyses of conflict databases have provided insights into the characteristics of individual conflicts, and helped to reveal larger trends in the nature of contemporary violence. For instance, quantitative analysis has provided us with a greater appreciation of the frequency and deadliness of civil conflict, highlighted the geographic distribution of civil strife and illustrated the relatively consistent decline in all forms of violence over the past two millennia (e.g. Lacina and Gleditsch 2005; Pinker 2011; Themnér and Wallensteen 2012). Inferential statistics also play a central role in the civil war research programme. Scholars using econometric tools have uncovered much of what we now know about the onset, duration and outcome of civil war. For example, quantitative methods have been responsible for the widespread consensus that now exists on the conflict-inducing effects of factors such as inequality, low economic opportunity, natural resources, ethnic dominance and political instability (e.g. Cederman et al. 2011; Cederman et al. 2010; Collier and Hoeffler 2004; Fearon and Laitin 2003). Quantitative literature is also at the heart of the key controversies within civil war studies. For example, the “greed vs. grievance” debate is largely a contest between quantitative scholars attempting to highlight the greater significance of economic or socio-political drivers of civil wars.

 Add to shortlist  Cite

Quantitative and Econometric Methodologies

Quantitative research is a form of inquiry based upon the collection and analysis of numerical data. The quantitative method has two principal purposes: to describe the main features of a body of data (descriptive statistics) and to make conclusions that extend beyond the data being observed (inferential statistics). Both descriptive and inferential statistics have led to significant advances in our understanding of civil conflict. Descriptive analyses of conflict databases have provided insights into the characteristics of individual conflicts, and helped to reveal larger trends in the nature of contemporary violence. For instance, quantitative analysis has provided us with a greater appreciation of the frequency and deadliness of civil conflict, highlighted the geographic distribution of civil strife and illustrated the relatively consistent decline in all forms of violence over the past two millennia (e.g. Lacina and Gleditsch 2005; Pinker 2011; Themnér and Wallensteen 2012). Inferential statistics also play a central role in the civil war research programme. Scholars using econometric tools have uncovered much of what we now know about the onset, duration and outcome of civil war. For example, quantitative methods have been responsible for the widespread consensus that now exists on the conflict-inducing effects of factors such as inequality, low economic opportunity, natural resources, ethnic dominance and political instability (e.g. Cederman et al. 2011; Cederman et al. 2010; Collier and Hoeffler 2004; Fearon and Laitin 2003). Quantitative literature is also at the heart of the key controversies within civil war studies. For example, the “greed vs. grievance” debate is largely a contest between quantitative scholars attempting to highlight the greater significance of economic or socio-political drivers of civil wars.

Providing a review of the bourgeoning body of influential statistical studies on civil war is beyond the scope of this chapter, and much of this scholarship is discussed in the collection of thematic chapters later in the volume. Similarly, a detailed instruction of how to undertake quantitative analysis is not within the remit of this compendium. Instead this chapter will provide an overview of the quantitative study of civil war, focusing on the development of quantitative conflict studies; the basics of the quantitative method; the prominent sources of civil conflict data; and the strengths and weaknesses of using quantitative methods to analyse civil war.

The emergence of the quantitative method

Quantitative conflict research first emerged in the late 1950s in conjunction with the behaviourist revolution that swept the social sciences. In this period conflict studies began to mature into a fully-fledged academic discipline, bringing together scholars from a diverse range of subjects including economics, politics, history, anthropology and social psychology. This new generation of conflict scholars took conscious steps to model themselves on the natural sciences, embracing the positivist principles 1 of observation, empirical data and measurement. Researchers sought to acquire knowledge through the identification of patterns from within large collections of data, and used mathematical approaches to model social and international processes. To advance this new scientific analysis of conflict new research tools, concepts and journals were created.

Kenneth Boulding played a pivotal role in the behaviourist revolution, helping to promote a research programme centred on the collection and systematic analysis of data. In 1957 Boulding, along with mathematician-biologist Anatol Rapoport, social psychologist Herbert Kelman, and sociologist Robert Cooley Angell, set up the Journal of Conflict Resolution, which published and promoted conflict literature with a scientific methodology (Boulding 1957). Building on this success Boulding and Rapoport later launched the Peace Science Society, an interdisciplinary effort to develop an individual set of concepts, techniques and data to better understand and mitigate conflict (Ramsbotham et al. 2011). This was quickly followed by the release of the most influential dataset in conflict studies, the Correlates of War project (COW). Originally led by David Singer and Melvin Small (1966, 1970), the first iteration of COW project provided data on all conflicts from 1816 to 1965. 2 Importantly, the COW data also offered information on a number of explanatory variables, facilitating the statistical analysis of the determinants of violent conflict. At the same time publications such as Lewis Fry Richardson’s (1957) posthumously published The Statistics of Deadly Quarrels achieved widespread attention, helping to promote a variety of quantitative and econometric techniques that were previously uncommon in the study of conflict.

Yet the behaviourist principles advocated by Boulding, Singer and Richardson were not welcomed by all conflict scholars. Critics argued that behaviourists reduced the complexity of the social world to those aspects that could be measured, thus ignoring the wider body of factors driving human behaviour, such as ideas, beliefs, meanings and reasons (Kurki and Wight 2007). This struggle largely divided conflict theorists; on the one hand explanatory conflict scholars sought generalised inferences by codifying and measuring key concepts, while interpretive theorists rejected the generalising approach and instead focused upon the interpretation of the unobservable and immeasurable forms of action using qualitative, discursive and historical analysis (Kurki and Wight 2007). This division was largely seen in relation to geographic boundaries and institutional membership, with North American scholars commonly adopting the behaviourist approach and membership in the Peace Science Society, while scholars outside the United States more often rejected the behaviourist principles and leaned towards the International Peace Research Association (Isard 2001).

Today there remains a large divide between quantitative and non-quantitative scholars, with researchers often defined as much by their methodological approach as by the substantive area of research that they pursue. Some argue that this schism of conflict studies into quantitative–systematic–generalising and qualitative–humanistic–discursive is deepening, for

[a]s the former becomes more sophisticated in the analysis of statistical data (and their work becomes less comprehensible to those who have not studied their techniques), the latter becomes more and more convinced of the irrelevance of such analysis to the seemingly non-replicable and non-generalisable events in which its practitioners are interested

(King et al. 1994: 4) This division extends to all aspects of the discipline, from the form of graduate training programmes offered within different institutions, to the journals in which research using different methodologies are published. This is unfortunate for, as the final section of this chapter discusses, both quantitative and qualitative approaches are often required to generate a full understanding of civil strife.

The basics of the quantitative method

Quantitative research is essentially any form of analysis that utilises numerical data. Most of the concepts relevant to civil war studies – such as inequality and resource dependency – do not naturally assume a numerical value. Therefore the first stage of quantitative research is always to develop a method through which social concepts can be transformed into numerical values. This can involve the researcher devising the methods in which the phenomena of interest can be observed and systematically represented by numbers (the coding procedure), or alternatively drawing upon conventional methods of measurement used in other literature. Creating a dataset has obvious advantages, such as allowing the researcher to tailor the variables and methods of operationalisation to the requirements of a research question. However, creating datasets can involve a significant investment of time and resources, and therefore more commonly researchers rely upon growing collections of pre-existing data sources (see below), merging or altering collections to meet the individual researcher’s needs.

Once in possession of a dataset there is a wide range of analytical techniques available to a researcher. Broadly speaking these techniques fall into two categories: descriptive and inferential statistics.

Descriptive statistics

Descriptive statistics seek to illustrate the distribution of the data by providing simple descriptions of interesting characteristics. This commonly includes frequency tables, measures of central tendency and indicators of the level of dispersion. To describe data in the most parsimonious fashion researchers often rely upon a form of statistical modelling. A statistical model is a mathematical representation of reality which can be used to describe data. For instance, the mean (average) is a statistical model that measures the central tendency of a collection of data. By calculating the sum of each of the numbers in a dataset, and dividing this number by the total number of observations, you can represent the entire dataset using only one number. GDP per capita is an example of this technique, providing an approximation of the mean income of a state’s population.

Statistical modelling of this nature is relatively straightforward and can often be undertaken with only minimal training. Using basic statistical packages such as Excel or SPSS, a researcher can highlight the total number of civil conflicts in a time period, the average duration of these conflicts, and the level of variance across regions.

Descriptive analysis can often illuminate findings that were previously overlooked by researchers. In particular the graphical representation of descriptive statistics can more clearly illuminate important trends when dealing with large bodies of data. Figure 3.1 illustrates this process, representing the evolving frequency of different forms of conflict since the Second World War. The chart simply presents the total number of inter-, intra- and extrasystemic conflicts in the post-war period. In their raw form these data are unwieldy and challenging to appreciate, but when displayed in this manner they clearly highlight the evolving frequencies of different forms of violent conflict.

Armed conflict by type, 1946–2011

Figure 3.1   Armed conflict by type, 1946–2011 3

Descriptive statistics can also help to illustrate relationships between two (bivariate) or more (multivariate) variables. This can give an indication of a causal relationship, suggesting cases in which a change in one (independent) variable produces a change in a different (dependent) variable. This form of analysis cannot conclusively demonstrate a causal process, but can often highlight a relationship that is worthy of additional quantitative or qualitative assessment. For example, using descriptive statistics Lacina (2006) demonstrated that secessionist civil conflicts are almost as deadly (in absolute deaths and deaths per year) as non-secessionist conflicts, contradicting the previous belief that secessionist wars are limited on account of their geographically isolated nature. However, she also finds that wars of secession do induce far fewer deaths per capita, which she argues is the result of their tendency to occur with large populous states. This finding has motivated a number of subsequent studies into the influence of conflict type, population size and conflict location (e.g. Buhaug 2006, 2010; Raleigh and Hegre 2009).

Inferential statistics

Inferential statistics go further than descriptive analysis, allowing a researcher to make claims beyond those cases that are under investigation. Inferential statistics attempt to make predications, or inferences, about the wider population, using observations and analyses from a sample (a subset of the population selected for analysis). Put differently, by studying a subset of the population researchers attempt to produce findings that can be generalised to the larger population of which the sample is a part. This is achieved by applying a more sophisticated form of statistical model to the conflict data. Inferential models are based upon a series of probabilistic assumptions, based upon the distribution of the data, how the parameters of that distribution change over time, and the dependence of one observation on another. Once a model has been selected an estimator is chosen. An estimator is a function of the sample data that provides estimation for the unknown parameter. In most cases models can be estimated using a number of different estimators. While a number of the assumptions that justify the selection of an estimator can be assessed using statistical tests, more often theoretical and substantive knowledge are the best guides of model choice. Therefore the art of statistical analysis often lies in the researcher’s ability to select the most appropriate models and estimators in relation to their theory and data.

Regression is the most common technique for modelling the relationship between variables. Regression analysis estimates the typical change in the dependent variable that occurs when an independent variable is varied, while at the same time holding constant other variables that could plausibly account for the change in the dependent variable. The broad family of regression-based models offers researchers a wide range of tools for the analysis of all aspects of civil violence. The appropriateness of the different forms of regression models is dependent upon the theoretical assumptions and data. When research is attempting to explain a dichotomous outcome – such as civil war onset or termination – logit and probit models are the most common method of choice. In the logit model the log odds of the outcome is modelled as a linear combination of the predictor variables. In the probit model, the inverse standard normal distribution of the probability is modelled as a linear combination of the predictors. When an outcome takes more than two categories in which there is a clear ordering, but in which the space between values is not the same across all levels of the variable – for example, conflict management outcomes – then multi-nominal logit models are generally preferred. Alternatively when the outcome under evaluation is some form of time interval – such as conflict duration – then a duration or hazard model is often most appropriate. Finally, in those cases in which the sample selection is related to or correlated with the dependent variable – such as mediation onset/outcome – a Heckman or Sartori selection model is required. 4

In civil war studies most of regression analysis takes place on pooled cross-sectional datasets. This data specification generally has repeated observations (e.g. years) on fixed units (e.g. states). A standard pooled cross-sectional dataset would be formed of cross-sectional data on n states and t time periods, producing n × t number of observations. For example a dataset covering 100 states for 68 years (1946–2013) would produce a time series cross-sectional dataset of 6,800 observations.

Time series cross-sectional analysis has a number of advantages. First, this approach helps to reduce the small-N problem, which occurs when analysts are faced with a limited number of units (states) and/or a limited number of available data points in a time period (e.g. post-Cold War). This can lead to a problem of too many variables with too few cases, which occurs when a large number of potentially explanatory variables require assessment on a small sample (Landman 2003). 5 Second, time series cross-sectional analysis allows researchers to empirically assess variables that rarely (or never) vary (e.g. the presence of natural resources or institutional structure of a state). By assessing a wider pool of cross-sectional observations (across both time and space) the researcher increases the variability of the data (Hicks 1994: 170–171). Finally, time series cross-sectional analysis allows researchers to assess the variation across two-dimensions simultaneously. Rather than assessing the cross-section of cases at one point in time (e.g. all states in 2000), or one country across a distinct time period (e.g. Sierra Leone from 1946–2012), the analyst can assess all countries through time (e.g. all states from 1946–2012) (Podestà 2006). Using this approach, inferences can therefore be drawn on a wider range of cases. These significant advantages have enabled pooled analysis to assume a central role in the quantitative analysis of civil war.

However, time series cross-sectional analysis also presents a number of problems (for a more detailed account, see Beck and Katz 1995; Hicks 1994). These primarily relate to the violations of standard error assumptions. When multiple observations are generated from the same unit (state), it is likely that the errors in country j at time t are correlated with the errors country j at time t + 1. Similarly, there is more likely to be correlation between certain subsets of units. For example, regional trends could potentially lead Kenya and Uganda, and France and Germany to share common features while remaining independent of each other. The problems of serial correlation, temporal dependence, contemporaneous correlation and heteroscedasticity can potentially bias statistical results (Podestà 2006). Increasingly sophisticated methods have been devised to overcome these challenges, but they remain serious obstacles that quantitative researchers using pooled methods must continue to address.

Data sources

The collection of conflict data is a relatively recent phenomenon. The first systematically collected dataset was not released until 1937, when Pitirim Sorokin (1937) published his three-volume series that quantitatively assessed the temporal and qualitative changes in civilisations, in which the history of warfare was one element. Other early pioneers included Quincy Wright (1942) and Lewis Fry Richardson (1957), who both led the way in the collection of systematic conflict data. Conflict datasets in the form we now know them did not truly emerge until the 1960s with the launch of the Correlates of War project (COW).

The COW dataset has been hugely influential in all forms of conflict studies, and remains one of the most frequently utilised data resources (Eck 2005). The data include all major armed conflicts (involving 1,000 or more battlefield deaths) that have taken place since 1816, including interstate war (conflict involving at least one member of the international system on each side), extrasystemic war (imperial, colonial and internationalised civil war), international war (conflict involving only one member of the international system), civil war (conflicts fought within state borders between a government and non-government force) and inter-communal war (conflict fought between two non-governmental actors). In addition to the original data project, the COW dataverse includes a diverse range of variables (alliances, contiguity, material capabilities and trade) that make the COW data an indispensable resource for a wide range of empirical studies.

The rapid growth in quantitative conflict studies in the post-Cold War era led to a significant increase in the availability of high quality datasets. The Uppsala/PRIO Armed Conflict Dataset 6 (Themnér and Wallensteen 2012) is the most prominent of the new resources, and is now utilised (at least) as frequently as the COW data. The Uppsala/PRIO data is formed on a broader definition of conflict, including all contested incompatibilities between at least two parties (at least one of which is the government) where the use of armed force results in at least 25 battle-related deaths. The Uppsala/PRIO data is therefore different from the COW data in two respects. First, it has a far lower threshold for inclusion, coding conflicts that produce a significantly lower death count. This allows researchers to assess important differences between high intensity large-scale civil war (1,000+ battlefield deaths) and the lower-level conflicts that produce as few as 25 fatalities. Second, the Uppsala/PRIO data requires all conflict to be fought over an incompatibility, either concerning government (e.g. the type of political system, the replacement or change in the composition of the central government), or territory (e.g. the change of the state in control of a certain territory, secession or autonomy).

While both the COW and Uppsala/PRIO datasets focus on the existence of conflicts, the International Crisis Behaviour (ICB) project instead provides information on the onset and outcome of international crises. Crisis events do not necessarily imply the use of violence, but can instead result from verbal threats and actions that demonstrate a willingness to use physical force (Öberg et al. 2009). The ICB data contains a rich level of information on crisis situations taking place both between and within states, including precise information on the initiation (trigger), characteristics (e.g. level of violence), management (e.g. type of mediation) and outcome (e.g. tension reduction) of all crises between 1918 and 2007 (see Wilkenfeld and Brecher 2000).

As well as the large data projects housed within major conflict research centres, a number of individuals have also created their own datasets. Some of the most prominent sources focus specifically on civil conflict, most notably those produced by James Fearon (Fearon and Laitin 2003) and Nicholas Sambanis (2000). These resources are based upon individual coding procedures, and therefore complement the larger data projects, offering researchers the opportunity to test the robustness of their findings on a range of different data collections.

In addition to the multiple collections of cross-national data, “micro-level” events data now offers researchers the ability to empirically assess a range of features below the level of the state. Disaggregated geo-referenced events-level data facilitate the assessment of theoretical arguments that account for local-level dynamics. Focusing on subnational or individual levels of analysis has obvious advantages. For example, we now appreciate that civil wars rarely encompasses entire states, and that local processes, including the relations between specific groups in limited locales, can often have a fundamental impact on national-level dynamics (Cederman and Gleditsch 2009). Disaggregated data allow researchers to assess the geographical variation of key variables within a state, and thus more accurately assess the local-level causes of civil conflict (Buhaug 2010; Raleigh and Hegre 2009). The two leading events-based datasets are the Uppsala Conflict Data Program Geo-referenced Events Dataset (UCDP GED), and the Armed Conflict Locations Events Data (ACLED). Both data resources capture geographically and temporally disaggregated conflict events, and are continually being developed to cover events within a broader collection of states. 7 The Social Conflict in Africa Database (SCAD) also focuses on events data, but casts a broader net, including forms of social conflict not systematically tracked in other conflict datasets. Other events-based datasets have tended to focus predominately on violent events in a civil conflict, while SCAD includes detailed information on pre-civil conflict actions, including protests, riots, strikes, inter-communal conflict and government violence against civilians (Salehyan et al. 2012). 8

Complementing the significant advancements in conflict datasets have been the growing collections of data focusing on the correlates of civil conflict. This includes: disaggregated geo-referenced resource data (e.g. diamonds, gemstones, hydrocarbons and narcotic production) (see Lujala, Chapter 10 this volume); geo-referenced measures of ethnic group location (e.g. Weidmann et al. 2010); increasingly sophisticated methods of measuring inequality (e.g. Cederman et al. 2011); a wide range of alternative indices capturing the nature of political regimes (e.g. Gleditsch and Ruggeri 2010; Cheibub et al. 2010); indicators of relative rebel group strength (e.g. Cunningham et al. 2009); geo-coded measures of distance and terrain (e.g. Buhaug 2010); and a range of variables capturing the transnational dynamics of civil violence (e.g. Salehyan and Gleditsch 2006).

While the collection of conflict data has grown rapidly since the end of the Cold War, data focused on the management and resolution of civil war have remained comparatively sparse. Thankfully recent data collection projects have begun to address this imbalance, providing systematic data on a range of variables related to civil war resolution. Most notably the Civil War Mediation (CWM) dataset (DeRouen et al. 2011) and the Managing Intrastate Low-level Conflict (MILC) database (Melander et al. 2009) now offer the opportunity to assess the effectiveness of different forms of third party intervention in a range of contexts. The CWM data builds upon the Uppsala Armed Conflict Termination data (ACT) (Kreutz 2010), providing information on all mediation attempts within conflicts that meet the UCDP/PRIO definition of civil war. The data is organised both by mediation cases and conflict episode, and includes a range of variables relevant to studies of mediation (e.g. actors, timing, strategy). The MILC database is event based, capturing a range of third party activities (e.g. indirect talks, direct talks, use of good offices). The MILC data is the first collection of data which facilitates the systematic study of third party conflict management in low-level armed conflicts. In addition to these two large-scale data projects, a number of other datasets have been produced by scholars such as Patrick Regan (2002) and Isak Svensson (2007), which offer additional opportunities to assess the conditions that facilitate civil war resolution.

The advantages of quantitatively analysing civil war

Analysing civil conflict using quantitative methods has a number of advantages. First, the higher level of abstraction in the specification of concepts allows researchers to analyse aggregate datasets that include the entire population of applicable cases. This scope facilitates stronger inferences (generalisations) and theory building, since empirical relationships can be shown to exist with a greater degree of certainty (Landman 2003: 24). For example, Cederman and colleagues (2011) illustrate the strong correlation between ethnonationalist conflict and horizontal inequalities by analysing spatial wealth estimates for the global population of ethnic group settlement areas.

In contrast, small-N researchers select cases on account of the occurrence (or non-occurrence) of some particular outcome. This can lead qualitative researchers to overstate the strength of a causal relationship, either by selecting cases with the knowledge that both the independent and dependent variables vary in the hypothesised direction, or failing to consider cases that might contradict a theory. Selecting cases on the dependent variable can be an appropriate research method for some purposes, for example when research hopes to identify potential causal mechanisms, or ascertain which variables are not necessary or sufficient conditions for a certain outcome (George and Bennett 2005: 23–24). However, focusing on a small selection of cases seriously restricts researchers’ ability to generalise their findings outside of the cases included in their sample.

Second, quantitative methods allow researchers to isolate the effect of causal forces by controlling for the influence of rival causal explanations. Civil conflict is a complex phenomenon driven by a multitude of factors. To confidently argue for the importance of a causal mechanism researchers must rule out other potentially confounding factors. This is often challenging, if not impossible, within small-N research, in which the number of potentially explanatory variables often eclipse the number of cases being studied. In comparison, the statistical analysis of large datasets offers researchers the ability to hold constant other rival variables that are not the focus of the study. 9 Therefore quantitative scholars commonly increase the validity of their findings by controlling for a range of features that have previously been shown to exert a strong influence on conflict onset, duration or outcome (e.g. population size, GDP per capita, conflict history). For example, Salehyan and Gleditsch (2006) use logit regression to demonstrate the relationship between refugee flows and civil conflict onset. To isolate the influence of refugees the authors control for other potential causes of civil conflict, including regime type, GDP per capita, population and ethnic heterogeneity. Similarly, the quantitative approach allows researchers to assess the strength of one causal process in relation to another. For example, Collier and Hoeffler (2004) demonstrated the important role that “greed” and opportunity plays in the onset of civil conflict by controlling for motivational factors that had previously been thought to be the key drivers of civil violence.

Finally, quantitative research offers a level of transparency and replicability not often possible in other forms of research. All leading political science journals now operate strict data replication policies. Quantitative publications are therefore accompanied with the release of both the dataset and statistical procedures used to calculate the reported results. 10 This reflects the objective assumptions that underpin quantitative methods, offering other scholars the opportunity to assess and potentially reinterpret key findings. The open access to statistical data and procedures also help the development of objective knowledge, as researchers can easily build upon existing research.

The challenges associated with quantitatively analysing civil war

One of the most challenging aspects of quantitative analysis is transforming social concepts into numerical values. This difficulty means that many of the variables used to capture theoretical constructs represent crude indicators of the real concept (Gleditsch and Ruggeri 2010). For example, the methods of measurement used to code concepts like power, resource dependence, relative military strength and inequalities, are often at best basic approximations of the abstract theoretical idea. As a result, measurement error is probably the rule rather than the exception in quantitative conflict studies, as even the best measures routinely (systematic error) or randomly (random error) capture additional elements that are not directly related to the concept (Call 2012). In a related problem, there can also be disagreement between scholars as to how certain variables should be interpreted. For example, while Fearon and Laitin (2003) use GDP per capita and oil export dependence as a proxy for state weakness, Collier and Hoeffler (2004) use these same variables to measure economic opportunity. Over the past decade improvements in the quality, coverage and quantity of civil war data have helped to reduce the problems associated with measurement error. Researchers can draw upon a range of cross-national data sources to minimise the likelihood of misguided inferences born from measurement difficulties. Yet despite the significant advancements in both the collection and analysis of conflict data, scholars must still remain modest in terms of what their proxies capture and be mindful of the ongoing perils of measurement error.

Quantitative research can also be guilty of overstating questionable statistical relationships, in particular on occasions in which small differences in model specification produce variance in the results reported (for example through the inclusion of controls, linear or non-linear terms). 11 This is particularly pertinent when results are interpreted with little regard for the implied effects of the estimates and model uncertainty (Ward et al. 2010). Civil war research has traditionally evaluated hypotheses on observed (in-sample) data and not considered to what extent existing research provides us with a basis for predicting civil war onset and termination out-of-sample. The focus on hypothesis testing on observed data increases the chances of overfitting, or fitting to idiosyncrasies of the specific sample rather than stable structural relationships between a response and predictors (Clayton and Gleditsch 2014; Ward et al. 2010). Researchers are now increasingly addressing this issue by assessing the validity of statistical results using out-of-sample validation (Clayton and Gleditsch 2014; Gleditsch and Ward 2013), yet it remains an issue of which quantitative researchers must remain mindful.

The current state of conflict data and modelling techniques can also limit the ability of researchers to accurately assess the full spectrum of issues relating to civil violence. First, statistical studies commonly struggle to capture the longer processes of escalation and de-escalation that define civil conflict. Conflict data is generally coded in relation to casualty thresholds, which can fail to capture the continuous chains of interaction in civil war (Florea 2012; Gleditsch 2002). Therefore isolated incidents of civil violence might often be better understood as “mere variation along the escalation–de-escalation continuum within the same conflict” (Florea 2012: 82, original italics). Second, most statistical approaches are built upon the assumption that the regressors are uncorrelated with the error term. This requires that the model account for all variables that both drive the dependent variable and are correlated with the other regressors. It is currently not possible to account for the full range of variables that drive civil conflict. For example, there is no rigorous cross-national data on leaders’ characteristics, the exploitation of certain social groups and the opportunities available to youths. This omits potentially important variables from statistical analysis which can bias statistical results. Third, econometric studies of civil war must account for the endogenising effect of civil war on other variables. Civil war commonly lowers institutional capacity and reduces economic growth, two of the primary conditions that are consistently shown to motivate civil violence. Scholars have grown more capable of modelling this process (Blomberg and Hess 2002), but still too frequently fail to capture the endogenising effect of civil conflict on other variables (Gates 2002). Finally, civil war is a relatively rare event. This means that in a panel of conflict data many countries have no civil war, meaning “country specific indicator variables corresponding to the all-zero countries perfectly predict the zeroes in the outcome variable (no civil war)” (Gates 2002: 22). This problem can be alleviated by using datasets that increase the number of observations (by lowering the death threshold for inclusion) and models that account for this issue (e.g. rare events logit). However, the problems associated with the rare nature of civil conflict can still cause serious problems in a number of econometric models. 12

Mixed methods research

Combining research methods can help to enhance the validity of both quantitative and qualitative research. Quantitative analysis of conflict data involves the presentation of statistical associations, along with arguments as to why the variation in the independent variable causes the variation in the dependent variable. Theories proposed to explain correlations can be evaluated in terms of their internal consistency and deductive validity, yet in many cases there are multiple consistent stories that can be offered to explain the co-variation between variables. Statistical methods are often unable to untangle competing causal stories or determine causal ordering, and require a deeper analysis to validate a proposed mechanism. In this context case studies can complement statistical analysis, demonstrating the internal validity of a causal process. For example, a researcher can use their statistical analysis to guide case selection for in-depth assessment of the theoretical mechanism and suggest directions for more structured focused comparisons (Lieberman 2005). By examining cases in a greater level of depth than was required to code the statistical data, researchers can illustrate the validity of a statistically supported mechanism, increasing the plausibility of a proposed theory. More generally the combination of methods can help quantitative researchers address measurement issues, assess outliers, discuss variables omitted from the large-N analysis, and examine cases incorrectly predicted by econometric models (Gates 2002).

Similarly, nesting qualitative research within a statistical analysis can offer a more stringent test of hypotheses generated from small-N research (Lieberman 2005). Case-based analysis commonly suffers from two fundamental problems: non-generalisability and selection bias. While the analysis of a limited number of cases is not capable of explaining broader patterns across all civil wars, it can highlight a more nuanced description of the chain of causal processes that lead to particular wars. Undertaking a case study prior to a large-N analysis can therefore help to generate testable hypothesis that can be assessed using a statistical approach.

The benefits of mixed methods research designs have been clearly illustrated in a number of prominent studies of civil war (Collier and Sambanis 2005; Doyle and Sambanis 2006; Fortna 2004; Kalyvas 2006). Yet unfortunately the bifurcation of conflict studies into qualitative and quantitative branches makes this practice less common than is desirable. Given the significant advantages associated with mixing research methods, an increased focus upon combining quantitative and qualitative methods would probably improve the strength of both branches of research, and thus represents a potentially fruitful approach for future researchers to pursue.


By identifying empirical regularities and assessing the causal influence of key concepts, quantitative scholars have greatly enhanced our knowledge of all aspects of civil strife. The ever-increasing theoretical and methodological sophistication of quantitative analysis suggests that quantitative methods will continue to lead the way in developing a greater understanding of the forces driving dangerous and destructive civil violence. In particular the movement towards increasingly disaggregated data sources, out-of-sample statistical validation and potentially mixed methods analysis, all promise to significantly enhance the already flourishing large-N civil war research programme.


Positivism is a theory of science, and generally most positivists adopt an empiricist epistemology. The empiricist approach is based on the belief that the only real knowledge we can have of the world is based upon the facts humans experience through their senses. Scientific knowledge therefore requires empirical validation; hence positivists privilege observation, empirical data and measurement (Kurki and Wight 2007).

The COW data is regularly updated.

Figure 3.1 taken from Themnér and Wallensteen (2012).

For a more detailed discussion on the appropriateness of statistical models refer to the literature on statistical methods, for example Long (1997).

In more technical terms, the number of explanatory variables exceeds the degrees of freedom required for analysis.

The armed conflict dataset was originally developed as part of the Uppsala conflict data programme in the 1980s, but was backdated to 1946 with the help of the Peace Research Institute Oslo (PRIO) in 2001.

For a comparison of coverage and quality of UCDP GED to ACLED see Eck (2012).

UCDP GED focuses specifically on violent events. ACLED includes a collection of violent and non-violent events, however the difficulties associated with distinguishing between the events is problematic (Eck 2012).

Statistical controls are a robust means of ruling out alternative explanations that form a central element of quantitative research. However, including controls without careful consideration for their effects can lead to incorrect inferences. For example, treating substantive variables (antecedents, moderators or mediators) as control variables leads to treating relevant variance as error variance. Furthermore, the indiscriminate use of control variables can increase Type II errors by partialling true variance from the relationships of interest (Becker 2005). To avoid these issues researchers must justify the inclusion of all controls, clearly describe the methods used to measure the control variable and both report and interpret the descriptive and substantive results for all control variables.

While quantitative studies now commonly make data used to generate the results publicly available, this can on occasions be undermined by the unclear coding decisions that were used to create the dataset.

For example, between 2002 and 2006 Paul Collier and his associates published a number of influential papers discussing the determinants of conflict reoccurrence. In this time the authors reported reoccurrence ranging from 50 per cent, 44 per cent, 23 per cent and 21 per cent (Call 2012: 61).

One method of overcoming this challenge is to utilise a time series cross-sectional dataset. However, as discussed above, this presents a whole range of additional difficulties (e.g. non-independence, unmeasured heterogeneity, endogeneity).


Beck, Nathaniel and Jonathan N. Katz (1995) “What to do (and not to do) with Time-Series Cross Section Data” American Political Science Review 89(3): 634–647.
Becker, Thomas E. (2005) “Potential Problems in the Statistical Control of Variables in Organizational Research: A Qualitative Analysis with Recommendations” Organizational Research Methods 8(3): 274–289.
Blomberg, Brock S. and Gregory D. Hess (2002) “The Temporal Links between Conflict and Economic Activity” Journal of Conflict Resolution 46(1): 74–90.
Boulding, Kenneth (1957) “An Editorial” Journal of Conflict Resolution 1(1): 1–2.
Buhaug, Halvard (2006) “Relative Capability and Rebel Objective in Civil War” Journal of Peace Research 43(6): 691–708.
Buhaug, Halvard (2010) “Dude, Where’s My Conflict? LSG, Relative Strength, and the Location of Civil War” Conflict Management and Peace Science 27(2): 107–128.
Call, Charles T. (2012) Why Peace Fails Washington, DC: Georgetown University Press.
Cederman, Lars-Erik and Kristian Skrede Gleditsch (2009) “Introduction to Special Issue on ‘Disaggregating Civil War’” Journal of Conflict Resolution 53(4): 487–495.
Cederman, Lars-Erik , Nils B. Weidmann and Kristian Skrede Gleditsch (2011) “Horizontal Inequalities and Ethno-Nationalist Civil War: A Global Comparison” American Political Science Review 105(3): 478–495.
Cederman, Lars-Erik , Andreas Wimmer and Brian Min (2010) “Why Ethnic Groups Rebel? New Data and Analysis” World Politics 62(1): 87–119.
Cheibub, José Antonio , Jennifer Gandhi and James Raymond Vreeland (2010) “Democracy and Dictatorship Revisited” Public Choice 143(1): 67–101.
Clayton, Govinda and Kristian Skrede Gleditsch (2014) “Will We See Helping Hands? Predicting Civil War Mediation and Likely Success” Conflict Management and Peace Science Forthcoming.
Collier, Paul and Anke Hoeffler (2004) “Greed and Grievance in Civil War” Oxford Economic Papers 56(4): 563–595.
Collier, Paul and Nicholas Sambanis (2005) Understanding Civil War: Evidence and Analysis, Volume 1: Africa Washington, DC: The World Bank.
Cunningham, David E. , Kristian Skrede Gleditsch and Idean Salehyan (2009) “It Takes Two: A Dyadic Analysis of Civil War Duration and Outcome” Journal of Conflict Resolution 53(4): 570–597.
DeRouen, Karl, Jr ., Jacob Bercovitch and Paulina Pospieszna (2011) “Introducing the new Civil Wars Mediation (CWM) Dataset” Journal of Peace Research 48(5): 663–672.
Doyle, Michael W. and Nicholas Sambanis (2006) Making War and Building Peace Woodstock: Princeton University Press.
Eck, Kristine (2005) “A Beginner’s Guide to Conflict Data: Finding and Using the Right Dataset” UCDP Research Paper Series No. 1, available at: www.pcr.uu.se/research/ucdp/publications/#2005 (accessed 1 February 2013 ).
Eck, Kristine (2012) “In Data We Trust? A Comparison of the UCDP GED and ACLED Conflict Events Datasets” Cooperation and Conflict 47(1): 124–141.
Fearon, James D. and David Laitin (2003) “Ethnicity, Insurgency, and Civil War” American Political Science Review 97(1): 75–90.
Florea, Adrian (2012) “Where Do We Go from Here? Conceptual Theoretical, and Methodological Gaps in the Large-N Civil War Research Program” International Studies Review 14(1): 78–98.
Fortna, Page (2004) Peace Time: Cease-Fire Agreements and the Durability of Peace Washington, DC: Princeton University Press.
Gates, Scott (2002) “Empirically Assessing the Causes of Civil War” paper presented at the annual meeting of the International Studies Association, New Orleans, 24–27 March.
George, Alexander L. and Andrew Bennett (2005) Case Studies and Theory Development in the Social Sciences Cambridge: Belfer Center for Science and International Affairs.
Gleditsch, Kristian Skrede (2002) All Politics is Local: The Diffusion of Conflict, Integration and Democratization Ann Arbor: University of Michigan Press.
Gleditsch, Kristian Skrede and Andrea Ruggeri (2010) “Political Opportunity Structures, Democracy, and Civil War” Journal of Peace Research 47(3): 299–310.
Gleditsch, Kristian Skrede and Michael D. Ward (2013) “Forecasting is Difficult, Especially about the Future: Using Contentious Issues to Forecast Interstate Disputes” Journal of Peace Research 50(1): 17–31.
Hicks Alexander M. (1994) “Introduction to Pooling,” in Thomas Janoski and Alexander M. Hicks (eds) The Comparative Political Economy of the Welfare State Cambridge: Cambridge University Press.
Isard, Walter (2001) “Historical Material on the Formative and Early Years of the Peace Science Society (International)” Peace Economics, Peace Science and Public Policy 7(1): 1–21.
Kalyvas, Stathis (2006) The Logic of Violence in Civil War New York: Cambridge University Press.
King, Gary , Robert O. Keohane and Sidney Verba (1994) Designing Social Inquiry: Scientific Inference in Qualitative Research Princeton: Princeton University Press.
Kreutz, Joakim (2010) “How and When Armed Conflicts End: Introducing the UCDP Conflict Termination Dataset” Journal of Peace Research 47(2): 243–250.
Kurki, Milja and Colin Wight (2007) “International Relations and Social Science”, in Tim Dunne , Milja Kurki and Steve Smith (eds) International Relations Theories: Discipline and Diversity Oxford: Oxford University Press.
Lacina, Bethany (2006) “Explaining the Severity of Civil Wars” Journal of Conflict Resolution 50(2): 276–289.
Lacina, Bethany and Nils Petter Gleditsch (2005) “Monitoring Trends in Global Combat: A New Dataset of Battle Deaths” European Journal of Population 21(2/3): 145–166.
Landman, Todd (2003) Issues and Methods in Comparative Politics: An Introduction London: Routledge.
Lieberman, Evan S. (2005) “Nested Analysis as a Mixed-Method Strategy for Comparative Research” American Political Science Review 99(3): 435–452.
Long, Scott T. (1997) Regression Models for Categorical and Limited Dependent Variables London: Sage.
Melander, Erik , Frida Möller and Magnus Öberg (2009) “Managing Intrastate Low-Intensity Armed Conflict 1993–2004: A New Dataset” International Interactions 35(1): 58–85.
Öberg, Magnus , Frida Möller and Peter Wallensteen (2009) “Early Conflict Prevention in Ethnic Crises, 1990–98: A New Dataset” Conflict Management and Peace Science 26(1): 67–91.
Pinker, Stephen (2011) The Better Angels of Our Nature: A History of Violence and Humanity St Ives: Penguin Books.
Podestà, Federico (2006) “Comparing Time Series Cross-Section Model Specifications: The Case of Welfare State Development” Quality and Quantity 40(4): 539–559.
Raleigh, Clionadh and Havard Hegre (2009) “Population Size, Concentration, and Civil War: A Geographically Disaggregated Analysis” Political Geography 28(4): 224–238.
Ramsbotham, Oliver , Tom Woodhouse and Hugh Miall (2011) Contemporary Conflict Resolution 3rd edition, Cambridge: Polity.
Regan, Patrick (2002) “Third Party Interventions and the Duration of Intrastate Conflict” Journal of Conflict Resolution 46(1): 55–73.
Richardson, Lewis Fry (1957) Statistics of Deadly Quarrels Pittsburgh: Boxwood Press.
Salehyan, Idean , Cullen S. Hendrix , Christina Case , Christopher Linebarger , Emily Stull and Jennifer Williams (2012) “The Social Conflict in Africa Database: New Data and Applications” International Interactions 38(4): 503–511.
Salehyan, Idean and Kristian Skrede Gleditsch (2006) “Refugees and the Spread of Civil War” International Organization 60(2): 335–366.
Sambanis, Nicholas (2000) “Partition as a Solution to Ethnic War: An Empirical Critique of the Theoretical Literature” World Politics 52(4): 437–483.
Singer, David J. and Melvin Small (1966) “The Composition and Status Ordering of the International System: 1815–1940” World Politics 18(2): 236–282.
Singer, David J. and Melvin Small (1970) “Patterns in International Warfare, 1816–1965” Annals of American Academy of Political and Social Science 391: 145–155.
Sorokin, Pitirim (1937) Social and Cultural Dynamics: A Study of Change in Major System of Art, Truth, Ethics, Law, and Social Relationships New York: Porter Sargent.
Svensson, Isak (2007) “Bias, Bargaining, and Peace Brokers: How Rebels Commit to Peace” Journal of Peace Research 44(2): 177–194.
Themnér, Lotta and Peter Wallensteen (2012) “Armed Conflict, 1946–2011” Journal of Peace Research 49(4): 565–575.
Ward, Michael D. , Brian D. Greenhill and Kristin M. Bakke (2010) “The Perils of Policy by p-value: Predicting Civil Conflicts” Journal of Peace Research 47(4): 363–375.
Weidmann, Nils B. , Jan Ketil Rød and Lars-Erik Cederman (2010) “Representing Ethnic Groups in Space: A New Dataset” Journal of Peace Research 47(4): 491–499.
Wilkenfeld, Jonathan and Michael Brecher (2000) “Interstate Crises and Violence: Twentieth-Century Findings” in Manus I. Midlarsky (ed.) Handbook of War Studies II Ann Arbor: University of Michigan Press.
Wright, Quincy (1942) A Study of War Chicago: University of Chicago Press.
Search for more...
Back to top

Use of cookies on this website

We are using cookies to provide statistics that help us give you the best experience of our site. You can find out more in our Privacy Policy. By continuing to use the site you are agreeing to our use of cookies.