Introducing SpatialGridBuilder: A new system for creating geo-coded datasets

Researchers in the conflict research community have become increasingly aware that we can no longer depend on state-aggregated data. Numerous factors at the substate level affect the nature of human interactions, so if we really want to understand conflict, we need to find more appropriate units of analysis. However, while many conflict researchers have realized this, actually taking the next step and performing data analysis on spatial data grids has remained a rather elusive goal for many because of the difficulty of learning the new techniques to perform such analyses. This paper introduces SpatialGridBuilder, a new, freely available, open-source system with the goal of empowering conflict researchers with no background in GIS methods to start their own spatial analyses. SpatialGridBuilder allows the researcher to: (a) create entirely new spatial datasets, based on the needs of their own research; (b) import their own spatial data; (c) easily add a range of important variables to the datasets, including commonly used conflict variables, plus new variables that have not been presented before; and (d) visualize graphical renderings of this data. Having done this, SpatialGridBuilder will then export the dataset for the researcher to analyse using conventional statistical methods. This article introduces the new program, and demonstrates how it can be used to set up such a statistical analysis. It also shows how different results can be achieved by building grids of different resolutions, thereby encouraging researchers to choose grid resolutions appropriate to their research questions and data. The article also introduces a novel means of determining infrastructure complexity, using Google maps.

Introducing SpatialGridBuilder: A new system for creating geo-coded data sets 1 Introduction In recent years, conflict researchers have moved away from the state as a unit of analysis, looking instead for more appropriate units. 1 One recent expression of this disaggregation is the creation of the PRIO-GRID (Tollefsen, Strand and Buhaug 2012).Yet while many conflict researchers have managed to successfully use Geographic Information Systems (GIS) methods, there are many more who want to, but have not done so because of the steep learning curve in methods and software.A new generation of researchers looking at issues such as improvised explosive devices (IEDs), rebel bases, drug cultivation, mineral/ fossil fuel reserves and climate change are gathering data but have difficulty building it into a geo-spatial framework.
To try to solve this and to encourage conflict researchers to start using spatial data, this paper introduces SpatialGridBuilder, a freely available, open-source system for creating spatial data sets.The program can be downloaded from http://sf.net/projects/spatialgridbuilder/.The program uses a graphical user interface (see Figure 1), runs on MS Windows and is currently being ported to Linux and Mac OS.One of the inspirations for SpatialGridBuilder is Bennett and Stam's (2000) EUGene (Expected Utility Generation and Data Management) program.SpatialGridBuilder is quite different to EUGene in many ways: it is not motivated by expected utility theory, and it is not primarily focused on dyads.But in some ways, SpatialGridBuilder can be seen as a spatial version of EUGene.Its over-arching aim is to help the researcher to create a new data set, based on both built-in variables and, more importantly, researcher-provided variables.SpatialGridBuilder gives the researcher the tools to create a spatial data set precisely tailored to the needs of the research question, at the macro or micro level.Like EUGene, SpatialGridBuilder does not attempt to analyse your data for you.Instead, it helps the researcher to create a data set which can then be analysed using conventional statistical software, using all of the methods which researchers are already familiar with.
Figure 1 goes about here

Creating a new spatial data grid
A spatial data grid can be conceived of as being like a sheet of graph paper: a page full of empty boxes which can be given x and y coordinates.All spatial data grids have a resolution, which is commonly measured in degrees per grid cell.For the PRIO-GRID, this resolution is 0.5 degrees per grid cell, which for most analyses is sufficiently granular.Indeed, when a data set covers the entire world, 0.5 degrees per grid cell can become compuationally slow to process, especially when many variables are involved, or when the data involved are time variant.However, in some cases, such as those involving small states or confined regions, 0.5 degrees per cell can be too coarse: at the equator, a half degree grid cell covers about 3063 km 2 .SpatialGridBuilder allows the researcher to decide the resolution and spatial confines of the grid.This is, in fact, the first decision the researcher must make when creating a new grid: what the resolution is, and which parts of the world are of interest.For instance, while a researcher looking at all of Africa may choose to use 0.5 degrees as the grid resolution, a researcher looking at East Timor may choose to use more granular grid cells, of perhaps 0.1 or 0.05 degrees per grid cell.SpatialGridBuilder allows the user to choose any resolution they like, and experiment with different resolutions. 2 To try to illustrate how to go about creating a new data set and applying variables to it, the example of a researcher interested in China will be used.

Land mask
Figure 2 goes about here When a researcher first creates a new grid, they are essentially creating a series of empty boxes: aside from an ID number, the latitude and longitude, plus the area of the grid cell (as this decreases based on the distance from the equator 3 ), the grid contains no information at all.So, as has been mentioned, we will assume that our researcher is interested in China.To begin, it is necessary to determine China's 'bounding box': the coordinates within which China would fit (see Figure 2).By looking at a map or atlas of Asia, the researcher could determine that a box which starts at 55 degrees north of the equator, continuing to 15 degrees north of the equator, and runs from 70 degrees east to 140 degrees east of Greenwich would neatly contain China.The researcher then needs to choose a resolution.Because we are not dealing with the whole world, it is possible to select a higher resolution than that used by the PRIO GRID.However, China is a big country, so it would be unwise to use too high a resolution.Accordingly, the researcher chooses 0.1 degrees per grid cell.By entering these details into SpatialGridBuilder, it is possible to create an empty grid as is shown in Figure 3.

Figure 3 goes about here
At this stage, the grid is of little value.One of the first things we need, then, is a binary indicator of whether each cell is on land or not (in most instances, we will only be interested in land cells, although 2 More grid cells is not always better: sometimes, this just increases the number of observations, which may not be appropriate.However, if you are looking at a small region and have data which have been coded to a precise level, it is usually better to build a higher resolution grid. 3 For instance, a one degree grid cell at the equator covers an area of approximately 110km longitude by 110km latitude (12,100km 2 ); whereas, at the poles, the latitude would remain 110km, but the longitude would be zero.Somewhere inbetween, such as a one degree grid cell centred on Oslo, would still have a latitude of 110km, but a longitude of 56km (6,160km 2 ).some researchers may be interested in naval conflict, or in exclusive economic zones (EEZs): as will be shown later, SpatialGridBuilder provides this option).The researcher can use SpatialGridBuilder to populate the grid with a land mask dummy variable, as is visualised in Figure 4.
Figure 4 goes about here

State masks
Even after introducing a land mask binary, the spatial grid may still contain cells that the researcher is not interested in.In the example above, while the bounding box has been drawn around China, the grid still contains cells from many other countries. 4To solve this problem, SpatialGridBuilder includes the option of adding state mask binaries.The researcher can select China, which will add a new dummy to the data grid identifying all of the grid cells in Chinese territory, marked up by the researcher's choice of identifier: Gleditsch-Ward (1999), Correlates of War (2011), or ISO 3166 (see Figure 5).The state maps are based on Weidmann, Kuse and Gleditsch's (2010) CShapes library.
Figure 5) goes about here

State identifiers
As well as providing state dummy variables with which to identify a single state, the program allows the researcher to add an additional variable which will identify all states, as is shown in Table ??.Again, these are based on Weidmann, Kuse and Gleditsch (2010) and can be identified using Gleditsch-Ward (1999), Correlates of War (2011), ISO or plain text state names.
Table ?? goes about here Grid cells in which there is no state (i.e., the sea, or territories beyond the writ of Westphalia) are coded as 'NA'.

Importing new data
Now that the basic grid is ready, with dummy variables to determine whether each cell is land and if so, which state it is in, the researcher can now import their own data.SpatialGridBuilder offers four ways in which to do this: point data, radial point data, polygon data and raster data.

Using your own data 1: polygon data
Let's assume that our China researcher is looking at levels of pollution throughout the country.The researcher finds an ESRI shapefile showing high flouride pollution sources (USGS, n.d.).SpatialGrid-Builder allows the researcher to import shapefiles and apply them to the grid (see Figure 6).It does this by determining whether the centre point of each grid cell falls within the polygons in the shapefile.

Using your own data 2: point data
Let us now assume that our China researcher wants to see if there is some relationship between pollution and airports.The researcher takes the list of latitudes and longitudes of 9000 airports developed by Cazemier and van der Molen (n. d.) and imports them to SpatialGridBuilder in a simple .csvfile.
Once imported, these can be visually represented as in Figure 7.
Figure 7 goes about here Again, SpatialGridBuilder will not tell the researcher whether there is a relationship between airports and pollution in China: this is where conventional statistical software should be used.But SpatialGrid-Builder will help the researcher to create the data set which can then be imported into the statistical software.

Using your own data 3: radial point data
Finally, our China researcher wants to look at Asia more broadly and uses the PRIO Conflict Site 1989-2008data (Dittrich Hallberg 2012).Conflict Site lists 791 conflicts in 75 countries, but our researcher just wants to look at conflicts in countries which have territory in Asia.This gives 426 conflict events in 24 countries.5

Figure 8 goes about here
Our researcher starts by creating a grid at 0.1 degrees per cell with a simple land mask, represented in Figure 8A.Then, identifiers for each of the 24 Asian conflict states are added, as is seen in Figure 8B.
Each conflict event in Conflict Site is coded with a lat long coordinate pair, plus a radius, ranging from 50 to 1300 kilometres.Accordingly, our researcher imports these data and visualises them in Figure 8C.
Finally, as our researcher is only interested in the Asian states coded by the PRIO Conflict Site data, the state identifiers are merged to form a PRIO-inclusion dummy.By then merging this variable with a dummy of the radial conflict zones, our researcher is able to create a mask which indicates territory which is both in a PRIO Conflict state and is within one of the conflict radii (Figure 8D).6

Using your own data 4: raster data
Sometimes, data are made available in a raster format: this is quite common for remote-sensing data, such as night lights, rainfall, topography and land-use.Accordingly, SpatialGridBuilder allows the import of raster data.The program will automatically determine how the data items in the raster file are separated (comma, semi-colon, tab or space).It will then ask the user which parts of the world are covered in the data.For instance, many raster-based global data sets will only cover the parts of the world in which people live; they will therefore exclude the polar regions in order to save memory and processing time.Once the user has entered the spatial bounds of the data set, SpatialGridBuilder will apply the raster data to the existing grid, based on the matching centre point of each grid cell (or, when the grid is of a different resolution to the raster data, the value from the nearest neigbour in the raster file).

Built-in variables
SpatialGridBuilder includes a number of variables commonly used by conflict researchers in their analyses.These are included to help researchers develop their data sets more quickly.They can be used as control variables, or as more substantive parts of hypotheses.

ACLED
To make it easier for the researcher to analyse conflict, ACLED (Armed Conflict Location Event Dataset) conflicts are built into the program (Raleigh, Linke, Hegre and Karlsen 2010).At time of writing, all ACLED conflicts for Africa have been included; other regions will be added in a later version.
The researcher is able to select between the eight different conflict types7 or choose to include all ACLED conflict types.Of course, ACLED offers more ways to divide data than just these eight conflict types, and researchers may wish to have more fine-grained control over the conflicts.In this case, the researcher can modify ACLED in the conventional way (with a statistical package, spreadsheet, text editor, etc.) and then import the data into SpatialGridBuilder as point data (discussed above).

UCDP GED
Additionally, SpatialGridBuilder includes all conflicts from the UCDP GED (Uppsala Conflict Data Program Georeferenced Event Dataset: Sundberg, Lindgren and Padskocimaite: 2010; Melander and Sundberg 2011).At time of writing, the only option is to include all conflicts.Future versions of SpatialGridBuilder will allow more control over this, but in the meantime, the researcher can still modify and import UCDP data using the method discussed above.

Distance
In the considerable research on the subject, distance has been shown to have an important effect on conflict.While many commentators have argued that globalisation has meant that distance is no longer an important factor for interaction opportunities, Gleditsch, Buhaug and Walter (2006) argue that the death of distance has probably been exaggerated.Indeed, data sets such as those developed by Gleditsch and Ward (2001) continue to form an important part of conflict research.Accordingly, the option to calculate a distance variable is included.This will determine the minimum distance (based on the haversine great circle formula) of each grid cell to any other variable the researcher has added to the grid.For instance, as our researcher is building a grid based on China, the distance of each grid cell to the nearest point on the border of China could be calculated (see Figure 9).This could also be useful for capital cities, conflict zones, or any other point or polygon data which the user wishes to use.

Mountainous/ rugged terrain
Mountains have often been analysed in conflict research literature.Yet many conflict researchers have used data which suffer from a key problem: they are based on a binary which is of little relevance to conflict research (see the data sets presented in Collier and Hoeffler, 2000/ 2004, plus Fearon and Laitin 2003; all ultimately based on the research of Gerrard 1990of Gerrard / 2000)).While many conflict researchers start articles by saying they wish to analyse the relationship between terrain and conflict, they finish their articles by using the percentage of mountains as a proxy for this (often a state-aggregated proxy).
SpatialGridBuilder solves this problem by introducing a linear measure of ruggedness.

Figure 10 goes about here
As can be seen in Figure 10, the program analyses each cell in the grid and looks at the elevation variance of that cell from its immediate neighbours, based on elevation data from the Shuttle Radar Topography Mission (Farr et al. 2007).This gives a variable which is of much more use to the conflict researcher than either the percentage of mountains, or simple elevation data.Indeed, looking at the ruggedness rendering in Figure 11, we can see the ruggedness variable demonstrates a very clear border between China and Nepal.Dupuy (1985) also adopts a similar approach.
More recently (and less impressionistically/ arbitrarily) both Buhaug and Rød (2006) and Raleigh and Hegre (2009) have used the ESRI's Digital Chart of the World to gather road data.This data set was released in the 1990s.The method employed by Buhaug and Rød is to take the logged total length of major roads in each grid cell, normalised by the country mean.They find that separatist conflict is more likely in regions with less than average road density, while governmental conflicts tend to occur in areas with a more developed road network.
Raleigh and Hegre employ a different method: they determine whether their grid cells include primary, secondary, or informal/ no roads.They find that '[c]onflict events are 47% less likely to happen in squares with no roads or only informal roads than in squares with primary roads.'It is worth quoting their explanation: The results ran counter to our initial expectations -conlficts are assumed to occur in faraway and inaccessible regions.However, the finding may not be so counter-intuitive after all.First, battle events occur where rebel group and army units encounter each other.Such meeting places are normally reached by road.Second, rebel groups tend to target high-value places (villages, military installation, pipelines, mines, etc.), and roads also often connect such places.Third, there is also a reporting bias at play here -media report incidences primarily in accessible areas.(Raleigh and Hegre 2009: 234).
However, some concerns have been raised by users of the Digital Chart of the World as to the accuracy and completeness of its data (for detailed surveys of user feed-back on the DCW, see Langaas (1995) and Smith and Langaas (1995)).As such, this paper presents an alternative means of measuring the complexity, or level of development, of infrastructure based on the complexity of maps on the Google Maps server (see Figure 12).Google Maps include information on roads, buildings, etc., and the more information there is in each image, the more complicated the infrastructure is.This information is easy to capture: as the maps themselves are stored in PNG format, a more complex map will inherently have a larger file size.

Figure 13 goes about here
A rendering of this infrastructure proxy can be seen in a grid of Asia, presented in Figure 13.This grid is built at a resolution of 0.1 degrees per grid cell, giving a size of 1300 × 660 grid cells.Accordingly, to determine the infrastructure complexity for this grid, SpatialGridBuilder sends 858,000 file size requests to the Google Maps servers.At first glance, the resulting grid looks rather like a night light image.
SpatialGridBuilder also includes night light data (see section below) and while there are similarities between the images, there are also significant differences.Indeed, the correlation between the two is only 0.52.
It is hoped that researchers will use this approach as a valuable new method of determining the complexity of infrastructure.A first analysis of this new variable will be made at the end of this paper.8

Night lights
Another source which has often been used in literature as a proxy for not only infrastructure, but also population and economic development, is the brightness of night lights.SpatialGridBuilder uses NOAA data for the year 2000; future versions will include data from the years 1997-2010 for researchers interested in temporal variance.

Rainfall
As the recent special issue of the Journal of Peace Research on climate change and conflict (January 2012) made clear, there is an increasing expectation among many outside of the conflict research community that climate change will have an impact on conflict, but within the conflict research community, there is little evidence of a relationship.However, there is broad acknowledgement that more work needs to be done.Accordingly, SpatialGridBuilder includes rainfall data.While the spatial resolution of the rainfall data is quite low (2.5 degrees per grid cell), the data set makes up for this in its temporal coverage: it offers monthly rainfall data covering the years 1979-2011, based on Xie and Arkin (1997).
By providing this data, the researcher is able to include climate-related data in models).More climate related data, e.g.temperature, plus higher resolution rainfall data will be included in future versions of SpatialGridBuilder.

Infant mortality
Following the suggestion by the State Failure Task Force group (Esty et al., 1998), Urdal (2005) argues that infant mortality is a better (inverse) proxy for development than more conventional economic measures, such as GDP or energy consumption per capita.Urdal goes on to find a strong relationship between high infant mortality rates and armed conflict.Accordingly, SpatialGridBuilder includes data on infant mortality (see Figure 14).However, as the purpose of SpatialGridBuilder is spatial disaggregation, it was necessary to use different data than the UN data used by Urdal.As such, the infant mortality measure is based on that compiled by Columbia University's Center for International Earth Science Information Network (CIESIN).

Figure 14 goes about here
As well as giving an indication of the rates of infant mortality in China and surrounding countries, Figure 14 is useful in that it also shows some of the problems facing researchers using spatial methods in their analyses.First is the problem of missing data.The CIESIN data are coded in raster format, with the value '-9999' indicating 'NODATA'.At first glance at the data set, it would appear that the -9999 values indicate the oceans, which in most cases they do.However, it is only when we apply the data to a grid and apply a country mask, using a tool such as SpatialGridBuilder, that we notice that many of the missing data values are on land and in some cases comprise whole countries (looking again at Figure 14, we see that Taiwan is rendered in red, indicating missing data).As with non-spatial data, researchers must take measures to deal with missing values.
Looking at China, we can see great variation in the levels of infant mortality in the eastern part of the country, but as we move west, we can see that there are large regions with the same level of infant mortality.This is because in China, as with many countries, data are collected at the county level, and Chinese counties tend to grow larger as you move (north-) west.Even when spatial data are stored in a raster format, even with population data, that does not mean that the raster pixel gives a true indication of the value at that location: it is often inferred from the mean of a larger aggregate unit, such as a county.Researchers need to remember this when analysing spatial data, as the unit size and shape is usually not constant throughout a country, and are seldom comparable between countries.This does not mean that we cannot do spatial analysis at all; it just means we need to understand the limitations of our data before making any inferences. 9.By combining SpatialGridBuilder with conventional statistical software, or indeed a simple spreadsheet, it is possible to create new variables.One such example is exclusive economic zones (EEZs).Part V of the UN Convention on the Law of the Sea defines EEZs as regions beyond the territorial sea of a state, in which each state has 'sovereign rights for the purpose of exploring and exploiting' over a region extending 200 nautical miles from the state's coast (UNCLOS 1982).Figure 15 gives an illustration of China's EEZ.While SpatialGridBuilder does not has an option to automatically calculate states' EEZs, it is possible to do this by a) using SpatialGridBuilder to create a China dummy; b) using statistical software or a spreadsheet to create a 'not China but still land' dummy; c) using SpatialGridBuilder to calculate the distance of each cell to both China and the 'not China' cells; d) using statistical software or a spreadsheet to create a dummy for all cells where i) the cell is not on land; ii) the cell is 200 nautical miles (370.4 km) or less from China; iii) the distance from the cell to China is less than the distance to any other country. 10Clearly, EEZs are very politically contentious, and many states claim differing EEZs based on their claims to remote islands.By working with different understandings of state territory (e.g., with or without key islands), SpatialGridBuilder allows the researcher to calculate different EEZs for each state and thereby determine regions of the seas which may face future disputes.Indeed, the number of militarized interstate disputes involving small islands indicates that EEZs are a useful area for spatial conflict analysis.

Example
Now that the functions of SpatialGridBuilder have been introduced, it is possible to give an example of how it can be used.This example will look at conflict in Asia, based on the conflicts in ACLED Version 1, 11 during the period 1997-2010.One of the key advantages of SpatialGridBuilder is that it allows the researcher to construct grids of any resolution.ACLED only includes conflicts for six Asian countries (Afghanistan, Cambodia, Laos, Myanmar, Nepal and Pakistan), so as this only encompasses a relatively small part of the world, it is possible to build quite a high resolution grid.For this example, a grid with cells of 0.05 degrees will be built.Also, to give a comparison, and to highlight how different resolutions can lead to different results, a separate lower resolution grid will also be built of 0.5 degrees per grid cell (the same resolution as the PRIO Grid).
To begin, we first need to determine the 'bounding box' of our grid, using the method illustrated earlier in Figure 2. By consulting an atlas and looking at our six countries, we can see that northern and southern bounds of 38 and 6 degrees latitude, and western and eastern bounds of 60 and 108 degrees 10 Future versions of SpatialGridBuilder will allow the user to do all of the steps within the program.Currently, it is possible to construct simple dummies in the program, but not based on multiple variables.SpatialGridBuilder comes with several tutorials, and details on how to calculate EEZs or similar variables are included in one of the tutorials.
11 More recent versions of ACLED exclude conflicts in Asia.
longitude will cover our six states nicely.We can then create two grids: a low (coarse) resolution grid (0.5 degrees per grid cell) and a high (fine) resolution grid (0.05 degrees per grid cell).Once the empty grids are created, we can populate them with a land mask, plus dummy identifiers for the six states.
The first ten lines of the low and high resolution grids, plus visualisations of the two grids are presented in Figure 16.
Figure 16 goes about here By importing these two data sets into a statistical package, we can start to do some analysis.For instance, we can make some preliminary observations as to how many grid cells we have in our two data sets, how many of those cells are on land, and how many of those cells are in each state.Additionally, as the ACLED data are restricted to these six states, we may want to create an ACLED dummy, so that our analysis will only look at territory within the six states.This can be done easily by merging the values of the six state dummies into a new ACLED dummy using straightforward commands in the statistical package. 12Table ?? goes about here As can be seen in Table ??, the high resolution grid has approximately 100 times more grid cells than the low resolution grid. 13An important point to make here is that bigger is not always better: by increasing our resolution, we are increasing the number of observations, but this is not necessarily a good idea for spatial analysis.It is often the case that spatial data are only available at a resolution of approximately 0.5 degrees per grid cell, such as with the PRIO-GRID.In such a case, the researcher should make a grid of 0.5 degrees per grid cell.However, if the researcher has access to more granular data, then a grid at a higher resolution would be preferable.

Figure 17 goes about here
For the sake of this example, let us assume that we are interested in violence against civilians in urban areas in Asia.Accordingly, a measure of the level of infrastructure development would be useful for our analysis.While some of the world's largest cities may sprawl for many tens of kilometres, most urban areas will only spread for a few kilometres at most.If we use a low resolution grid, then, the chances are greater that we will miss these urban areas. 14  1 For instance, in R, the code would be acled mask <-AFG mask + MMR mask + NPL mask + PAK mask + KHM mask + LAO mask 13 Remember that ten times more degrees per grid cell applies longitudinally and latitudinally, or horizontally and vertically, so yields 100 times more grid cells.The slight differences from 100 are due to rounding.
14 As was mentioned earlier, a grid built at 0.5 degrees per grid cell covers approximately 3063 km 2 at the equator (a cell of about 55.35 × 55.35km), which few cities would fully occupy.A grid built at 0.05 degrees per cell, however, would cover around 30.6 km 2 at the equator (a cell of 5.53km × 5.53km), which many urban areas would fully occupy.
degrees latitude, 70.1 degrees longitude (see Pakistan A).If we build a grid of 0.05 degrees per grid cell, the nearest location we can include is 33.9 degrees latitude, 70.1 degrees longitude (see Pakistan B), just 280 metres away.However, the nearest location in a grid built at 0.5 degrees per cell would be 34 degrees latitude, 70 degrees longitude (see Pakistan C), a forest area in the mountains, 14.67 kilometres away from Parachinar.Similarly, ACLED code an incident of conflict in Bangkok: the high resolution grid (Thailand B) located 0.47 kilometres away is also found in the centre of the city, while the low resolution grid cell is in farmland, 27.21 kilometres away.
Table ?? goes about here Returning to SpatialGridBuilder, our researcher can now progressively add some other control variables to the models: ruggedness of terrain, tree cover, annualised rainfall, night light, population and infant mortality rate.Results are presented in Tables ?? and ??.The important point to note here is the difference in the infrastructure variable between the two grids.In the low resolution grid, the effect of the infrastructure variable maintains significance across most models, but completely loses significance in the final model H.However, in the high resolution grid, significance is maintained throughout the models.This is in line with expectations based on the example in Figure 11.The high resolution grid is more likely to capture a measure of infrastructure which is based in the same urban area as the one coded by ACLED, while the low resolution grid tends to find areas in more rural areas, which will have a lower value for the infrastructure variable.
The take-home message here is not that researchers should always build higher resolution grids; indeed, just building higher resolution grids for the sake of it merely increases the number of non-conflict events (thus skewing results), and leads to more time spent processing.Instead, researchers should spend time determining the appropriate resolution for their research before finalising their research design.This can be done by building test grids at multiple resolutions.If, like in this example, our researcher is interested in infrastructure, then he or she should start with a low resolution grid and run some spot checks to determine whether the grid is capturing urban areas; if it is not, gradually increase the resolution, and keep testing until sufficient urban areas are being captured.Alternatively, if all of the variables the researcher is interested in are at a relatively low resolution, then build the grid at that resolution.SpatialGridBuilder gives the researcher the freedom to make these decisions.Moreover, by allowing the researcher to build grids at multiple resolutions, SpatialGridBuilder not only provides additional opportunities for robustness checks, but also allows for a more inductive and flexible approach to predictive modelling. 1515 The author is grateful to one of the anonymous reviewers for making this observation.

Conclusion
The past few years have seen a strong movement toward the use of GIS.This is a function of many things: the availability of computers that are able to work on very large data sets; the increasing degree to which the Internet has become an essential part of many researchers' lives; the willingness of many institutions, researchers and programmers to make their data and methods freely available; and finally, the evolution of GIS methods themselves as a result of these changes.Some excellent examples of geospatial software are being developed, but the steep learning curve has discouraged many researchers from using it.This paper has introduced a new means through which conflict researchers can start to introduce spatial data analysis into their work.It is hoped that by providing a program which is as straightforward to use as possible, but which does not in any way reproduce the statistical methods which researchers are already familiar with, this program will help researchers to start making new analyses which can help to further our understanding of conflict.

Bibliography
in Figure 18.In grid A, cells are clearly autocorrelated: the black cells are all next to each other, and the white cells are all next to each other.A standard measure of spatial autocorrelation is Moran's I, which in this case would have a value close to 1. Grid B, the chessboard pattern, is the other end of the spectrum: the correlation is perfectly negative as the black and white cells are competely dispersed.This yields a Moran's I close to -1.Finally, grid C has a pseudo-random distribution and accordingly its Moran's I is close to 0. When testing the grid based on the PRIO Conflict Site data, it was found that the Moran's I value was close to 1, indicating strong positive spatial autocorrelation.Accordingly, it is unwise to use PRIO Conflict Site grid cells as a dependent variable, as ordinary regression models do not control for spatial dependency.Instead, the PRIO Conflict Site cells could be used as an inclusion mask: instead of looking at all conflicts within a state, try looking at all conflicts within the conflict zone, but do not compare them with cells outside the zone.For more on spatial autocorrelation and Moran's I, see Moran, 1950;Bivand, Pebesma and Gómez-Rubio 2008.Luc Anselin's work is especially useful for identifying and dealing with spatial autocorrelation: see Anselin (1988); Anselin and Bera (1988); Anselin et al. (1996); Anselin (2001).

Figure 6
Figure 6 goes about here

Figure 9
Figure9goes about here

Figure 11
Figure 11 goes about here

Figure 15 goes about here 9
Figure15goes about here 9 Researchers should also consider the modifiable areal unit problem, or MAUP; seeOpenshaw 1983) Figure 17 illustrates this.The top row is based on Google Earth images of Parachinar, a small town (population approximately 70,000) in the Kurram Valley, close to the border with Afghanistan.ACLED records an incident of violence against civilians in Parachinar; the data set codes the location of the violence as having the coordinates 33.8975

Figure 2 :
Figure 2: Getting started with SpatialGridBuilder: begin by determining the 'bounding box' of the area of interest (in this case, China).Map based on CIA map of Asia, 2013.

Figure 8 :
Figure 8: Asian states with conflicts in the PRIO Conflict Site 1989-2008 data set (Dittrich Hallberg 2012).'A' constructs the grid at 0.1 degrees per grid cell and applies a land mask; 'B' creates masks for the 24 states covered by PRIO Conflict Site; 'C' plots the radial conflict zones (brighter colours equal more conflicts); 'D' merges the conflict states and the conflict zones to create a conflict-state-zone mask, thereby removing the radial zones which are outside of the PRIO Conflict Site states.

Figure 9 :
Figure 9: Distance of each grid cell to the nearest border point of China.The brighter the shade of grey, the greater the distance.

Figure 10 :
Figure 10: King's move method to determine ruggedness of terrain

Figure 12 :
Figure 12: London, Tokyo and 'Area 51,' Nevada: Google Maps can be used as a proxy for the level of infrastructure development

Figure 14 :
Figure 14: Infant mortality in China and surrounding countries.Brighter shade of grey indicates higher level of infant mortality.Red indicates inferred missing data.

Figure 17 :
Figure 17: Comparison of two incidents of violence against civilians recorded in ACLED.The top row shows the location of an incident in Parachinar, Pakistan.'A' indicates the actual coordinates recorded in ACLED; 'B' is the nearest coordinate pair in a 0.05 degree grid; 'C' is the nearest location in a 0.5 degree grid.The bottom row shows a separate occurrence of violence against civilians in Bangkok.

Figure 18 :
Figure 18: Explaining spatial autocorrelation.'A' is positively autocorrelated, so would have a Moran's I value close to 1. 'B' is negatively correlated, so its Moran's I would be close to -1. 'C', the pseudo-random distribution, has a Moran's I close to 0.