Emma Waugh

Check out my work and reflections in Open Source GIScience!

View My GitHub Profile

Reproducibility & Replicability in Open Source GIS

September 27, 2021

Open source GIS is playing an important role in the reproducibility/replicability realm of science. Besides making tools and data more accessible, I imagine that accountability increases when it’s easier for researchers to check credibility of their peers’ claims, and public trust in geographic research will be enhanced.

R & R in GIS

Open source GIS shows the data and analysis and certainly contributes to issues with R&R as long as code is well documented. And therefore, code is essentially cross-referenced when it is applied to other study areas or in different contexts, to determine if the initial result was anomalous. It seems as if the difference between idiographic and nomothetic approaches to geography make it more of a challenge to quantify replicability, though geographers are discussing how to incorporate the two together (Sui & Kedron 2021).

Additionally, development of infrastructure in the form of software, guides, and reporting standards in organizations increase accountability within the field (Sui & Kedron 2021). It also seems as if the open source GIS community is very engaged in each others’ work, and is driving more collaboration and innovation in ways to address spatial problems. I’m sure there are other disciplines that are making a coordinated effort to maintain the availability of raw data and source code, within computer and data sciences. As I’ve seen in some of my analyses, the health sciences also have a lot to gain from open source research. The National Institutes of Health is beginning to require its funded research to make data publicly available.

Limitations of R & R in Geography

There are some problems with reproducibility and replicability that open source GIS will not address on its own, however. I’m not sure that open source has improved precision compared to other methods. Also, my understanding is that being open source GIS only requires that code is available, and not that other decisions or data sources are included–depending on the types of data being incorporated, I can see where specific choices in methodology may not be reproducible without very thorough documentation; for example, there can be a lot of variation in field sampling or survey procedures that may not necessarily be documented. It seems like these limitations would be more easily avoided in GIS compared to other sciences, however. In addition, decisions in organizing and cleaning data can play a big role in the results (NASEM 2019, Box 3.2), though well-documented code should represent those. And, as mentioned above, idiographic research within geography

References

NASEM. 2019. Reproducibility and Replicability in Science. Washington, D.C.: National Academies Press. DOI: 10.17226/25303

Sui, D., and P. Kedron. 2021. Reproducibility and Replicability in the Context of the Contested Identities of Geography. Annals of the American Association of Geographers 111 (5):1275–1283. DOI: 10.1080/24694452.2020.1806024.

Main Page