Crafting mines from satellite images

By Victor Maus, alumnus of the IIASA Ecosystems Services and Management Program and researcher at the Vienna University of Economics and Business

The mining of coal, metals, and other minerals causes loss of natural habitats across the entire globe. However, available data is insufficient to measure the extent of these impacts. IIASA alumnus Victor Maus and his colleagues mapped more than 57,000 km² of mining areas over the whole world using satellite images.

 

© Pix569 | Dreamstime.com

Our modern lifestyles and consumption patterns cause environmental and social impacts geographically displaced in production sites thousands of kilometres away from where the raw materials are extracted. Complex supply chains connecting mineral mining regions to consumers often obscure these impacts. Our team at the Vienna University of Economics and Business is investigating these connections and associated impacts on a global-scale www.fineprint.global.

However, some mining impacts are not well documented across the globe, for example, where and how much area is used to extract metals, coal, and other essential minerals are unknown. This information is necessary to assess the environmental implications, such as forest and biodiversity loss associated with mining activities. To cover this data gap, we analyzed the satellite images of more than 6,000 known mining regions all around the world.

Visually identifying such a large number of mines in these images is not an easy task. Imagine you are flying and watching from the window of a plane, how many objects on the Earth’s surface can you identify and how fast? Using satellite images, we searched and mapped mines over the whole globe. It was a very time-consuming and exhausting task, but we also learned a lot about what is happening on the ground. Besides, it was very interesting to virtually visit a vast range of mining places across the globe and realize the large variety of ecosystems that are affected by our increasing demand for nature’s resources.

The result of our adventure is a global data set covering more than 21,000 mapped areas adding up to around 57,000 km² (that is about the size of Croatia or Togo). These mapped areas cover open cuts, tailings dams, piles of rocks, buildings, and other infrastructures related to the mining activities — some of them extending to almost 10 km (see figure below). We also learned that around 50 % of the mapped mining area is concentrated in only five countries, China, Australia, the United States, Russia, and Chile.

Examples of mines viewed from Google Satellite images. (a) Caraj\'{a}s iron ore mine in Brazil, (b) Batu Hijau copper-gold mine in Indonesia, and (c) Super Pit gold mine in Australia. In purple is the data collected for these mines (Figure source: www.nature.com/articles/s41597-020-00624-w).

Using these data, we can improve the calculation of environmental indicators of global mineral extraction and thus support the development of less harmful ways to extract natural resources. Further, linking these impacts to supply chains can help to answer questions related to our consumption of goods. For example, which impacts the extraction of minerals used in our smartphones cases and where on the planet they occur? We hope that many others will use the mining areas data for their own research and applications. Therefore, the data is fully open to everyone. You can explore the global mining areas using our visualization tool at www.fineprint.global/viewer or you can download the full data set from doi.pangaea.de/10.1594/PANGAEA.910894. The complete description of the data and methods is in our paper available from www.nature.com/articles/s41597-020-00624-w.

This blog post first appeared on the Springer Nature “Behind the paper” website. Read the original post here.

Note: This article gives the views of the authors, and not the position of the Nexus blog, nor of the International Institute for Applied Systems Analysis.

Open science has to go beyond open source

By Daniel Huppmann, research scholar in the IIASA Energy Program

Daniel Huppmann sheds light on how open-source scientific software and FAIR data can bring us one step closer to a community of open science.

© VectorMine | Dreamstime.com

Over the past decade, the open-source movement (e.g., the Free Software Foundation (FSF) and the Open Source Initiative (OSI)) has had a tremendous impact on the modeling of energy systems and climate change mitigation policies. It is now widely expected – in particular by and of early-career researchers – that data, software code, and tools supporting scientific analysis are published for transparency and reproducibility. Many journals actually require that authors make the underlying data available in line with the FAIR principles – this acronym stands for findable, accessible, interoperable, and reusable. The principles postulate best-practice guidance for scientific data stewardship. Initiatives such as Plan S, requiring all manuscripts from projects funded by the signatories to be released as open-access publications, lend further support to the push for open science.

Alas, the energy and climate modeling community has so far failed to realize and implement the full potential of the broader movement towards collaborative work and best practice of scientific software development. To live up to the expectation of truly open science, the research community needs to move beyond “only” open-source.

Until now, the main focus of the call for open and transparent research has been on releasing the final status of scientific work under an open-source license – giving others the right to inspect, reuse, modify, and share the original work. In practice, this often means simply uploading the data and source code for generating results or analysis to a service like Zenodo. This is obviously an improvement compared to the previously common “available upon reasonable request” approach. Unfortunately, the data and source code are still all too often poorly documented and do not follow best practice of scientific software development or data curation. While the research is therefore formally “open”, it is often not easily intelligible or reusable with reasonable effort by other researchers.

What do I mean by “best practice”? Imagine I implement a particular feature in a model or write a script to answer a specific research question. I then add a second feature – which inadvertently changes the behavior of the first feature. You might think that this could be easily identified and corrected. Unfortunately, given the complexity and size to which scientific software projects tend to quickly evolve, one often fails to spot the altered behavior immediately.

One solution to this risk is “continuous integration” and automated testing. This is a practice common in software development: for each new feature, we write specific tests in an as-simple-as-possible example at the same time as implementing the function or feature itself. These tests are then executed every time that a new feature is added to the model, toolbox, or software package, ensuring that existing features continue to work as expected when adding a new functionality.

Other practices that modelers and all researchers using numerical methods should follow include using version control and writing documentation throughout the development of scientific software rather than leaving this until the end. In addition, not just the manuscript and results of scientific work should be scrutinized (aka “peer review”), but such appraisal should also apply to the scientific software code written to process data and analyze model results. In addition, like the mentoring of early-career researchers, such a review should not just come at the end of a project but should be a continuous process throughout the development of the manuscript and the related analysis scripts.

In the course that I teach at TU Wien, as well as in my work on the MESSAGEix model, the Intergovernmental Panel on Climate Change Special Report on Global Warming of 1.5°C scenario ensemble, and other projects at the IIASA Energy Program, I try to explain to students and junior researchers that following such best-practice steps is in their own best interest. This is true even when it is just a master’s thesis or some coursework assignment. However, I always struggle to find the best way to convince them that following best practice is not just a noble ideal in itself, but actually helps in doing research more effectively. Only when one has experienced the panic and stress caused by a model not solving or a script not running shortly before a submission deadline can a researcher fully appreciate the benefits of well-structured code, explicit dependencies, continuous integration, tests, and good documentation.

A common trope says that your worst collaborator is yourself from six months ago, because you didn’t write enough explanatory comments in your code and you don’t respond to emails. So even though it sounds paradoxical at first, spending a bit more time following best practice of scientific software development can actually give you more time for interesting research. Moreover, when you then release your code and data under an open-source license, it is more likely that other researchers can efficiently build on your work – bringing us one step closer to a community of open science!

Note: This article gives the views of the authors, and not the position of the Nexus blog, nor of the International Institute for Applied Systems Analysis.

What did we learn from COVID-19 models?

By Sibel Eker, researcher in the IIASA Energy Program

IIASA researcher Sibel Eker explores the usefulness and reliability of COVID-19 models for informing decision making about the extent of the epidemic and the healthcare problem.

© zack Ng 99 | Dreamstime.com

In the early days of the COVID-19 pandemic, when facts were uncertain, decisions were urgent, and stakes were very high, both the public and policymakers turned not to oracles, but to mathematical modelers to ask how many people could be infected and how the pandemic would evolve. The response was a plethora of hypothetical models shared on online platforms and numerous better calibrated scientific models published in online repositories. A few such models were announced to support governments’ decision-making processes in countries like Austria, the UK, and the US.

With this announcement, a heated debate began about the accuracy of model projections and their reliability. In the UK, for instance, the model developed by the MRC Centre for Global Infectious Disease Analysis at Imperial College London projected around 500,000 and 20,000 deaths without and with strict measures, respectively. These different policy scenarios were misinterpreted by the media as a drastic variation in the model assumptions, and hence a lack of reliability. In the US, projections of the model developed by the University of Washington’s Institute for Health Metrics and Evaluation (IHME) changed as new data were fed into the model, sparking further debate about the accuracy thereof.

This discussion about the accuracy and reliability of COVID-19 models led me to rethink model validity and validation. In a previous study, my colleagues and I showed that, based on a vast scientific literature on model validation and practitioners’ views, validity often equates with how good a model represents the reality, which is often measured by how accurately the model replicates the observed data. However, representativeness does not always imply the usefulness of a model. A commentary following that study emphasized the tradeoff between representativeness and the propagation error caused by it, thereby cautioning against an exaggerated focus on extending model boundaries and creating a modeling hubris.

Following these previous studies, in my latest commentary in Humanities and Social Sciences Communications, I briefly reviewed the COVID-19 models used in public policymaking in Austria, the UK, and the US in terms of how they capture the complexity of reality, how they report their validation, and how they communicate their assumptions and uncertainties. I concluded that the three models are undeniably useful for informing the public and policy debate about the extent of the epidemic and the healthcare problem. They serve the purpose of synthesizing the best available knowledge and data, and they provide a testbed for altering our assumptions and creating a variety of “what-if” scenarios. However, they cannot be seen as accurate prediction tools, not only because no model is able to do this, but also because these models lacked thorough formal validation according to their reports in late March. While it may be true that media misinterpretation triggered the debate about accuracy, there are expressions of overconfidence in the reporting of these models, even though the communication of uncertainties and assumptions are not fully clear.

© Jaka Vukotič | Dreamstime.com

© Jaka Vukotič | Dreamstime.com

The uncertainty and urgency associated with pandemic decision-making is familiar to many policymaking situations from climate change mitigation to sustainable resource management. Therefore, the lessons learned from the use of COVID models can resonate in other disciplines. Post-crisis research can analyze the usefulness of these models in the discourse and decision making so that we can better prepare for the next outbreak and we can better utilize policy models in any situation. Until then, we should take the prediction claims of any model with caution, focus on the scenario analysis capability of models, and remind ourselves one more time that a model is a representation of reality, not the reality itself, like René Magritte notes that his perfectly curved and brightly polished pipe is not a pipe.

References

Eker S (2020). Validity and usefulness of COVID-19 models. Humanities and Social Sciences Communications 7 (1) [pure.iiasa.ac.at/16614]

Note: This article gives the views of the author, and not the position of the Nexus blog, nor of the International Institute for Applied Systems Analysis.

Mapping habitats in support of biodiversity research

By Martin Jung, postdoctoral research scholar in the IIASA Ecosystems Services and Management Program.

IIASA postdoc Martin Jung discusses how a newly developed map can help provide a detailed view of important species habitats, contribute to ongoing ecosystem threat assessments, and assist in biodiversity modeling efforts.

Biodiversity is not evenly distributed across our planet. To determine which areas potentially harbor the greatest number of species, we need to understand how habitats valuable to species are distributed globally. In our new study, published in Nature Scientific Data, we mapped the distribution of habitats globally. The habitats we used are based on the International Union for Conservation of Nature (IUCN) Red List habitat classification scheme, one of the most widely used systems to assign species to habitats and assess their extinction risk. The latest map (2015) is openly available for download here. We also built an online viewer using the Google Earth Engine platform where the map can be visually explored and interacted with by simply clicking on the map to find out which class of habitat has been mapped in a particular location.

Figure 1: View on the habitat map with focus on Europe and Africa. For a global view and description of the current classes mapped, please read Jung et al. 2020 or have a look at the online interactive interface.

The habitat map was created as an intersection of various, best-available layers on land cover, climate, and land use (Figure 1). Specifically, we created a decision tree that determines for each area on the globe the likely presence of one of currently 47 mapped habitats. For example, by combining data on tropical climate zones, mountain regions and forest cover, we were able to estimate the distribution of subtropical/tropical moist mountainous rain forests, one of the most biodiverse ecosystems. The habitat map also considers best available land use data to map human modified or artificial habitats such as rural gardens or urban sites. Notably, and as a first, our map also integrates upcoming new data on the global distribution of plantation forests.

What makes this map so useful for biodiversity assessments? It can provide a detailed view on the remaining coverage of important species habitats, contribute to ongoing ecosystem threat assessments, and assist in global and national biodiversity modeling efforts. Since the thematic legend of the map – in other words the colors, symbols, and styles used in the map – follows the same system as that used by the IUCN for assessing species extinction risk, we can easily refine known distributions of species (Figure 2). Up to now, such refinements were based on crosswalks between land cover products (Figure 2b), but with the additional data integrated into the habitat map, such refinements can be much more precise (Figure 2c). We have for instance conducted such range refinements as part of the Nature Map project, which ultimately helped to identify global priority areas of importance for biodiversity and ecosystem services.

Figure 2: The range of the endangered Siamang (Symphalangus syndactylus) in Indonesia and Malaysia according to the IUCN Red List. Up to now refinements of its range were conducted based on land cover crosswalks (b), while the habitat map allows a more complete refinement (c).

Similar as with other global maps, this new map is certainly not without errors. Even though a validation has proved good accuracy at high resolution for many classes, we stress that – given the global extent and uncertainty – there are likely fine-scale errors that propagate from some of the input data. Some, such as the global distribution of pastures, are currently clearly insufficient, with existing global products being either outdated or not highly resolved enough to be useful. Luckily, with the decision tree being implemented on Google Earth Engine, a new version of the map can be created within just two hours.

In the future, we plan to further update the habitat map and ruleset as improved or newer data becomes available. For instance, the underlying land cover data from the European Copernicus Program is currently only available for 2015, however, new annual versions up to 2018 are already being produced. Incorporating these new data would allow us to create time series of the distribution of habitats. There are also already plans to map currently missing classes such as the IUCN marine habitats – think for example of the distribution of coral reefs or deep-sea volcanoes – as well as improving the mapped wetland classes.

Lastly, if you, dear reader, want to update the ruleset or create your own habitat type map, then this is also possible. All input data, the ruleset and code to fully reproduce the map in Google Earth Engine is publicly available. Currently the map is at version 003, but we have no doubt that the ruleset and map can continue to be improved in the future and form a truly living map.

Reference:

Jung M, Raj Dahal P, Butchart SHM, Donald PF, De Lamo X, Lesiv M, Kapos V,Rondinini C, & Visconti P (2020). A global map of terrestrial habitat types. Nature Scientific Data DOI: 10.1038/s41597-020-00599-8 

Note: This article gives the views of the author, and not the position of the Nexus blog, nor of the International Institute for Applied Systems Analysis.

How citizen science can fill data gaps for the SDGs

By Dilek Fraisl, researcher in the IIASA Ecosystems Services and Management Program and chair of the WeObserve SDGs and Citizen Science Community of Practice.

How can we address the data gaps for achieving the United Nations’ Sustainable Development Goals (SDGs)? What is the potential of citizen science to track progress on the SDGs as a new source of data? How can we harness citizen science data effectively for evidence-based policymaking and SDG achievement?

These were just some of the questions we had in mind when we started research into the contributions of citizen science to SDG monitoring at the Sustainable Solutions Development Network (SDSN) Thematic Research Network on Data and Statistics (TReNDS). We were aware that citizen science has a role to play, but we didn’t know what the extent of that role would be. We wanted to show where exactly the real potential of citizen science lies in the global SDG indicator framework and also to understand what we can do to bring all the key players together to fully realize this potential.

This research led to our paper “Mapping Citizen Science Contributions to the UN Sustainable Development Goals”, which was recently published in the journal Sustainability Science.

© Litter Intelligence by Sustainable Coastlines

Our most remarkable finding was that citizen science could contribute to the achievement of all 17 Sustainable Development Goals (SDGs) by  providing data for 33% of all SDG indicators. There are currently 247 SDG indicators that are defined in an evolving framework that includes 17 goals and 169 targets. This has huge potential.

We first investigated the metadata and work plans of all the SDG indicators and then searched for citizen science initiatives at global, national, and even local scales that could potentially contribute data to the monitoring of these indicators. This work was carried out with volunteer members of the SDGs and Citizen Science Community of Practice (SDGs CoP) that was launched a year and a half ago for the WeObserve project.

We also looked at the overlap between contributions from citizen science and earth observations in our study. Based on the mapping exercise GEO undertook of the 29 indicators identified, citizen science could support 24. This shows great potential for citizen science and earth observation approaches to complement each other. One example would be Picture Pile  ̶  a flexible tool that ingests imagery from satellites, unmanned aerial vehicles (UAVs), or geotagged photos for rapid assessment and classification.

In Picture Pile, the volunteers are provided with a pair of images taken at different times and asked whether they see any tree loss (to identify deforestation), damaged buildings after a disaster (for post disaster damage assessment), marine plastics (to understand the extent of plastics problem), or to assess levels of poverty (to map poverty), among others. Picture Pile combines earth observation and citizen science approaches that could be used for monitoring some SDG indicators. To name but a few: 1.5.2 Direct economic loss attributed to disasters in relation to global gross domestic product (GDP); 11.1.1 Proportion of urban population living in slums, informal settlements, or inadequate housing; 14.1.1b Floating plastic debris density; and 15.1.1 Forest area as a proportion of total land area. Exploring and realizing this potential of citizen science and earth observation is one of our priorities at the GEO Community Activity on Citizen Science (GEO-CITSCI).

Thanks to this study, we now know which initiatives could be leveraged to contribute to SDG monitoring, and we have the groundwork to show to project teams, National Statistical Offices, and custodian agencies to start discussions around how to realize it fully.

The SDG indicators where citizen science projects are “already contributing” (in green), “could contribute” (in yellow) or where there is “no alignment” (in grey). The overall citizen science contributions to each SDG are summarized as pie charts. Black borders around indicators show the overlap between citizen science and EO, as identified by GEO (2017).

The Picture Pile application (both online and for mobile devices) is designed to be a generic and flexible tool for ingesting imagery that can then be rapidly classified by volunteers. Picture Pile, IIASA.

Another important finding of our work was that the greatest potential for citizen science  ̶  when existing and potential future contributions are combined  ̶  could occur respectively in SDG 15 (Life on Land), SDG 11 (Sustainable Cities and Communities), SDG 3 (Good Health and Wellbeing), and SDG 6 (Clean Water and Sanitation). This shows that citizen science has the greatest potential for input to the environmental SDG indicators.

Of the 93 environmental indicators in the SDG indicator framework identified by the United Nations Environment Programme (UNEP), citizen science could provide inputs for 37 (around 40%) indicators. As 68% of these environmental SDG indicators lack data, again identified by UNEP, also given that we only have 10 years left to achieve the SDGs, we need to start thinking about how to leverage this potential citizen science offer for SDG monitoring.

In order to effectively monitor and ultimately achieve the SDGs, traditional ways of data collection such as censuses or household surveys will not be sufficient. Additionally, they will also be too expensive to cover the wide range of the SDGs with its 169 targets and 247 indicators on a regular basis. We urgently need to act on the results of this study, and to utilize the potential of new ways of data collection such as citizen science, if we are to achieve the SDGs by 2030, but how? Where do we start?

We need to keep working on demonstrating the value of citizen science in the global data ecosystem through initiatives such as the WeObserve the SDGs CoP, building partnerships around citizen science data involving all the stakeholders, and encouraging investment to leverage the use of citizen science data for the SDGs. We should develop case studies and success stories about the use of citizen science by NSOs and design the citizen science initiatives with NSOs and other government agencies to ensure that their data quality requirements are met.

I believe it is important to mention that citizen science is not only a source of data that could fill gaps, but it is also a great way to mobilize action and get everyone on board to play their part in addressing the world’s greatest challenges by engaging the public in scientific research. Working together, we can harness the potential of citizen science to achieve the UN Sustainable Development Goals (SDGs).

This post first appeared on the Group on Earth Observations (GEO) blog.

Note: This article gives the views of the author, and not the position of the Nexus blog, nor of the International Institute for Applied Systems Analysis.

The IIASA COVID-19 dashboard

By Tadeusz Bara-Slupski, Artificial Intelligence for Good initiative leader, Appsilon Data Science

Tadeusz Bara-Slupski discusses the Artificial Intelligence for Good initiative’s recent collaboration with IIASA to develop an interactive COVID-19 data visualization tool.

Number of hospital beds per 1000 population © IIASA

Public institutions rely on external data sources and analysis to guide policymaking and intervention. Through our AI for Good initiative, we support organizations that provide such inputs with our technical expertise. We were recently approached by IIASA to create a dashboard to visualize COVID-19 data. This builds on our previous collaboration, which had us deliver a decision-making tool for natural disaster risk planning in Madagascar. In this article, we provide an example of how to help policymakers navigate the ocean of available data with dashboards that turn these data into actionable information.

Data is useful information when it creates value…or saves lives

The current pandemic emergency has put an unprecedented strain on both public health services and policymaking bodies around the world. Government action has been constrained in many cases by limited access to equipment and personnel. Adequate policymaking can help to coordinate the emergency relief effort effectively, make better use of scarce resources, and prevent such shortages in the future. This, however, requires access to secure, timely, and accurate information.

Governments commission various public bodies and research institutes to provide such data both for planning and coordinating the response. For instance, in the UK, the government commissioned the National Health Service (NHS) to build a data platform to consolidate a number of data providers into one single source. However, for the data to be useful it must be presented in a way that is consistent with the demands of an emergency situation. Therefore, the NHS partnered with a number of tech companies to visualize the data in dashboards and to provide deeper insights. Raw data, regardless of its quality, is not useful information until it is understood in a way that creates value – or in this case informs action that could save lives.

IIASA approached us to support them in making their COVID-19 data and indicators more useful to policymakers. The institute’s research is used by policymakers around the world to make critical decisions. We appreciated the opportunity to use our skills to support their efforts by creating an interactive data visualization tool.

IIASA COVID-19 report and mapbook

Research indicates that while all segments of the population are vulnerable to the virus, not all countries are equally vulnerable at the same time. Therefore, there is a need for accurate socioeconomic and demographic data to inform the allocation of scarce resources between countries and even within countries.

IIASA responded to this need with a regularly updated website and data report: “COVID-19: Visualizing regional socioeconomic indicators for Europe”. The reader is introduced to a range of demographic, socioeconomic, and health-related indicators for European Union member countries and sub-regions in five categories:

  • Current COVID-19 trends – information about the number of cases and effectiveness of policy response measures
  • Demographic indicators – age, population density, migration
  • Economic indicators – GDP, income, share of workers who work from home
  • Health-related indicators – information about healthcare system capacity
  • Tourism – number of visitors, including foreign

The indicators and data were chosen for their value in assisting epidemiological analysis and balanced policy formulation. Policymakers often face the challenge of prioritizing pandemic mitigation efforts over long-term impacts like unemployment, production losses, and supply-chain disruptions. IIASA’s series of maps and graphs facilitates understanding of these impacts while maintaining the focus on containing the spread of the virus.

Our collaboration – a dashboard for policymakers

Having taken the first step to disseminate the data as information in the form of a mapbook, Asjad Naqvi decided to make these data even more accessible by turning the maps into an interactive and visually appealing tool.

IIASA has previously approached Appsilon Data Science with a data visualization project, which had us improve the features and design of Visualize, a decision support tool for policymakers in natural disaster risk management. Building on this experience, we set out to assist Naqvi with creating a dashboard to deliver the data to end-users even faster.

The application allows for browsing through a list of 32 indicators and visualizing them on an interactive map. The list is not final with indicators being regularly reviewed, added, and retired on a weekly basis.

White circles indicate the number of cases per 1 million citizens.

The application will continue to provide the latest and most relevant information to track regional performance in Europe also in the post-pandemic phase:

The pandemic has a disproportionate impact on women’s employment and revealed some of the systemic inequalities.

Social distancing measures, for instance, have a large impact on sectors with high female employment rates. The closure of schools and daycare facilities particularly affects working mothers. Indicators such as female unemployment rate can inform appropriate remedial action in the post-COVID world and highlight regions of special concern like Castilla-La-Mancha in Spain.

Given the urgency of the pandemic emergency, we managed to develop and deploy this application within five days. We believe such partnerships between data science consultancies and research institutes can transform the way policymakers utilize data. We are looking forward to future collaborations with IIASA and other partners to help transform data into accessible and useful information.

This project was conducted as part of our Artificial Intelligence for Good initiative. The application is available to explore here.

Note: This article gives the views of the author, and not the position of the Nexus blog, nor of the International Institute for Applied Systems Analysis.