Big Data : Explaining its Uses to Environmental Sciences - Africa Green Magazine

Post Top Ad

Responsive Ads Here

Big Data : Explaining its Uses to Environmental Sciences


Big Data : Explaining its Uses to Environmental Sciences

Defining Big Data

Big Data was the buzz phrase of 2017, but in truth, the concept has been around far longer than that. We know what data is - it is the raw information collected from any study, but particularly in science. Data science is the study of this data. Big Data takes this concept one step further; it is a data set of such complexity that it would be impossible to process, examine, manipulate and present using traditional methods. The intended results are often so complex (1) that it's difficult to process even using tried and tested electronic methods. It's important to note that the term does not necessarily denote the size of the data set (although sometimes a large volume of data is unavoidable), merely it's complexity. Big Data is determined using five metrics 

How Does Environmental Sciences Use Big Data? Real World Case Studies

It should hardly surprise us that government bodies and university research departments all over the world are already using Big Data to aid research and decision-making. Here is a selection of the applied science of Big Data and success stories.

The EPA and Public Health

One of the biggest areas in the US for unifying big data with environmental science is public and environmental health (16). Already, we've seen improvements in the monitoring and mitigation of toxicological issues of industrial chemicals released into the atmosphere. Monitoring has always used the tried and tested methods such as localized environmental sampling, but now we can process such data through computational methods, the result is more accurate, more up-to-date, faster produced, with more analytical information to allow experts to make an informed decision. Big Data allows for high throughput (more resources, a longer period of time), combined data sets (bringing together multiple, otherwise seemingly disparate data sets) and meta-analysis (studies that are the compilation of existing studies to create a more thorough and hopefully accurate picture), and deeper analysis of the results produced from these studies.

EPA is presently using such data acquired through Big Data Analytics to synthesize more accurate predictions for areas where data either does not exist or is difficult to acquire. Also, researchers can identify gaps in the data and potential vulnerabilities in the system and process of investigation. Overall, this mitigates the problems and enhances data for better decision making for public health concerns. They are now working with NCDS (National Consortium for Data Science) to identify current challenges that they hope to address through big data science (16).

For Geographic Data

Few tools have proven as useful to so many environmental sciences as the map. From simple cartography for naval navigation, geographic surveying, to modern uses for Geographic Information Systems (databases of data sets from which we can produce digestible maps and create visually striking imagery for an intended audience), GIS thrives on Big Data. Much of GIS strength lies in its ability to consolidate, utilize and present statistical data. The more data you have from a geographic area, the better the quality of the output and the more informed the decision making is likely to be. Its biggest contribution (so far) seems to be in spatial analytics, and that's good news for GIS technicians and for those people charged with making decisions based on the outputs of their data.

One example is in disaster and emergency relief (17). As recently as 2017, a researcher showed in a seminal study that it would be possible in future to parse textual references to GIS databases for up-to-the-minute problem areas currently suffering from tsunamis, flooding, and earthquakes. This would not have been possible before due to the sheer intensity of cross-referencing requirements. Satellite data and aerial imagery have already informed GIS in disaster management, with Hurricane Katrina being one of the first and best-known choices in using the technology. In future, Big Data will further enhance its efficacy.

Further, the EPA is using geographic data to inform research into public health through the Environmental Quality Index (16). Big Data is informing a number of areas and bringing them together in the most comprehensive analysis of its kind examining air, water, and dry land, and the built environment and socio-economic data (18). It is expected that this information will inform public health decisions and allow for medical research into health disparities of child mortality and poverty.

Climate Change and Planetary Monitoring

In 2013, the UK government announced large-scale investment in Big Data infrastructure for science, particularly in the environmental sector. Of particular note to global research was a commitment to maintaining funding for a program called CEMS (Climate and Environmental Monitoring from Space) (19). This allowed for the creation of larger databases to cope with the upcoming Big Data revolution and to allow research partner organizations to work with more data and produce more results. With a specific focus on climate change and planetary monitoring, CEMS storage removed the need to download enormous data sets while reducing the cost of access (20). It provides the tools as well as the data, allowing for greater efficiency, sharing in the academic community, and providing resources once beyond the reach of many institutes due to budgetary restrictions alone. Along with Cloud data, this is now the standard globally for some of the world's top research institutes.

At the same time, one of the UK's top universities announced plans to open a Big Data center for environmental science research and analysis. It intends to bridge the “data gap” between those who research global environmental problems and those charged with making decisions to remedy such issues (21). That's also at the core of the relationship between the US-based Lighthill Risk Network - an insurance representative organization - and the UK's Institute for Environmental Analytics - a data research organization. Working in partnership to see how big data can be applied to a variety of issues in risk management and natural disasters, particularly in light of increased frequency of erratic and extreme weather, Lighthill is now committed to developing global databases and making the business case for sharing data (22). Such cross-government and partnerships between industry and government are working as shown with the previously discussed EPA programs and the EU-wide Copernicus Climate Change Service which recently went live.

Finally, there are immense implications for the uses of Big Data for climate modeling. As early as 2010, NASA was utilizing Big Data capture and storage for creating climate models to make the most accurate climate projection models yet (30). It is estimated the agency stores as much as 32 petabytes of information for modeling purposes. Models thrive on enormous data sets, complex data and accumulated metadata. As far as the sciences are concerned, climate modeling could be the single most important area of academia for Big Data applications. Learn more about the history of climate change.


With an ever-growing global population putting more pressure on resources, agritech is going to have to invest in some important developments. It's projected that barring major disaster, the global population in 2060 will be 9 billion (23, p401) with the highest growth in poorer countries. This means a lot of investment in agricultural systems to cope. One of these is GM technology, expected to help the world's poorest communities grow resilient crops for sustainable food supply and economies. However, GM alone is not going to solve this problem.

Essential resource management plans will need to be put into place to ensure we are making the most of agricultural land and effectively using ground nutrients, limiting deforestation, properly managing water resources and developing new methods of farming that could use even less space than before. In the US, some notable agricultural organizations are already using crowdsourced data in conjunction with remote sensing and publicly available data such as weather forecast information (23, p402). This allows the creation of Big Data sets so domestic farmers can improve land use efficiency, maximizing productivity and revenue stream. Here, Big Data is used in environmental engineering to inform farmer what crops they should plant this year and even the likely event of when their machinery will break down. This information may be used for crop management in the first instance (to cope with predicted extreme weather) or order parts ahead of time so that work is not lost in the second.

This is expected to be even more important in the developing world for people who live in so-called “marginal landscapes” (29). This is where the agricultural growth production is low due to erratic water supply, low precipitation, located in particularly acidic or alkaline soils. Some people have had to use such landscapes through little choice; they may be bad choices, but they are still the best available to them. The use of Big Data here is two-fold: firstly, providing mitigation and management tools for marginal landscapes already in use. Second, identifying the best uses for marginal landscapes not already turned over to agriculture (24).

Genetic Studies

Although not technically an environmental science, it has many uses beneficial to the environment from GM technology to gene mapping, examination of the spread and transmission of infectious diseases in vital food crops such as Panama Disease in bananas (25). It's useful in a wide range of biological sciences. We expect many advances in genetics to come thanks to the advent of Big Data. When the human genome was decoded in the early part of the last decade, the process took over 10 years. Now, with Big Data analytics, OECD estimates that the exact same process, if carried out for the first time today, would take just 24 hours (26). Faster research of genetic structures means faster reaction and identification to problematic genes and faster implementation for mitigation measures.

Citizen Science

One of the unexpected benefits of Big Data to any science, but particularly the environment is so-called “Citizen Science”. This is the accumulation of data reported from people in geographic locations all over the world voluntarily offering information on conditions where they live. It is often beyond the financial and time resources of researchers to investigate all claims directly, so they rely on local people to report such information. This is not new, but the term “citizen science” and the overt public engagement is new. Indeed, there are many examples of successful citizen science projects already such as the Christmas Bird Census of 1900 (27) and that came long before global communication, cloud storage and mobile technology - arguably the three technologies that have enabled public engagement like no other.

When many people report phenomena, it reduces the possibility of hoax, misinterpretation and fake reporting. While anecdotal evidence is not useful in some areas and, indeed counterproductive in others, science organizations all over the globe are inviting input from interested amateurs and stoking interest in environmental science. The Christmas Bird Census may have been born out of collective horror of the mass slaughter of native North American birds, but it did raise consciousness later of the potential ecological problems of such a “tradition” and how citizen themselves could help with conservation if engaged in the right way. Even widespread voluntary human drug trials for new pharmaceuticals can be considered “citizen science” with volunteers in a wide range of lifestyles engaging in experiments and reporting side effects and effects on medical conditions back to researchers (28).

Anthropology & Archaeology

The study of people in the past (and their material remains) may not be the first outlet you might consider for Big Data application, mostly because they tend to study small groups of individuals on specific sites. However, compiling such data can have benefits to studies over large areas to determine the spread of technology, cultural evolution, and even track the spread of ancient farming practices such as slash and burn. Accumulated digital data is not new to these two areas. Statistics are and have always been a useful tool in such methods as aerial survey data and remote sensing, both of which are profoundly useful to relatively new technologist such as GIS (Geographic Information Systems) (31). In 2017, it was suggested that Big Data could be used to plow through old excavation reports to “data mine” in a hope of extracting new information.

Archaeologists and anthropologists often deal with complex data, comparing site analyses and trying to marry up otherwise seemingly disparate data sets. In theory, this could make large-scale investigations into the affairs of humans in the past much faster, broader and more complex. This should result in more complex and useful results, improved visualizations, greater computing power and more informed/useful results in cultural studies (32). This can be just as useful in studying modern populations as for societies in the past. Learn more about archaeology.

In Environmental Conservation

It was reported in 2014 that Big Data was not yet part of the world of sustainability and environmental conservation (33). Although some applications have proven useful in climate science and climate modelling, there are still few areas where Big Data is useful in such areas as land conservation, sustainability and local environmental mitigation. The seminal report did go on to acknowledge a number of essential areas that could (in theory) benefit from the application of Big Data and Big Data Analytics in the future (33, p7). These included:
  • Environmental NGOs may use data as evidence for lobbying governments to instigate laws or other measures to protect individual landscapes. As these groups are often at the forefront of advocacy because they are at the forefront of application, they produce the data and could use it in support of their findings
  • Third-party specialists and consultants who can accumulate data and provide such information in reports for clients, similar work to the NGOs noted in the first point
  • Corporate entities may employ Big Data in two forms: firstly as evidence that they are complying with government regulation pertaining to their industry and sector; secondly to launch investigations into issues to determine the cause of an environmental problem
  • International organizations who work in environmental policy research to make recommendations to other international decision-making organizations and lobby groups
  • Government bodies in determining policy and bills on environmental regulation and sustainability. At present, the US is working with the Dutch government in ensuring open data policy for Big Data analytics in this area

In Regional Planning

Urban landscapes are often overlooked when discussing environmental sciences. But urban centers are environments too, sometimes with their own ecosystems. They are a curious ecology, impact the environment, are impacted the environment, providing life and work for residents and becoming self-contained ecological islands. Yet studies in urbanism represent some of the best and earliest examples of the application of Big Data. In 2014, a report on China's applied statistics and Big Data to examine urban systems and urban-rural planning highlighted the project (begun in the year 2000) as a major success (34). 2014 was the year they engaged in rapid expansion of the practice. It requires a unification of data between information technologists, geographers, logistics and urban planning.

Big Data can be applied to examine problems areas for traffic (and aid decision making on where to place new roads), crime centers (and where to focus law enforcement resources), health problems (and to attempt to understand why certain areas experience certain health problems - pollution, poverty, poor access to resources etc). Standard data sets are insufficient, lacking depth, and urban planning requires information from disparate sources - demographics, geographic information, resources, employment figures, pollution, employment, health and many more to understand the complex parts that go into making an urban center function.

Big Data should improve the process of urban planning and resource allocation. In fact, it's already doing so. More recently, studies have shown the usefulness of Big Data in planning “smart urban planning” (35) through large data sets, and the relative usefulness of doing so in future. It's expected to be both a time saver and a money saver.

What are the Advantages of Using Big Data in Environmental Sciences?

As you can see from the above section, across the world, government departments and university research facilities already using or preparing for big data. Few have made as many strides as the US EPA (Environmental Protection Agency) (16). The advantages are numerous.

Collecting, Sorting, Analyzing, Presenting Quickly

As hinted in the scenarios presented above, Big Data's major advantage is in the capacity to collect masses of data and analyze it quickly; it's a realistic cost and resource saving tool in areas often drastically underfunded and having to cut costs. The storage capacity now exists to collect and collate; the computing power is also affordable to process and manipulate in any way necessary. This is most obvious in climatology, even if the community has been relatively slow to adopt it (36). These two processes alone make Big Data vital for environmental science presentation and accuracy.

Error Mitigation

How to handle errors in data, reporting, rogue data and anomalous results has been one of the biggest problems facing any science. When sample sizes are too small, anomalous data can be given more importance than it deserves. But studies are often limited by sample size alone due to resource factors. The larger a data set, the more likely a rogue piece of information will fall in significance and not damage the overall result (37). Coupled with the cost and resource saving, environmental studies can, in theory, become larger and more thorough, producing more accurate results.

Better Environmental Management

This applies to urban management as our cities continue to undergo rapid and vast changes in line with changing technology and demands of residents. In one study, the Norwegian capital of Oslo was able to reduce its energy consumption through the application of Big Data Analytics when examining its energy resources (38). Similarly, in Denver, predictive reporting and risk analysis at the city's Police Departments was able to reduce serious crime by around 30%. Portland in Oregon used a similar system to analyze stop light changes at intersections in order to manage traffic flow better. After just six years, the city eliminated 157,000 metric tonnes of CO2 emissions. Traffic flow varies as a city grows; what was once a sufficient stoplight pattern can change.

Better Decision Making

By sheer weight of numbers, Big Data and the analytical tools used in its processing is able to process and analyze more past data than ever before. Previously, this too was limited by resources but with its increased access and availability, it is expected to permit easier presentation and reporting, delivering more confident results and therefore, better to aid decision makers and policy development professionals. Scientists and government can work together more efficiently in future, not just to react to the environmental problems of today, but work with greater foresight today to make better decisions for tomorrow.

What are the Current Challenges for Big Data in Environmental Sciences?

Big Data is not expected to be a panacea for all the world's environmental problems or for research or applied science in general. Nor is it designed to be a one-size-fits-all answer. Like any other emerging technology, there are problems and limitations to keep in mind when extolling the virtues of Big Data.

Technical Limitations

Due to the complexity of so-called Big Data, the method presents a number of other challenges to those who seek to acquire and use it. For instance, the framework for each of the following concepts:
  • Methods for capturing data
  • The capacity for storing it
  • Analyzing the data when captured
  • Searching, sharing and transferring during the utilization process
  • Visualization and querying of data
  • Updating the information in line with recent changes
  • Data security, privacy issues and the sources of storage
May not always have the capacity, especially where the volume of data quickly outstrips the capacity for present computing technology to perform any of the above functions. Big Data's increase is and so far, has been, exponential in growth. To keep up, hardware in all of the areas above will need to keep up, if not exceed the necessary capacities. We must also not underestimate the problems with human error - wrongly entered data, poor processing due to mistakes, and interpretation of that data. The information may not lie, but humans can and do make mistakes.

Ethical Limitations

In all this, it's important to remember that some sciences concern data pertaining to humans. Issues will include problems such as cultural sensitivities as in archaeology and anthropology (32). Some critics are concerned that in reducing populations to Big Data information, we reduce their humanity, their individuality. However, with the improvements in disaster response time, applications in climate science, and in the enormous data processes when examining archaeological/anthropological information, it's likely that these human sciences and humanities concerned with the environment will benefit in the long-term.

Also, we must be aware of the legal ramification of data storage. Here in the US, HIPAA protects a patient's rights to their medical history. The European Union is introducing a set of regulations called GDPR (General Data Protection Regulation) in May 2018. This will affect the USA, especially researchers, scientific institutes and anybody handling Big Data from entities operating in the European Union, or information relating to any citizen living within a member state of the EU and EEA (39). It is understood that the US government is watching closely to see how GDPR functions and how it might adopt such a law in future. It's likely such information will receive protection with required deletion at the owner's request; the ramifications for information stored about people will certainly apply.

Lack of Widespread “Open Access”

Research institutes and businesses are often incredibly protective of their research data, especially where mass profitability is involved. Yet there has been a move in recent decades to call for subscription-free public access to scientific data. Known as Open Access, not enough strides have been made in this area, in some disciplines, that Big Data Analytics is not presently experiencing its full potential and much data is restricted, meaning that - although studies can call on more data and do more with it - there is still a large amount of data that could prove useful in environmental science, held privately with limited or no access. Although fear of handing over information to competitors is part of the issue, other problems include lack of resources to do so or a lack of awareness of how useful Open Access can be (32). Together, Open Access and Big Data has the capacity to be a powerful force in research science, but the latter is being held back by a lack of the former.


Aucun commentaire:

Enregistrer un commentaire

Post Bottom Ad

Responsive Ads Here