Big Data : Explaining its Uses to Environmental Sciences
Defining Big Data
Big Data
was the buzz phrase of 2017, but in truth, the concept has been around far
longer than that. We know what data is - it is the raw information collected
from any study, but particularly in science. Data science is
the study of this data. Big Data takes this concept one step further; it is a
data set of such complexity that it would be impossible to process, examine,
manipulate and present using traditional methods. The intended results are
often so complex (1) that it's difficult to process even using
tried and tested electronic methods. It's important to note that the term does
not necessarily denote the size of the data set (although sometimes a large
volume of data is unavoidable), merely it's complexity. Big Data is determined
using five metrics
How Does Environmental Sciences Use Big Data? Real World Case Studies
It should
hardly surprise us that government bodies and university research departments
all over the world are already using Big Data to aid research and
decision-making. Here is a selection of the applied science of Big Data and
success stories.
The EPA and Public Health
One of
the biggest areas in the US for unifying big data with environmental science is
public and environmental
health (16). Already, we've seen improvements in the monitoring
and mitigation of toxicological issues
of industrial chemicals released into the atmosphere. Monitoring has always
used the tried and tested methods such as localized environmental
sampling, but now we can process such data through computational methods,
the result is more accurate, more up-to-date, faster produced, with more
analytical information to allow experts to make an informed decision. Big Data
allows for high throughput (more resources, a longer period of time), combined
data sets (bringing together multiple, otherwise seemingly disparate data sets)
and meta-analysis (studies that are the compilation of existing studies to
create a more thorough and hopefully accurate picture), and deeper analysis of
the results produced from these studies.
EPA is
presently using such data acquired through Big Data Analytics to synthesize
more accurate predictions for areas where data either does not exist or is
difficult to acquire. Also, researchers can identify gaps in the data and
potential vulnerabilities in the system and process of investigation. Overall,
this mitigates the problems and enhances data for better decision making for
public health concerns. They are now working with NCDS (National Consortium for
Data Science) to identify current challenges that they hope to address through
big data science (16).
For Geographic Data
Few tools
have proven as useful to so many environmental sciences as the map. From
simple cartography for
naval navigation, geographic
surveying, to modern uses for Geographic
Information Systems (databases of data sets from which we can produce
digestible maps and create visually striking imagery for an intended audience),
GIS thrives on Big Data. Much of GIS strength lies in its ability to
consolidate, utilize and present statistical data. The more data you have from
a geographic area, the better the quality of the output and the more informed
the decision making is likely to be. Its biggest contribution (so far) seems to
be in spatial analytics, and that's good news for GIS
technicians and for those people charged with making decisions based
on the outputs of their data.
One
example is in disaster and emergency relief (17). As recently as 2017,
a researcher showed in a seminal study that it would be possible in future to
parse textual references to GIS databases for up-to-the-minute problem areas
currently suffering from tsunamis, flooding, and earthquakes. This would not
have been possible before due to the sheer intensity of cross-referencing
requirements. Satellite data and aerial
imagery have already informed GIS in disaster management, with Hurricane
Katrina being one of the first and best-known choices in using the technology.
In future, Big Data will further enhance its efficacy.
Further,
the EPA is using geographic data to inform research into public health through
the Environmental Quality Index (16). Big Data is informing a
number of areas and bringing them together in the most comprehensive analysis
of its kind examining air, water, and dry land, and the built environment and
socio-economic data (18). It is expected that this information will
inform public health decisions and allow for medical research into health
disparities of child mortality and poverty.
Climate Change and Planetary Monitoring
In 2013,
the UK government announced large-scale investment in Big Data infrastructure
for science, particularly in the environmental sector. Of particular note to
global research was a commitment to maintaining funding for a program called
CEMS (Climate and Environmental Monitoring from Space) (19). This
allowed for the creation of larger databases to cope with the upcoming Big Data
revolution and to allow research partner organizations to work with more data
and produce more results. With a specific focus on climate change and planetary
monitoring, CEMS storage removed the need to download enormous data sets while
reducing the cost of access (20). It provides the tools as well as
the data, allowing for greater efficiency, sharing in the academic community,
and providing resources once beyond the reach of many institutes due to
budgetary restrictions alone. Along with Cloud data, this is now the standard
globally for some of the world's top research institutes.
At the
same time, one of the UK's top universities announced plans to open a Big Data
center for environmental science research and analysis. It intends to bridge
the “data gap” between those who research global environmental problems and
those charged with making decisions to remedy such issues (21).
That's also at the core of the relationship between the US-based Lighthill Risk
Network - an insurance representative organization - and the UK's Institute for
Environmental Analytics - a data research organization. Working in partnership
to see how big data can be applied to a variety of issues in risk management
and natural disasters, particularly in light of increased frequency of erratic
and extreme weather, Lighthill is now committed to developing global databases
and making the business case for sharing data (22). Such
cross-government and partnerships between industry and government are working
as shown with the previously discussed EPA programs and the EU-wide Copernicus
Climate Change Service which recently went live.
Finally,
there are immense implications for the uses of Big Data for climate
modeling. As early as 2010, NASA was utilizing Big Data capture and storage
for creating climate models to make the most accurate climate projection models
yet (30). It is estimated the agency stores as much as 32 petabytes
of information for modeling purposes. Models thrive on enormous data sets,
complex data and accumulated metadata. As far as the sciences are concerned,
climate modeling could be the single most important area of academia for Big
Data applications. Learn more about the history of
climate change.
Agritech
With an
ever-growing global population putting more pressure on resources, agritech is
going to have to invest in some important developments. It's projected that
barring major disaster, the global population in 2060 will be 9 billion (23,
p401) with the highest growth in poorer countries. This means a lot of
investment in agricultural systems to cope. One of these is GM
technology, expected to help the world's poorest communities grow resilient
crops for sustainable food supply and economies. However, GM alone is not going
to solve this problem.
Essential resource
management plans will need to be put into place to ensure we are
making the most of agricultural land and effectively using ground nutrients,
limiting deforestation, properly managing water resources and developing new
methods of farming that could use even less space than before. In the US, some
notable agricultural organizations are already using crowdsourced data in
conjunction with remote sensing and publicly available data such as weather
forecast information (23, p402). This allows the creation of Big
Data sets so domestic farmers can improve land use efficiency, maximizing
productivity and revenue stream. Here, Big Data is used in environmental
engineering to inform farmer what crops they should plant this year
and even the likely event of when their machinery will break down. This
information may be used for crop management in the first instance (to cope with
predicted extreme weather) or order parts ahead of time so that work is not
lost in the second.
This is
expected to be even more important in the developing world for people who live
in so-called “marginal landscapes” (29). This is where the
agricultural growth production is low due to erratic water supply, low
precipitation, located in particularly acidic or alkaline soils. Some people
have had to use such landscapes through little choice; they may be bad choices,
but they are still the best available to them. The use of Big Data here is
two-fold: firstly, providing mitigation and management tools for marginal
landscapes already in use. Second, identifying the best uses for marginal
landscapes not already turned over to agriculture (24).
Genetic Studies
Although
not technically an environmental science, it has many uses beneficial to the
environment from GM technology to gene mapping, examination of the spread and
transmission of infectious diseases in vital food crops such as Panama Disease
in bananas (25). It's useful in a wide range of biological
sciences. We expect many advances in genetics to come thanks to the advent of
Big Data. When the human genome was decoded in the early part of the last
decade, the process took over 10 years. Now, with Big Data analytics, OECD
estimates that the exact same process, if carried out for the first time today,
would take just 24 hours (26). Faster research of genetic
structures means faster reaction and identification to problematic genes and
faster implementation for mitigation measures.
Citizen Science
One of
the unexpected benefits of Big Data to any science, but particularly the
environment is so-called “Citizen Science”. This is the accumulation of data
reported from people in geographic locations all over the world voluntarily
offering information on conditions where they live. It is often beyond the
financial and time resources of researchers to investigate all claims directly,
so they rely on local people to report such information. This is not new, but
the term “citizen science” and the overt public engagement is new.
Indeed, there are many examples of successful citizen science projects already
such as the Christmas Bird Census of 1900 (27) and that came
long before global communication, cloud storage and mobile technology -
arguably the three technologies that have enabled public engagement like no
other.
When many
people report phenomena, it reduces the possibility of hoax, misinterpretation
and fake reporting. While anecdotal evidence is not useful in some areas and,
indeed counterproductive in others, science organizations all over the globe
are inviting input from interested amateurs and stoking interest in
environmental science. The Christmas Bird Census may have been born out of
collective horror of the mass slaughter of native North American birds, but it did
raise consciousness later of the potential ecological problems of such a
“tradition” and how citizen themselves could help with conservation if engaged
in the right way. Even widespread voluntary human drug trials for new
pharmaceuticals can be considered “citizen science” with volunteers in a wide
range of lifestyles engaging in experiments and reporting side effects and
effects on medical conditions back to researchers (28).
Anthropology & Archaeology
The study
of people in the past (and their material remains) may not be the first outlet
you might consider for Big Data application, mostly because they tend to study
small groups of individuals on specific sites. However, compiling such data can
have benefits to studies over large areas to determine the spread of
technology, cultural evolution, and even track the spread of ancient farming
practices such as slash and burn. Accumulated digital data is not new
to these two areas. Statistics are and have always been a useful tool in such
methods as aerial survey data and remote sensing, both of which are profoundly
useful to relatively new technologist such as GIS (Geographic Information
Systems) (31). In 2017, it was suggested that Big Data could be
used to plow through old excavation reports to “data mine” in a hope of
extracting new information.
Archaeologists and anthropologists often
deal with complex data, comparing site analyses and trying to marry up
otherwise seemingly disparate data sets. In theory, this could make large-scale
investigations into the affairs of humans in the past much faster, broader and
more complex. This should result in more complex and useful results, improved
visualizations, greater computing power and more informed/useful results in
cultural studies (32). This can be just as useful in studying
modern populations as for societies in the past. Learn more about archaeology.
In Environmental Conservation
It was
reported in 2014 that Big Data was not yet part of the world of sustainability and
environmental conservation (33). Although some applications have
proven useful in climate science and climate modelling, there are still few
areas where Big Data is useful in such areas as land conservation,
sustainability and local environmental mitigation. The seminal report did go on
to acknowledge a number of essential areas that could (in theory) benefit from
the application of Big Data and Big Data Analytics in the future (33,
p7). These included:
- Environmental NGOs may use data as evidence
for lobbying governments to instigate laws or other measures to protect
individual landscapes. As these groups are often at the forefront of
advocacy because they are at the forefront of application, they produce
the data and could use it in support of their findings
- Third-party specialists and consultants who
can accumulate data and provide such information in reports for clients,
similar work to the NGOs noted in the first point
- Corporate entities may employ Big Data in two
forms: firstly as evidence that they are complying with government
regulation pertaining to their industry and sector; secondly to launch
investigations into issues to determine the cause of an environmental
problem
- International organizations who work in environmental
policy research to make recommendations to other international
decision-making organizations and lobby groups
- Government bodies in determining policy and
bills on environmental regulation and sustainability. At present, the US
is working with the Dutch government in ensuring open data policy for Big
Data analytics in this area
In Regional Planning
Urban landscapes are
often overlooked when discussing environmental sciences. But urban centers are
environments too, sometimes with their own ecosystems. They are a curious
ecology, impact the environment, are impacted the environment, providing life
and work for residents and becoming self-contained ecological islands. Yet
studies in urbanism represent some of the best and earliest examples of the
application of Big Data. In 2014, a report on China's applied statistics and
Big Data to examine urban systems and urban-rural planning highlighted the
project (begun in the year 2000) as a major success (34). 2014 was
the year they engaged in rapid expansion of the practice. It requires a
unification of data between information technologists, geographers,
logistics and urban planning.
Big Data
can be applied to examine problems areas for traffic (and aid decision making
on where to place new roads), crime centers (and where to focus law enforcement
resources), health problems (and to attempt to understand why certain areas
experience certain health problems - pollution, poverty, poor access to
resources etc). Standard data sets are insufficient, lacking depth, and urban
planning requires information from disparate sources - demographics, geographic
information, resources, employment figures, pollution, employment, health and
many more to understand the complex parts that go into making an urban center
function.
Big Data
should improve the process of urban planning and resource allocation. In fact,
it's already doing so. More recently, studies have shown the usefulness of Big
Data in planning “smart urban planning” (35) through large
data sets, and the relative usefulness of doing so in future. It's expected to
be both a time saver and a money saver.
What are the Advantages of Using Big Data in Environmental Sciences?
As you
can see from the above section, across the world, government departments and
university research facilities already using or preparing for big data. Few
have made as many strides as the US EPA (Environmental Protection Agency) (16).
The advantages are numerous.
Collecting, Sorting, Analyzing, Presenting Quickly
As hinted
in the scenarios presented above, Big Data's major advantage is in the capacity
to collect masses of data and analyze it quickly; it's a realistic cost and
resource saving tool in areas often drastically underfunded and having to cut
costs. The storage capacity now exists to collect and collate; the computing
power is also affordable to process and manipulate in any way necessary. This
is most obvious in climatology,
even if the community has been relatively slow to adopt it (36).
These two processes alone make Big Data vital for environmental science
presentation and accuracy.
Error Mitigation
How to
handle errors in data, reporting, rogue data and anomalous results has been one
of the biggest problems facing any science. When sample sizes are too small,
anomalous data can be given more importance than it deserves. But studies are
often limited by sample size alone due to resource factors. The larger a data
set, the more likely a rogue piece of information will fall in significance and
not damage the overall result (37). Coupled with the cost and
resource saving, environmental studies can, in theory, become larger and more
thorough, producing more accurate results.
Better Environmental Management
This
applies to urban management as our cities continue to undergo rapid and vast
changes in line with changing technology and demands of residents. In one
study, the Norwegian capital of Oslo was able to reduce its energy consumption
through the application of Big Data Analytics when examining its energy
resources (38). Similarly, in Denver, predictive reporting and risk
analysis at the city's Police Departments was able to reduce serious crime by
around 30%. Portland in Oregon used a similar system to analyze stop light
changes at intersections in order to manage traffic flow better. After just six
years, the city eliminated 157,000 metric tonnes of CO2 emissions.
Traffic flow varies as a city grows; what was once a sufficient stoplight
pattern can change.
Better Decision Making
By sheer
weight of numbers, Big Data and the analytical tools used in its processing is
able to process and analyze more past data than ever before. Previously, this
too was limited by resources but with its increased access and availability, it
is expected to permit easier presentation and reporting, delivering more
confident results and therefore, better to aid decision makers and policy
development professionals. Scientists and government can work together more
efficiently in future, not just to react to the environmental problems of
today, but work with greater foresight today to make better decisions for
tomorrow.
What are the Current Challenges for Big Data in Environmental Sciences?
Big Data
is not expected to be a panacea for all the world's environmental problems or
for research or applied science in general. Nor is it designed to be a
one-size-fits-all answer. Like any other emerging technology, there are
problems and limitations to keep in mind when extolling the virtues of Big
Data.
Technical Limitations
Due to
the complexity of so-called Big Data, the method presents a number of other
challenges to those who seek to acquire and use it. For instance, the framework
for each of the following concepts:
- Methods for capturing data
- The capacity for storing it
- Analyzing the data when captured
- Searching, sharing and transferring during
the utilization process
- Visualization and querying of data
- Updating the information in line with recent
changes
- Data security, privacy issues and the sources
of storage
May not
always have the capacity, especially where the volume of data quickly outstrips
the capacity for present computing technology to perform any of the above
functions. Big Data's increase is and so far, has been, exponential in growth.
To keep up, hardware in all of the areas above will need to keep up, if not
exceed the necessary capacities. We must also not underestimate the problems
with human error - wrongly entered data, poor processing due to mistakes, and
interpretation of that data. The information may not lie, but humans can and do
make mistakes.
Ethical Limitations
In all
this, it's important to remember that some sciences concern data pertaining to
humans. Issues will include problems such as cultural sensitivities as in
archaeology and anthropology (32). Some critics are concerned that
in reducing populations to Big Data information, we reduce their humanity,
their individuality. However, with the improvements in disaster response time,
applications in climate science, and in the enormous data processes when
examining archaeological/anthropological information, it's likely that these
human sciences and humanities concerned with the environment will benefit in
the long-term.
Also, we
must be aware of the legal ramification of data storage. Here in the US, HIPAA
protects a patient's rights to their medical history. The European Union is
introducing a set of regulations called GDPR (General Data Protection
Regulation) in May 2018. This will affect the USA, especially researchers,
scientific institutes and anybody handling Big Data from entities operating in
the European Union, or information relating to any citizen living within a member
state of the EU and EEA (39). It is understood that the US
government is watching closely to see how GDPR functions and how it might adopt
such a law in future. It's likely such information will receive protection with
required deletion at the owner's request; the ramifications for information
stored about people will certainly apply.
Lack of Widespread “Open Access”
Research
institutes and businesses are often incredibly protective of their research
data, especially where mass profitability is involved. Yet there has been a
move in recent decades to call for subscription-free public access to
scientific data. Known as Open Access, not enough strides have been made in
this area, in some disciplines, that Big Data Analytics is not presently
experiencing its full potential and much data is restricted, meaning that -
although studies can call on more data and do more with it - there is still a
large amount of data that could prove useful in environmental science, held
privately with limited or no access. Although fear of handing over information
to competitors is part of the issue, other problems include lack of resources
to do so or a lack of awareness of how useful Open Access can be (32).
Together, Open Access and Big Data has the capacity to be a powerful force in
research science, but the latter is being held back by a lack of the former.
AGM
Link
Aucun commentaire:
Enregistrer un commentaire