Earth and environmental systems science (ESS) research is evidence-based and relies on the analysis and modeling of diverse and multi-scale datasets. The volume of ESS data has risen sharply in recent years, with more data gathered by the minute. This may come as positive news—however, much of this data remains unarchived, difficult to access, and even unusable. Among other challenges, many scientists lack the resources and ability to archive and share their data using consistent methods. The Earth science community has moved toward adopting Findable, Accessible, Interoperable, and Reusable (FAIR) data principles to solve this problem.
A new paper authored by the Earth and Environmental Systems Science for a Virtual Ecosystem (ESS-DIVE) team seeks to address these issues and presents 11 novel reporting formats for organizing and describing various types of Earth science data in public databases. Published in Scientific Data, the ready-to-use formats are available in the ESS-DIVE data repository, as well as on ESS-DIVE’s community GitHub space. ESS-DIVE provides a centralized location to store and share open and standardized datasets to enhance scientific collaboration and data reuse.
“This publication is the result of a dedicated and collaborative effort across six U.S. DOE national labs, and a testament to the value of computational and Earth science researchers partnering for positive impact,” said Charuleka Varadharajan, a Scientist in the Earth and Environmental Sciences Area at Berkeley Lab, and lead of its Earth AI and Data program. “These reporting formats come at a time when they are urgently needed to enable our ability to extract insights from complex environmental systems data.”
A community effort for a FAIRer future
Supported by the U.S. Department of Energy (DOE) Office of Science Biological and Environmental Program, ESS-DIVE brought together teams of scientists across the DOE National Lab Network with the aim of helping researchers within its ESS community provide more standardized and well-described data. Together, they identified and created instructions and templates for formatting diverse environmental data types. The community-centered process involved reviewing over 100 existing data standards, conventions, or other reporting formats—and receiving input from 247 scientists representing 100+ institutions.
“A highlight of the reporting format development process was monthly meetings that convened many of the scientists leading the reporting format development process,” says Robert Crystal-Ornelas. “During these working sessions, we could harmonize on key terminology relevant across reporting formats, and share successes and challenges with the broader reporting format group.”
Covering data types commonly used by DOE, some of the reporting formats are intended to standardize commonly used descriptions about the data, referred to as “metadata,” such as information about the dataset locations and samples from where the data were generated. Others provide instructions for formatting and describing data files such as the comma-separated value (CSV) format or guidelines for organizing model data. The other reporting formats are more domain-specific and focused on data types of importance to ESS research such as leaf-level gas exchange, soil respiration, water and sediment chemistry, hydrologic monitoring, and microbial amplicon abundances.
Crystal-Ornelas also stated that the scale of the outreach and input received on the reporting formats underscores how big a need there was for this type of standardization within Earth and environmental sciences. He’s excited to see the formats used by researchers around the world, including inputs from across and outside of the National Lab Network.
Shreyas Cholia, Group Leader for the Integrated Data Systems Group (Scientific Data Division) at Berkeley Lab, said: “ESS-DIVE is designed as a scalable framework that allows data providers to contribute standardized, structured, and high-quality data. The reporting formats are a vitally important contribution that supports long-term data stewardship. reproducible research, and data standardization across the community.”
This collaborative approach paves the way for future innovation around FAIRer data and may serve as a model for other organizations that would like to develop community (meta)data reporting formats for other types of data.