ESS-DIVE

Deep Insight for Earth Science Data

CONTACT
  • DATA
    • SEARCH DATA
    • SUBMIT DATA
    • ACCESS DATA PORTALS
    • DATA PRESERVATION
    • TERMS OF USE
  • ABOUT
    • WHAT WE DO
    • OUR TEAM
    • OUR COMMUNITY
    • OUR COMMUNITY PROJECTS
    • OPPORTUNITIES
  • GET STARTED
    • GUIDE TO USING ESS-DIVE
    • DATA SUBMISSION GUIDELINES
    • PROPOSAL GUIDELINES
    • DATA REPORTING FORMATS
    • DATA USE AND CITATION
    • FAQs
  • LEARN MORE
    • NEWSROOM
    • WEBINAR LIBRARY
    • TRAINING EVENTS
    • TRAINING VIDEOS
    • PUBLICATIONS
    • ACRONYMS/GLOSSARY
  • DATA
    • SEARCH DATA
    • SUBMIT DATA
    • ACCESS DATA PORTALS
    • DATA PRESERVATION
    • TERMS OF USE
  • ABOUT
    • WHAT WE DO
    • OUR TEAM
    • OUR COMMUNITY
    • OUR COMMUNITY PROJECTS
    • OPPORTUNITIES
  • GET STARTED
    • GUIDE TO USING ESS-DIVE
    • DATA SUBMISSION GUIDELINES
    • PROPOSAL GUIDELINES
    • DATA REPORTING FORMATS
    • DATA USE AND CITATION
    • FAQs
  • LEARN MORE
    • NEWSROOM
    • WEBINAR LIBRARY
    • TRAINING EVENTS
    • TRAINING VIDEOS
    • PUBLICATIONS
    • ACRONYMS/GLOSSARY

Standardizing Water Quality Data with New ESS-DIVE Community Reporting Formats

January 2, 2022 by Dylan O'Ryan

Photo by Hans Reniers on Unsplash

Dylan O’Ryan is a Student Assistant with the Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository. He writes here about his experience with ESS-DIVE’s development and implementation of data reporting formats [community data standards] in collaboration with six teams of scientists in the US DOE Environmental System Science (ESS) community. Dylan first began working with ESS-DIVE as part of a Community College Internship (CCI), where he standardized existing water quality data using ESS-DIVE’s community data reporting formats.

Some of ESS-DIVE’s data reporting formats, such as that for soil and water quality, are specific to research domains. Other reporting formats are generalized to a wide range of data such as Comma Separated Value (CSV) files and sample collections. These standard reporting formats are designed to make data more Findable, Accessible, Interoperable, Reusable (FAIR) from the perspective of our ESS community scientists. Data reporting formats standardize data to enable creation of better tools that allow advanced search, integration, and visualization of data within and across multiple datasets. 

Kristin Boye, a Staff Scientist at SLAC National Accelerator Laboratory, developed a water/soil/sediment chemistry reporting format as part of her collaboration with ESS-DIVE. She developed this reporting format by synthesizing recommendations from other generalized reporting formats, such as CSV and FLMD (File-level Metadata), and incorporating community feedback on how to format water/soil/sediment chemistry data. 

Storm Drain Detectives is a community-based water quality monitoring program in Lodi, California. For the past seven years, I’ve helped the program measure water quality parameters such as DO, pH, temperature, and bacteria. This experience with water quality testing enabled me to better understand the datasets that I was converting for ESS-DIVE. I used the community water/soil/sediment chemistry data reporting format to convert existing datasets within the Lawrence Berkeley National Laboratory (LBNL) Watershed Function Scientific Focus Area (WFSFA) project. I converted water quality datasets where their metadata information was already published on ESS-DIVE, including ICP-MS, DIC/NPOC/TDN, Ammonia-N, Anion, and Isotope data. 

Here is the step-by-step process that details how I converted existing datasets to the water/soil/sediment chemistry reporting format [See Image 1 for workflow diagram]:

  • Retrieve the water quality data file and locate the associated metadata published on ESS-DIVE. 
  • Populate the methods file template. The methods file is where you store information on the samples’ methods of collection, analysis, storage, etc. I entered the information supplied by the data provider from the associated dataset metadata describing their methods. See Image 2 for an example of converted methods information to the reporting format methods file.
  • Populate the data file template. This data file is similar to most data files where you input sample information and measurements; however, this reporting format data file is designed to include information needed for future interpretation and reuse, such as: unique sample names, methods information (collection/analysis procedures, detection limits, analysis precision) as well as the data. The data file template also allows for standardized variable names and units across the files. Standardized names and units can be included in the term list. See Image 3 for an example of a converted data file from a data provider to the reporting format data file template. 
    • Note: I first filled in the methods information and header rows before populating the sample data. 
  • As part of this reporting format, you can choose to fill out an optional terminology file. The terminology file can include all terms that would benefit from additional description and definition (e.g., data flags or other codes used throughout the data and method files). We note that the terminology file is different from the required data dictionary file that is part of ESS-DIVE’s file-level metadata reporting format. In the data dictionary, you provide definitions of column or row names, and their units. The terminology file is specifically designed for terms that are not captured in the column or row names. See Image 4 for an example of a terminology file.

The water/soil/sediment chemistry reporting format was straightforward. I was able to catch on to using the template and requirements of the reporting format. Transferring datasets is easy once you understand the general structure. While converting these data files, I became faster with converting where it became quick to create a methods file and data file with over 200 samples within 30 minutes. 

Here are a few more tips and tricks related to converting a multitude of datasets:

  • You generally only need to create one methods file for a particular measurement (e.g., ICP-MS), where you would only need to adjust the data file to include the samples you tested. 
  • Similarly, the data file headers and associated terms can be repeated if there are no collection or analysis procedure changes. 

I found that utilizing ESS-DIVE’s reporting formats was straightforward and made the data easier to find, understand, and use in new ways. The converted datasets include unique sample names, contextual information describing the data (metadata), standardized formatting of missing values, and many more qualities that increase the usability of the data. The examples of converted water quality datasets are now being utilized by some WFSFA data providers in order to standardize their data and metadata.

Some other reporting formats that may help you standardize your data and metadata are CSV, File Level Metadata (FLMD), Sample Identifiers and Metadata, and Model Data Archiving Guidelines, which are high-level reporting formats that apply across multiple domains. The Leaf Gas Exchange reporting format, which is intended for leaf-level gas exchange data. The Soil Respiration reporting format, which is intended for soil respiration data and metadata. Hydrological Monitoring reporting format, which is designed for water parameters measured by in situ meters/probes. There are a couple of reporting formats in development: 16S Amplicon Sequencing and Locations Metadata. See Image 5 for a workflow for use of ESS-DIVE’s reporting formats. 

The ESS-DIVE team is available for questions and help for those who want to use the reporting formats. Please email ess-dive-support@lbl.gov or you can use the “Contact US” feature on the ESS-DIVE website.

Image 1: Workflow for Conversion of Reporting Format

 

Image 2: Example conversion of methods information to methods file

 

Image 3: Example conversion of datafile to reporting format data file template

 

Image 4: Example of terminology file

 

Image 5: Workflow for use of reporting formats

Filed Under: news

ESS-DIVE at AGU 2021

December 7, 2021 by lncore


The ESS-DIVE team is excited to present their work and connect with the Earth and Environmental Systems Science (ESS) community at the upcoming American Geophysical Union (AGU) Fall Meeting 2021. The event will take place in New Orleans, LA and online everywhere 13-17 December 2021. Several ESS-DIVE team members are presenting on relevant topics, ranging from best practices for data curation and publication to approaches to support metadata synthesis. ESS-DIVE will be involved with a total of 6 oral and eLightening presentations.

#AGU21 is the leading forum for advancing Earth and space science and leveraging this research toward solutions for societal challenges. The Earth and space science community is gathering both in person and virtually for this annual meeting to learn and collaborate around the theme of “Science is Society.” With more than 25,000 individuals from 100+ countries expected to attend representing the global Earth and space sciences community, the event will consist of inspiring plenary talks, cutting-edge science presentations and more.Most sessions will be recorded and available to this global community of researchers, scientists, educators, students, policymakers, partners, science enthusiasts, journalists, and communicators. With in-person and worldwide online participation, attendees will have numerous opportunities to network with government regulators, scientific visionaries, and industry thought-leaders. 

  • Madison Burrus will present Community Engagement Efforts to Encourage and Incentivize Data Archiving in the Environmental System Sciences Community during the poster session on Tuesday 14 December from 14:00 – 16:00 PT. 
  • Joan Damerow will present How do we make interdisciplinary sample data more FAIR (Findability, Accessibility, Interoperability, and Reusability)? on Wednesday, 15 December from 12:51-12:57 PT. 
  • Dylan O’Ryan is presenting Applying Community Data Reporting Formats to Open-Source Water Quality Data during the poster session on Thursday, 16 December 2021; 14:00 – 16:00 PT. 
  • Emily Robles is presenting Bringing more Tropical data to the table through the NGEE-Tropics data archive during the poster session on Thursday, 16 December 2021; 14:00 – 16:00 Pacific
  • Joan will also chair three sessions on Connecting Disciplines and Data in Earth and Environmental Synthesis Research: Enabling International and Interdisciplinary Data Discovery, Integration, and Reuse on Thursday, 16 December 2021; 14:00 – 16:00 (poster session), and Friday, 17 December from 7:45 – 9:00 (elightening), 10: 45 – 12:00 PT (oral session).
  • Deb Agarwal will present Enabling Citations of Large Numbers of Diverse Dataset on Friday December 17 from 07:57 – 08:00 PT. 
  • Robert Crystal-Ornelas will present Fundamentals for Collaborating on Research Projects Using GitHub on Friday, 17 December 2021 from 07:50 – 08:20 PT. He will also present Community Data Standards for More Reusable Data in Earth and Environmental Science during the poster session on Friday, 17 December 2021; 14:00 – 16:00 PT.
  • Emily Robles will present FAIR Dataset Metadata: An Analysis of Requirements across Environmental Science Data Repositories on Friday, 17 December 2021 from 12:48 –12:53 PT.
  • Shreyas Cholia will present Fostering Growth in the ESS-DIVE Repository on Friday December 17 from 14:03 – 14:06 PT.
  • Community Fund Partners Pamela Weisenhorn and Kathleen Beilsmith will present Applying Data Standards and Reproducible Workflows To Advance Earth System Science during the poster session on Friday, 17 December 2021; 08:03 – 08:06 Pacific.

 

ESS-DIVE is enthusiastic about the opportunity to engage in this collaborative and interdisciplinary event. The interactive nature of this event will serve as a platform to share research findings, discuss use cases, and more. The team looks forward to not only sharing their knowledge, but also gaining new insights and experiences. 

ESS-DIVE is funded by the Data Management program within the Earth and Environmental Systems Science Division under the DOE’s Office of Science Biological and Environmental Research program and is maintained by the Lawrence Berkeley National Laboratory.

Filed Under: Homepage Features, news

Community Data Workshop Offers Guidance in Environmental Data Management and Sharing

May 14, 2021 by Charuleka Varadharajan

A community of researchers to learn how critical DOE Environmental System Science (ESS) data is managed, stored, discovered.

 

 

 

 

 

 

 

 

We will host our first hands-on workshop dedicated to working with and fostering a community around Environmental System Science (ESS) data. The workshop, hosted online by the ESS-DIVE team at Lawrence Berkeley National Laboratory (LBNL), takes place Monday and Tuesday, May 24-25 from 9am-2pm PST (12pm-5 pm EST) and is free for all registrants.

The workshop is designed both to introduce newcomers to ESS-DIVE and to help those familiar with ESS-DIVE to sharpen their data practices. It includes discussions of ESS-DIVE’s present and future; instruction on querying, submitting, and describing ESS data; and hands-on tutorials for those both new and experienced with the repository. It is a valuable opportunity for personnel associated with projects funded by the DoE’s ESS program to learn how to archive data in ESS-DIVE and for ESS-DIVE to work with the community to make the process as easy as possible.

Environmental data, and the models and software that depend on it, have helped us gain an exponentially better understanding of the natural world in recent years. We are in the midst of a cultural and paradigm shift where open-access data increasingly provides the foundation on which scientific progress is built. In light of global challenges, such as climate change, it is more important than ever to have open and reliable data to make scientific breakthroughs and sound decisions. However, historically data has not been well documented, managed, stored, and reused. As such, ESS-DIVE is working with a community of DoE ESS-funded researchers to improve the long-term efficiency of data management and to maximize the value of ESS data.

At the same time, the complexity, diversity, and sheer volume of data in the ESS-DIVE repository will also grow, from terabytes of data just a few years ago to petabytes in the future. That volume places extra demands on the investigators who contribute data generated by DoE-funded projects. It also makes it more difficult for students, teachers, decision makers, and interested members of the public to discover and use critical publicly-funded data.

Many ESS projects are uniquely collaborative and interdisciplinary. They involve specialists and data across a range of environmental disciplines, including hydrology, ecology, climate, geology, geophysics, geochemistry, and microbiology. A key challenge of ESS-DIVE is serving such diverse and multidisciplinary data that is often connected by a given research question or location. Many types of data are required to address critical questions such as how ecosystems process carbon or how contaminants cycle through soil and water. ESS-DIVE will discuss use cases during the workshop to ensure that our reporting formats and tools support the researchers’ science goals.

To LBNL Senior Scientist and Data Science and Technology Department Head Deb Agarwal, who is the lead Principal Investigator of ESS-DIVE, the workshop represents a chance to broaden participation in an important collective endeavor.

“Data is a valuable research output in and of itself. ESS-DIVE was established to acknowledge the importance of sound environmental research data archiving and to make that data widely and readily available to anyone interested in using it.

The Community Data Workshop is an important opportunity for the ESS-DIVE team to introduce the repository to the community and to discuss upcoming features in development. There are many ESS project teams that are new to the repository and this workshop will help them get a head start on the process of effectively managing and archiving data.

We are really excited to bring together new and already engaged users to discuss how we can work together to archive their ESS data. We look forward to working together to make sure that ESS-DIVE is serving the needs of the ESS community.”

ESS-DIVE is funded by the Data Management program within the Earth and Environmental Systems Science Division under the DOE’s Office of Science Biological and Environmental Research program and is maintained by the Lawrence Berkeley National Laboratory.

Registration for the ESS-DIVE Community Data Workshop is closed. Please check in next year! To learn more about the workshop, or about ESS-DIVE in general, please visit the workshop event page or contact ess-dive-support@lbl.gov.

Filed Under: news

Insights from the 2021 ESIP Winter Meeting

February 2, 2021 by rcrystalornelas

The Earth Science Information Partners (ESIP) community gathered virtually for the winter 2021 meeting to learn and collaborate around the theme of “Leading Innovation in Earth Science Data Frontiers.” Inspiring plenary talks addressed building a culture of innovation and the exhilarating, but also lonely and frightening, nature of exploring new frontiers in science. There was much discussion around machine learning and AI, and work needed towards assessing and achieving AI-readiness of data across agencies and industries.  

Joan Damerow and Rob Crystal-Ornelas from ESS-DIVE attended the meeting, and recap some of the highlights and resources that may be useful for the ESS-DIVE community.  

  • Joan helped organize a kickoff meeting for the new ESIP Physical Samples Curation Cluster, which will focus initial efforts on identifying high-level recommendations to journal publishers on providing FAIR sample data. We will be meeting on a regular basis to outline core/basic recommendations for sample identifiers and metadata, relevant across disciplines.  
  • A session on Linking Knowledge in the Earth and Space Sciences discussed how knowledge systems bring together data in a meaningful way to answer useful scientific questions. At ESS-DIVE, we want to know how you would like to search, link, integrate, and reuse data within a larger network. And to ensure that your related data is effectively linked when published. 
  • Plenary talks on Innovation and New Frontiers in ML/AI introduced real-world examples of how ML/AI can be applied to a range of research priorities, such as monitoring crops and identifying areas of food insecurity in developing countries. Speakers emphasized the continued need for humans in the loop to characterize remote sensing and other data types for ML/AI approaches. Training data is still hard to get, very time-intensive and expensive.

  • The Innovating in a Documentation Ecosystem session leveraged the experiences of the audience to identify needs to more effectively link related data within and across agencies and organizations. The ingredients for a connected system involve community engagement, innovative tools and infrastructure, use of metadata standards and conventions, and persistent identifiers. We learned about one particularly useful tool, the metadata editor (mdEditor), and the important role of open APIs. Our next step is to address the challenge of community coordination and work to convince stakeholders to invest in more connected data ecosystems that improve data management efficiency and reusability of data.  
  • During plenary talks on Innovation in Open Search and Discovery, we heard from representatives of Schema.org, Google, and Google Dataset Search. The audience promoted the metadata element “variableMeasured” as one of the most important within schema.org for supporting discoverability of datasets. Natasha Noy introduced work exploring the content of Google Dataset Search and how people are searching for data. 

  • We explored data publication workflows from geoscience researchers to data repositories, and journal publications to identify problem areas. Our next step is to communicate and collaborate with journal publishers towards streamlining the data publication process, and ensure that data is open and useful upon publication.
  • A session called “Jupyter Notebooks: Harnessing the full potential” introduced participants to the many ways that the web interface Jupyter can be used for open source code development. We heard about examples of scientists using the Jupyter ecosystem to do everything from create interactive data dashboards, to publishing markdown-style books, to authoring manuscripts for peer reviewed journals.
  • In the final organized session of ESIP’s 2021 winter meeting, attendees discussed the challenges and opportunities for defining AI-ready data. The session moderators highlighted the importance of AI-read data for efficiently using novel computing technologies like exascale computing while also recognizing that the definition of AI-ready data is a work in progress. Some of the potential elements of an AI-ready dataset includes: data completeness, documentation (through metadata and data dictionaries), and clear end user licenses.

Filed Under: news

ESS-DIVE at AGU 2020

November 11, 2020 by Charuleka Varadharajan

The ESS-DIVE team is looking forward to participating in the 2020 AGU Fall Meeting. Below are several abstracts that we will be presenting, we look forward to (virtually) seeing you there!

 

Addressing Model Data Archiving Needs for the Department of Energy’s Environmental System Science Community (IN008-01)

 

Presenter: Maegen Simmonds
Presentation Type: eLightning 
Session Date and Time*: Tuesday, 8 December 2020; 10:30 – 10:33 Pacific
Session Number and Title: IN008 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? III eLightning 
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/755236

Abstract

Maegen Simmonds1, William J Riley1, Mario Melara2, Shreyas Cholia1 and Charuleka Varadharajan1, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, United States 

Researchers in the Department of Energy’s Environmental System Science (ESS) program use a variety of models to advance robust, scale-aware predictions of terrestrial and subsurface ecosystems. ESS projects typically conduct field observations and experiments coupled with modeling exercises using a model-experimental (ModEx) approach that enables iterative co-development of experiments and models, and ensures that experimental data needed to parameterize and test models are collected. Thus, preserving the “model data” comprising the outputs from simulations, as well as driving, parameterization and validation data with associated codes is becoming increasingly important. The Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) repository currently stores all types of data associated with ESS projects; however, it has not yet been optimized for ingesting and serving large data volumes associated with model outputs. Furthermore, we have lacked community consensus on which model data are scientifically useful to archive. Thus, to scale and optimize ESS-DIVE for model data, we surveyed and interviewed the ESS community to identify the needs for archiving, sharing, and utilizing model data, and to begin developing archiving guidelines to ensure that archived data are scientifically useful, findable, and accessible. Here, we present the results of the survey and the proposed guidelines. This initial assessment of the community needs is an important step in supporting ESS-DIVE’s long-term vision to broadly enable data-model integration, and knowledge generation from model and observational data. This vision will be achieved through close partnerships with the ESS community.

 

 

Connecting Environmental Systems Science and Digital Library Practices (IN008-02)

 

Presenter: Joan Damerow
Presentation Type: eLightning 
Session Date and Time*: Tuesday, 8 December 2020; 10:33 – 10:36 Pacific
Session Number and Title: IN008 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? III eLightning
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/73618

Abstract

Joan E Damerow1, Charuleka Varadharajan1, Kristin Boye2, Madison Burrus1, K. Dana Chadwick3, Shreyas Cholia1, Robert Crystal-Ornelas1, Kim S Ely4, Valerie C Hendrix1, Matthew B. Jones5, Christopher S. Jones5, Zarine Kakalia1, Ken M Kemner6, Annie B Kersting7, Katharine Maher8, Mario Melara9, Nancy Shiao-Lynn Merino10, Fianna O’Brien1, Zach Perzan11, Emily Robles1, Cory Snavely12, Patrick Sorensen13, James Stegen14, Pamela Weisenhorn15, Karen Whitenack1, Mavrik Zavarin16 and Deb Agarwal17, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)SLAC National Acceleratory Laboratory, Stanford Synchrotron Radiation Lightsource, Menlo Park, CA, United States, (3)Stanford University, Earth System Science, Stanford, CA, United States, (4)Brookhaven National Laboratory, Environmental and Climate Sciences Department, Upton, NY, United States, (5)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (6)Argonne Natl Lab, Argonne, IL, United States, (7)LLNL, Livermore, CA, United States, (8)Stanford-Geology & Env Science, Stanford, CA, United States, (9)Lawrence Berkeley National Laboratory, Berkeley, United States, (10)Lawrence Livermore National Laboratory, Livermore, United States, (11)Stanford University, Earth Systems Science, Stanford, CA, United States, (12)Lawrence Berkeley National Laboratory, NERSC, Berkeley, CA, United States, (13)Lawrence Berkeley National Laboratory, Earth and Environmental Sciences, Berkeley, CA, United States, (14)Pacific Northwest National Laboratory, Richland, WA, United States, (15)Argonne National Laboratory, Argonne, United States, (16)Lawrence Livermore National Laboratory, Livermore, CA, United States, (17)LBNL, Berkeley, CA, United States

 

The U.S. Department of Energy’s (DOE’s) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) stores and publicly distributes data from observational, experimental, and modeling research funded by the DOE’s Environmental Systems Science activity. The diversity of data and interdisciplinary nature of projects presents challenges in developing recommendations for data management, reporting, and publication. Part of our role as a community-focused data repository is to synthesize, interpret, and make good data curation practices easier and more useful for our community. As representatives of Environmental Systems Science researchers, we can also provide valuable feedback within the informatics community and influence existing practices to better support interdisciplinary science. 

In this presentation, we demonstrate a community-focused approach in connecting our scientists with best practices for data curation and publication developed in broader informatics and digital library communities. For example, we conducted a pilot test involving many of our scientific projects on the use of persistent identifiers for physical samples–specifically, International Geo/General Sample Numbers (IGSNs). We compared existing sample-related templates and shared vocabulary terms, and evaluated the experience of users to more efficiently describe biological and geological samples from interdisciplinary studies. We explore other challenges encountered as a broad, interdisciplinary repository, such as efficiently curating interdisciplinary data types, ensuring that data is FAIR and of high quality, and that authors receive appropriate credit for contributing quality datasets. Overall, the success of our repository relies on our ability to support specific community needs, and incorporate practices that help maximize the value of Environmental Systems Science data now and in the future.  

 

The ESS-DIVE repository and next steps toward a usable, trusted, and FAIR repository (IN0008-03)

 

Presenter: Deb Agarwal
Presentation Type: eLightning
Session Date and Time*: Tuesday, 8 December 2020; 10:36 – 10:39
Session Number and Title: IN008 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? III eLightning
Session Link:  https://agu.confex.com/agu/fm20/prelim.cgi/Paper/772655

Abstract

Deb Agarwal1, Shreyas Cholia1, Charuleka Varadharajan1, Valerie C Hendrix1, Joan E Damerow1, Madison Burrus1, Robert Crystal-Ornelas1, Hesham Elbashandy1, Emily Robles1, Fianna O’Brien1, Zarine Kakalia1, Mario Melara2, William J Riley1, Cory Snavely3, Makayla Shepherd2, Maegen Simmonds4, Karen Whitenack1, Matthew B. Jones5, Christopher S. Jones5 and Peter Slaughter5, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, United States, (3)Lawrence Berkeley National Laboratory, NERSC, Berkeley, CA, United States, (4)University of California Davis, Davis, CA, United States, (5)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States

The US Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository is in its third year of operation. The repository focus is  on three areas of development: expanding adoption and use by ESS Users, standardization of data, and support for projects providing data to the repository. Our approach is designed around user experience methods and involves significant discussion and involvement of the community. The priorities of the repository are continually revised and refined based on input from the community. 

Our current focus is on expanding the user-base and functionality of ESS-DIVE through five key innovations: (1) understand user needs; (2) support for early data archiving by projects;  (3) reaching a broader portion of the ESS community; (4) support search of extracted ESS-DIVE data with a fusion database; and (5) federation with other repositories. We are focused on providing a scalable, robust repository and long-term curation of ESS data that adhere to Findable, Accessible, Interoperable, and Reusable (FAIR) principles, with the goal of increasing the ease and capacity of storing data in the repository. A key goal is enhancing usability of the data. For example, many of the projects contributing data to ESS-DIVE have large teams, last many years, and generate a large number of data packages. We are working with our community to evaluate the available methods of  providing usable citations for large subsets of the data from a project. 

Our end goal is  to have a repository that is trusted by the community and that is the preferred storage facility for data generated by the DOE ESS program and the preferred provider of ESS data. One challenge is that FAIR principles are designed to address the needs of the data user, and largely ignore the needs of the data provider. The CoreTrustSeal is not yet well known so there is no pressure from our user community or funders to complete the application process. However, now that at least one repository based on the same software, MetaCat, has been certified the process might be less work for ESS-DIVE. As publishers move to require CoreTrustSeal certification, we expect to see increased pressure to obtain the certification.

 

Tackling the Challenges of Earth Science Data Synthesis: Insights from (meta)data standards approaches  (IN012-07)

 

Presenter: Valerie Hendrix
Presentation Type: Oral 
Session Date and Time*: Tuesday, 8 December 2020; 20:54 – 20:58 Pacific
Session Number and Title: IN012 – Data and Information Services for Interdisciplinary Research and Applications in Earth Science II
Location: Virtual
Session link:  https://agu.confex.com/agu/fm20/prelim.cgi/Paper/749418 

Abstract

Valerie C Hendrix, Danielle S Christianson, Charuleka Varadharajan, Madison Burrus, Shreyas Cholia, You-Wei Cheah, Housen Chu, Robert Crystal-Ornelas, Joan E Damerow, Zarine Kakalia, Fianna O’Brien, Gilberto Pastorello, Emily Robles and Deb Agarwal, Lawrence Berkeley National Laboratory, Berkeley, CA, United States

 

Diverse, complex data are a significant component of Earth Science’s “big data” challenge. Some earth science data, like remote sensing observations, are well understood, are uniformly structured, and have well-developed standards that are adopted broadly within the scientific community. Unfortunately, for other types of Earth Science data, like ecological, geochemical and hydrological observations, few standards exist and their adoption is limited. The synthesis challenge is compounded in interdisciplinary projects in which many disciplines, each with their own cultures, must synthesize data to solve cutting edge research questions.

Data synthesis for research analysis is a common, resource intensive bottleneck in data management workflows. We have faced this challenge in several U.S. Department of Energy research projects in which data synthesis is essential to addressing the science. These projects include AmeriFlux, Next Generation Ecosystem Experiment (NGEE) – Tropics, Watershed Function Science Focus Area, Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), and a DOE Early Career project using data-driven approaches to predict water quality.

In these projects, we have taken a range of approaches to support (meta)data synthesis. At one end of the spectrum, data providers apply well-defined standards or reporting formats before sharing their data, and at the other, data users apply standards after data acquisition. As these projects continue to evolve, we have gained insights from these experiences, including advantages and disadvantages, how project history and resources led to choice of approach, and enabled data harmonization. In this talk, we discuss the pros and cons of the various approaches, and also present flexible applications of standards to support diverse needs when dealing with complex data.

 

 

Letting the community lead the way to data integration: Data standards and documentation developed by domain experts and the ESS-DIVE repository (IN015-07)

 

Presenter: Rob Crystal-Ornelas
Presentation Type: Oral
Session Date and Time*: Wednesday, 9 December 2020; 17:54 – 17:58
Session Number and Title: IN015 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? II
Session Link: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/712622 

Abstract

Robert Crystal-Ornelas1, Charuleka Varadharajan1, Ben P Bond-Lamberty2, Kristin Boye3, Madison Burrus1, Shreyas Cholia1, Joan E Damerow1, Ranjeet Devarakonda4, Hesham Elbashandy1, Kim S Ely5, Amy E Goldman2, Susan L Heinz6, Valerie C Hendrix1, Christopher S. Jones7, Matthew B. Jones7, Zarine Kakalia1, Mario Melara8, Fianna O’Brien1, Stephanie Pennington9, William J Riley1, Emily Robles1, Alistair Rogers5, Makayla Shepherd8, Maegen Simmonds1, Peter Slaughter7, Terri Velliquette10, Pamela Weisenhorn11, Karen Whitenack1 and Deb Agarwal12, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Pacific Northwest National Laboratory, Richland, WA, United States, (3)SLAC National Accelerator Laboratory, Stanford Synchrotron Radiation Lightsource, Menlo Park, CA, United States, (4)Oak Ridge National Laboratory, Oak Ridge, TN, United States, (5)Brookhaven National Laboratory, Environmental and Climate Sciences Department, Upton, NY, United States, (6)Oak Ridge National Laboratory, Kingston, TN, United States, (7)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (8)Lawrence Berkeley National Laboratory, Berkeley, United States, (9)Pacific Northwest National Laboratory, Joint Global Change Research Institute, College Park, MD, United States, (10)Oak Ridge National Laboratory, Oak Ridge, United States, (11)Argonne National Laboratory, Argonne, United States, (12)LBNL, Berkeley, CA, United States

 

Earth and Environmental Science data repositories are tasked with storing data that comes in a wide range formats. Many repositories, including the US Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) repository, see data integration and synthesis as a key step in harnessing the power of the large datasets contained within the repositories. However, the lack of standardization in data contributed by users can prohibit data reuse and integration. 

To kickstart the generation of reporting standards, the ESS-DIVE repository funded six community partners from national labs around the US to develop 7 metadata/data related standards. In this talk, we begin by describing how our community partners achieved consensus on standards for some of the most common data types uploaded to ESS-DIVE. One challenge community partners faced was providing robust documentation so that any data producer could adopt the standards prior to uploading their data to ESS-DIVE. Documentation also needed to be dynamic so that when standards required modifications it was relatively easy to do so.

To overcome this challenge, ESS-DIVE has begun to implement a software versioning-style framework to allow for data standards to be transparently developed and updated. When standards are expanded or updated by community consensus, our versioning framework allows a clear view of any modifications. Data uploaded to the ESS-DIVE repository that adhere to these community standards will be more interoperable and reusable, facilitating synthesis across datasets. These standardized data contributions to ESS-DIVE would then enable a deeper integrated search of the individual data files within the repository through the ESS-DIVE “fusion database”. Ultimately, by developing standards, providing clear documentation, and a transparent way of updating standards, ESS-DIVE provides a sustainable path toward data integration through community-driven standard development.

 

Incorporating Data Management Best Practices into Scientific Workflows (IN016-07)

 

Presenter: Zarine Kakalia
Presentation Type: eLightning
Session Date and Time*: Wednesday, 9 December 2020; 20:48 – 20:51
Session Number and Title: IN016 – A Call to Action for FAIR, Reproducible, and Transparent Science: Analytical Code, Workflows, Services, Models, and Conclusions eLightning
Session Link: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/684965

Abstract

Zarine Kakalia1, Charuleka Varadharajan1, Madison Burrus1, Danielle S Christianson1, Robert Crystal-Ornelas1, Joan E Damerow1, Dipankar Dwivedi1, Boris Faybishenko1, Valerie C Hendrix1, Emily Robles1, Roelof Versteeg2, Karen Whitenack1 and Deb Agarwal1, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Subsurface Insights, Hanover, NH, United States

 

The U.S. Department of Energy’s Watershed Function Scientific Focus Area (SFA) in the East River, Colorado generates and uses interdisciplinary data from hydrological, geochemical, geophysical, microbiological and remote sensing observations. The project has developed an end-to-end infrastructure to acquire the SFA’s multi-scale data, generate data products, and enable internal and public data access. Maintaining FAIR data throughout this pipeline is challenging due to the diversity of the data and scientific workflows. To ensure data pipelines generate integratable products and meet repository standards, the SFA Data Management Team engages with field scientists to incorporate best data management practices throughout the scientific workflow. SFA data is published through the DOE’s Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository, and thus adopts standards and metadata quality criteria required for publication through ESS-DIVE.

To overcome the challenge of acquiring critical metadata from diverse data streams, the SFA Data Management Team developed an integrated field-data workflow. Field scientists are required to use persistent location identifiers for long-term sites and register field locations prior to site creation. Scientists are encouraged to use International Geo Sample Numbers (IGSNs), which are persistent identifiers for their samples that are recommended by the ESS-DIVE repository. The SFA has completed two IGSN pilot tests with ESS-DIVE to begin incorporating sample tracking into the end-to-end data pipeline. This required extensive time and education on behalf of the field team, proving that shifting scientists’ processes to curate better data requires substantial effort. Finally, scientists are asked to provide sensor data, following practices adopted by the DOE’s Ameriflux network. Datasets are reviewed and compiled internally, and final data products and the associated metadata are published on ESS-DIVE.

This integrated workflow makes it easier to apply data to downstream analysis, synthesis and models. We found that developing project data/metadata standards and workflows in line with repository requirements is an effective way to develop FAIR and transparent data practices throughout the field-data pipeline. 

 

Optimizing the efficiency of metadata curation in large scale data repositories  (IN047-09)

 

Presenter: Emily Robles
Presentation Type: eLightning 
Session Date and Time*: Thursday, 17 December 2020; 04:24 – 04:27 Pacific
Session Number and Title: IN047 – Recent Advancements in Earth Science Data Discovery and Metadata Stewardship Practices
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/767519
 

Abstract

Emily Robles1, Charuleka Varadharajan1, Shreyas Cholia1, Valerie C Hendrix1, Joan E Damerow1, Madison Burrus1, Robert Crystal-Ornelas1, Hesham Elbashandy1, Zarine Kakalia1, Mario Melara2, Fianna O’Brien1, Makayla Shepherd2, Maegen Simmonds3, Karen Whitenack1, Matthew B. Jones4, Christopher S. Jones4, Peter Slaughter4 and Deb Agarwal5, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, United States, (3)University of California Davis, Davis, CA, United States, (4)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (5)LBNL, Berkeley, CA, United States

The Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository stores highly diverse Earth and environmental science data generated by projects funded by the U.S. Department of Energy (DOE). A system of metadata quality standards was developed through extensive community collaboration to ensure the data submitted to ESS-DIVE remain findable, accessible, interoperable, and reproducible (FAIR) for data users. However, ongoing implementation of these checks requires a metadata review process capable of scaling with the growth of the repository as increasing emphasis is placed on the importance of data archival within the environmental sciences.

To address this challenge, ESS-DIVE created a robust data package review workflow incorporating both automated and manual checks for each data package submitted for publication. A suite of automated metadata quality FAIR checks was developed by the National Center for Ecological Analysis and Synthesis (NCEAS) and tailored to fit ESS-DIVE needs through research into metadata best practices, review of journal metadata requirements, and community feedback. The results are compiled into Metadata Quality Reports, which provide instantaneous feedback to both the data contributor and ESS-DIVE reviewers on problem areas within the metadata upon submission. Reviewers then carry out manual checks focused on metadata content and complete post-review assessments that collect the length of time each review takes. Standardized feedback responses are generated by both series of checks and are used by the reviewer to collaborate 1:1 with contributors until the data package is eligible for publication.

This system has improved the quality of ESS-DIVE data while decreasing review time by ~60% from the start of implementation. The integration of automation allows our team members to focus efforts on the content-oriented manual metadata checks, which are the most commonly failed metadata requirements. Post-review assessments inform future automation efforts to continuously increase efficiency. This system of metadata review will sustain and support higher volumes of publication requests, ensuring that metadata quality standards are enforced throughout the continued growth of the ESS-DIVE repository. 

 

 

Increasing visibility of historical datasets through modern repository practices (IN047-10)

 

Presenter: Madison Burrus
Presentation Type: eLightning 
Session Date and Time*: Thursday, 17 December 2020; 04:27 – 04:30 Pacific
Session Number and Title: IN047 – Recent Advancements in Earth Science Data Discovery and Metadata Stewardship Practices
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/771167 

Abstract

Madison Burrus1, Fianna O’Brien1, Charuleka Varadharajan1, Valerie C Hendrix1, Shreyas Cholia1, Hesham Elbashandy1, Jannean Elliott2, Christopher S. Jones3, Matthew B. Jones3, Zarine Kakalia1, Emily Robles1, Crystal Sherline2, Peter Slaughter3, Cory Snavely4, Sara Studwell5, Karen Whitenack1 and Deb Agarwal1,6, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Department of Energy Oak Ridge, Oak Ridge, United States, (3)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (4)Lawrence Berkeley National Laboratory, NERSC, Berkeley, CA, United States, (5)Department of Energy Oak Ridge, Oak Ridge, TN, United States, (6)LBNL, Berkeley, CA, United States

 

The Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) repository preserves, expands access to, and improves usability of Earth and environmental scientific data. Amongst several efforts to improve the visibility of ESS-DIVE data, we’ve adapted the “Portals” feature from the National Center for Ecological Analysis and Synthesis (NCEAS) Metcat platform, providing our repository users a space to showcase their custom data collections. 

Here, ESS-DIVE demonstrates the utility of Portals for data discovery using the legacy data collection of Carbon Dioxide Information Analysis Center (CDIAC) datasets. CDAIC was a DOE climate-change data archive containing high-value fossil fuel emission and vegetation response data that ceased operations in 2017. Originally the CDIAC data was available through web pages with limited metadata, which limited their discoverability to web search engines. When ESS-DIVE took on the responsibility of maintaining these decades worth of vital climate change data, we had the opportunity to increase the discoverability of these datasets using a modern, manageable user interface.

In collaboration with the DOE’s Office of Science, Technology and Information (OSTI), we enhanced the CDIAC metadata previously obscured from users and coupled the datasets and metadata into packages on ESS-DIVE. Then we created a portal to view all CDIAC data packages and transferred in project information from the original webpages, providing an archive-centric view of CDIAC data. Portals are a permanent feature in ESS-DIVE that any user can leverage to create custom, branded landing pages about their research topic with any related datasets published on ESS-DIVE.

As an interdisciplinary archive for earth science data, the preservation and modernization of data previously held by CDIAC was paramount for ESS-DIVE. Using Portals, we could increase the findability and accessibility of data as well as through metadata improvements and the CDIAC Portal on ESS-DIVE. 

 

Community Fund Partners

Making leaf physiology FAIR: a new standard for leaf-level gas exchange data and metadata (IN045-03) 

Presenter: Kim Ely (Brookhaven National Lab)
Presentation Type: Oral
Session Date and Time*: Wednesday, 16 December 2020; 08:38 – 08:42 Pacific
Session Number and Title: IN045 – Improving Infrastructure for Trustworthy Digital Repositories to Enable Current and Future Use of Open Data in Developed and Developing Countries II
Session URL: https://agu.confex.com/agu/fm20/meetingapp.cgi/Paper/695004
 

Abstract 

With the advancement of ecological data archiving, there is an increased awareness of the FAIR principles, a call to improve Findability, Accessibility, Interoperability and Reusability of data. A particular challenge in meeting these goals is presented by long tail data; low volume, diverse data types with no widely used community standards. Leaf-level gas exchange data provide mechanistic understanding of plant and ecosystem fluxes of carbon and water. These data yield important parameterizations for terrestrial biosphere models and are necessary to understand the response of plants to global change. Collection of these data is both specialist and time consuming, and individual studies generally focus on limited species or restricted geographic regions. The high value of these data is recognized as evidenced by many publications that reuse and synthesize gas exchange data, however there are currently no published standards in use to facilitate data re-use, making enhanced use of gas exchange data by the community challenging and somewhat ad hoc.

We have developed a standard for leaf-level gas exchange data and metadata to provide guidance to data contributors on how to store data in data repositories to maximize the value of that data and facilitate efficient data re-use. For data users, the standard will expand the capacity of data repositories to optimize data search and extraction, and how ready those data are for incorporation into synthesis products. The standard encompasses metadata elements, standard vocabularies, required variables and a crosswalk across the outputs of common instruments to enable accurate data compilation. Currently the standard covers survey measurements, dark respiration, CO2 and light response curves, and parameters derived from those measurements.

The standard is being developed for the U.S. Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository. However, development of the standard has considered global needs for these data, and subject matter experts from institutions around the world have been invited to review and contribute to the standard. We hope that this broad community engagement will lead to wide acceptance and uptake of the published standard across the leaf-level gas exchange community

 

Filed Under: news, Uncategorized

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • Next Page »