ESS-DIVE

Deep Insight for Earth Science Data

CONTACT
  • DATA
    • SEARCH DATA
    • SUBMIT DATA
    • ACCESS DATA PORTALS
    • DATA PRESERVATION
    • TERMS OF USE
  • ABOUT
    • WHAT WE DO
    • OUR TEAM
    • OUR COMMUNITY
    • OUR COMMUNITY PROJECTS
    • OPPORTUNITIES
  • GET STARTED
    • GUIDE TO USING ESS-DIVE
    • DATA SUBMISSION GUIDELINES
    • PROPOSAL GUIDELINES
    • DATA REPORTING FORMATS
    • DATA USE AND CITATION
    • FAQs
  • LEARN MORE
    • NEWSROOM
    • WEBINAR LIBRARY
    • TRAINING EVENTS
    • TRAINING VIDEOS
    • PUBLICATIONS
    • ACRONYMS/GLOSSARY
  • DATA
    • SEARCH DATA
    • SUBMIT DATA
    • ACCESS DATA PORTALS
    • DATA PRESERVATION
    • TERMS OF USE
  • ABOUT
    • WHAT WE DO
    • OUR TEAM
    • OUR COMMUNITY
    • OUR COMMUNITY PROJECTS
    • OPPORTUNITIES
  • GET STARTED
    • GUIDE TO USING ESS-DIVE
    • DATA SUBMISSION GUIDELINES
    • PROPOSAL GUIDELINES
    • DATA REPORTING FORMATS
    • DATA USE AND CITATION
    • FAQs
  • LEARN MORE
    • NEWSROOM
    • WEBINAR LIBRARY
    • TRAINING EVENTS
    • TRAINING VIDEOS
    • PUBLICATIONS
    • ACRONYMS/GLOSSARY

ESS-DIVE at AGU 2020

November 11, 2020 by Charuleka Varadharajan

The ESS-DIVE team is looking forward to participating in the 2020 AGU Fall Meeting. Below are several abstracts that we will be presenting, we look forward to (virtually) seeing you there!

 

Addressing Model Data Archiving Needs for the Department of Energy’s Environmental System Science Community (IN008-01)

 

Presenter: Maegen Simmonds
Presentation Type: eLightning 
Session Date and Time*: Tuesday, 8 December 2020; 10:30 – 10:33 Pacific
Session Number and Title: IN008 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? III eLightning 
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/755236

Abstract

Maegen Simmonds1, William J Riley1, Mario Melara2, Shreyas Cholia1 and Charuleka Varadharajan1, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, United States 

Researchers in the Department of Energy’s Environmental System Science (ESS) program use a variety of models to advance robust, scale-aware predictions of terrestrial and subsurface ecosystems. ESS projects typically conduct field observations and experiments coupled with modeling exercises using a model-experimental (ModEx) approach that enables iterative co-development of experiments and models, and ensures that experimental data needed to parameterize and test models are collected. Thus, preserving the “model data” comprising the outputs from simulations, as well as driving, parameterization and validation data with associated codes is becoming increasingly important. The Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) repository currently stores all types of data associated with ESS projects; however, it has not yet been optimized for ingesting and serving large data volumes associated with model outputs. Furthermore, we have lacked community consensus on which model data are scientifically useful to archive. Thus, to scale and optimize ESS-DIVE for model data, we surveyed and interviewed the ESS community to identify the needs for archiving, sharing, and utilizing model data, and to begin developing archiving guidelines to ensure that archived data are scientifically useful, findable, and accessible. Here, we present the results of the survey and the proposed guidelines. This initial assessment of the community needs is an important step in supporting ESS-DIVE’s long-term vision to broadly enable data-model integration, and knowledge generation from model and observational data. This vision will be achieved through close partnerships with the ESS community.

 

 

Connecting Environmental Systems Science and Digital Library Practices (IN008-02)

 

Presenter: Joan Damerow
Presentation Type: eLightning 
Session Date and Time*: Tuesday, 8 December 2020; 10:33 – 10:36 Pacific
Session Number and Title: IN008 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? III eLightning
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/73618

Abstract

Joan E Damerow1, Charuleka Varadharajan1, Kristin Boye2, Madison Burrus1, K. Dana Chadwick3, Shreyas Cholia1, Robert Crystal-Ornelas1, Kim S Ely4, Valerie C Hendrix1, Matthew B. Jones5, Christopher S. Jones5, Zarine Kakalia1, Ken M Kemner6, Annie B Kersting7, Katharine Maher8, Mario Melara9, Nancy Shiao-Lynn Merino10, Fianna O’Brien1, Zach Perzan11, Emily Robles1, Cory Snavely12, Patrick Sorensen13, James Stegen14, Pamela Weisenhorn15, Karen Whitenack1, Mavrik Zavarin16 and Deb Agarwal17, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)SLAC National Acceleratory Laboratory, Stanford Synchrotron Radiation Lightsource, Menlo Park, CA, United States, (3)Stanford University, Earth System Science, Stanford, CA, United States, (4)Brookhaven National Laboratory, Environmental and Climate Sciences Department, Upton, NY, United States, (5)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (6)Argonne Natl Lab, Argonne, IL, United States, (7)LLNL, Livermore, CA, United States, (8)Stanford-Geology & Env Science, Stanford, CA, United States, (9)Lawrence Berkeley National Laboratory, Berkeley, United States, (10)Lawrence Livermore National Laboratory, Livermore, United States, (11)Stanford University, Earth Systems Science, Stanford, CA, United States, (12)Lawrence Berkeley National Laboratory, NERSC, Berkeley, CA, United States, (13)Lawrence Berkeley National Laboratory, Earth and Environmental Sciences, Berkeley, CA, United States, (14)Pacific Northwest National Laboratory, Richland, WA, United States, (15)Argonne National Laboratory, Argonne, United States, (16)Lawrence Livermore National Laboratory, Livermore, CA, United States, (17)LBNL, Berkeley, CA, United States

 

The U.S. Department of Energy’s (DOE’s) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) stores and publicly distributes data from observational, experimental, and modeling research funded by the DOE’s Environmental Systems Science activity. The diversity of data and interdisciplinary nature of projects presents challenges in developing recommendations for data management, reporting, and publication. Part of our role as a community-focused data repository is to synthesize, interpret, and make good data curation practices easier and more useful for our community. As representatives of Environmental Systems Science researchers, we can also provide valuable feedback within the informatics community and influence existing practices to better support interdisciplinary science. 

In this presentation, we demonstrate a community-focused approach in connecting our scientists with best practices for data curation and publication developed in broader informatics and digital library communities. For example, we conducted a pilot test involving many of our scientific projects on the use of persistent identifiers for physical samples–specifically, International Geo/General Sample Numbers (IGSNs). We compared existing sample-related templates and shared vocabulary terms, and evaluated the experience of users to more efficiently describe biological and geological samples from interdisciplinary studies. We explore other challenges encountered as a broad, interdisciplinary repository, such as efficiently curating interdisciplinary data types, ensuring that data is FAIR and of high quality, and that authors receive appropriate credit for contributing quality datasets. Overall, the success of our repository relies on our ability to support specific community needs, and incorporate practices that help maximize the value of Environmental Systems Science data now and in the future.  

 

The ESS-DIVE repository and next steps toward a usable, trusted, and FAIR repository (IN0008-03)

 

Presenter: Deb Agarwal
Presentation Type: eLightning
Session Date and Time*: Tuesday, 8 December 2020; 10:36 – 10:39
Session Number and Title: IN008 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? III eLightning
Session Link:  https://agu.confex.com/agu/fm20/prelim.cgi/Paper/772655

Abstract

Deb Agarwal1, Shreyas Cholia1, Charuleka Varadharajan1, Valerie C Hendrix1, Joan E Damerow1, Madison Burrus1, Robert Crystal-Ornelas1, Hesham Elbashandy1, Emily Robles1, Fianna O’Brien1, Zarine Kakalia1, Mario Melara2, William J Riley1, Cory Snavely3, Makayla Shepherd2, Maegen Simmonds4, Karen Whitenack1, Matthew B. Jones5, Christopher S. Jones5 and Peter Slaughter5, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, United States, (3)Lawrence Berkeley National Laboratory, NERSC, Berkeley, CA, United States, (4)University of California Davis, Davis, CA, United States, (5)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States

The US Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository is in its third year of operation. The repository focus is  on three areas of development: expanding adoption and use by ESS Users, standardization of data, and support for projects providing data to the repository. Our approach is designed around user experience methods and involves significant discussion and involvement of the community. The priorities of the repository are continually revised and refined based on input from the community. 

Our current focus is on expanding the user-base and functionality of ESS-DIVE through five key innovations: (1) understand user needs; (2) support for early data archiving by projects;  (3) reaching a broader portion of the ESS community; (4) support search of extracted ESS-DIVE data with a fusion database; and (5) federation with other repositories. We are focused on providing a scalable, robust repository and long-term curation of ESS data that adhere to Findable, Accessible, Interoperable, and Reusable (FAIR) principles, with the goal of increasing the ease and capacity of storing data in the repository. A key goal is enhancing usability of the data. For example, many of the projects contributing data to ESS-DIVE have large teams, last many years, and generate a large number of data packages. We are working with our community to evaluate the available methods of  providing usable citations for large subsets of the data from a project. 

Our end goal is  to have a repository that is trusted by the community and that is the preferred storage facility for data generated by the DOE ESS program and the preferred provider of ESS data. One challenge is that FAIR principles are designed to address the needs of the data user, and largely ignore the needs of the data provider. The CoreTrustSeal is not yet well known so there is no pressure from our user community or funders to complete the application process. However, now that at least one repository based on the same software, MetaCat, has been certified the process might be less work for ESS-DIVE. As publishers move to require CoreTrustSeal certification, we expect to see increased pressure to obtain the certification.

 

Tackling the Challenges of Earth Science Data Synthesis: Insights from (meta)data standards approaches  (IN012-07)

 

Presenter: Valerie Hendrix
Presentation Type: Oral 
Session Date and Time*: Tuesday, 8 December 2020; 20:54 – 20:58 Pacific
Session Number and Title: IN012 – Data and Information Services for Interdisciplinary Research and Applications in Earth Science II
Location: Virtual
Session link:  https://agu.confex.com/agu/fm20/prelim.cgi/Paper/749418 

Abstract

Valerie C Hendrix, Danielle S Christianson, Charuleka Varadharajan, Madison Burrus, Shreyas Cholia, You-Wei Cheah, Housen Chu, Robert Crystal-Ornelas, Joan E Damerow, Zarine Kakalia, Fianna O’Brien, Gilberto Pastorello, Emily Robles and Deb Agarwal, Lawrence Berkeley National Laboratory, Berkeley, CA, United States

 

Diverse, complex data are a significant component of Earth Science’s “big data” challenge. Some earth science data, like remote sensing observations, are well understood, are uniformly structured, and have well-developed standards that are adopted broadly within the scientific community. Unfortunately, for other types of Earth Science data, like ecological, geochemical and hydrological observations, few standards exist and their adoption is limited. The synthesis challenge is compounded in interdisciplinary projects in which many disciplines, each with their own cultures, must synthesize data to solve cutting edge research questions.

Data synthesis for research analysis is a common, resource intensive bottleneck in data management workflows. We have faced this challenge in several U.S. Department of Energy research projects in which data synthesis is essential to addressing the science. These projects include AmeriFlux, Next Generation Ecosystem Experiment (NGEE) – Tropics, Watershed Function Science Focus Area, Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), and a DOE Early Career project using data-driven approaches to predict water quality.

In these projects, we have taken a range of approaches to support (meta)data synthesis. At one end of the spectrum, data providers apply well-defined standards or reporting formats before sharing their data, and at the other, data users apply standards after data acquisition. As these projects continue to evolve, we have gained insights from these experiences, including advantages and disadvantages, how project history and resources led to choice of approach, and enabled data harmonization. In this talk, we discuss the pros and cons of the various approaches, and also present flexible applications of standards to support diverse needs when dealing with complex data.

 

 

Letting the community lead the way to data integration: Data standards and documentation developed by domain experts and the ESS-DIVE repository (IN015-07)

 

Presenter: Rob Crystal-Ornelas
Presentation Type: Oral
Session Date and Time*: Wednesday, 9 December 2020; 17:54 – 17:58
Session Number and Title: IN015 – Best Practices and Realities of Research Data Repositories: Which One Should I Choose to Publish My Data? II
Session Link: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/712622 

Abstract

Robert Crystal-Ornelas1, Charuleka Varadharajan1, Ben P Bond-Lamberty2, Kristin Boye3, Madison Burrus1, Shreyas Cholia1, Joan E Damerow1, Ranjeet Devarakonda4, Hesham Elbashandy1, Kim S Ely5, Amy E Goldman2, Susan L Heinz6, Valerie C Hendrix1, Christopher S. Jones7, Matthew B. Jones7, Zarine Kakalia1, Mario Melara8, Fianna O’Brien1, Stephanie Pennington9, William J Riley1, Emily Robles1, Alistair Rogers5, Makayla Shepherd8, Maegen Simmonds1, Peter Slaughter7, Terri Velliquette10, Pamela Weisenhorn11, Karen Whitenack1 and Deb Agarwal12, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Pacific Northwest National Laboratory, Richland, WA, United States, (3)SLAC National Accelerator Laboratory, Stanford Synchrotron Radiation Lightsource, Menlo Park, CA, United States, (4)Oak Ridge National Laboratory, Oak Ridge, TN, United States, (5)Brookhaven National Laboratory, Environmental and Climate Sciences Department, Upton, NY, United States, (6)Oak Ridge National Laboratory, Kingston, TN, United States, (7)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (8)Lawrence Berkeley National Laboratory, Berkeley, United States, (9)Pacific Northwest National Laboratory, Joint Global Change Research Institute, College Park, MD, United States, (10)Oak Ridge National Laboratory, Oak Ridge, United States, (11)Argonne National Laboratory, Argonne, United States, (12)LBNL, Berkeley, CA, United States

 

Earth and Environmental Science data repositories are tasked with storing data that comes in a wide range formats. Many repositories, including the US Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) repository, see data integration and synthesis as a key step in harnessing the power of the large datasets contained within the repositories. However, the lack of standardization in data contributed by users can prohibit data reuse and integration. 

To kickstart the generation of reporting standards, the ESS-DIVE repository funded six community partners from national labs around the US to develop 7 metadata/data related standards. In this talk, we begin by describing how our community partners achieved consensus on standards for some of the most common data types uploaded to ESS-DIVE. One challenge community partners faced was providing robust documentation so that any data producer could adopt the standards prior to uploading their data to ESS-DIVE. Documentation also needed to be dynamic so that when standards required modifications it was relatively easy to do so.

To overcome this challenge, ESS-DIVE has begun to implement a software versioning-style framework to allow for data standards to be transparently developed and updated. When standards are expanded or updated by community consensus, our versioning framework allows a clear view of any modifications. Data uploaded to the ESS-DIVE repository that adhere to these community standards will be more interoperable and reusable, facilitating synthesis across datasets. These standardized data contributions to ESS-DIVE would then enable a deeper integrated search of the individual data files within the repository through the ESS-DIVE “fusion database”. Ultimately, by developing standards, providing clear documentation, and a transparent way of updating standards, ESS-DIVE provides a sustainable path toward data integration through community-driven standard development.

 

Incorporating Data Management Best Practices into Scientific Workflows (IN016-07)

 

Presenter: Zarine Kakalia
Presentation Type: eLightning
Session Date and Time*: Wednesday, 9 December 2020; 20:48 – 20:51
Session Number and Title: IN016 – A Call to Action for FAIR, Reproducible, and Transparent Science: Analytical Code, Workflows, Services, Models, and Conclusions eLightning
Session Link: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/684965

Abstract

Zarine Kakalia1, Charuleka Varadharajan1, Madison Burrus1, Danielle S Christianson1, Robert Crystal-Ornelas1, Joan E Damerow1, Dipankar Dwivedi1, Boris Faybishenko1, Valerie C Hendrix1, Emily Robles1, Roelof Versteeg2, Karen Whitenack1 and Deb Agarwal1, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Subsurface Insights, Hanover, NH, United States

 

The U.S. Department of Energy’s Watershed Function Scientific Focus Area (SFA) in the East River, Colorado generates and uses interdisciplinary data from hydrological, geochemical, geophysical, microbiological and remote sensing observations. The project has developed an end-to-end infrastructure to acquire the SFA’s multi-scale data, generate data products, and enable internal and public data access. Maintaining FAIR data throughout this pipeline is challenging due to the diversity of the data and scientific workflows. To ensure data pipelines generate integratable products and meet repository standards, the SFA Data Management Team engages with field scientists to incorporate best data management practices throughout the scientific workflow. SFA data is published through the DOE’s Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository, and thus adopts standards and metadata quality criteria required for publication through ESS-DIVE.

To overcome the challenge of acquiring critical metadata from diverse data streams, the SFA Data Management Team developed an integrated field-data workflow. Field scientists are required to use persistent location identifiers for long-term sites and register field locations prior to site creation. Scientists are encouraged to use International Geo Sample Numbers (IGSNs), which are persistent identifiers for their samples that are recommended by the ESS-DIVE repository. The SFA has completed two IGSN pilot tests with ESS-DIVE to begin incorporating sample tracking into the end-to-end data pipeline. This required extensive time and education on behalf of the field team, proving that shifting scientists’ processes to curate better data requires substantial effort. Finally, scientists are asked to provide sensor data, following practices adopted by the DOE’s Ameriflux network. Datasets are reviewed and compiled internally, and final data products and the associated metadata are published on ESS-DIVE.

This integrated workflow makes it easier to apply data to downstream analysis, synthesis and models. We found that developing project data/metadata standards and workflows in line with repository requirements is an effective way to develop FAIR and transparent data practices throughout the field-data pipeline. 

 

Optimizing the efficiency of metadata curation in large scale data repositories  (IN047-09)

 

Presenter: Emily Robles
Presentation Type: eLightning 
Session Date and Time*: Thursday, 17 December 2020; 04:24 – 04:27 Pacific
Session Number and Title: IN047 – Recent Advancements in Earth Science Data Discovery and Metadata Stewardship Practices
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/767519
 

Abstract

Emily Robles1, Charuleka Varadharajan1, Shreyas Cholia1, Valerie C Hendrix1, Joan E Damerow1, Madison Burrus1, Robert Crystal-Ornelas1, Hesham Elbashandy1, Zarine Kakalia1, Mario Melara2, Fianna O’Brien1, Makayla Shepherd2, Maegen Simmonds3, Karen Whitenack1, Matthew B. Jones4, Christopher S. Jones4, Peter Slaughter4 and Deb Agarwal5, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, United States, (3)University of California Davis, Davis, CA, United States, (4)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (5)LBNL, Berkeley, CA, United States

The Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository stores highly diverse Earth and environmental science data generated by projects funded by the U.S. Department of Energy (DOE). A system of metadata quality standards was developed through extensive community collaboration to ensure the data submitted to ESS-DIVE remain findable, accessible, interoperable, and reproducible (FAIR) for data users. However, ongoing implementation of these checks requires a metadata review process capable of scaling with the growth of the repository as increasing emphasis is placed on the importance of data archival within the environmental sciences.

To address this challenge, ESS-DIVE created a robust data package review workflow incorporating both automated and manual checks for each data package submitted for publication. A suite of automated metadata quality FAIR checks was developed by the National Center for Ecological Analysis and Synthesis (NCEAS) and tailored to fit ESS-DIVE needs through research into metadata best practices, review of journal metadata requirements, and community feedback. The results are compiled into Metadata Quality Reports, which provide instantaneous feedback to both the data contributor and ESS-DIVE reviewers on problem areas within the metadata upon submission. Reviewers then carry out manual checks focused on metadata content and complete post-review assessments that collect the length of time each review takes. Standardized feedback responses are generated by both series of checks and are used by the reviewer to collaborate 1:1 with contributors until the data package is eligible for publication.

This system has improved the quality of ESS-DIVE data while decreasing review time by ~60% from the start of implementation. The integration of automation allows our team members to focus efforts on the content-oriented manual metadata checks, which are the most commonly failed metadata requirements. Post-review assessments inform future automation efforts to continuously increase efficiency. This system of metadata review will sustain and support higher volumes of publication requests, ensuring that metadata quality standards are enforced throughout the continued growth of the ESS-DIVE repository. 

 

 

Increasing visibility of historical datasets through modern repository practices (IN047-10)

 

Presenter: Madison Burrus
Presentation Type: eLightning 
Session Date and Time*: Thursday, 17 December 2020; 04:27 – 04:30 Pacific
Session Number and Title: IN047 – Recent Advancements in Earth Science Data Discovery and Metadata Stewardship Practices
Session URL: https://agu.confex.com/agu/fm20/prelim.cgi/Paper/771167 

Abstract

Madison Burrus1, Fianna O’Brien1, Charuleka Varadharajan1, Valerie C Hendrix1, Shreyas Cholia1, Hesham Elbashandy1, Jannean Elliott2, Christopher S. Jones3, Matthew B. Jones3, Zarine Kakalia1, Emily Robles1, Crystal Sherline2, Peter Slaughter3, Cory Snavely4, Sara Studwell5, Karen Whitenack1 and Deb Agarwal1,6, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Department of Energy Oak Ridge, Oak Ridge, United States, (3)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (4)Lawrence Berkeley National Laboratory, NERSC, Berkeley, CA, United States, (5)Department of Energy Oak Ridge, Oak Ridge, TN, United States, (6)LBNL, Berkeley, CA, United States

 

The Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) repository preserves, expands access to, and improves usability of Earth and environmental scientific data. Amongst several efforts to improve the visibility of ESS-DIVE data, we’ve adapted the “Portals” feature from the National Center for Ecological Analysis and Synthesis (NCEAS) Metcat platform, providing our repository users a space to showcase their custom data collections. 

Here, ESS-DIVE demonstrates the utility of Portals for data discovery using the legacy data collection of Carbon Dioxide Information Analysis Center (CDIAC) datasets. CDAIC was a DOE climate-change data archive containing high-value fossil fuel emission and vegetation response data that ceased operations in 2017. Originally the CDIAC data was available through web pages with limited metadata, which limited their discoverability to web search engines. When ESS-DIVE took on the responsibility of maintaining these decades worth of vital climate change data, we had the opportunity to increase the discoverability of these datasets using a modern, manageable user interface.

In collaboration with the DOE’s Office of Science, Technology and Information (OSTI), we enhanced the CDIAC metadata previously obscured from users and coupled the datasets and metadata into packages on ESS-DIVE. Then we created a portal to view all CDIAC data packages and transferred in project information from the original webpages, providing an archive-centric view of CDIAC data. Portals are a permanent feature in ESS-DIVE that any user can leverage to create custom, branded landing pages about their research topic with any related datasets published on ESS-DIVE.

As an interdisciplinary archive for earth science data, the preservation and modernization of data previously held by CDIAC was paramount for ESS-DIVE. Using Portals, we could increase the findability and accessibility of data as well as through metadata improvements and the CDIAC Portal on ESS-DIVE. 

 

Community Fund Partners

Making leaf physiology FAIR: a new standard for leaf-level gas exchange data and metadata (IN045-03) 

Presenter: Kim Ely (Brookhaven National Lab)
Presentation Type: Oral
Session Date and Time*: Wednesday, 16 December 2020; 08:38 – 08:42 Pacific
Session Number and Title: IN045 – Improving Infrastructure for Trustworthy Digital Repositories to Enable Current and Future Use of Open Data in Developed and Developing Countries II
Session URL: https://agu.confex.com/agu/fm20/meetingapp.cgi/Paper/695004
 

Abstract 

With the advancement of ecological data archiving, there is an increased awareness of the FAIR principles, a call to improve Findability, Accessibility, Interoperability and Reusability of data. A particular challenge in meeting these goals is presented by long tail data; low volume, diverse data types with no widely used community standards. Leaf-level gas exchange data provide mechanistic understanding of plant and ecosystem fluxes of carbon and water. These data yield important parameterizations for terrestrial biosphere models and are necessary to understand the response of plants to global change. Collection of these data is both specialist and time consuming, and individual studies generally focus on limited species or restricted geographic regions. The high value of these data is recognized as evidenced by many publications that reuse and synthesize gas exchange data, however there are currently no published standards in use to facilitate data re-use, making enhanced use of gas exchange data by the community challenging and somewhat ad hoc.

We have developed a standard for leaf-level gas exchange data and metadata to provide guidance to data contributors on how to store data in data repositories to maximize the value of that data and facilitate efficient data re-use. For data users, the standard will expand the capacity of data repositories to optimize data search and extraction, and how ready those data are for incorporation into synthesis products. The standard encompasses metadata elements, standard vocabularies, required variables and a crosswalk across the outputs of common instruments to enable accurate data compilation. Currently the standard covers survey measurements, dark respiration, CO2 and light response curves, and parameters derived from those measurements.

The standard is being developed for the U.S. Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository. However, development of the standard has considered global needs for these data, and subject matter experts from institutions around the world have been invited to review and contribute to the standard. We hope that this broad community engagement will lead to wide acceptance and uptake of the published standard across the leaf-level gas exchange community

 

Filed Under: news, Uncategorized

ESIP 2020

July 29, 2020 by Charuleka Varadharajan

The Earth Science Information Partners (ESIP) is a community-based organization with a mission to support collaborations around data topics in Earth sciences. ESS-DIVE became an ESIP member in 2019, and seeks to represent perspectives of the Environmental Systems Science community as we engage in discussions on best practices for managing, preserving, and reusing data. 

Rob Crystal-Ornelas and Joan Damerow represented ESS-DIVE in the 2020 ESIP Summer meeting from July 14-24. This was the first virtual ESIP meeting, which allowed more people to attend and creative approaches for group breakouts, interactive demos, and collaborative documents. During sessions and plenaries, the interdisciplinary nature of the meeting was clear as earth scientists, data managers, digital librarians, computer programmers and more shared perspectives on this year’s theme: “putting data to work.” 

 

New postdoc with the ESS-DIVE team, Rob Crystal-Ornelas, attending his first ESIP conference.

 

 

 

 

 

 

 

 

To recap some highlights from the conference, we compiled a short list of takeaways and helpful links, and provide a more in-depth session spotlight below.

  • Plenary by Dr. Julia Lowndes on how the openscapes project is providing long-term support for researchers looking to adopt and normalize open science practices.
  • Research showcase poster by Dr. Kathe Todd-Brown who synthesized a list of 1,000 unique terms commonly used in soil science research.
  • Sessions and Plenary talks on vocabularies in Earth, Space, and Environmental Sciences highlighted goals, current resources, and problems with shared vocabularies. ESIP clusters are working towards, harmonizing terms across vocabularies, integrating earth science terms with other disciplines (e.g. biological ontologies) to support interoperability and multi-disciplinary science, and guidelines to make it easier to choose and use relevant vocabularies. 
  • Tutorial by ESIP Fellow Yuhan Rao who walked attendees through an openly available tutorial on Machine Learning techniques using R and Python.
  • The ESIP Research Object Citation cluster has made considerable contributions to progress in data citation practices, but more work is needed to support credit for data and other research objects (e.g. Make Data Count: https://makedatacount.org/).  
  • Joan Damerow presented outcomes from our ESS-DIVE community pilot test on registering interdisciplinary samples for International Geo/General Sample Numbers (IGSNs) and standardizing sample metadata.  
  • In a session on physical samples, we developed a plan for a new ESIP cluster to work together on a variety of topics to support sample data curation, discovery, and reuse. One goal of the proposed Physical Samples Cluster will be to compile and/or develop recommendations and tools to support improvements in sample management (e.g. Middleware for Assisting the Registration of Samples (MARS): http://cirdles.org/projects/mars/) and reuse.   

 

Session spotlight: “What we wished we’d learned in grad school: A workshop to develop a mini data management training”.

 

This session, led by Dr. Yuan Rao, and PhD candidates Ellie Davis and Ben Roberts-Pierel invited participants to co-create documents that can be used to introduce researchers to key steps in data management. In small zoom breakout rooms, attendees worked with 4 other people to help identify why and when we should introduce graduate students to the concept of documenting their research using meta-data. We determined that an overarching why is because scientists want their research used by others in the future. Our group also thought that data documentation lessons and seminars should be integrated throughout graduate education from the very start. Data management could be taught formally as part of a graduate level Research Methods course or informally through workshops or lab groups meetings.

 

The outcome of this session is a draft set of teaching materials that can be used to introduce graduate students to the data management lifecycle.

 

 

Filed Under: news, Uncategorized

Fall 2019 AGU Highlights

January 24, 2020 by Charuleka Varadharajan

The ESS-DIVE team had a great time presenting our work and connecting with our community at this year’s AGU conference. In total, members of the team had seven oral and poster presentations. See our previous post, ESS-DIVE at AGU 2019, for a list of each presenter and their abstract.

Here are some highlights!

DebArgarwal presenting during the Best Practices and Realities of Research Data Repositories session

 

Fianna O’Brien making friends at the AGU booths
Charuleka Varadharajan and Valerie Hendrix at the ESS-DIVE booth – thank you to those who visited us!
Joan Damerow giving her talk on persistent sample identifiers use and metadata standard creation.

 

Zarine Kakalia explaining ESS-DIVE’s method for creating a standardized metadata quality review process for submitted data packages

 

 

 

 

 

 

 

 

 

 

Filed Under: news

ESS-DIVE at AGU 2019

November 23, 2019 by Charuleka Varadharajan

Come meet the ESS-DIVE team at AGU 2019 in San Francisco! We are available during our “Meet the scientist” time slot, 3:30-5 pm on Tuesday, Dec 10th at the Berkeley Lab Booth in the Exhibits Hall.

We also will be at several oral and poster presentations listed below.

Designing the ESS-DIVE Data Repository to be Trusted by the Community and FAIR ( IN14B-16 )

Presenter: Deb Agarwal
Presentation Type: eLightning 
Session Date and Time*: Monday, 9 December 2019; 16:00 – 18:00 
Session Number and Title: IN14B: Best Practices and Realities of Research Data Repositories eLightning 
Location: Moscone South; eLightning Theater III
Abstract
The US Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository is still in its early implementation and growth phase. The focus of the repository has been on three areas of development: data access capabilities, standardization of data, and services to support projects providing data to the repository. Our approach is designed around user experience methods and involves significant discussion and involvement of the community in the design and development of capabilities. The priorities of the repository are continually revised and refined based on input from the community. We are following the developments of CoreTrustSeal and FAIR principles for data, and they are targets we hope to achieve in the future.

Our primary near-term goal is to build a repository that is trusted by the community and that is the preferred storage facility for data generated by the DOE Environmental System Science program. We continually strive to ensure our data are easily findable, accessible, interoperable, and reusable (FAIR). Achieving this goal requires a partnership with the data providers to gather the necessary metadata and standardized data. One challenge is that FAIR principles are designed to address the needs of the data user, and largely ignore the needs of the data provider. In this talk we present our repository and our approach to working with data providers to move their data toward FAIR principles. We also discuss the challenges we see in incentivizing the data provider to care about some aspects of FAIR data. Our ESS-DIVE team includes members of project teams that store data in the ESS-DIVE repository, and these dual perspectives give us some insight into the motivations and needs of the data provider. The motivations and priorities of data providers do not always align with the needs of the repository.

Standardizing Metadata Quality Review for an Environmental Data Repository (IN14B-09)

Presenter: Zarine Kakalia
Presentation Type: eLightning 

Session Date and Time*: Monday, 9 December 2019; 16:00 – 18:00 
Session Number and Title: IN14B: Best Practices and Realities of Research Data Repositories eLightning 
Location: Moscone South; eLightning Theater III
Abstract
The Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), is a data repository developed to support earth and environmental science projects funded by the U.S. Department of Energy (DOE), and is part of the DataONE network. One of the challenges ESS-DIVE faces is ensuring that submitted data packages have thorough metadata necessary to find and use the dataset.

Our goal is to ensure all data packages published on ESS-DIVE have high-quality metadata that meet FAIR data principles. However, extensive metadata quality reviews can involve significant staff time and resources. Therefore, we implement a combination of automated checks to catch issues upon submission, and a manual process for in-depth content reviews requiring domain knowledge. A majority of the automated checks were developed by DataONE and the Arctic Data Center and are designed to assess the findability, accessibility, interoperability and reusability (FAIR-ness) of datasets by checking for the presence of metadata fields and word counts. We are testing this suite of DataONE metadata checks as well as additional checks needed for our community. Automated checks reduce the time needed for manual reviews and provide instant feedback to users, thus expediting the publication process. To standardize the manual review process and provide consistent feedback to dataset authors, we use a checklist form with specific requirements for each metadata element. Completed forms for each dataset enable tracking the quality of datasets before and after review, and the amount of time taken on the review process.

We have found that the combination of automated quality reports and specific guidance in the review process is an effective approach to improve metadata and reduce manual review time. In addition, data from the completed review forms will allow us to assess whether the automated checks have decreased the manual review time and improved metadata quality.

Addressing Paradigm Shifts and Competing Interests in an Open Science World (IN22C-18)

Presenter: Deb Agarwal
Presentation Type: eLightning 
Session Date and Time*: Tuesday, 10 December 2019; 10:20 – 12:20 
Session Number and Title: IN22C: Open Knowledge Networks and Semantics for Geosciences: Successes and Challenges of Open Science eLightning 
Location: Moscone South; eLightning Theater III
Abstract
Open science has the potential to democratize science access in unprecedented ways. However, the change to open science shifts the costs and benefits in subtle ways for the scientists involved. Understanding these shifts will allow us to better address the needs of all parties and increase the amount of open science available. In the case of open publications, there is a shift in the costs of publishing from the reader to the author. This can affect the costs of scholarly output that were not planned into a grant and changes the publishers’ business model. In the case of data, there has traditionally been little or no academic credit for data collection. Credit is acquired from authorship of publications using that data. In an open data world, the collector of the data shifts from having exclusive publishing rights to having non-exclusive publishing rights with the hope of more citations of the data. This change requires that the publication of a dataset has similar academic credit to the publication of a paper. These aspects are already understood and are hopefully on track to be solved.

In this talk, we describe 12 years of experience working with data users and providers to move towards open data in biogeosciences. We have addressed this challenge as the US Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository and as members of the data teams for many earth science projects including: AmeriFlux (ameriflux.lbl.gov), FLUXNET (fluxnet.org), NGEE-Tropics (ngee-tropics.lbl.gov/), and Watershed SFA (watershed.lbl.gov). Moving global, diverse communities to open science paradigms requires finding the right incentives and building trust with the community. We discuss the incentives we have used to encourage open data as well as the benefits and pitfalls encountered along the way. Often the motivations for, and barriers to moving to open science data are not as straightforward as we initially thought and can vary across countries and projects.

Community use of persistent sample identifiers and metadata standards: supporting efficient data management in the field, laboratory, and online (IN32A-05) 

Presenter: Joan Damerow
Presentation Type: Oral 
Session Date and Time: Wednesday, 11 December 2019; 10:20 – 12:20 
Presentation Length: 11:20 – 11:35 
Session Number and Title: IN32A: Communities, Tools, and Policies That Enable Integration of Earth, Space, and Environmental Science Data and Cyberinfrastructures II: Tools and Policies 
Location:Moscone West; 2018, L2
Abstract
Physical samples are foundational entities for research in earth and environmental sciences; they are not only the basis of individual studies but could also be integrated with other data to inform new and broader-scale questions. Data contributors to the Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) repository often work in large, interdisciplinary teams and send samples to multiple facilities for analyses. This community needs an efficient system for persistent sample identification and tracking that is suitable for the field, laboratory analyses, and online publication.

We are conducting a community pilot test on the use of persistent identifiers for physical samples–specifically, International Geo Sample Numbers (IGSNs). Six projects with a variety of sample types are registering samples for IGSNs, standardizing sample collection metadata, and publishing their sample metadata in the System for Earth Sample Registration (SESAR) sample catalog and ESS-DIVE. The purpose of the test is to evaluate the experience of users and to decide on essential standardized metadata for our community. We gathered information for the pilot test through discussions with project teams and documented several components, such as the efficiency of the process (i.e. use of templates, labeling, registering samples, and updating metadata) and any apparent problems. We resolved uncertainties in use of metadata fields, and added standard terms as needed. Throughout the pilot test, we also gathered feedback on desired use cases, which include: improvements in data management, advanced search capabilities, ability to link identifiers, and ability to integrate and reuse sample data

The pilot test results will inform community-driven standards and tools for sample identifiers, tracking, and metadata in the ESS-DIVE repository. Our overall goal is to provide practical recommendations for efficient sample data management while also preserving and maximizing the potential value of samples into the future.

Utilizing Diverse Data in Scientific Analysis and Modeling for Water Resource Management (IN51A-01)

Presenter: Charuleka Varadharajan
Presentation Type: Oral 
Session Date and Time: Friday, 13 December 2019; 08:00 – 10:00 
Presentation Length: 08:00 – 08:15 
Session Number and Title: IN51A: Data and Information Services for Interdisciplinary Research and Applications in Earth Science I 
Location: Moscone West; 2018, L2
Abstract
The Earth’s water resources are being characterized at unprecedented resolutions due to the growth of sensor networks, remote sensing, and other observational tools. However, our ability to utilize ‘water big data’ for scientific analysis and modeling is still limited for many reasons. First, water data are complex and diverse, making it challenging to integrate and compare across data types. Second, it is difficult to discover and synthesize data across providers, as data and metadata are not provided using standardized formats. Third, real world data often need substantial quality checks and cleaning for scientific use. Finally, data preparation for both mechanistic and data-driven models is not trivial and involves gap filling, transformations, and conversion into formats that can be used by the models.

Here, we present technologies developed for curation, assessment, integration, visualization, and publication of water data for research funded by the U.S. Department of Energy (DOE). The Data Management Framework of the Watershed Function Scientific Focus Area (SFA) consists of cyberinfrastructure to (a) store diverse data in a queryable database, (b) scripts in Jupyter notebooks to QA/QC these data, (c) a broker BASIN-3D to integrate diverse, distributed, multiscale data into a unified view, and (d) search and access portals to enable data exploration through interactive tools and visualizations. ESS-DIVE is a data repository for DOE-funded environmental data, and is promoting the development of data/metadata standards in partnership with its community to ensure long-term data interoperability and reusability. Data from the SFA and other DOE watershed research efforts are publicly released through ESS-DIVE. Finally, we present our experience with using publicly-available water data from various providers in Colorado and California, and discuss challenges in using data as inputs to deep learning and mechanistic models.

Several federal and state efforts are now prioritizing open water data infrastructure. Future data systems need to enable seamless discovery and access of data from different providers in standardized formats. Also needed are scientific workflows that connect data to models. These advances are needed to provide timely predictions of water availability and quality to stakeholders

A Community-Centered Approach to Managing Environmental Data in Repositories (IN51F-0690)

Presenter: Charuleka Varadharajan
Presentation Type: Poster 
Session Date and Time: Friday, 13 December 2019; 08:00 – 12:20 
Session Number and Title: IN51F: Data Integration: Enabling the Acceleration of Science Through Connectivity, Collaboration, and Convergent Science II Posters 
Location: Moscone South, Poster Hall
Abstract
The Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) is a U.S. Department of Energy (DOE) data repository for DOE-sponsored research in earth and environmental sciences. Data stored in ESS-DIVE are highly diverse, spanning many science domains, and encompassing field, experimental, and modeling research across a variety of terrestrial and subsurface ecosystems. ESS-DIVE’s mission is to enable broad access to, and improve the usability of data stored in the archive. A key objective is to encourage data providers to contribute well-structured, high-quality data, with an intent to enable data users to easily build processing, synthesis, and analysis capabilities for those data.

ESS-DIVE’s approach from its inception has been to partner with its scientific research community to make the process of submitting and using data easier and more rewarding. We engage our user community, which comprises individual DOE projects, DOE cyberinfrastructure groups, and data users, through a variety of means. This includes face-to-face meetings during site visits to major data contributors, meetings of an advisory board consisting of the leads of major projects, conducting monthly webinars and online surveys to seek feedback on new features or priorities, and tutorials to train users. We utilize established user-experience research methods to determine user needs and priorities, and have embedded environmental scientists in the ESS-DIVE team to provide domain expertise to guide infrastructure development. We also work with the community to identify, develop and adopt consistent data and metadata standards that are most likely to be suitable for, and used by researchers submitting data to ESS-DIVE.

Here, we present the story of how ESS-DIVE has engaged its community, evolved to incorporate user needs and priorities, and lessons learnt through this process. The community-centered approach has so far resulted in dramatically increasing user interest in ESS-DIVE infrastructure and standards development. We believe this approach will maximize the value of ESS-DIVE datasets into the future to ultimately advance the scientific understanding and prediction of hydro-biogeochemical and ecosystem processes that occur from bedrock through soil and vegetation to the atmospheric interface.

Increasing Efficiency in Data Publication using Semi-Automated Workflow (IN51F-0705)

Presenter: Fianna O’Brien
Presentation Type: Poster
Session Date and Time: Friday, 13 December 2019; 08:00 – 12:20 
Session Number and Title: IN51F: Data Integration: Enabling the Acceleration of Science Through Connectivity, Collaboration, and Convergent Science II Posters 
Location: Moscone South, Poster Hall
Abstract
With the growing necessity for open access data, researchers are required to play the roles of both data provider and publisher. The Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data archive provides a publishing workflow to increase the accessibility of data produced by earth and environmental science projects funded by the Department of Energy. ESS-DIVE is presented with the challenge of providing a publication workflow that efficiently disseminates diverse datasets that meet FAIR standards.

ESS-DIVE provides quality assessment and data citation services to enhance dataset visibility for researchers. This multi-step, heretofore highly manual process requires considerable staff resources. ESS-DIVE has streamlined the ingest workflow by enabling communication between existing system components, integrating customer service desk software with the data archive. Publication team members can track and access data submissions, manage the documentation of the ingest process, and communicate with data providers through this centralized location. The use of automated metadata checks and the development of a manual review checklist has also dramatically improved the efficiency of the data publication process. Once a review has been satisfactorily completed, datasets are published on ESS-DIVE with a persistent and unique identifier.

While the speed of the publication process depends on the responsiveness of the data provider and the quality of the initial submission, the integration of a semi-automated workflow has dramatically improved not only the efficiency of our data publication process but also its consistency and reliability, bolstering impactful research efforts to address modern environmental challenges. We aim to continue to reduce the time and energy required of environmental scientists to contribute data to their field, and to offer review and support throughout the publishing process.

Filed Under: news

DataONE welcomes ESS-DIVE

September 22, 2018 by Charuleka Varadharajan

ESS-DIVE was announced as DataONE’s latest member node on September 18, 2018. “With the contributions from ESS-DIVE, DataONE now exposes over 1.16M data objects across its Member Nodes. By joining DataONE, ESS-DIVE reinforces its mission to preserve, expand access to, and improve usability of critical data. Deb Agarwal, a scientist in Berkeley Lab’s CRD and lead of the ESS-DIVE project, explains that becoming a DataONE Member Node will make ESS-DIVE “an even more powerful tool, as the library’s DOE-funded data contents will be discoverable in cross-catalogue searches.”

Read more..

Filed Under: Homepage Carousel, Homepage Features, news

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • Next Page »