The ESS-DIVE Team is looking forward to participating in the 2023 AGU Fall Meeting in San Francisco, CA. The ESS-DIVE team will be in-person throughout the meeting and we look forward to meeting you in person. Come visit us at one of the help desks, or presentations listed below!
Booths and Help Desks
AGU Open Science & Data Help Desk
Location: Moscone Center, Exhibits Hall
Team Member(s): Joan Damerow
Date and Time: Tuesday, December 12th; 10:00 – 12:00 PST
DOE Biological and Environmental Research (BER) Program Booth
Location: Moscone Center, Exhibits Hall
Team Member(s): Charu Varadharajan
Date and Time: Tuesday, December 12th; 16:00 – 17:00 PST
LBL Earth and Environmental Sciences Area (EESA) Booth
Location: Moscone Center, Exhibits Hall
Team Member(s): Charu Varadharajan, Joan Damerow, & other ESS-DIVE team members
Date and Time: Wednesday, December 13th; 16:00 – 17:00 PST
Oral, eLightning, and Poster Presentations
Open Earth and Environmental Data Advance Scientific Discovery eLightning (H12P-0874)
Presenter: Shreyas Cholia
Presentation Type: Session Chair
Session Date and Time: Monday, 11 December 2023; 16:00 – 17:30 PST
Presentation Location: Moscone Center, South, Hall D; eLightning Theater IV, Hall D – South
Session URL: https://agu.confex.com/agu/fm23/meetingapp.cgi/Session/213275
Enabling Integration Across Diverse Environmental Systems Data (IN14B-04)
Presenter: Valerie Hendrix
Presentation Type: eLightning
Session Date and Time: Monday, 11 December 2023; 16:09 – 16:12 PST
Session Number and Title: IN14B: Open Earth and Environmental Data Advance Scientific Discovery eLightning
Presentation Location: Moscone Center, South, Hall D; eLightning Theater IV, Hall D – South
Session URL: https://agu.confex.com/agu/fm23/meetingapp.cgi/Paper/1365123
Abstract
Valerie C Hendrix1, Shreyas Cholia1, Charuleka Varadharajan2, Deb Agarwal3, Joan E Damerow3, Hesham Elbashandy3, Fianna O’Brien3, Emily Robles1, Mario Melara3, Madison Burrus1, Karen Whitenack3, Catherine Wong1, Shalki Shrivastava3, Sarah Poon3, Matthew B. Jones4, Jing Tao5, Matthew Brooke5 and Rushiraj Nenuji5, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Earth and Environmental Sciences Area, Berkeley, CA, United States, (3)Lawrence Berkeley National Laboratory, Berkeley, United States, (4)University of California Santa Barbara, National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States, (5)National Center for Ecological Analysis and Synthesis, Santa Barbara, United States
The Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) is a data repository for research sponsored by the U.S. Department of Energy’s Environmental System Science program. ESS-DIVE enables collection, storage, management, and sharing of a variety of observational, experimental, and modeling data in Earth and environmental sciences. The volume, complexity, and diversity of these interdisciplinary data present unique integration challenges. We discuss how ESS-DIVE approaches data integration across these datasets and with other data systems.
ESS-DIVE enables a systematic method for linking datasets from other recognized data providers directly in its metadata. This allows metadata to be searchable in ESS-DIVE, while referencing and linking out to externally managed data products in a standardized manner. In order to track and relate sample data across systems, we encourage the use of common standards for sample data identifiers, such as the International Generic Sample Number (IGSN). ESS-DIVE works closely with community partners to ensure adoption and consistency of these standards to promote interoperability and data integrity. Metadata in ESS-DIVE are published in a number of formats, including the JSON-LD format, which allows the data to be easily ingested and understood by external systems (eg. Google Dataset Search, OSTI, Data.gov etc.). ESS-DIVE offers support for project-specific features allowing researchers to collaborate and share data within their teams efficiently. Also provided are project data portals where related data are collected, cataloged, and easily accessible. ESS-DIVE supports a secondary storage layer to serve very large, hierarchical datasets. This allows users to directly browse and access large volumes of data over the web, as well as through high-performance data transfer mechanisms like Globus which can be used to efficiently move data between sites. Finally, ESS-DIVE is integrated with DataONE, a federation of interoperable data repositories facilitating open science and data discovery. This integration includes replicated data and metadata between ESS-DIVE and DataONE making it easier for users to access the data they need, regardless of where they are stored.
BASIN-3D: Addressing the challenges and opportunity of earth science data synthesis (IN31E-0707)
Presenter: Danielle Christianson
Presentation Type: Poster
Session Date and Time: Wednesday, December 13th; 8:30 AM – 12:50 PM PST
Session Number and Title: IN31E: Empowering Earth Science Data Use and Hydrologic Advancements: Showcasing Innovative Tools and Technologies for Broad User Communities Poster
Presentation Location: Moscone Center, South, Poster Hall A-C (Exhibition Level, South, MC)
Session URL: https://agu.confex.com/agu/fm23/meetingapp.cgi/Paper/1324860
Abstract
Danielle S Christianson, Lawrence Berkeley National Laboratory, Berkeley, United States, Valerie C Hendrix, Lawrence Berkeley National Laboratory, Berkeley, CA, United States and Charuleka Varadharajan, Lawrence Berkeley National Laboratory, Earth and Environmental Sciences Area, Berkeley, CA, United States
By some estimates, researchers spend up to 80% of their time finding, accessing, validating, and/or transforming data for analysis (EC, 2019). BASIN-3D (Broker for Assimilation, Synthesis and Integration of eNvironmental Diverse, Distributed Datasets) is data synthesis software designed to reduce this barrier by synthesizing diverse data from a variety of remote and/or local data sources on demand without the need for additional storage. BASIN-3D is unique in that it fully synthesizes formats, units, and semantics by mapping data source vocabulary and formats to a common data model. Thus, researchers use a single BASIN-3D query language to request data from various sources and receive synthesized results in common formats (e.g., hdf5, Pandas dataframe, json). BASIN-3D is available as a suite of Python Libraries (https://github.com/BASIN-3D) for customizable integration into python-based applications like Jupyter Notebooks and Django web frameworks.
In this presentation, we describe new capabilities and data source connections with lessons learned. For example, we overhauled the vocabulary mapping strategy to enable complex- and multi-mapping between data source terminology and that of BASIN-3D. While this feature enables a greater diversity of data to be synthesized, it also meant researchers wanted more details about the translation to validate appropriateness for their use. Additionally, we demonstrated synthesis of individual datasets from Department of Energy’s ESS-DIVE repository whose data follow a community-developed data reporting format. This demo has shown individual research teams the improved potential to have their data reused by others if their data are formatted in a community-adopted standard. It also enabled us to understand the requirements of building generalized connections to a variety of well-structured data sources.
EC: European Commission, Directorate-General for Research and Innovation, 2019. Cost-benefit analysis for FAIR research data – Cost of not having FAIR research data, Publications Office, https://data.europa.eu/doi/10.2777/02999
Need for Interdisciplinary, Collaborative Data Management on Earth Science Data Repositories (IN41C-0600)
Presenter: Madison Burrus
Presentation Type: Poster
Session Date and Time: Thursday, December 14th; 8:30 AM – 12:50 PM PST
Session Number and Title: IN41C: Fostering Wide-Open Science by Improving Collaboration, Innovation, Attribution, Identity, and Ethics in Data Repositories and Data (Re)publication I Poster
Presentation Location: Moscone Center, South, Poster Hall A-C (Exhibition Level, South, MC)
Session URL: https://agu.confex.com/agu/fm23/meetingapp.cgi/Paper/1325100
Abstract
Madison Burrus1, Fianna O’Brien2, Deb Agarwal2, Joan E Damerow2, Hesham Elbashandy2, Valerie C Hendrix1, Matthew B. Jones3, Mario Melara2, Dylan O’Ryan2, Emily Robles1, Sarah Poon2, Shalki Shrivastava2, Karen Whitenack2, Shreyas Cholia1 and Charuleka Varadharajan1, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, United States, (3)University of California Santa Barbara, National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States
The US Department of Energy (DOE) sponsors numerous Earth and environmental science research activities through their Earth Systems Science (ESS) program. While the research data produced by this program are vast and diverse, there is a need to centralize resources to curate, archive and access interdisciplinary ESS-funded research data. The Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository was established to be the central repository for ESS data. The goal was to create a space where existing and future research projects funded by ESS could easily publish data and report on progress.
An archival challenge facing ESS-DIVE and the broader environmental science community is that journal requirements and repository infrastructure tend to emphasize data archival at the individual contributor level, yet data products resulting from interdisciplinary scientific research are inherently generated from team contributions. At ESS-DIVE, we are working to accommodate the needs of ESS community projects. Based on community input, we expanded the ESS-DIVE repository to support project collaboration by introducing capabilities that enable team-based data publication, administration, and management.
We introduced several new project-centric data management features, including unique identifiers for projects, the ability to create teams, an administrative project data manager role, and a permission model for sharing dataset access with teams. Additionally, these project specific features were designed to compliment ESS-DIVE’s existing data management services. Projects can use ESS-DIVE’s programmatic web service API to upload and manage datasets in bulk, customize data portal websites to showcase data collections, and share dataset access to individuals as needed. In combination, the tools implemented at ESS-DIVE provide ESS projects a generalized, team-based data archiving experience throughout the publication process.
Here we showcase collaborative data publication tools developed by ESS-DIVE and designed for current and future DOE research projects. We discuss challenges, the need for collaboration to improve the design and development of these tools, and future work.
Linking and Citing Related Resources for Environmental System Science Research (IN41C-0606)
Presenter: Joan Damerow
Presentation Type: Poster
Session Date and Time: Thursday, December 14th; 8:30 AM – 12:50 PM PST
Session Number and Title: IN41C: Fostering Wide-Open Science by Improving Collaboration, Innovation, Attribution, Identity, and Ethics in Data Repositories and Data (Re)publication I Poster
Presentation Location: Moscone Center, South, Poster Hall A-C (Exhibition Level, South, MC)
Session URL: https://agu.confex.com/agu/fm23/meetingapp.cgi/Paper/1294212
Abstract
Joan E Damerow1, Deb Agarwal1, Madison Burrus2, Shreyas Cholia2, Hesham Elbashandy1, Valerie C Hendrix2, Fianna O’Brien1, Emily Robles1, Mario Melara1, Dylan O’Ryan1 and Charuleka Varadharajan3, (1)Lawrence Berkeley National Laboratory, Berkeley, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (3)Lawrence Berkeley National Laboratory, Earth and Environmental Sciences Area, Berkeley, CA, United States
The study of natural ecosystems requires multidisciplinary science teams to understand and model multi-scale processes. For example, studies of organic matter cycling through plants and soil involves data and analysis to represent soil biogeochemistry, microbial communities, plant structures, and ecophysiological traits of specific organisms involved. All related data should be clearly linked to represent processes at a given site or across broader scales. When published, however, they are often missing information needed to find, access, integrate, and reuse the data. The U.S. Department of Energy’s Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository archives datasets with related resources that may be distributed across numerous online sources. This presentation will outline ESS-DIVE’s current approach to linking and citations, and plans for tracking additional related identifiers.
Our goal is to enable users to easily identify scientifically useful information related to a dataset, and enable more accurate citation metrics. Here we compare existing approaches to linking related resources, including those from DataCite, Schema.org, and Ecological Metadata Language (EML). We then test use cases for applying related identifiers and connection metadata (e.g. resource type, relationship type) in ESS-DIVE datasets. And explore how any references to datasets and papers are or should be incorporated into citation and usage metrics. ESS-DIVE’s proposed approach to link related resources will make interdisciplinary Environmental System Science data more FAIR. A standard approach for linking identifiers can further facilitate automated transfer of metadata and data across online repositories and data systems. We find that automated tools and coordination is still needed across data systems and publishers to make effective linking practical, and to enable more accurate citation metrics.
Using FAIR-Guided Metadata Requirements for Diverse Data in the ESS-DIVE Data Repository (IN44A-07)
Presenter: Emily Robles
Presentation Type: Oral
Session Date and Time: Thursday, December 14th; 17:00 – 17:10 PST
Session Number and Title: IN44A: Applying Community-Developed Principles and Guidance to Improve Open-Science Capabilities of Scientific Data Repositories and Service Providers II Oral
Presentation Location: Moscone Center, 2014 – West
Session URL: https://agu.confex.com/agu/fm23/meetingapp.cgi/Paper/1451422
Abstract
Emily Robles1, Deb Agarwal1, Madison Burrus2, Joan E Damerow1, Hesham Elbashandy1, Valerie C Hendrix2, Mario Melara1, Fianna O’Brien1, Dylan O’Ryan1, Shalki Shrivastava1, Karen Whitenack1, Shreyas Cholia2 and Charuleka Varadharajan2, (1)Lawrence Berkeley National Laboratory, Berkeley, United States, (2)Lawrence Berkeley National Laboratory, Berkeley, CA, United States
High volumes of environmental science data are needed to support ecological research, including large scale analyses and model development. As open publication practices become more common and increase data availability, data and metadata quality remain limiting factors for discovery and reuse. These issues can begin to be addressed through metadata requirements implemented by data repositories or archives, however the review of metadata remains a complex issue for large data repositories holding diverse data types, both because of the need to prioritize scalability and the need for community adoption. Using guidance from the FAIR principles, which state that data should be finable, accessible, interoperable, and reusable, the U.S. Department of Energy’s Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository has implemented community agreed upon metadata checks at both the data package- and file-level to improve the usability and accessibility of published data.
Package-level metadata, such as complete and descriptive titles and methods information, provides important contextual information needed to reuse data and is reviewed by ESS-DIVE using a system of both automated and manual checks. To address the need for additional formatting guidelines in a field where data are often diverse and multidisciplinary, ESS-DIVE partnered with researchers from its science community to develop 13 reporting formats, covering a broad range of Earth science data and metadata types. Datasets that have adopted reporting format guidelines for file-level metadata now undergo an automated assessment for metadata quality requirements upon submission to ESS-DIVE. Extending metadata review to the file level will not only improve upon the information provided by package-level metadata, but enable future expansion of search and discovery capabilities. Here we discuss the process by which ESS-DIVE used guidance from the FAIR principles, in addition to prioritizing community feedback and collaboration, during the development of dataset requirements, as well as the mechanisms of the metadata review process and future directions for utilizing file-level review.