The ESS-DIVE team is looking forward to participating in the 2024 AGU Annual Meeting in Washington D.C. Several members of the ESS-DIVE team will be in-person throughout the meeting and hope to meet some of you there. Come visit us at one of the booths, or presentations listed below!
Booths
LBL Earth and Environmental Sciences Area (EESA) Booth
Location: Walter E. Washington Convention Center, Exhibits Hall
Team Member(s): Joan Damerow, Dylan O’Ryan, & Emily Nagamoto
Date and Time: Wednesday, December 11th; 14:00 – 15:00 ET
Oral Presentations
ESS-DIVE: Building Scalable, Resilient, Interoperable Infrastructure for Data Repositories (IN11A-06)
Presenter: Charuleka Varadharajan
Presentation Type: Oral
Session Date and Time: Monday, December 9th; 09:20 – 09:30 ET
Session Number and Title: IN11A: Collaboratively Advancing Trust in Data Repositories and Integrated Infrastructure to Enable Interdisciplinary Research, Applications, and Uses of Open Data I Oral
Presentation Location: Marquis 3-4 (Marriott Marquis)
Session URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Paper/1651870
Abstract
Shreyas Cholia1, Charuleka Varadharajan1, Valerie C. Hendrix1, Joan Damerow1, Emily Robles1, Deborah Agarwal1, Madison Burrus1, Danielle Christianson1, Hesham Elbashandy1, Mario Melara1, Fianna O’Brien1, Dylan O’Ryan1, Sarah Poon1, Shalki Shrivastava1, Karen Whitenack1, Catherine Wong1, Matthew B Jones2, Jing Tao2, Matthew Brooke2, Rushiraj Nenuji2, Jeanette Clark2, (1) Lawrence Berkeley National Laboratory, (2) National Center for Ecological Analysis and Synthesis
The Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) is a repository storing observation, simulation and derived data products generated by research funded by the Department of Energy’s Environmental System Science program. Given the breadth and volumes of DOE research data, ESS-DIVE addresses the increasing demand for robust data management systems that can handle complex environmental data spanning diverse, heterogenous formats and large volumes.
The infrastructure is built to support a wide range of data standards and reporting formats, aligned with the FAIR data principles. ESS-DIVE employs standardized metadata formats and APIs that promote compatibility and ease of use, facilitating data integration across different platforms and research communities. Leveraging these data formats, we developed a novel Fusion Database – a service that extracts, indexes, and serves data from within standardized files, to enable advanced search beyond metadata.
We are pioneering the development of a scalable, resilient, and interoperable infrastructure to support growth in data volumes with a tiered storage model. The primary storage layer manages smaller datasets and all metadata, ensuring verification, data lineage, and redundancy, The Tier 2 storage layer enables long-term archiving of large data (multi-terabyte) for scaling and cost-effectiveness. It provides high performance access to data, directly from the underlying storage volumes through Globus and web interfaces, while the associated metadata are maintained and indexed in the primary tier.
As a long-term archive, resilience is another cornerstone of the ESS-DIVE infrastructure. The system is architected to provide high availability through the use of reproducible, containerized Docker/Kubernetes microservices that can be deployed at multiple sites. We have automated backup and failover processes to facilitate resilient deployments. ESS-DIVE is also a member of the DataONE federation which ensures that our primary metadata and data are automatically replicated to other DataONE sites.
We believe that our approach can serve as a useful model for data repository management, enabling a path towards scalable infrastructure for long-term data preservation.
A scalable and community-centered repository workflow for dataset review in support of FAIR data (IN12A-03)
Presenter: Joan Damerow
Presentation Type: Oral
Session Date and Time: Monday, December 9th; 10:40 – 10:50 ET
Session Number and Title: IN11A: Collaboratively Advancing Trust in Data Repositories and Integrated Infrastructure to Enable Interdisciplinary Research, Applications, and Uses of Open Data I Oral
Presentation Location: Marquis 3-4 (Marriott Marquis)
Session URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Paper/1711551
Abstract
Emily Robles1, Dylan O’Ryan1, Deborah A. Agarwal1, Kathleen Beilsmith2, Ben Bond-Lamberty3, Kristin Boye4, Madison Burrus1, Danielle S. Christianson1, Michael Crow5, Robert Crystal-Ornelas1, Joan Damerow1, Hesham Elbashandy1, Kim S. Ely1, Brieanne Forbes3, Amy E. Goldman3, Susan Heinz5, Valerie C Hendrix1, Zarine Kakalia1, Kayla Mathes8, Mario Melara1, Fianna O’Brien1, Drew Paine1, Stephanie C. Pennington3, Sarah Poon1, Lavanya Ramakrishnan1, Alistair Rogers1, Shalki Shrivastava1, Maegen Simmonds1, Terri Velliquette5, Pamela Weisenhorn2, Jessica Nicole Welch5, Karen Whitenack1, Catherine Wong1, Shreyas Cholia1, Charuleka Varadharajan1, (1) Lawrence Berkeley National Laboratory, (2) Argonne National Laboratory, (3) Pacific Northwest National Laboratory, (4) SLAC National Accelerator Laboratory, (5) Oak Ridge National Laboratory, (7) Pacific Northwest National Laboratory, (8) Integrated Life Sciences, Virginia Commonwealth University
Open data are needed to support scientific research broadly; however the quality of data and associated metadata in repositories remain limiting factors for discoverability and usability. Existing standards and guidelines to format data and metadata can be difficult and time intensive to follow, are highly diverse, and often lack clear incentives for data contributors.
With the goal of lowering the barriers to generating Findable, Accessible, Interoperable, Reusable (FAIR) data and metadata, the U.S. Department of Energy’s Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository partnered with researchers to create 12 community standards and templates for a diverse range of Earth science data and metadata types. We developed both high-level (e.g., file-level metadata, CSV guidelines, sample metadata) and data-type specific (e.g., hydrologic monitoring, soil respiration, leaf physiology) reporting formats. In particular, the file-level metadata reporting format is used to describe details of individual files within a dataset, while the CSV reporting format outlines best practices for formatting tabular data to be machine readable, and enable automated validation and parsing.
We have developed automated quality-check tools that validate the completeness of files according to the file-level metadata and CSV reporting format requirements, and parse tabular data files with variables that have been formatted correctly. Automation enables the data publication review process to scale with repository growth. In addition, the adoption of reporting formats has enabled advanced search capabilities and provides the foundation for future data integration tools. Notably, we developed a Deep Dive API which provides advanced search within data files across all ESS-DIVE public datasets that have been parsed and validated by our automated checks. Users can find relevant data points and variables within individual files of a dataset, facilitating more efficient data discovery for scientific workflows than dataset-level search alone. Future development of automated tools to help data contributors standardize their data and metadata will continue to lower the barriers to generating well-curated, rapidly reusable, FAIR data.
Town Halls
Complex Citations: Ensuring Transparency, Reproducibility, and Credit for All Supporting Research Contributions (TH15N)
Moderators: Shelley Stall, Lesley A Wyborn, Martina Stockhause, Joan Damerow, Justin James Henry Buck, James Ayliffe
Date and Time: Monday, December 9th; 18:00 – 19:00 EST
Location: 209 A-C (Convention Center)
Townhall URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Session/229536
An ongoing challenge relevant to most research disciplines is the difficulty in citing 100+ digital objects such as datasets, software, samples, and images. Journals require authors to place citations over some set limit into supplemental information, where individual citations are not properly indexed, not linked to the manuscript, nor tracked accurately. Citing these research products is critical to enable transparent and reproducible research and for researchers, institutions, and project managers to trace citation, get appropriate credit, and report impact to funders.
We propose the term ‘reliquary’ to describe all the datasets, software, or other digital objects that support research findings for a specific paper. A reliquary may contain hundreds to millions of objects, often created by different groups and preserved in different repositories. We need to develop a scalable implementation strategy to enable researchers to use this type of citation and allow integration into common citation/impact metrics.
This Town Hall is relevant to all Sections of AGU and beyond including researchers, repositories managers, infrastructure builders, journal staff, and indexers. The work is related to the new working group “Complex Citations” of the Research Data Alliance.
A Unified Earth Science Data Infrastructure to Achieve a New Level of Science: Opportunities and Challenges (TH45H)
Moderators: Kerstin Kleese van Dam, Ben P Bond-Lamberty, Giri Prakash, Charuleka Varadharajan, Nicki Linn Hickmon
Date and Time: Thursday, December 12th; 18:00 – 19:00 EST
Location: 209 A-C (Convention Center)
Townhall URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Session/228926
Over the past year, scientists across the DOE National Laboratories convened a series of meetings to discuss solutions to a set of science challenges spanning the intersections of genomics, metagenomics, microbial community dynamics, hydrobiogeochemistry, atmosphere-land interactions, and earth system modeling. The team focused on development of a unified data infrastructure as part of a 10-year vision that integrates science with all-source data derived from, e.g., observations, experiments, simulations, and AI-informed information across existing and future data archives. This town hall will provide a status of progress towards the decadal vision, updates of the science opportunities and early success stories, and barriers that need to be overcome. Panelists will emphasize the value of linking distributed resources, reproducibility, and interoperability in scientific research. Critical to the discussion is the need to break down the barriers to data access and usability for the broader interdisciplinary community. The session will explore both the opportunities that arise from these unified approaches and the challenges that persist, including data harmonization, privacy, and the scaling of technologies. By bringing together experts and stakeholders, the Town Hall will foster a collaborative environment to refine strategies and accelerate the implementation of effective, community-wide data solutions in the earth sciences.
Convened Sessions
Accelerating the Model-Experiment Cycle Using Artificial Intelligence and Advanced Technologies I Oral (IN21A)
Conveners: Charuleka Varadharajan, Pamela Weisenhorn, Paul E Bayer, Justin Jay Hnilo, Yuan-Heng Wang
Date and Time: Tuesday, December 10th; 08:30 – 10:00 EST
Location: Marquis 12-13 (Marriott Marquis)
Session URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Session/240540
Accelerating the Model-Experiment Cycle Using Artificial Intelligence and Advanced Technologies II Poster (IN23A)
Conveners: Charuleka Varadharajan, Pamela Weisenhorn, Paul E Bayer, Justin Jay Hnilo, Yuan-Heng Wang
Date and Time: Tuesday, December 10th; 13:40 – 17:30 EST
Location: Hall B-C (Poster Hall) (Convention Center)
Session URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Session/237283
Of Stations, Sites, Sensors, and Samples: Collaboration, Innovation, and Stewardship of Field-Based Observations I Oral (IN42A)
Conveners: Stephanie Mullins Wingo, Joan Damerow, Natalie Raia, Rorie Edmunds, Lesley A Wyborn
Date and Time: Thursday, December 12th; 10:20 – 11:50 EST
Location: Liberty I-K (Marriott Marquis)
Session URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Session/240616
Of Stations, Sites, Sensors, and Samples: Collaboration, Innovation, and Stewardship of Field-Based Observations II Poster (IN43B)
Conveners: Stephanie Mullins Wingo, Joan Damerow, Natalie Raia, Rorie Edmunds, Lesley A Wyborn
Date and Time: Thursday, December 12th; 13:40 – 17:30 EST
Location: Hall B-C (Poster Hall) (Convention Center)
Session URL: https://agu.confex.com/agu/agu24/meetingapp.cgi/Session/228521