NEWS – Page 8

Webinar on Expanding the Discoverability of your ESS-DIVE Data through External Linking

January 20, 2022 by Dylan O'Ryan

ESS-DIVE Webinar

Monday, February 7 | 11:00-12:00 PT / 14:00-15:00 ET

View Webinar Video / Link to Webinar Slides

Enhance the discoverability of your datasets by cross-linking relevant data and associated information across repositories. Learn about ESS-DIVE’s new approach and capability to link your ESS-DIVE data to other online repositories and data systems.

This will be a short webinar, with ample time for questions and feedback. We will cover the following use cases for linking your data and metadata:

Link to individual data files or copies of your data stored elsewhere
Link to the original publication of a dataset (e.g. your project’s data archive) where metadata and data can be found
Provide feedback on additional needs for linking to external data, methods, samples, and publications.

Please encourage anyone from your project who may be interested to attend.

Webinar presented by Joan Damerow Lead Scientist

Joan is an environmental scientist with a background in geoscience sampling, freshwater ecology, and biodiversity informatics. She runs activities for ESS-DIVE, including ESS-DIVE webinars, our annual data workshop, and is active in relevant conferences and data working groups (e.g. ESIP, RDA, AGU). Joan is interested in interdisciplinary data management and tracking, and works with DOE ESS data contributors to identify, develop, and implement practical data standards in ESS-DIVE that support FAIR principles.

Standardizing Water Quality Data with New ESS-DIVE Reporting Formats

January 2, 2022 by Dylan O'Ryan

Dylan O’Ryan is a Student Assistant with the Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository. He writes here about his experience with ESS-DIVE’s development and implementation of data reporting formats [data standards] in collaboration with six teams of scientists in the US DOE Environmental System Science (ESS) researchers. Dylan first began working with ESS-DIVE as part of a College Internship (CCI), where he standardized existing water quality data using ESS-DIVE’s data reporting formats.

Some of ESS-DIVE’s data reporting formats, such as that for soil and water quality, are specific to research domains. Other reporting formats are generalized to a wide range of data such as Comma Separated Value (CSV) files and sample collections. These standard reporting formats are designed to make data more Findable, Accessible, Interoperable, Reusable (FAIR) from the perspective of our ESS scientists. Data reporting formats standardize data to enable creation of better tools that allow advanced search, integration, and visualization of data within and across multiple datasets.

Kristin Boye, a Staff Scientist at SLAC National Accelerator Laboratory, developed a water/soil/sediment chemistry reporting format as part of her collaboration with ESS-DIVE. She developed this reporting format by synthesizing recommendations from other generalized reporting formats, such as CSV and FLMD (File-level Metadata), and incorporating feedback on how to format water/soil/sediment chemistry data.

Storm Drain Detectives is a water quality monitoring program in Lodi, California. For the past seven years, I’ve helped the program measure water quality parameters such as DO, pH, temperature, and bacteria. This experience with water quality testing enabled me to better understand the datasets that I was converting for ESS-DIVE. I used the water/soil/sediment chemistry data reporting format to convert existing datasets within the Lawrence Berkeley National Laboratory (LBNL) Watershed Function Scientific Focus Area (WFSFA) project. I converted water quality datasets where their metadata information was already published on ESS-DIVE, including ICP-MS, DIC/NPOC/TDN, Ammonia-N, Anion, and Isotope data.

Here is the step-by-step process that details how I converted existing datasets to the water/soil/sediment chemistry reporting format [See Image 1 for workflow diagram]:

Retrieve the water quality data file and locate the associated metadata published on ESS-DIVE.
Populate the methods file template. The methods file is where you store information on the samples’ methods of collection, analysis, storage, etc. I entered the information supplied by the data provider from the associated dataset metadata describing their methods. See Image 2 for an example of converted methods information to the reporting format methods file.
Populate the data file template. This data file is similar to most data files where you input sample information and measurements; however, this reporting format data file is designed to include information needed for future interpretation and reuse, such as: unique sample names, methods information (collection/analysis procedures, detection limits, analysis precision) as well as the data. The data file template also allows for standardized variable names and units across the files. Standardized names and units can be included in the term list. See Image 3 for an example of a converted data file from a data provider to the reporting format data file template.
- Note: I first filled in the methods information and header rows before populating the sample data.
As part of this reporting format, you can choose to fill out an optional terminology file. The terminology file can include all terms that would benefit from additional description and definition (e.g., data flags or other codes used throughout the data and method files). We note that the terminology file is different from the required data dictionary file that is part of ESS-DIVE’s file-level metadata reporting format. In the data dictionary, you provide definitions of column or row names, and their units. The terminology file is specifically designed for terms that are not captured in the column or row names. See Image 4 for an example of a terminology file.

The water/soil/sediment chemistry reporting format was straightforward. I was able to catch on to using the template and requirements of the reporting format. Transferring datasets is easy once you understand the general structure. While converting these data files, I became faster with converting where it became quick to create a methods file and data file with over 200 samples within 30 minutes.

Here are a few more tips and tricks related to converting a multitude of datasets:

You generally only need to create one methods file for a particular measurement (e.g., ICP-MS), where you would only need to adjust the data file to include the samples you tested.
Similarly, the data file headers and associated terms can be repeated if there are no collection or analysis procedure changes.

I found that utilizing ESS-DIVE’s reporting formats was straightforward and made the data easier to find, understand, and use in new ways. The converted datasets include unique sample names, contextual information describing the data (metadata), standardized formatting of missing values, and many more qualities that increase the usability of the data. The examples of converted water quality datasets are now being utilized by some WFSFA data providers in order to standardize their data and metadata.

Some other reporting formats that may help you standardize your data and metadata are CSV, File Level Metadata (FLMD), Sample Identifiers and Metadata, and Model Data Archiving Guidelines, which are high-level reporting formats that apply across multiple domains. The Leaf Gas Exchange reporting format, which is intended for leaf-level gas exchange data. The Soil Respiration reporting format, which is intended for soil respiration data and metadata. Hydrological Monitoring reporting format, which is designed for water parameters measured by in situ meters/probes. There are a couple of reporting formats in development: 16S Amplicon Sequencing and Locations Metadata. See Image 5 for a workflow for use of ESS-DIVE’s reporting formats.

The ESS-DIVE team is available for questions and help for those who want to use the reporting formats. Please email ess-dive-support@lbl.gov or you can use the “Contact US” feature on the ESS-DIVE website.

Image 1: Workflow for Conversion of Reporting Format

Image 2: Example conversion of methods information to methods file

Image 3: Example conversion of datafile to reporting format data file template

Image 5: Workflow for use of reporting formats

ESS-DIVE at AGU 2021

December 7, 2021 by lncore

The ESS-DIVE team is excited to present their work and connect with Earth and Environmental Systems Science (ESS) researchers at the upcoming American Geophysical Union (AGU) Fall Meeting 2021. The event will take place in New Orleans, LA and online everywhere 13-17 December 2021. Several ESS-DIVE team members are presenting on relevant topics, ranging from best practices for data curation and publication to approaches to support metadata synthesis. ESS-DIVE will be involved with a total of 6 oral and eLightening presentations.

#AGU21 is the leading forum for advancing Earth and space science and leveraging this research toward solutions for societal challenges. Earth and space science researchers is gathering both in person and virtually for this annual meeting to learn and collaborate around the theme of “Science is Society.” With more than 25,000 individuals from 100+ countries expected to attend representing the global Earth and space sciences, the event will consist of inspiring plenary talks, cutting-edge science presentations and more.Most sessions will be recorded and available to this global researchers, scientists, educators, students, policymakers, partners, science enthusiasts, journalists, and communicators. With in-person and worldwide online participation, attendees will have numerous opportunities to network with government regulators, scientific visionaries, and industry thought-leaders.

Madison Burrus will present Efforts to Encourage and Incentivize Data Archiving in the Environmental System Sciences during the poster session on Tuesday 14 December from 14:00 – 16:00 PT.
Joan Damerow will present How do we make interdisciplinary sample data more FAIR (Findability, Accessibility, Interoperability, and Reusability)? on Wednesday, 15 December from 12:51-12:57 PT.
Dylan O’Ryan is presenting Applying Data Reporting Formats to Open-Source Water Quality Data during the poster session on Thursday, 16 December 2021; 14:00 – 16:00 PT.
Emily Robles is presenting Bringing more Tropical data to the table through the NGEE-Tropics data archive during the poster session on Thursday, 16 December 2021; 14:00 – 16:00 Pacific
Joan will also chair three sessions on Connecting Disciplines and Data in Earth and Environmental Synthesis Research: Enabling International and Interdisciplinary Data Discovery, Integration, and Reuse on Thursday, 16 December 2021; 14:00 – 16:00 (poster session), and Friday, 17 December from 7:45 – 9:00 (elightening), 10: 45 – 12:00 PT (oral session).
Deb Agarwal will present Enabling Citations of Large Numbers of Dataset on Friday December 17 from 07:57 – 08:00 PT.
Robert Crystal-Ornelas will present Fundamentals for Collaborating on Research Projects Using GitHub on Friday, 17 December 2021 from 07:50 – 08:20 PT. He will also present Data Standards for More Reusable Data in Earth and Environmental Science during the poster session on Friday, 17 December 2021; 14:00 – 16:00 PT.
Emily Robles will present FAIR Dataset Metadata: An Analysis of Requirements across Environmental Science Data Repositories on Friday, 17 December 2021 from 12:48 –12:53 PT.
Shreyas Cholia will present Fostering Growth in the ESS-DIVE Repository on Friday December 17 from 14:03 – 14:06 PT.
Project Partners Pamela Weisenhorn and Kathleen Beilsmith will present Applying Data Standards and Reproducible Workflows To Advance Earth System Science during the poster session on Friday, 17 December 2021; 08:03 – 08:06 Pacific.

ESS-DIVE is enthusiastic about the opportunity to engage in this collaborative and interdisciplinary event. The interactive nature of this event will serve as a platform to share research findings, discuss use cases, and more. The team looks forward to not only sharing their knowledge, but also gaining new insights and experiences.

ESS-DIVE is funded by the Data Management program within the Earth and Environmental Systems Science Division under the DOE’s Office of Science Biological and Environmental Research program and is maintained by the Lawrence Berkeley National Laboratory.

Annual Data Workshop Offers Guidance in Environmental Data Management and Sharing

May 14, 2021 by Charuleka Varadharajan

DOE ESS researchers to learn how critical DOE Environmental System Science (ESS) data is managed, stored, discovered.

We will host our first hands-on workshop dedicated to working with Environmental System Science (ESS) researchers. The workshop, hosted online by the ESS-DIVE team at Lawrence Berkeley National Laboratory (LBNL), takes place Monday and Tuesday, May 24-25 from 9am-2pm PST (12pm-5 pm EST) and is free for all registrants.

The workshop is designed both to introduce newcomers to ESS-DIVE and to help those familiar with ESS-DIVE to sharpen their data practices. It includes discussions of ESS-DIVE’s present and future; instruction on querying, submitting, and describing ESS data; and hands-on tutorials for those both new and experienced with the repository. It is a valuable opportunity for personnel associated with projects funded by the DoE’s ESS program to learn how to archive data in ESS-DIVE and for ESS-DIVE to work with the DOE ESS researchers to make the process as easy as possible.

Environmental data, and the models and software that depend on it, have helped us gain an exponentially better understanding of the natural world in recent years. We are in the midst of a cultural and paradigm shift where open-access data increasingly provides the foundation on which scientific progress is built. In light of global challenges, it is more important than ever to have open and reliable data to make scientific breakthroughs and sound decisions. However, historically data has not been well documented, managed, stored, and reused. As such, ESS-DIVE is working with DoE ESS-funded researchers to improve the long-term efficiency of data management and to maximize the value of ESS data.

At the same time, the complexity, heterogeneity, and sheer volume of data in the ESS-DIVE repository will also grow, from terabytes of data just a few years ago to petabytes in the future. That volume places extra demands on the investigators who contribute data generated by DoE-funded projects. It also makes it more difficult for students, teachers, decision makers, and interested members of the public to discover and use critical publicly-funded data.

Many ESS projects are uniquely collaborative and interdisciplinary. They involve specialists and data across a range of environmental disciplines, including hydrology, ecology, geology, geophysics, geochemistry, and microbiology. A key challenge of ESS-DIVE is serving multidisciplinary data that is often connected by a given research question or location. Many types of data are required to address critical questions such as how ecosystems process carbon or how contaminants cycle through soil and water. ESS-DIVE will discuss use cases during the workshop to ensure that our reporting formats and tools support the researchers’ science goals.

To LBNL Senior Scientist and Data Science and Technology Department Head Deb Agarwal, who is the lead Principal Investigator of ESS-DIVE, the workshop represents a chance to broaden participation in an important collective endeavor.

“Data is a valuable research output in and of itself. ESS-DIVE was established to acknowledge the importance of sound environmental research data archiving and to make that data widely and readily available to anyone interested in using it.

The Annual Data Workshop is an important opportunity for the ESS-DIVE team to introduce the repository to DOE ESS researchers and to discuss upcoming features in development. There are many ESS project teams that are new to the repository and this workshop will help them get a head start on the process of effectively managing and archiving data.

We are really excited to bring together new and already engaged users to discuss how we can work together to archive their ESS data. We look forward to working together to make sure that ESS-DIVE is serving the needs of the ESS researchers.”

Registration for the ESS-DIVE Annual Data Workshop is closed. Please check in next year! To learn more about the workshop, or about ESS-DIVE in general, please visit the workshop event page or contact ess-dive-support@lbl.gov.

Insights from the 2021 ESIP Winter Meeting

February 2, 2021 by rcrystalornelas

Earth Science Information Partners (ESIP) researchers gathered virtually for the winter 2021 meeting to learn and collaborate around the theme of “Leading Innovation in Earth Science Data Frontiers.” Inspiring plenary talks addressed building a culture of innovation and the exhilarating, but also lonely and frightening, nature of exploring new frontiers in science. There was much discussion around machine learning and AI, and work needed towards assessing and achieving AI-readiness of data across agencies and industries.

Joan Damerow and Rob Crystal-Ornelas from ESS-DIVE attended the meeting, and recap some of the highlights and resources that may be useful for DOE ESS researchers.

Joan helped organize a kickoff meeting for the new ESIP Physical Samples Curation Cluster, which will focus initial efforts on identifying high-level recommendations to journal publishers on providing FAIR sample data. We will be meeting on a regular basis to outline core/basic recommendations for sample identifiers and metadata, relevant across disciplines.
A session on Linking Knowledge in the Earth and Space Sciences discussed how knowledge systems bring together data in a meaningful way to answer useful scientific questions. At ESS-DIVE, we want to know how you would like to search, link, integrate, and reuse data within a larger network. And to ensure that your related data is effectively linked when published.
Plenary talks on Innovation and New Frontiers in ML/AI introduced real-world examples of how ML/AI can be applied to a range of research priorities, such as monitoring crops and identifying areas of food insecurity in developing countries. Speakers emphasized the continued need for humans in the loop to characterize remote sensing and other data types for ML/AI approaches. Training data is still hard to get, very time-intensive and expensive.

The Innovating in a Documentation Ecosystem session leveraged the experiences of the audience to identify needs to more effectively link related data within and across agencies and organizations. The ingredients for a connected system involve innovative tools and infrastructure, use of metadata standards and conventions, and persistent identifiers. We learned about one particularly useful tool, the metadata editor (mdEditor), and the important role of open APIs. Our next step is to address the challenge of coordination and work to convince researchers to invest in more connected data ecosystems that improve data management efficiency and reusability of data.
During plenary talks on Innovation in Open Search and Discovery, we heard from representatives of Schema.org, Google, and Google Dataset Search. The audience promoted the metadata element “variableMeasured” as one of the most important within schema.org for supporting discoverability of datasets. Natasha Noy introduced work exploring the content of Google Dataset Search and how people are searching for data.

We explored data publication workflows from geoscience researchers to data repositories, and journal publications to identify problem areas. Our next step is to communicate and collaborate with journal publishers towards streamlining the data publication process, and ensure that data is open and useful upon publication.
A session called “Jupyter Notebooks: Harnessing the full potential” introduced participants to the many ways that the web interface Jupyter can be used for open source code development. We heard about examples of scientists using the Jupyter ecosystem to do everything from create interactive data dashboards, to publishing markdown-style books, to authoring manuscripts for peer reviewed journals.
In the final organized session of ESIP’s 2021 winter meeting, attendees discussed the challenges and opportunities for defining AI-ready data. The session moderators highlighted the importance of AI-read data for efficiently using novel computing technologies like exascale computing while also recognizing that the definition of AI-ready data is a work in progress. Some of the potential elements of an AI-ready dataset includes: data completeness, documentation (through metadata and data dictionaries), and clear end user licenses.