Dylan O’Ryan is a Student Assistant with the Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository. He writes here about his experience with ESS-DIVE’s development and implementation of data reporting formats [community data standards] in collaboration with six teams of scientists in the US DOE Environmental System Science (ESS) community. Dylan first began working with ESS-DIVE as part of a Community College Internship (CCI), where he standardized existing water quality data using ESS-DIVE’s community data reporting formats.
Some of ESS-DIVE’s data reporting formats, such as that for soil and water quality, are specific to research domains. Other reporting formats are generalized to a wide range of data such as Comma Separated Value (CSV) files and sample collections. These standard reporting formats are designed to make data more Findable, Accessible, Interoperable, Reusable (FAIR) from the perspective of our ESS community scientists. Data reporting formats standardize data to enable creation of better tools that allow advanced search, integration, and visualization of data within and across multiple datasets.
Kristin Boye, a Staff Scientist at SLAC National Accelerator Laboratory, developed a water/soil/sediment chemistry reporting format as part of her collaboration with ESS-DIVE. She developed this reporting format by synthesizing recommendations from other generalized reporting formats, such as CSV and FLMD (File-level Metadata), and incorporating community feedback on how to format water/soil/sediment chemistry data.
Storm Drain Detectives is a community-based water quality monitoring program in Lodi, California. For the past seven years, I’ve helped the program measure water quality parameters such as DO, pH, temperature, and bacteria. This experience with water quality testing enabled me to better understand the datasets that I was converting for ESS-DIVE. I used the community water/soil/sediment chemistry data reporting format to convert existing datasets within the Lawrence Berkeley National Laboratory (LBNL) Watershed Function Scientific Focus Area (WFSFA) project. I converted water quality datasets where their metadata information was already published on ESS-DIVE, including ICP-MS, DIC/NPOC/TDN, Ammonia-N, Anion, and Isotope data.
Here is the step-by-step process that details how I converted existing datasets to the water/soil/sediment chemistry reporting format [See Image 1 for workflow diagram]:
- Retrieve the water quality data file and locate the associated metadata published on ESS-DIVE.
- Populate the methods file template. The methods file is where you store information on the samples’ methods of collection, analysis, storage, etc. I entered the information supplied by the data provider from the associated dataset metadata describing their methods. See Image 2 for an example of converted methods information to the reporting format methods file.
- Populate the data file template. This data file is similar to most data files where you input sample information and measurements; however, this reporting format data file is designed to include information needed for future interpretation and reuse, such as: unique sample names, methods information (collection/analysis procedures, detection limits, analysis precision) as well as the data. The data file template also allows for standardized variable names and units across the files. Standardized names and units can be included in the term list. See Image 3 for an example of a converted data file from a data provider to the reporting format data file template.
- Note: I first filled in the methods information and header rows before populating the sample data.
- As part of this reporting format, you can choose to fill out an optional terminology file. The terminology file can include all terms that would benefit from additional description and definition (e.g., data flags or other codes used throughout the data and method files). We note that the terminology file is different from the required data dictionary file that is part of ESS-DIVE’s file-level metadata reporting format. In the data dictionary, you provide definitions of column or row names, and their units. The terminology file is specifically designed for terms that are not captured in the column or row names. See Image 4 for an example of a terminology file.
The water/soil/sediment chemistry reporting format was straightforward. I was able to catch on to using the template and requirements of the reporting format. Transferring datasets is easy once you understand the general structure. While converting these data files, I became faster with converting where it became quick to create a methods file and data file with over 200 samples within 30 minutes.
Here are a few more tips and tricks related to converting a multitude of datasets:
- You generally only need to create one methods file for a particular measurement (e.g., ICP-MS), where you would only need to adjust the data file to include the samples you tested.
- Similarly, the data file headers and associated terms can be repeated if there are no collection or analysis procedure changes.
I found that utilizing ESS-DIVE’s reporting formats was straightforward and made the data easier to find, understand, and use in new ways. The converted datasets include unique sample names, contextual information describing the data (metadata), standardized formatting of missing values, and many more qualities that increase the usability of the data. The examples of converted water quality datasets are now being utilized by some WFSFA data providers in order to standardize their data and metadata.
Some other reporting formats that may help you standardize your data and metadata are CSV, File Level Metadata (FLMD), Sample Identifiers and Metadata, and Model Data Archiving Guidelines, which are high-level reporting formats that apply across multiple domains. The Leaf Gas Exchange reporting format, which is intended for leaf-level gas exchange data. The Soil Respiration reporting format, which is intended for soil respiration data and metadata. Hydrological Monitoring reporting format, which is designed for water parameters measured by in situ meters/probes. There are a couple of reporting formats in development: 16S Amplicon Sequencing and Locations Metadata. See Image 5 for a workflow for use of ESS-DIVE’s reporting formats.
The ESS-DIVE team is available for questions and help for those who want to use the reporting formats. Please email ess-dive-support@lbl.gov or you can use the “Contact US” feature on the ESS-DIVE website.