eData Data Reporting Format

The eData Data Reporting Format is a format for the reporting of chemical occurrence/exposure data in the natural environment. It provides tables, vocabulary, and validation functions for structuring chemical occurrence data, as well as spatial, social, biological, geographical, and other metadata.

For the eData application, please see its dedicated repository.

Installation

To install this package, you will need the devtools package:

  devtools::install_github("NIVANorge/eDataDRF")

Usage

The format is structured as a series of tables that collect different types of data. The Measurements table is the central data table. It references the other tables via foreign keys, unique identifying columns present in both tables. For more information on each table, please see the section below or the Articles section of the documentation site.

library(eDataDRF)

campaign_table <- initialise_campaign_tibble()
campaign_table
#> # A tibble: 0 × 8
#> # ℹ 8 variables: CAMPAIGN_NAME_SHORT <chr>, CAMPAIGN_NAME <chr>,
#> #   CAMPAIGN_START_DATE <date>, CAMPAIGN_END_DATE <date>, ORGANISATION <chr>,
#> #   ENTERED_BY <chr>, ENTERED_DATE <date>, CAMPAIGN_COMMENT <chr>
biota_table <- initialise_biota_tibble()
biota_table
#> # A tibble: 0 × 14
#> # ℹ 14 variables: SAMPLE_ID <chr>, SITE_CODE <chr>, PARAMETER_NAME <chr>,
#> #   ENVIRON_COMPARTMENT <chr>, ENVIRON_COMPARTMENT_SUB <chr>,
#> #   MEASURED_CATEGORY <chr>, SAMPLING_DATE <chr>, SUBSAMPLE <chr>,
#> #   SPECIES_GROUP <chr>, SAMPLE_SPECIES <chr>, SAMPLE_TISSUE <chr>,
#> #   SAMPLE_SPECIES_LIFESTAGE <chr>, SAMPLE_SPECIES_GENDER <chr>,
#> #   BIOTA_COMMENT <chr>

Tables

Tables are created as tibble::tibble() calls with empty variables of specific types (e.g. character(0) for strings). These support easier validation (see Validation) and the extensive Tidyverse family of functions.

Tables are listed below:

Table Name	Purpose	Comments
Campaign	Records data about sampling campaign and organisation collecting data.
Reference	Records conventional publication metadata, where available
Sites	Records site coordinates, land use, country/ocean
Parameters	Records data on stressors (chemical, radiation, etc.), quality measurements
Compartments	Records information on the compartment/matrix sampled
Samples	Records which combinations of dates, sites, parameters and compartments were sampled	Not used in final analysis, but exists as an intermediate table used to create measurements
Biota	Where relevant, records biota species, tissue, life stage, and gender	Optional
Methods	Records type and descriptions of methods used for sampling, extraction, fractionation and analysis
Measurements	Records measured values, units, uncertainty, sample size, and methods associated with a given sample
CREED (quality)	Records CREED assessment criteria, relevant data, criteria scores, and limitations	Optional
CREED Scores	Records CREED usability scores calculated from CREED data above	Optional

Vocabulary

Likewise, controlled vocabulary is available as functions that return vectors, lists, or tables. In some cases, helper functions are available that wrap multiple individual functions.

Where external data sources are used to generate a vocabulary, functions may wrap (processed) data from other R packages or load raw data from external sources. Vocabulary functions are documented in the Reference section and linked to in their relevant tables.

# returns a named vector
measured_categories_vocabulary() 
#>               External               Internal                Surface 
#>       "External Media" "Internal to Organism"  "Surface of Organism"

# returns a nested list
environ_compartments_sub_vocabulary()[1] 
#> $Aquatic
#>                    Freshwater             Marine/Salt Water 
#>                  "Freshwater"           "Marine/Salt Water" 
#>   Brackish/Transitional Water                   Groundwater 
#> "Brackish/Transitional Water"                 "Groundwater" 
#>                    Wastewater          Liquid Growth Medium 
#>                  "Wastewater"        "Liquid Growth Medium" 
#>                     Rainwater                    Stormwater 
#>                   "Rainwater"                  "Stormwater" 
#>                      Leachate              Aquatic Sediment 
#>                    "Leachate"            "Aquatic Sediment" 
#>                     Porewater                        Sludge 
#>                   "Porewater"                      "Sludge" 
#>                      Snow/Ice 
#>                    "Snow/Ice"

# returns a tibble
extraction_protocols_vocabulary()[1:5, 3] 
#> # A tibble: 5 × 1
#>   Long_Name                 
#>   <chr>                     
#> 1 Not relevant              
#> 2 Not reported              
#> 3 No extraction             
#> 4 Methanol extraction       
#> 5 Dichloromethane extraction

# calls bind_rows() on four *_protocol_vocabulary() functions to return a tibble
protocol_options_vocabulary()[1:5, 3] 
#> # A tibble: 5 × 1
#>   Long_Name         
#>   <chr>             
#> 1 Not relevant      
#> 2 Not reported      
#> 3 Point sampling    
#> 4 Composite sampling
#> 5 Trawl sampling

# calls crsuggest::crs_sf, filters to most common/relevant to Norway
coordinate_systems_vocabulary(common_only = TRUE) 
#> [1] "Not relevant"          "Not reported"          "Other"                
#> [4] "WGS 84"                "ETRS89"                "WGS 84 / UTM zone 32N"
#> [7] "WGS 84 / UTM zone 33N" "WGS 84 / UTM zone 34N" "WGS 84 / UTM zone 35N"

# returns a vector
ocean_vocabulary()[1:20] 
#>  [1] "Not relevant"       "Not reported"       "Other"             
#>  [4] "Torres Strait"      "Tasman Sea"         "Solomon Sea"       
#>  [7] "Ross Sea"           "Coral Sea"          "Bismarck Sea"      
#> [10] "Bellingshausen Sea" "Bass Strait"        "Amundsen Sea"      
#> [13] "Timor Sea"          "Sunda Strait"       "Sumba Strait"      
#> [16] "Sulu Sea"           "Sulawesi Sea"       "South China Sea"   
#> [19] "Singapore Strait"   "Seram Sea"

Is something missing?

We intend to update this format regularly as new resources become available and to address emerging needs. If you have a suggestion or comments, please get in touch via the Issues tab or, if you’re more technically minding, making a Pull Request with proposed changes for review.