Prepare and publish a dataset
Overview
🚧 Most of this chapter is under construction.
Once your data are assembled, cleaned, and analyzed, you should strongly consider publishing them for future use and posterity. In fact, this is required for most JRN LTER researchers and strongly encouraged by most funders and journals. To prepare a dataset for publication, you should make a few decisions first.
- What data and files will your dataset contain?
- Where will you publish the dataset?
- How will you assemble the metadata and publish it?
We explore the answers to these questions below, but an overview of the process is shown in Figure 1. You, the researcher, are responsible for collecting quality data and metadata, and notifying the Jornada IM team when you are ready to publish (though often we’ll ask you…). Once you provide the data and metadata files to an IM, an iterative curation process begins, with the aim of creating a publishable dataset that meets Jornada and LTER standards. After a cycle or two of review and editing, you or the IM team can publish the dataset to the appropriate repository.
What’s in your dataset?
It can be difficult to visualize what a published dataset should contain. At a minimum, there will be one or more data files, and a collection of metadata that thoroughly describes the data (ideally in human- and machine-readable form).
Other things a published dataset can contain (add to panels)
- Code
- Maps
- Document files (like field or lab protocols)
- Images
Choosing a repository
The recommended repository for most tabular ecological or environmental data produced by the Jornada Basin LTER program is EDI. The EDI repository supports rigorous, community-supported metadata standards and facilitates data re-use. It is also well-integrated into the Jornada’s information management systems. For some data types, and non-data research products, we recommended other repositories that are specialized for a particular research domains, or have especially useful features. Publishing in non-EDI repositories can have advantages if they provide features for your particular type of data, or are your research community’s accepted data source. Recommended repositories for a range of Jornada data and use cases are described below.
Environmental Data Initiative
AmeriFlux
NCBI
Zenodo
The IM team recommends EDI for most Jornada data, but if you opt to publish in a different repository, keep in mind that:
- Jornada data managers can provide some guidance, but more of the responsibility for metadata preparation and publishing will fall to you.
- The IM team still needs to know about Jornada datasets published outside EDI, so please notify us!
Despite these caveats, it is best to choose a repository where you and your collaborators know the data will be discovered and used.
Assembling the metadata
To ensure successful interpretation and re-use of the data after they are published, they must be accompanied by metadata that thoroughly describe them. For more on the importance of metadata and how to collect it, review guidance in the “Collect, manage and describe research data” chapter or at the EDI repository. Ideally, metadata should be in a format that provides rich detail and is both human- and machine-readable. A number of metadata standards can meet this objective, but the choice often comes down to data type, community support and repository capability.
At present, the Jornada IM team supports two ways to create a metadata file for your dataset: EDI’s ezEML application or Jornada metadata templates. These are described below.
ezEML
The EDI repository has created a web app called ezEML for describing research datasets and creating a standardized metadata documents for publication. The ezEML app creates metadata documents in the Ecological Metadata Language, or EML, which is a dialect of XML. The tool abstracts away most of the complexity of EML and XML, and is an easy method to author well-documented datasets. There is a Jornada EML template available on the site, so the recommended process for Jornada researchers is:
- Log in to ezEML using your Google, GitHub, or ORCID account (whichever is easiest).
- Start a new EML dataset using the EML Documents > New from Template menu item.
- Navigate to and select the LTER/JRN/JRN_template_general template to open an EML template pre-populated with Jornada metadata.
- Give the dataset a unique name. You can save your metadata and then return to this document anytime.
- Follow the sequence of forms on the left, and ezEML’s prompts, to upload data files and enter metadata for your dataset. Each section of your metadata will have help available (“?” icons) and several fields will already be filled if you are using the JRN template.
- Use the Check metadata and Check data tables tools at the bottom left to check the completeness and validity of your dataset. Green badges next to metadata mean your dataset is well described and ready to share (and yellow or red can indicate missing values or errors).
- When ready, click Submit/Share Package and then Collaborate with Colleagues. DO NOT USE Submit Package to EDI or we may miss your dataset.
- On the Invite a Collaborator screen, share the dataset with a Jornada data manager by entering the (jornada.data@nmsu.edu) email.
At this point, the Jornada IM Team will receive a notification and can access your dataset in ezEML to review, edit, and publish to EDI. It never hurts to send a reminder in case the data managers miss this EDI notification.
Metadata templates
A metadata template is a document with a structure and cues that help you collect the essential metadata needed to describe a published dataset. We have created Jornada metadata templates in MS Word (.docx) or Excel (.xlsx) formats. These templates contain sections for all critical pieces of metadata, along with instructions on what to include and how to structure the information. The Excel version is slightly more detailed and may be useful for complex datasets. Completed templates and accompanying data files should be sent to the Jornada IM team (jornada.data@nmsu.edu).
Other metadata methods
Under construction…
The review process
Once you have submitted the dataset to the Jornada IM team, the data and metadata are securely archived and a round of review and editing begins (see Figure 1). A data manager will review the metadata for completeness, compliance with Jornada standards, and mismatches between metadata and data. If revisions are needed or errors are found you will receive an email requesting the changes. Changes can be submitted in ezEML, or with an updated metadata template, as appropriate.
While writing metadata, the Jornada metadata standards and keyword thesauri (Excel file) documents are helpful, but not required.
Publishing the dataset
Once revisions are complete and the dataset authors agree to publish the dataset, the IM creates a final EML file, and then sends it, with the data, to the Environmental Data Initiative repository (EDI) to publish the dataset. A dataset citation, with a DOI, will be emailed to the dataset authors for distribution. If you are NOT publishing at EDI the publishing process will be different, and you may be expected to handle more of it yourself.