Introduction

This page contains descriptions and the reproducible workflows used to create and load the teaching datasets used in our lessons. In general these start as Jornada data files downloaded from a data repository like EDI. We often use tidyverse to do some data manipulation, and then all finished datasets are output to the repository’s data/ directory.

The complete code needed to generate and load the teaching datasets are in the sections below. To use each teaching dataset, you can either run this code before you start a lesson, or you can download and load the pre-made datasets from the data/ directory. Suggestions for how to proceed are found at the start of each lesson.

Teaching dataset 1 - Jornada ANPP data

This dataset contains annual net primary production (NPP) data, measured in grams biomass per square meter, from the Jornada NPP study sites. There are 15 NPP study sites, in 5 different vegetation zones, across the Jornada Basin. At each site there are 49 1x1 meter quadrats where repeated measures of plant volume by species take place several each year. These volume data have been converted to biomass using an allometric method, and then to net primary production using the increment in biomass from one measurement to the next.

The annual data are in this EDI dataset:

First we will need to load the tidyverse library.

library(tidyverse)

Then, load a comma-separated value file from the EDI repository. At the EDI repository, each file in a dataset is assigned a download URL. This can be passed to the tidyverse::read_csv() function to read that file into a dataframe.

# Assign the address for the csv file hosted on EDI to a variable
infile <- "https://pasta.lternet.edu/package/data/eml/knb-lter-jrn/210011003/105/127124b0f04a1c71f34148e3d40a5c72"

# Read this file into a tibble with `tidyverse::read_csv()`
anpp.annual <- read_csv(infile, na=c('NA', '', '.'))
## Rows: 465 Columns: 4
## ── Column specification ─────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): zone, site
## dbl (2): year, npp_g_m2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

R tells us that we have two character data columns (zone and site), and two floating point (dbl) columns that contain the year of measurement - year - and estimated annual NPP values (npp_g_m2). Lets make a simple plot to look at the data. To do this we are using the ggplot() function, which is part of the ggplot2 library that was attached when we loaded the tidyverse package`

ggplot(anpp.annual, aes(x = year, y = npp_g_m2, col = site, group = site)) +
  geom_line() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Now, lets output this into our data/ directory so that we can load it easily next time.

write_csv(anpp.annual, '../data/td01_anpp.annual.csv')