This page contains descriptions and the reproducible workflows used
to create and load the teaching datasets used in our lessons. In general
these start as Jornada data files downloaded from a data repository like
EDI. We often use tidyverse to
do some data manipulation, and then all finished datasets are output to
the repository’s data/
directory.
The complete code needed to generate and load the teaching datasets
are in the sections below. To use each teaching dataset, you can either
run this code before you start a lesson, or you can download and load
the pre-made datasets from the data/
directory. Suggestions for how to proceed are found at the start of
each lesson.
This dataset contains annual net primary production (NPP) data, measured in grams biomass per square meter, from the Jornada NPP study sites. There are 15 NPP study sites, in 5 different vegetation zones, across the Jornada Basin. At each site there are 49 1x1 meter quadrats where repeated measures of plant volume by species take place several each year. These volume data have been converted to biomass using an allometric method, and then to net primary production using the increment in biomass from one measurement to the next.
The annual data are in this EDI dataset:
First we will need to load the tidyverse
library.
library(tidyverse)
Then, load a comma-separated value file from the EDI repository. At
the EDI repository, each file in a dataset is assigned a download URL.
This can be passed to the tidyverse::read_csv()
function to
read that file into a dataframe.
# Assign the address for the csv file hosted on EDI to a variable
infile <- "https://pasta.lternet.edu/package/data/eml/knb-lter-jrn/210011003/105/127124b0f04a1c71f34148e3d40a5c72"
# Read this file into a tibble with `tidyverse::read_csv()`
anpp.annual <- read_csv(infile, na=c('NA', '', '.'))
## Rows: 465 Columns: 4
## ── Column specification ─────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): zone, site
## dbl (2): year, npp_g_m2
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
R tells us that we have two character data columns (zone
and site
), and two floating point (dbl
)
columns that contain the year of measurement - year
- and
estimated annual NPP values (npp_g_m2
). Lets make a simple
plot to look at the data. To do this we are using the
ggplot()
function, which is part of the
ggplot2
library that was attached when we loaded the
tidyverse
package`
ggplot(anpp.annual, aes(x = year, y = npp_g_m2, col = site, group = site)) +
geom_line() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Now, lets output this into our data/
directory so that
we can load it easily next time.
write_csv(anpp.annual, '../data/td01_anpp.annual.csv')