silvR: An R library for forest inventory data

_images/silvR_logo.png

silvR is a set of R functions for analysing forest inventory data.

In this document we introduce the silvR functions, and run through a set of worked examples with synthetic tree inventory data. We recommend you learn how the silvR functions work, and then try to input your own data.

Who do I talk to?

Repo maintained by Samuel Bowers (sam.bowers@ed.ac.uk).

Contents

Setup instructions

Requirements

silvR is a set of R functions written for use in Linux or Windows. None of the functions of silvR should require a powerful PC to run, though some will require an internet connection to function properly.

Functions in silvR make use of a common R libraries for data manipulation (dplyr, tidyr), ecological analysis (vegan), geospatial data manipulation (raster, rgdal, sp), and a specialised library for forest plot data analysis (BIOMASS).

Installing silvR

To install silvR, you’ll first have to install and load devtools as follows:

> install.packages('devtools')

> library('devtools')

Once you have devtools, you can install silvR as follows:

> install_bitbucket('sambowers/silvr')

To load the silvR functions, run the following:

> library('silvr')

Loading data

For the purposes of this documentation, we’ll be making use of a synthetic forest inventory dataset. The dataset contains stems from 25 forest inventory plots of 0.25 ha located in miombo woodlands.

The datasets are packaged with silvr, and are available as follows:

> data(mstems) # Stem data, one row per tree stem

> data(mplots) # Plot data, one row per plot

Take a peek at these two datasets using the head() function before continuing:

> head(mstems)

> head(mplots)

Introduction

What is silvR?

silvR is an R package. Packages are sets of functions, data and documentation that is easy to share with others. R has many thousands of packages that are publically and freely available. If you have a statistical problem, chances are someone has already solved it and published a package that you can use.

silvR has been developed to assist in the analysis of forest inventory plot data, with a focus on problems that are specific to southern African woodlands. silvR has been developed as part of the SEOSAW project (https://seosaw.github.io/), which has brought together plot inventory data from across southern Africa.

The package is still under development, so if you have any suggestions for changes or for new features, do let us know!

Required data structure

We strongly advise that you organise your data in the following standardised way. By organising forest inventory data as we describe here, you’ll be able to make best use of the functions of silvR, but also find that your data will be easier to apply other common R functions.

Functions in silvR work on two scales: individuals and plots. Example functionality that is applicable to the individual scale is:

  • Calculating stem biomass
  • Correction of species names

…and functionality that is applicable to the plot scale:

  • Calculating plot biomass
  • Characterisation of dominant species
  • Extracting data from remote sensing products

We recommend you maintain two R data.frames: one containing individual stem data, and one containing plot data. There should be a plotcode field in both data.frames, so that individual stem data can be linked to properties of the plot it was measured at.

Your individual stem data.frame should be similar to:

plotcode species_name DBH height
P01 Brachystegia spiciformis 10.5 7.5
P01 Julbernardia globiflora 22.1 10.0
P02 Diplorhynchus condylocarpon 15.4 8.0
P02 Brachystegia spiciformis 31.9 12.1
P03 Afzelia quanzensis 48.8 20.2

Your plot data.frame should be similar to:

plotcode latitude longitude area_ha
P01 -18.967 26.332 0.25
P02 -19.498 26.635 0.25
P03 -19.011 27.090 0.25

Note

At minimum your stem data.frame should contain a plotcode, species_name and DBH, and your plot data.frame should contain a plotcode, latitude, and longitude. Other fields describing individuals (e.g. height, stem location) or plots (e.g. vegetation type, meteorological data) should also be included where available.

Using bundled data

silvR comes bundled with two datasets that can be used to test out its functions. These data are for synthetic (i.e. totally fictional!) forest plot data in miombo woodlands. These are loaded with:

> data(mstems) # Stem-scale data

> data(mplots) # Plot-scale data

Cleaning your own data

At present, there is only limited functionality in silvR to check and clean input data, meaning that you will have to make sure that input data are sensible. Where there are transciprion errors in your data, silvR will not necessarily be able to identify them. Checks we can recommend are:

  • Are all DBH measurements realistic, and in the same units (cm or m)?
  • Are there an equal number of unique plot codes in your stem and plot data.frames?
  • Are plot locations all given in the same coordinate reference system (e.g. WGS 84)?
  • Are all species names sensible, and without spelling mistakes? N.B. There is a function in silvR that performs basic species name corrections.

Structural parameters

Loading silvR and sample data

First, make sure that you’ve followed the setup instructions. You’ll also need to load in the sample data included in silvR to follow along with these examples, which can be performed with:

> data(mstems) # Stem-scale data

> data(mplots) # Plot-scale data

If you’ve not yet installed silvR, return to the setup instructions before continuing.

Note

All functions and datasets in silvR come with documentation. Where in doubt, get help using the ? character. e.g. ?calculateBasalArea or ?calculateBiomass.

Calculating structural parameters

Structural parameters are calculated at the level of an individual tree (e.g. basal area, biomass), but are usually expressed at the scale of a plot. Here we’ll run through the calculation of these parameters a the scale of a stem, then show you how to summarise these at the scale of a plot.

Basal area

Basal area is the area of land that is occupied by the cross-section of a tree.

_images/basal_area.png

We can calculate basal area from a tree’s diameter at breast height (DBH), which is usually measured at 1.3 m above ground level. Assuming a circular tree bole, we can calculate basal area using the equation:

BA = π(DBH/2)^2

To calculate basal area using silvR, we can input a vector of tree DBH measurements into the following function:

> calculateBasalArea(mstems$DBH)
   [1] 0.030171856 0.633348221 0.012468981 0.089727028 0.030790750 0.296091966 0.014957123 0.026015529
   [9] 0.014957123 0.133316626 0.258769845 0.658993041 0.010207035 0.051471854 0.029559245 0.134614104
  [17] 0.024884555 0.057255526 0.009503318 0.009503318 0.012076282 0.033329156 0.244544714 0.011689866
  [25] 0.423137973 0.115811672 0.009503318 0.022698007 0.009852035 0.468084739 0.092940877 0.094024727
  [33] 0.026015529 0.185507905 0.048305129 0.053092916 0.079422604 0.084496276 0.021124069 0.012076282
  [41] 0.015393804 0.309748469 0.011689866 0.541060795 0.020106193 0.246300864 0.012468981 2.301958035
  [49] 0.021124069 0.899202357 0.446511422 0.030171856 0.087615878 0.055571632 0.031415927 0.013684778
  ...

You can save the result to the original data.frame by defining a new column as follows:

> mstems$basal_area <- calculateBasalArea(mstems$DBH)

For all of the following examples we’ll add the data to the existing mstems data.frame.

Note

By default, silvR functions such as calculateBasalArea require DBH to be given in units of cm and provides outputs in standard units (e.g. m^2 for basal area). Check documentation for other options.

Stem volume

Stem volume provides an estimate of the ammount of space that the tree bole occupies.

_images/stem_volume.png

Stem volume is calculated using the equation of [SOURCE]. Calculate it using silvR as follows:

> mstems$volume <- calculateStemVolume(mstems$DBH)

Stocking density

We can use silvR to calculate stocking density of stems greater than a minimum DBH. At the scale of a stem, this translates to values of 1 (included) and 0 (excluded).

To calculate whether a tree contributes to the stocking count in silvR:

> mstems$stocking <- calculateStocking(mstems$DBH)

To do the same, but with a minimum DBH of 10 cm:

> mstems$stocking_10 <- calculateStocking(mstems$DBH, min_DBH = 10)

Aboveground biomass

silvR is pre-programmed with a range of allometric equations that are valid for miombo woodlands. By default, silvR will use the Ryan11 model described in this paper:

Ryan, Casey M., Mathew Williams, and John Grace. “Above‐and belowground carbon stocks in a miombo woodland landscape of Mozambique.” Biotropica 43.4 (2011): 423-432.

Run this in silvR using:

> mstems$AGB_Ryan11 <- calculateBiomass(mstems$DBH)

Recall that we can view summary statistics using the R summary function:

> summary(mstems$AGB_Ryan11)
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.003714 0.014319 0.050140 0.218266 0.183587 7.424314

We can also use other allometric models. For example:

> mstems$AGB_Mugasha14 <- calculateBiomass(mstems$DBH, model = 'Mugasha14')
> mstems$AGB_Mutakele09<- calculateBiomass(mstems$DBH, model = 'Mutakele09')

See documentation (?calculateBiomass) for all allometric models.

Note

Most allometric models only require DBH as input to calculate DBH, but be aware that some may require height or even species information. Some of these more advanced models for estimating aboveground biomass are used in silvR (Chave05 and Chave14), but we’ll return to these in the next section.

Aggregating structural parameters to plot scale

Most commonly we will want to express these structural parameters at plot scale. For example, basal area is usally expressed as the sum total of basal area of all trees, in units of m^2/ha.

We can use functions from base R to aggregate by plot:

> ba_plot <- aggregate(basal_area ~ plot_code, mstems, sum)

> ba_plot$basal_area <- ba_plot$basal_area / 0.25 # Because each plot is only 0.25 ha in size, and we want units of m^2/ha

> head(ba_plot)
  plot_code basal_area
1   plot_01   25.08409
2   plot_02   19.17778
3   plot_03   30.12648
4   plot_04   43.59910
5   plot_05   27.74461
6   plot_06   21.80926

Try to repeat this process to calculate stocking density and aboveground biomass at each plot.

Species functions

Formatting species data

Species data tend to come in the format Genus species, but most functions in silvR require separate vectors for Genus and species.

silvR has a function to split character strings in the format Genus species or Genus.species into two vectors, and enforces correct capitalisation:

> splitGenusSpecies(mstems$species_name)
          genus        species
1      Parinari curatellifolia
2    Terminalia        sericea
3       Afzelia     quanzensis
4 Erythrophleum      africanum
5    Terminalia        sericea
6  Brachystegia    spiciformis
...

These can be attached to the original data.frame using the cbind function from base R to be used as inputs to other functions:

> mstems <- cbind(mstems, splitGenusSpecies(mstems$species_name))

> head(mstems$genus) # Show the first few lines of the new field
    [1] "Parinari"      "Terminalia"
[3] "Afzelia"       "Erythrophleum"
[5] "Terminalia"    "Brachystegia"

Correcting species data

On occasion we’ll want to group species by order or family, requiring time consuming work. Getting consistent spellings in forest inventory data is a further challenge, yet necessary in order to produce accurate assessments of species distributions and biodiversity. There already exist R packages that are capable of automating such spelling corrections (e.g. BIOMASS and taxize), in silvR we have aimed to simplify some of this processing based on functions from the BIOMASS package.

The function getTaxonomicNames has multiple purposes. First, it can identify order and family given a genus:

> taxonomy <- getTaxonomicNames(mstems$genus)

> head(taxonomy)
     order           family         genus species
1 Malpighiales Chrysobalanaceae      Parinari    <NA>
2     Myrtales     Combretaceae    Terminalia    <NA>
3      Fabales         Fabaceae       Afzelia    <NA>
4      Fabales         Fabaceae Erythrophleum    <NA>
5     Myrtales     Combretaceae    Terminalia    <NA>
6      Fabales         Fabaceae  Brachystegia    <NA>

If we’re only interested in one taxonomic rank, we can specify it using the ranks input:

> mstems$family <- getTaxonomicNames(mstems$genus, ranks = c('family')) # Add to original data.frame

> head(mstems$family)
[1] "Chrysobalanaceae" "Combretaceae"     "Fabaceae"         "Fabaceae"         "Combretaceae"
[6] "Fabaceae"

getTaxonomicNames can also search through species databases and perform automated correcion of species names using the clean_input option:

> species_corrected <- getTaxonomicNames(mstems$genus, species = mstems$species, clean_input = TRUE)

> head(species_corrected)
         order           family         genus        species
1 Malpighiales Chrysobalanaceae      Parinari curatellifolia
2     Myrtales     Combretaceae    Terminalia        sericea
3      Fabales         Fabaceae       Afzelia     quanzensis
4      Fabales         Fabaceae Erythrophleum      africanum
5     Myrtales     Combretaceae    Terminalia        sericea
6      Fabales         Fabaceae  Brachystegia    spiciformis

To also show which species names have been changed, use the return_changes option which will return a nameModified field:

> species_corrected <- getTaxonomicNames(mstems$genus, species = mstems$species, clean_input = TRUE, return_changes = TRUE)

> head(species_corrected)
         order           family         genus        species nameModified
1 Malpighiales Chrysobalanaceae      Parinari curatellifolia        FALSE
2     Myrtales     Combretaceae    Terminalia        sericea        FALSE
3      Fabales         Fabaceae       Afzelia     quanzensis        FALSE
4      Fabales         Fabaceae Erythrophleum      africanum        FALSE
5     Myrtales     Combretaceae    Terminalia        sericea        FALSE
6      Fabales         Fabaceae  Brachystegia    spiciformis        FALSE

Note

We advise that you use these functions with caution and don’t rely on their outputs 100%; they can occasionally make mistakes leaving you with records of species that don’t actually exist in your inventory plots.

Loading wood density

Wood density is a useful plant trait, and is required by some allometric equations to predict tree biomass. Wood density is species specific, and can be loaded from existing databases. silvR uses the BIOMASS library to produce estimates of wood density.

At it’s simplest, this function requires a genus as input:

> mstems$wood_density <- loadWoodDensity(mstems$genus)

We can get better estimates with both genus and species:

> mstems$wood_density <- loadWoodDensity(mstems$genus, species = mstems$genus)

In some cases a species will not be present in the wood density database. For these cases, we can also input family:

> mstems$wood_density <- loadWoodDensity(mstems$genus, species = mstems$genus, family = mstems$family)

For cases where family is also not enough to retrieve a wood density, we can input the plot code for each stem which will apply the average wood density for each plot to unknown species:

> mstems$wood_density <- loadWoodDensity(mstems$genus, species = mstems$genus, family = mstems$family, plot = mstems$plotcode)

Note

By default silvR assumes a location of tropical Africa. If you’re working on data from outside Africa, take a look at at the documentation using ?loadWoodDensity and the region option.

Esimating aboveground biomass

For some more complex allometric equations, species names and tree height are used to extract wood density estimates. An example of these are the ‘Chave’ allometric equations applicable to tropical trees, described in:

  1. Chave, Jérôme, et al. “Tree allometry and improved estimation of carbon stocks and balance in tropical forests.” Oecologia 145.1 (2005): 87-99.
  2. Chave, Jérôme, et al. “Improved allometric models to estimate the aboveground biomass of tropical trees.” Global change biology 20.10 (2014): 3177-3190.

To use these equations in silvR, we’ll return to the calculateBiomass function, using additional inputs describing species:

> mstems$AGB_Chave05 <- calculateBiomass(mstems$DBH, height = mstems$height, model = 'Chave05', family = mstems$family, genus = mstems$genus, species = mstems$species)

> mstems$AGB_Chave14 <- calculateBiomass(mstems$DBH, height = mstems$height, model = 'Chave14', family = mstems$family, genus = mstems$genus, species = mstems$species)

See ?calculateBiomass for more information.

Identifying dominant species

A simple way of summarising species data is to identify the dominant species at each plot.

By default, the function identifies 5 species by stocking density at each plot:

> dominant_species <- getDominantSpecies(mstems$species_name, mstems$plot_code)

    > head(dominant_species)
              plot_code            dominant_sp_1            dominant_sp_2            dominant_sp_3
    1   plot_01   Isoberlinia angolensis Brachystegia spiciformis  Julbernardia globiflora
    2   plot_02 Brachystegia spiciformis   Isoberlinia angolensis  Erythrophleum africanum
    3   plot_03           Faurea saligna   Isoberlinia angolensis Brachystegia spiciformis
    4   plot_04   Isoberlinia angolensis Brachystegia spiciformis  Julbernardia globiflora
    5   plot_05   Isoberlinia angolensis Brachystegia spiciformis          Burkea africana
    6   plot_06  Julbernardia globiflora Brachystegia spiciformis  Erythrophleum africanum
                            dominant_sp_4               dominant_sp_5 dominant_sp_mt_1 dominant_sp_mt_2 dominant_sp_mt_3
    1 Parinari curatellifolia Diplorhynchus condylocarpon               21               13               13
    2          Faurea saligna        Anisophyllea boehmii               13                9                8
    3 Julbernardia globiflora Diplorhynchus condylocarpon               18               18               17
    4    Anisophyllea boehmii             Burkea africana               24               22               20
    5      Marquesia macroura        Brachystegia boehmii               16               15               11
    6         Burkea africana        Dombeya rotundifolia               14               10               10
      dominant_sp_mt_4 dominant_sp_mt_5 dominant_sp_pc_1 dominant_sp_pc_2 dominant_sp_pc_3 dominant_sp_pc_4
    1               13               11         13.20755         8.176101         8.176101         8.176101
    2                8                7         10.83333              7.5         6.666667         6.666667
    3               17               13         7.725322         7.725322         7.296137         7.296137
    4               17               15         9.125475         8.365019         7.604563         6.463878
    5               11                9          9.69697         9.090909         6.666667         6.666667
    6                9                9         9.090909         6.493506         6.493506         5.844156
      dominant_sp_pc_5
    1         6.918239
    2         5.833333
    3         5.579399
    4         5.703422
    5         5.454545
    6         5.844156

Note

This function uses complete species names (i.e. Genus species) in place of individual genus and species vectors.

The data.frame output by this function is complex. Columns dominant_sp_# show the dominant species for each plot (from 1 to n), dominant_sp_mt_# show the value of the comparison metric (in this case stocking density), and dominant_sp_pc_# the percentage of the comparison metric by each species.

We can combine this function with structural information, such as basal area or biomass. For example:

> dominant_species <- getDominantSpecies(mstems$species_name, mstems$plot_code, metric = calculateBasalArea(mstems$DBH))

or aboveground biomass

> dominant_species <- getDominantSpecies(mstems$species_name, mstems$plot_code, metric = calculateBiomass(mstems$DBH, model = 'Ryan11'))

Calculating biodiversity

Biodiversity describes the varaibility of species in a forest inventory plot. Measures of biodiversity vary from very simple (species richness) to more complex measures based on the distribution of individuals amongst species.. These measures of diveristy are collectively called ‘diversity indices’.

silvR is pre-programmed with a range of diversity indices (see documentation at ?calculateBiodiversity).

To calculate species richness:

> calculateBiodiversity(mstems$species_name, mstems$plot_code, index = 'richness')
plot_01 plot_02 plot_03 plot_04 plot_05 plot_06 plot_07 plot_08 plot_09 plot_10 plot_11 plot_12 plot_13
    25      26      27      27      25      25      25      25      22      20      25      26      15
plot_14 plot_15 plot_16 plot_17 plot_18 plot_19 plot_20 plot_21 plot_22 plot_23 plot_24 plot_25
    22      26      25      25      25      26      26      23      24      20      23      24

For a more complex measure of diveristy, use the ‘simpson’ or ‘shannon’ indices:

> calculateBiodiversity(mstems$species_name, mstems$plot_code, index = 'shannon')
 plot_01  plot_02  plot_03  plot_04  plot_05  plot_06  plot_07  plot_08  plot_09  plot_10  plot_11
2.930107 3.025561 3.117128 3.089926 3.009947 3.088370 3.060124 2.978967 2.887835 2.767140 3.012447
plot_12  plot_13  plot_14  plot_15  plot_16  plot_17  plot_18  plot_19  plot_20 plot_21  plot_22
2.984938 2.440725 2.917779 2.954540 3.023698 3.015096 2.959135 3.062585 3.005983 2.944163 2.998152
plot_23  plot_24  plot_25
2.815136 2.944381 2.940585

If you want to calculate diodiversity based on another metric, such as basal area, use the abundance option:

calculateBiodiversity(mstems$species_name, mstems$plot_code, index = 'shannon', abundance = calculateBasalArea(mstems$DBH))
 plot_01  plot_02  plot_03  plot_04  plot_05  plot_06  plot_07  plot_08  plot_09  plot_10  plot_11
2.645848 2.846774 3.063116 2.854390 2.710108 2.982382 2.983830 2.747758 2.654970 2.366331 2.973090
plot_12  plot_13  plot_14  plot_15  plot_16  plot_17  plot_18  plot_19  plot_20  plot_21  plot_22
2.865765 1.988800 2.796020 2.862064 2.613919 2.917068 2.728365 2.991442 2.868902 2.707835 2.750818
plot_23  plot_24  plot_25
2.435320 2.654030 2.455297

Preparing data for vegan (advanced)

If you’re interested in assessment of biodiversity and species distributions, we can recommend looking at the vegan package in R, which provides a wide variety of very useful functions. Many functions in vegan require data to be arranged in a different format to silvR.

To convert to a vegan applicable format, run:

mstems_vegan <- prepareForVegan(mstems$species_name, mstems$plot_code)

See ?prepareForVegan for further options.

Working with spatial data

Locations of forest inventory plots are usually marked by points measured with a GPS unit.

There exists a wealth of freely available data from remote sensing and modelling which can be used to better characterise your plots. With silvR and R we can take advantage of this data, and integrate forest inventory measurements with global data products.

Loading sample data

For this part of the lirbary, we’ll use a dataset that describes the plots and their locations. If not already present, load them into memory using:

> data(mplots)

Managing point data

Spatial data can be recorded in a wide range of coordinate reference systems (CRS). You’ll probably be familiar with data provided as latitude and longitude (often WGS84) or Universal Transverse Mercator (UTM), (most of Zimbabwe sits in UTM zone 35S).

It can be complex to describe a coordinate reference system in R, but there is a shortcut known as EPSG codes. The most commonly used CRSs have each been given a unique code, which silvR has been programmed to understand. For example, WGS84 has the EPSG code 4326, and UTM35S has the code 32735. You can find out what EPSG code your CRS has from the website http://www.epsg.org/.

silvR has a function to reproject point data to a different CRS given the EPSG codes to convert from and to. For example, to convert the set of points provided in the mplots dataset from UTM37S to WGS84 (lat/lon), you can run the follows:

> points_rep <- reprojectPoints(mplots$UTM_E, mplots$UTM_N, CRS_from = 32735, CRS_to = 4326)

> head(points_rep)
  x_coords  y_coords
1 26.96684 -18.54202
2 26.94692 -18.46403
3 27.12489 -18.55811
4 26.88303 -18.41498
5 26.99591 -18.74058
6 27.18346 -18.62647

Note

For advanced users The rgdal package has a specialised CRS object that can also be input into reprojectPoints. silvR has a function to load a CRS object from its epsg code. For example: CRS_WGS84 <- getCRSFromEPSG(4326). You may want to use this function if combining silvR with more advanced spatial processing with rgdal.

Extracting data from Rasters

Raster data consists of an array of cells (a.k.a pixels) arranged into rows and columns. These data can be georeferenced, such that each pixel represents a known location on the Earth. Common examples of raster data are remote sensing images or modelled meteorological data. A commonly used example of raster data is the GeoTiff format.

Combining forest inventory measurements with raster data opens a lot of opportunities. For example, we might want to determine meteorological parameters for each plot (temperature, precipitation, annual variation), land cover types (grassland, savannah, forest), or generate calibration data for creating our own remote sensing products such as biomass maps.

For the purposes of silvR, we’ve prepared a test dataset which contains a count of the number of burn scars detected by MODIS in southern Africa over the period 2001 - 2018 at 1 km resolution. Download the dataset here (2 MB), and save it somewhere memorable (e.g. Desktop).

_images/MODIS_fire.PNG

To extract pixel values for locations of forest plots, we can use the extractFromRaster function:

> mplots$fire_count <- extractFromRaster('path/to/firecount_2001_2018.tif', mplots$UTM_E, mplots$UTM_N, EPSG = 32735)

> mplots$fire_count
 [1] 0 0 0 0 0 1 0 1 1 0 2 1 4 0 5 4 1 7 0 0 0 1
[23] 0 0 1

extractFromRaster will understand a range of geospatial raster data types, including GeoTiffs, HDF files, and VRT files. Feel free to experiment with other data sources.

Extracting data from Vectors

The other commonly used geospatial data type is vector data, which is constructed of points, lines, and polygons. Commonly used vector data formats include shapefiles. Vector datasets can contain, for example, data on locations of protected areas, soil types, and political boundaries such as regions and districts.

Warning

There is not at present an extractFromVector function in silvR… but watch this space!

What next?

Applying functions to your own data

Up to this point we’ve been using synthetic forest plot data. Real life forest plot data will be harder to use: there will be formatting issues, spelling mistakes, and transcription errors to deal with. The key to getting good results using the silvR functions (or any other R library) will be to making sure that your data are in the correct format and have undergone appropriate cleaning.

When you feel ready, arrange your data into a format similar to the sample data in silvR and load it into R (using read.csv()). Try and run through the worked examples, but this time with your own data.

New functionality

New functions will be added to silvR over time. If there’s something you’d like to see added, please do let us know! Contact sam.bowers@ed.ac.uk.

Other libraries of interest

silvR is just a small part of a large ecosystem of R libraries. If you have a statistical problem, it’s very likely that there exists an R library to help with it. Libraries that have been particularly helpful to us inclde:

  1. vegan
  2. BIOMASS
  3. taxize
  4. ggplot2

How do I learn more?

Learning to program is mostly a matter of practice. There are lots of free resources available online to assist in learning R. Some we can recommend are:

  1. https://www.datacamp.com/courses/free-introduction-to-r
  2. https://www.youtube.com/playlist?list=PL6gx4Cwl9DGCzVMGCPi1kwvABu7eWv08P
  3. https://ourcodingclub.github.io/
  4. https://r4ds.had.co.nz/

Happy coding!

Working with real data

Thus far, we’ve been using synthetic forest inventory data. This is great for learning, but doesn’t simulate the difficulties that are associated with real-world, messy data.

Here we’ll look at inventory data from a 50 year fire experiment in Marondera, Zimbabwe. The experiment was designed to look at the effect of frequent fires on tree populations. The photos below show what the plots looked like after being burned every year (top; fire return interval = 1 year) and after complete protection from fire (below; fire return interval = 50 years). The experiment had a number of plots subjected to a range of different fire treatments.

For more on the experiment see:

Furley, Rees, Ryan and Saiz (2009) Savanna burning and the assessment of long-term fire experiments with particular reference to Zimbabwe. Progress in Physical Geography 32(6): 611-634

_images/marondera1.jpg _images/marondera2.jpg

We’ll use this dataset to answer the following questions:

  1. What is the relationship between fire return interval and aboveground biomass?
  2. What effect does fire have on species composition?

Dataset description

The data are provided in .csv format, containing five columns:

Parameter Description
plot_code A unique ID for each experimental plot.
species_name Species ID in the fomat ‘Genus species’.
DBH Diameter at breast height, in units of centimetres.
height Stem height, in units of metres.
FRI Fire return interval, in years. For example, a plot with an FRI treatment of 3 was burned every three years.

Download the data here.

Each plot has an area of is 60 x 36 m, which is 0.216 hectares.

Data preparation

First, load in marondera.csv using the read.csv() function.

Use the head() and summary() functions to get an understanding of the data:

  1. How many plots are there?
  2. How many experimental fire treatments?
  3. What species are present?

This is real data, so it does include errors, which will impact the quality of results if not taken care of. Try to identify if there are any incorrect records:

  1. Are all the DBH records realistic?
  2. Are the species names all correct and well-ordered?

Note

Hint 1: Any measurement of a tree with a DBH of greater than 100 cm is probably an error. Hint 2: Some of the species names are mispelled, use the unique function to find out what they are.

What can you do to improve incorrect records?

Note

Hint 1: Consider using NA values. Hint 2: Recall that silvR can perform corrections to species names.

Stem attributes

Try and determine the following:

  1. How many species are there are there across all plots?
  2. Draw a histogram (in the form hist(DBH)) to show the stem size distribution.
  3. Draw a scatter plot to determine if there’s a relationship between DBH and stem height.
  4. Use silvR to load estimates of wood density for each stem.

Plot attributes

Determine the structural attributes of each of the plots. Use silvR to calculate the following parameters for each tree:

  1. Stocking density
  2. Basal area
  3. Aboveground biomass

Note

Remember that these are best presented on a per hectare basis. Recall that each plot at Marondera is 0.216 ha.

Then use the aggregate function to produce an estimate of each of these parameters at plot scale.

Note

Are all of these plot-scale numbers reasonable? If not, you may need to perform additional data cleaning.

Use silvR to identify the dominant three species, and to calculate biodiverstiy indices for each plot.

The impact of fire return interval on vegetation structure

Summarise the plot structure parameters you just calculated by fire treatment.

  1. How does stocking density, basal area, and biomass vary with FRI?
  2. Can a miombo woodland with annual fires support any aboveground biomass?

These data are probably better expressed graphically. Try to build a boxplot of fire return interval vs the structural parameters. The syntax to build a boxplot looks like:

boxplot(biomass~FRI, data=plot_data)

Advanced: The effect of fire on species composition

  1. How does species composition vary by FRI treatment? (Hint: you can express this as the top-5 species under each treatment)
  2. Is there any relationship between biodiversity and FRI?

Warning

These questions are quite tricky. Consider using the table function to produce a summary.

_images/fire_kilwa.jpg