Scott Rinnan

Quantitative Ecology and Resource Management, University of Washington


Leave a comment

Troubleshooting the gmap function for making maps in R

There are several wonderful tools and packages available for incorporating map features from into plots in R. Some of the more well-known examples are the ggmap function from the aptly-named ggmap package (which follows the conventions of Hadley Wickham’s Grammar of Graphics)1, and the GetMap function from the RgoogleMaps package. These both can query and pull data from Google Maps:

library(ggmap)
rainier<-get_map(location=c(-121.760374,46.852886),maptype="satellite",zoom=11)
ggmap(rainier)

unnamed-chunk-1-1

You can specify the location by coordinates, or you can type in a location just like you would in a Google Maps searchbar. You can even liven things up a little with some more playful styles:

seattle<-get_map(location="Seattle, WA",maptype="watercolor")
ggmap(seattle)

unnamed-chunk-2-1

After several months of working with the dismo package for other research purposes, I found a new (to me) function in the package that I really like. The gmap function in the dismo package is unique in that it returns the map query to you in raster format, which can be extremely useful for plotting purposes when you’re working with spatial data.

library(dismo)
wash<-gmap("Washington state",scale=2)
plot(wash)

unnamed-chunk-3-1unnamed-chunk-3-2

Since this map is in raster format, all the familiar raster and other spatial analysis functions can be used, such as crop, mask, etc. You can also make a map by giving the gmap function an extent object. Seriously, this is sooooo useful!

library(maptools)
sagegrouse<-readShapePoly("~/Google Drive/JDSM/Cent_urop_pl.shp")
grouse_habitat<-gmap(extent(sagegrouse),lonlat=T,type="satellite")
plot(grouse_habitat)
plot(sagegrouse,add=T,col="#00FF0040")

unnamed-chunk-4-1unnamed-chunk-4-2

Here I’ve read in a polygon shapefile that represents the historical distribution of greater sage-grouse, and used the extent of the shapefile to get a background map. And since the map is just a raster object, we can use the regular plot function for the map, and it will automatically use the plot method for plotting rasters. Pretty slick, huh?

But wait, what’s up with all that white space that precedes all those plots?

Well, turns out there’s something wonky about the format of the raster that gets returned from the Google database. I don’t know what it is, precisely, but as near as I can tell, when you do the command plot(map) where map is a raster object that’s been created with the gmap function, this gets plotted with the rasterImage method from the graphics package, rather than the the plot method for rasters in the raster package. Why does this matter? Because the rasterImage function has all sorts of stability issues, and is not even supported on some operating systems. I don’t know why the maps use that method rather the plot.raster method, but that’s what they’re doing, and that big white space is an artifact of that. This is a particularly frustrating issue if you are trying to write a figure to a file using a function like pdf; if you do this, it will create a file with two pages, the first one blank, the second one your plot.

Even worse, the rasterImage plotting method isn’t compatible with really useful tools, like a legend. What happens if we try to add a legend to one of these plots?

sagegrouse_breadth<-raster("~/Google Drive/Breadth_research/Historical_breadth_maps/Birds/Hist_cent_urop.tif")
plot(grouse_habitat)
cols<-colorRampPalette(c("black","darkgreen","olivedrab3","yellow","red3"))
plot(sagegrouse_breadth,add=T,legend=F,col=cols(200))
plot(sagegrouse_breadth,legend.only=T,col=cols(200))

unnamed-chunk-5-1

Yeah, that’s not helpful.

In certain cases, you can force which type of method you want to use by making a call to that method directly. I illustrate that here by creating some fake time series data, and plotting that with both the generic plot method and the plot.ts method for time series.

x<-1990:2010
y<-sample(20,21,replace=T)
plot(x,y)
plot.ts(x,y)

unnamed-chunk-6-1unnamed-chunk-6-2Unfortunately, we can’t do the same with the method for plotting rasters from the raster package. Or maybe we can, I don’t know; this is running up against the limits of my R knowledge.

Instead, I wrote a function that takes the gmap object and forces it to get plotted with the raster method, rather than the rasterImage method. It’s not an ideal fix, but it gets the job done.

plot_gmap<-function(ras,...){
cols<-ras@legend@colortable
x<-unique(ras)
par(mar=c(0,0,0,0))
plot(ras,col=cols[x+1],legend=F,box=F,axes=F,...)
}

Using this function to plot the gmap object, the results should now be in a format that more closely resembles what we want:

plot_gmap(grouse_habitat)
cols<-colorRampPalette(c("black","darkgreen","olivedrab3","yellow","red3"))
plot(sagegrouse_breadth,add=T,legend=F,col=cols(200))
plot(sagegrouse_breadth,legend.only=T,col=cols(200))

unnamed-chunk-8-1

There we go, that’s more like it! The plot_gmap function can accept any argument that you would use when plotting a raster object with the raster package, such as alpha (controls transparency), ext (controls extent), as well as any other graphical parameters such as main (makes a title).

There are still some plotting issues, to be sure. If you try and crop a gmap object, you’ll notice that the color scheme gets messed up:

seattle<-gmap("Seattle, WA",type="satellite",scale=2)
plot_gmap(seattle)
seattle_crop<-crop(seattle,extent(seattle)/2)
plot_gmap(seattle_crop)

unnamed-chunk-9-1unnamed-chunk-9-2

As near as I can tell, this is not a fault of the function I wrote, and more a consequence of cropping rasters that use a colortable to specify colors. Colortables are a relatively new feature in the raster package, and several of the package’s functions have not yet been rewritten to preserve the structure of colortables when objects gets manipulated.

Long story short: gmap is a great function, but there are frustrating plotting issues that come along with it. Hopefully the plot_gmap function I wrote above helps you get around those issues. Good luck, and happy plotting!

References:

  1. Wickham, H. (2010). A layered grammar of graphics. Journal of Computational and Graphical Statistics, 19(1), 3-28.


4 Comments

How to construct a bias file with R for use in MaxEnt modeling

Maximum entropy classification is a popular tool for different types of ecological modeling. Using the MaxEnt software, species distribution modeling has never been easier! Input your climate data, some species occurrence records, and boom! You’ve got yourself a nice probability map of where your little critters can be found. This software is user-friendly by design, but distances the user from the complicated multinomial logistic regression that takes place under the hood. It is not necessary to understand exactly how the model selection works in order to benefit from it, but it is important that you have all the right pieces, in order to ensure a good model fit. (If you do want to understand the software better, the references at the end of the post are an excellent place to begin.)

One of the common issues modelers face is sampling bias. Let’s say you have records of 50 observations of a species of marmot, all occurring on the Olympic Peninsula. What can we infer from this data? Can the species be found only on the Peninsula, or have other areas simply not been surveyed to the same extent? Is the classic conundrum with presence-only data; presence/absence data, by contrast, gives us much more information, since it explicitly contains information where the species does not occur. Unfortunately, people are (understandably) much more likely to record that they saw a creature than they are to record that they didn’t. Most of the data in the public domain are presence-only observations. If we want to infer meaningful information from these data, we’re going to need a way to account for the sampling bias inherent within it.

Lots of the literature out there stresses the importance of accounting for sampling bias [1,4,5], and some papers tell how you might go about doing it. Unfortunately, there doesn’t seem to be much information about how to actually do it. When I first confronted this issue, I was incredibly confused, and the usual online coding resources seemed to be equally full of confusion. Having finally figured it out, I thought I would share the process. I make no claims as to the most efficient or the sleekest code, but it gets the job done.

We will account for the bias in the data by constructing a bias layer. Basically, this layer leverages occurrence data from species in a similar taxon as the one of interest. The Christmas Bird Count is an excellent example: every Christmas, thousands of citizen scientists go out and record every bird they can find, which results in millions of observations. Let’s say that Ms. Birdbrain spent her Christmas hiking around in the mountains, and saw a northern spotted owl and two barred owls, each in a different location. Essentially, we can treat the presence of the barred owl as a “pseudo-absence” of the spotted owl, because if there had been a spotted owl there, presumably Ms. Birdbrain would have recorded it as well. Thus, if we look at the collective records of all birds, we’d have a good idea of the geographical sampling bias, which we can then use to account for the bias in a single species.

This technique assumes that you have the MaxEnt software already installed on your computer. We will be creating a bias file in R, and then interfacing with MaxEnt by use of the dismo package.

library(dismo) # interface with MaxEnt
library(raster) # spatial data manipulation
library(MASS) # for 2D kernel density function
library(magrittr) # for piping functionality, i.e., %>%
library(maptools) # reading shapefiles

The GBIF database provides huge amounts of occurrence data. I downloaded a list of all native bird species in the US from the IUCN database, and then extracted occurrence records for these species using the gbif function from the dismo package. (Note: this takes a long time!)

for(i in 1:nrow(birds)){
     temp<-na.omit(temp[ ,c("lon", "lat")])
     write.csv(temp, paste0("H:/Species occurrences/birds/", birds$Genus[i], " ",
     birds$Species[i]," occurrence.csv"), row.names = F)
}

We now have hundreds of csv files, each containing occurrence records for a different species of bird. Let’s merge these into one giant data frame:

occurdat <- list.files("H:/Species occurrences/Birds", pattern = ".csv$", full = T)
locations <- read.csv(occurdat[1], colClasses = "numeric")

for(i in 2:length(occurdat)){
     temp <- read.csv(occurdat[i], colClasses = "numeric")
     locations <- rbind(locations, temp)
}

We’re going to need these occurrences on a spatial grid, so let’s turn them into a raster. Ultimately, if we’re going to be constructing some species distribution models, we’ll need some climate data. Lots is available in the WorldClim database. We want the occurrence raster to have the same resolution as the climate data, so we’ll use the climate data as an argument in the rasterize function.

climdat <- brick("H:/Bioclim_brick.tif")
occur.ras <- rasterize(locations, climdat, 1)
plot(occur.ras)

Bird occurrences

Now that we have our occurrences on a grid, we can finally make our bias layer. Let’s mask our occurrence data to the United States, so we capture the occurrences just within this country.

states <- readShapePoly("H:/Shapefiles/US_States/states.shp", proj4string = occur.ras@crs)
occur.states <- mask(occur.ras, states) %>% crop(states)

We will use the kde2d function from the MASS package, which gives us a two-dimensional kernel density estimate, based on the coordinates of the occurrence points. (Note: the output of this function is sensitive to the bandwidth selection; if in doubt, use the default.)

presences <- which(values(occur.states) == 1)
pres.locs <- coordinates(occur.states)[presences, ]

dens <- kde2d(pres.locs[,1], pres.locs[,2], n = c(nrow(occur.states), ncol(occur.states)))
dens.ras <- raster(dens)
plot(dens.ras)

Bird bias

And that, dear readers, is our bias layer! Let’s save it for future use in MaxEnt:

writeRaster(dens.ras, "H:/Species Occurrences/Bird bias file.tif")

Now that that’s taken care of, we are finally free to start modeling individual species, which is comparatively simple. Let’s do it for the first species of bird we downloaded. The bias file is included in the MaxEnt model as one of the options you can specify with the args argument. (A complete list of model options can be found in the MaxEnt help file, or here.)

occurrences <- read.csv(occurdat[1])
mod1 <- maxent(climdat, occurrences, args = "biasfile=dens.ras")

Tah-dah! You just unbiased your occurrence data. Ms. Birdbrain would be proud.

Here are some examples of visualization of model output. These are for Ochotona princeps, the American pika.

This slideshow requires JavaScript.

Good luck, and happy modeling!


References:

  1. Beck, Jan, et al. “Spatial bias in the GBIF database and its effect on modeling species’ geographic distributions.” Ecological Informatics 19 (2014): 10-15.
  2. Elith, Jane, et al. “A statistical explanation of MaxEnt for ecologists.” Diversity and Distributions 17.1 (2011): 43-57.
  3. Phillips, Steven J., Robert P. Anderson, and Robert E. Schapire. “Maximum entropy modeling of species geographic distributions.” Ecological modelling 190.3 (2006): 231-259.
  4. Phillips, Steven J., et al. “Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data.” Ecological Applications 19.1 (2009): 181-197.
  5. Syfert, Mindy M., Matthew J. Smith, and David A. Coomes. “The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models.” PloS one 8.2 (2013): e55158.