Why do kde estimates from ks and adehabitatHR packages in R produce different results? (with same bandwidth and grid)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Why do kde estimates from ks and adehabitatHR packages in R produce different results? (with same bandwidth and grid)

Brad Nissen
Hi All, 

I have a series of VHF radio-telemetry data from animals that I am hoping to be able analyze in R. I have done my research on the different packages available for this, and I believe that utilizing the package ks to select a plug-in bandwidth method for all individuals will be most appropriate, based on previous research with my study species. This is reinforced in section 4.3 of this spatial ecology book and in this paper by Millspaugh et al (2006)

I have written two R functions, initially to check that I got similar results, and then hopefully select one to analyze the home ranges of my study animals. My results from the two functions were not as similar as I hoped, and that is what brings me to this forum - to figure out why. The first function relies entirely on the ks package, by using the kde() function and a pre-defined grid expanded by a factor of the bandwidth to get the volume of the Utilization distribution (UD). The UD was then sorted by high to low density. I calculated the size of a grid cell, then calculated the cumulative sum of the grid cell volumes. I then took the number of cells in this list needed to give a cumulative volume of 0.95 or 0.50 (depending on if I am interested in the 95% or 50% kde area) and then multiplied that number of cells by the cell size to get the area. Perhaps there is a flaw in my logic here that I am not seeing, if so please alert me. 

The annotated R code for that function is here: 

kde_ks.hpi <- function(filename, percentage, gridsize, multiplier){
    data <- read.csv(file = filename)
    x <- as.data.frame(data$X)
    y <- as.data.frame(data$Y)
    loc <- cbind(x,y)
    data.h <-Hpi(loc)
    x.grid.size<- (gridsize) #Set grid size as no. of nodes in the x direction
    band.mult<- (multiplier)
    x=seq(min(loc[,1])-band.mult*sqrt(data.h[1,1]), max(loc[,1]) + band.mult *sqrt(data.h[1,1]), length.out=x.grid.size)
    y=seq(min(loc[,2])-band.mult*sqrt(data.h[2,2]), max(loc[,2])+  band.mult*sqrt(data.h[2,2]), by=(x[2]-x[1])) #sets out nodes in y axis, spaced the same as x axis
    eval.pts <-expand.grid(x,y)
    UD <-kde(loc,H=data.h,eval.points=eval.pts)
    output<- data.frame(cbind(UD$eval.points[,1],UD$eval.points[,2],UD$estimate))
    colnames(output)<- c("xcoord","ycoord","UD_ht")
    cell.area <- (x[2]-x[1])*(y[2]-y[1])
    grid.vol <- sum(cell.area*output[,3]) #check that grid UD vol is >=0.99
    vol<-output[,3]/sum(output[,3]) #standardize volume to add to 1
    output<-output[order(-output$vol),] #sort by descending volume
    cumV<-cumsum(output$vol) #calculate cumulative UD volume
    output<-data.frame(cbind(output,cumV)) #add cumulative vol to output
    sub.UD<-subset(output,output$cumV<=(percentage/100)) #subset output by percent vol
    cellcount <- nrow(sub.UD)
    area <- cellcount*cell.area
    KDE <- data.frame(c(area,grid.vol))

So as an example - 
kde_ks.hpi("C01 .csv", percentage=50,gridsize =150,multiplier =3) #I will attach a stripped down version of this data file if anyone is interested in reproducing my results. 

In this case - I found a 99% home range area of 3958 meter^2 for this individual. 

My second function is largely based on code that I found from the Spatial Ecology textbook that I referenced above. This code transforms the bandwidth and kde from the ks package with rasters to utilize functions in the adehabitatHR function to easily get UD volumes and areas. The difference between my code and the one in the book is that I define the grid using the same methods as the first function. Yet, I don't get the same results, despite using the same grid and bandwidth as I did in the previous function. Below is the annotated code for that function: 

CRS.SC <- CRS("+init=epsg:32616") #define study area projection first

kde_ade.ks.hpi <- function(filename, percentage, gridsize, multiplier){
  data <- read.csv(file = filename) #read in data
  x <- as.data.frame(data$X)
  y <- as.data.frame(data$Y)
  xy <- c(x,y)
  loc <- cbind(x,y)
  data.h <-Hpi(loc) #define bandwidth using plug-in method
  data.proj <- SpatialPointsDataFrame(xy,data, proj4string = CRS.SC)
  boundingVals <- data.proj@bbox #get the bounding values of the animal locations
  band.mult <- (multiplier) #define the value used to expand the evaluation grid
  y.expand <- band.mult*sqrt(data.h[2,2]) #set the value used to expand the grid by a function of the multiplier and the bandwidth
  x.expand <- band.mult*sqrt(data.h[1,1]) #note that this expansion is different in the x and y direction 
  deltaX <- as.integer(((boundingVals[1,2]) - (boundingVals[1,1])) + (2*x.expand)) #get the total length of the grid axis
  deltaY <- as.integer(((boundingVals[2,2]) - (boundingVals[2,1])) + (2*y.expand))
  x.grid.size <- (gridsize) #set the number of grid nodes in the X direction
  gridRes <- deltaX/x.grid.size #determine the grid resolution, (i.e the size of one side of a cell)
  y.grid.size <- deltaY/gridRes #determine the number of nodes in the Y direction using the same cell size as the x axis
  boundingVals[2,1] <- boundingVals[2,1] - y.expand #min Y - expand the in each direction grid by the previously determined value
  boundingVals[2,2] <- boundingVals[2,2] + y.expand #max Y
  boundingVals[1,1] <- boundingVals[1,1] - x.expand #min X
  boundingVals[1,2] <- boundingVals[1,2] + x.expand #max X
  gridTopo <- GridTopology((boundingVals[,1]), c(gridRes,gridRes),c(x.grid.size,y.grid.size)) #Grid Topology object is basis for sampling grid (offset, cellsize, dim)
  sampGrid <- SpatialGrid(gridTopo, proj4string = CRS.SC) #Using the Grid Topology and projection create a SpatialGrid class
  sampSP <- as(sampGrid, "SpatialPixels") #Cast over to Spatial Pixels
  sampRaster <- raster(sampGrid) #convert the SpatialGrid class to a raster
  sampRaster[] <- 1 #set all the raster values to 1 such as to make a data mask
  evalPoints <- xyFromCell(sampRaster, 1:ncell(sampRaster)) #Get the center points of the mask raster with values set to 1
  hpikde <- kde(x=loc, H=data.h, eval.points=evalPoints) #Create the KDE using the evaluation points
  hpikde.raster <- raster(sampRaster) #Create a template raster based upon the mask and then assign the values from the kde to the template
  hpikde.raster <- setValues(hpikde.raster,hpikde$estimate)
  hpikde.px <- as(hpikde.raster,"SpatialPixelsDataFrame") #Cast over to SPxDF
  hpikde.ud <- new("estUD", hpikde.px) #create new estUD using the SPxDF
  hpikde.ud@vol = FALSE #Assign values to a couple slots of the estUD
  hpikde.ud@h$meth = "Plug-in Bandwidth"
  hpikde.ud.vol <- getvolumeUD(hpikde.ud, standardize=TRUE) #Convert the UD values to volume using getvolumeUD from adehabitatHR and cast over to a raster
  hpikde.ud.vol.raster <- raster(hpikde.ud.vol)
  hpikde.vol <- getverticeshr(hpikde.ud, percent = percentage,ida = NULL, unin = "m", unout = "m2", standardize=TRUE) #Here we generate volume contours using the UD
  hpikde.vol$area #Determine UD area at that contour

So for example: 
> kde_ade.ks.hpi(filename = "C01 .csv", percentage=50,gridsize=150,multiplier=3)
[1] 3965.43 

So in this case - I found a 50% kde home range area of 3965 meter^2 for this individual. Not really that big a difference from the other value of 3958 m^2- but I would love to know where it comes from, as it the difference between the two functions can vary based on how I define the grid, multiplier, or percentage. At 99% the same data set produces a value of 67182 m^2 and 67453 m^2 for the first and second methods, respectively. In this case the second method produces a larger value both times, but when I try this with a different animal, the first method gives me larger areas. What is going on here, why is there a difference? Is one method better than another? What do you all recommend for moving forward? 

I am hoping to use these functions and alter them to produce shapefiles that I can then map in GIS, once my methodology is settled. 

Thank you very much for your time, 

Brad Nissen

--May the forest be with you!

R-sig-Geo mailing list
[hidden email]

C01 .csv (2K) Download Attachment