Quantcast

Extract polygon values from large raster dataset

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Extract polygon values from large raster dataset

Spencer Scheidt
Hello,

I am attempting to extract land cover values from the NLCD 2006 dataset (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a SpatialPolygons* object.  However, the NLCD raster is approximately 16 GB in size (30 m resolution), and the grid that I would like to work with contains 5520 polygons (0.5 degree resolution).

I have written a script based on previous postings that have asked a similar question and Applied Spatial Data Analysis in R by Bivand et. al, and used this approach successfully for smaller SpatialPolygons* objects with the same NLCD dataset.  

What I am wondering is if there is a quicker way to extract my polygon values, either by breaking the raster and polygons into smaller chunks, or using an entirely different approach.  I realize that the extreme sizes of my 2 objects may be the ultimate constraint, and if necessary I can reduce the resolution of my grid.

Below is my working code, which uses 2 different approaches, via rasterize (estimated to take 10+ days to run and cross tabulate) and extract (estimated to take over 30 days to run):

#load appropriate libraries
library(maps)
library(rgdal)
library(raster)
library(sp)
library(maptools)

#create bounding box
us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)

#create grid
getClass("GridTopology")

cs<-c(0.5, 0.5)
cc<-us.box[,1] + (cs/2)
cd<-ceiling(diff(t(us.box))/cs)
us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd)
us.grid

#create spatial polygons
#projection is albers equal area conic
prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +units=m"
us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string = CRS("+proj=longlat +ellps=WGS84"))
us.SP<-spTransform(us.sp, CRS(prj.string))

summary(us.SP)

#check to make sure ID values are appropriate - should be listed by row from top left
IDvaluesGridTopology(us.grid)
slot(us.SP, "polygons")

#load land cover data raster (16 GB)
env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96")

#create raster of polygons
poly.raster <- rasterize(us.SP, env.data, progress = 'window')
df <- crosstab(poly.raster, env.data, progress = 'window')

#maybe another way?
ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window')

Any help or advice on working with large datasets is much appreciated.  Thank you for your time.

Spencer

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Extract polygon values from large raster dataset

Matthew Landis
Spencer,  it's been awhile since I tried to use rasterize(), but when I
did, I found it very slow compared to alternative approaches, e.g. using
ArcGIS.  In the end, I discovered gdal_rasterize (
http://www.gdal.org/gdal_rasterize.html), which seems to be faster,
although I've never compared them head to head.  That's what I now use
on a regular basis.  Maybe it will work for you?

M

On 06/18/2012 07:07 PM, Spencer Scheidt wrote:

> Hello,
>
> I am attempting to extract land cover values from the NLCD 2006 dataset
> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a
> SpatialPolygons* object.  However, the NLCD raster is approximately 16 GB in
> size (30 m resolution), and the grid that I would like to work with contains
> 5520 polygons (0.5 degree resolution).
>
> I have written a script based on previous postings that have asked a similar
> question and Applied Spatial Data Analysis in R by Bivand et. al, and used
> this approach successfully for smaller SpatialPolygons* objects with the
> same NLCD dataset.
>
> What I am wondering is if there is a quicker way to extract my polygon
> values, either by breaking the raster and polygons into smaller chunks, or
> using an entirely different approach.  I realize that the extreme sizes of
> my 2 objects may be the ultimate constraint, and if necessary I can reduce
> the resolution of my grid.
>
> Below is my working code, which uses 2 different approaches, via rasterize
> (estimated to take 10+ days to run and cross tabulate) and extract
> (estimated to take over 30 days to run):
>
> #load appropriate libraries
> library(maps)
> library(rgdal)
> library(raster)
> library(sp)
> library(maptools)
>
> #create bounding box
> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)
>
> #create grid
> getClass("GridTopology")
>
> cs<-c(0.5, 0.5)
> cc<-us.box[,1] + (cs/2)
> cd<-ceiling(diff(t(us.box))/cs)
> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd)
> us.grid
>
> #create spatial polygons
> #projection is albers equal area conic
> prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96
> +units=m"
> us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string =
> CRS("+proj=longlat +ellps=WGS84"))
> us.SP<-spTransform(us.sp, CRS(prj.string))
>
> summary(us.SP)
>
> #check to make sure ID values are appropriate - should be listed by row from
> top left
> IDvaluesGridTopology(us.grid)
> slot(us.SP, "polygons")
>
> #load land cover data raster (16 GB)
> env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
> +lon_0=-96")
>
> #create raster of polygons
> poly.raster<- rasterize(us.SP, env.data, progress = 'window')
> df<- crosstab(poly.raster, env.data, progress = 'window')
>
> #maybe another way?
> ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window')
>
> Any help or advice on working with large datasets is much appreciated.
> Thank you for your time.
>
> Spencer
>
>
>
> --
> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html
> Sent from the R-sig-geo mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo


--
~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew Landis, Ph.D.
Research Scientist
ISciences, LLC
61 Main St. Suite 200
Burlington VT 05401
802.864.2999
www.isciences.com
~~~~~~~~~~~~~~~~~~~~~~~~~~

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Extract polygon values from large raster dataset

Mathieu Basille-2
You might also want to consider using PostGIS [1], which now handles
rasters by default in the recent 2.0 version, and was precisely developed
to handle large maps easily. Note that it uses GDAL for some raster
functions, so that it might not be faster than GDAL itself...

Mathieu.

[1] http://postgis.refractions.net/


Le 18/06/2012 21:54, Matthew Landis a écrit :

> Spencer, it's been awhile since I tried to use rasterize(), but when I did,
> I found it very slow compared to alternative approaches, e.g. using ArcGIS.
> In the end, I discovered gdal_rasterize (
> http://www.gdal.org/gdal_rasterize.html), which seems to be faster,
> although I've never compared them head to head. That's what I now use on a
> regular basis. Maybe it will work for you?
>
> M
>
> On 06/18/2012 07:07 PM, Spencer Scheidt wrote:
>> Hello,
>>
>> I am attempting to extract land cover values from the NLCD 2006 dataset
>> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a
>> SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in
>> size (30 m resolution), and the grid that I would like to work with contains
>> 5520 polygons (0.5 degree resolution).
>>
>> I have written a script based on previous postings that have asked a similar
>> question and Applied Spatial Data Analysis in R by Bivand et. al, and used
>> this approach successfully for smaller SpatialPolygons* objects with the
>> same NLCD dataset.
>>
>> What I am wondering is if there is a quicker way to extract my polygon
>> values, either by breaking the raster and polygons into smaller chunks, or
>> using an entirely different approach. I realize that the extreme sizes of
>> my 2 objects may be the ultimate constraint, and if necessary I can reduce
>> the resolution of my grid.
>>
>> Below is my working code, which uses 2 different approaches, via rasterize
>> (estimated to take 10+ days to run and cross tabulate) and extract
>> (estimated to take over 30 days to run):
>>
>> #load appropriate libraries
>> library(maps)
>> library(rgdal)
>> library(raster)
>> library(sp)
>> library(maptools)
>>
>> #create bounding box
>> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)
>>
>> #create grid
>> getClass("GridTopology")
>>
>> cs<-c(0.5, 0.5)
>> cc<-us.box[,1] + (cs/2)
>> cd<-ceiling(diff(t(us.box))/cs)
>> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd)
>> us.grid
>>
>> #create spatial polygons
>> #projection is albers equal area conic
>> prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96
>> +units=m"
>> us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string =
>> CRS("+proj=longlat +ellps=WGS84"))
>> us.SP<-spTransform(us.sp, CRS(prj.string))
>>
>> summary(us.SP)
>>
>> #check to make sure ID values are appropriate - should be listed by row from
>> top left
>> IDvaluesGridTopology(us.grid)
>> slot(us.SP, "polygons")
>>
>> #load land cover data raster (16 GB)
>> env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
>> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
>> +lon_0=-96")
>>
>> #create raster of polygons
>> poly.raster<- rasterize(us.SP, env.data, progress = 'window')
>> df<- crosstab(poly.raster, env.data, progress = 'window')
>>
>> #maybe another way?
>> ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window')
>>
>> Any help or advice on working with large datasets is much appreciated.
>> Thank you for your time.
>>
>> Spencer
>>
>>
>>
>> --
>> View this message in context:
>> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html
>>
>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>

--

~$ whoami
Mathieu Basille, Post-Doc

~$ locate
Laboratoire d'Écologie Comportementale et de Conservation de la Faune
+ Centre d'Étude de la Forêt
Département de Biologie
Université Laval, Québec

~$ info
http://ase-research.org/basille

~$ fortune
``If you can't win by reason, go for volume.''
Calvin, by Bill Watterson.

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Interpretation of simultaneous and conditional autoregressive models

Mitra

Hi,i have a data set and I modeled the response, Y~X1+X2 (where Y=yield, X1=temperature, X2=Precipitation)  using SAR and CAR models.. Under both models, it seems that there is significant spatial correlation in the residuals
because the estimated value of lambda is 0.12087 and the p-value of the likelihood ratio
test is 2.22x10-16. (i.e. I reject null hypothesis concluding that there is spatial auto correlation in the residuals.)My question is: what this means in terms of precipitation and temperature? In other words, How do I explain the meaning of interpretation to non statisticians? (for some boday who has not taken spatial statistics courses??I would really appreciate any clarification from the experts.Thanks,Mitra DevkotaGraduate Research AssistantSDSU, Brookings571 242 6140


 




> Date: Mon, 18 Jun 2012 22:34:46 -0400
> From: [hidden email]
> CC: [hidden email]
> Subject: Re: [R-sig-Geo] Extract polygon values from large raster dataset
>
> You might also want to consider using PostGIS [1], which now handles
> rasters by default in the recent 2.0 version, and was precisely developed
> to handle large maps easily. Note that it uses GDAL for some raster
> functions, so that it might not be faster than GDAL itself...
>
> Mathieu.
>
> [1] http://postgis.refractions.net/
>
>
> Le 18/06/2012 21:54, Matthew Landis a écrit :
> > Spencer, it's been awhile since I tried to use rasterize(), but when I did,
> > I found it very slow compared to alternative approaches, e.g. using ArcGIS.
> > In the end, I discovered gdal_rasterize (
> > http://www.gdal.org/gdal_rasterize.html), which seems to be faster,
> > although I've never compared them head to head. That's what I now use on a
> > regular basis. Maybe it will work for you?
> >
> > M
> >
> > On 06/18/2012 07:07 PM, Spencer Scheidt wrote:
> >> Hello,
> >>
> >> I am attempting to extract land cover values from the NLCD 2006 dataset
> >> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a
> >> SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in
> >> size (30 m resolution), and the grid that I would like to work with contains
> >> 5520 polygons (0.5 degree resolution).
> >>
> >> I have written a script based on previous postings that have asked a similar
> >> question and Applied Spatial Data Analysis in R by Bivand et. al, and used
> >> this approach successfully for smaller SpatialPolygons* objects with the
> >> same NLCD dataset.
> >>
> >> What I am wondering is if there is a quicker way to extract my polygon
> >> values, either by breaking the raster and polygons into smaller chunks, or
> >> using an entirely different approach. I realize that the extreme sizes of
> >> my 2 objects may be the ultimate constraint, and if necessary I can reduce
> >> the resolution of my grid.
> >>
> >> Below is my working code, which uses 2 different approaches, via rasterize
> >> (estimated to take 10+ days to run and cross tabulate) and extract
> >> (estimated to take over 30 days to run):
> >>
> >> #load appropriate libraries
> >> library(maps)
> >> library(rgdal)
> >> library(raster)
> >> library(sp)
> >> library(maptools)
> >>
> >> #create bounding box
> >> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)
> >>
> >> #create grid
> >> getClass("GridTopology")
> >>
> >> cs<-c(0.5, 0.5)
> >> cc<-us.box[,1] + (cs/2)
> >> cd<-ceiling(diff(t(us.box))/cs)
> >> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd)
> >> us.grid
> >>
> >> #create spatial polygons
> >> #projection is albers equal area conic
> >> prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96
> >> +units=m"
> >> us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string =
> >> CRS("+proj=longlat +ellps=WGS84"))
> >> us.SP<-spTransform(us.sp, CRS(prj.string))
> >>
> >> summary(us.SP)
> >>
> >> #check to make sure ID values are appropriate - should be listed by row from
> >> top left
> >> IDvaluesGridTopology(us.grid)
> >> slot(us.SP, "polygons")
> >>
> >> #load land cover data raster (16 GB)
> >> env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
> >> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
> >> +lon_0=-96")
> >>
> >> #create raster of polygons
> >> poly.raster<- rasterize(us.SP, env.data, progress = 'window')
> >> df<- crosstab(poly.raster, env.data, progress = 'window')
> >>
> >> #maybe another way?
> >> ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window')
> >>
> >> Any help or advice on working with large datasets is much appreciated.
> >> Thank you for your time.
> >>
> >> Spencer
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html
> >>
> >> Sent from the R-sig-geo mailing list archive at Nabble.com.
> >>
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> [hidden email]
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
>
> --
>
> ~$ whoami
> Mathieu Basille, Post-Doc
>
> ~$ locate
> Laboratoire d'Écologie Comportementale et de Conservation de la Faune
> + Centre d'Étude de la Forêt
> Département de Biologie
> Université Laval, Québec
>
> ~$ info
> http://ase-research.org/basille
>
> ~$ fortune
> ``If you can't win by reason, go for volume.''
> Calvin, by Bill Watterson.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
     
        [[alternative HTML version deleted]]


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Extract polygon values from large raster dataset

Brad Nesom
In reply to this post by Spencer Scheidt
WOW!

I appreciate the insight. I constantly need rasterize. but have very little
working knowledge of using it.

I guess patience is the first ingredient.

brad

On Mon, Jun 18, 2012 at 6:07 PM, Spencer Scheidt <[hidden email]>wrote:

> Hello,
>
> I am attempting to extract land cover values from the NLCD 2006 dataset
> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted
> into a
> SpatialPolygons* object.  However, the NLCD raster is approximately 16 GB
> in
> size (30 m resolution), and the grid that I would like to work with
> contains
> 5520 polygons (0.5 degree resolution).
>
> I have written a script based on previous postings that have asked a
> similar
> question and Applied Spatial Data Analysis in R by Bivand et. al, and used
> this approach successfully for smaller SpatialPolygons* objects with the
> same NLCD dataset.
>
> What I am wondering is if there is a quicker way to extract my polygon
> values, either by breaking the raster and polygons into smaller chunks, or
> using an entirely different approach.  I realize that the extreme sizes of
> my 2 objects may be the ultimate constraint, and if necessary I can reduce
> the resolution of my grid.
>
> Below is my working code, which uses 2 different approaches, via rasterize
> (estimated to take 10+ days to run and cross tabulate) and extract
> (estimated to take over 30 days to run):
>
> #load appropriate libraries
> library(maps)
> library(rgdal)
> library(raster)
> library(sp)
> library(maptools)
>
> #create bounding box
> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)
>
> #create grid
> getClass("GridTopology")
>
> cs<-c(0.5, 0.5)
> cc<-us.box[,1] + (cs/2)
> cd<-ceiling(diff(t(us.box))/cs)
> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim =
> cd)
> us.grid
>
> #create spatial polygons
> #projection is albers equal area conic
> prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96
> +units=m"
> us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string =
> CRS("+proj=longlat +ellps=WGS84"))
> us.SP<-spTransform(us.sp, CRS(prj.string))
>
> summary(us.SP)
>
> #check to make sure ID values are appropriate - should be listed by row
> from
> top left
> IDvaluesGridTopology(us.grid)
> slot(us.SP, "polygons")
>
> #load land cover data raster (16 GB)
> env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
> +lon_0=-96")
>
> #create raster of polygons
> poly.raster <- rasterize(us.SP, env.data, progress = 'window')
> df <- crosstab(poly.raster, env.data, progress = 'window')
>
> #maybe another way?
> ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window')
>
> Any help or advice on working with large datasets is much appreciated.
> Thank you for your time.
>
> Spencer
>
>
>
> --
> View this message in context:
> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html
> Sent from the R-sig-geo mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>


--
[image: Steven Wright]<http://www.goodreads.com/author/show/181771.Steven_Wright>
"I have a map of the United States... Actual size. It says, 'Scale: 1 mile
= 1 mile.' I spent last summer folding it. I hardly ever unroll it. People
ask me where I live, and I say, 'E6."
-- Steven Wright <http://www.goodreads.com/author/show/181771.Steven_Wright>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Extract polygon values from large raster dataset

Robert Hijmans
In reply to this post by Spencer Scheidt
Spencer,

That is an interesting challenge. Your approaches are fine in principle,
but you have a raster of 30 meter resolution for the conterminous US ( ~17
billion cells ) and, if I understand you well, you want to tabulate the
values by polygon, using irregular polygons (relative to the raster) of
0.5x0.5 degrees.  That is a good case study for me to improve the speed of
rasterize (which is used in both of your methods as extract will call it
for each polygon); and I might work on that this summer.

For now, I have not much help to offer, except that you can probably make
things more efficient by removing the polygons that are outside the US:

#load appropriate libraries
library(rgdal)
library(raster)
library(rgeos)

r <- raster(xmn=-124.5, xmx=-67, ymn=25, ymx=49, nrow=48, ncol=115)
r[] <- 1:ncell(r)
p <- as(r, 'SpatialPolygonsDataFrame')

# remove the cells that are outside of the USA
# that is 35% and could saves a lot of processing time

usa <- getData('GADM', country='USA', level=0)
i <- gIntersects(p, usa, byid=TRUE)  #some patience required here
pp <- p[as.vector(i), ]

#  projection (changed to the exact spec of the landcover data)
prj <- CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0
+y_0=0 +ellps=GRS80 +units=m")
us <- spTransform(pp, prj)

#load land cover data raster (16 GB)
env <- raster('nlcd2006_landcover_4-20-11_se5.img', RAT=F)

plot(env) #remakably fast!
plot(us, add=TRUE, border='red')

NAvalue(env) <- 0

rasterize/crosstab is probably faster (according to you numbers), but if
you are going the 'extract' route, you need to do some looping, otherwise
you will attempt to create a list with 10 billion values, and that will
likely not work. Something like this, which is indeed rather slow:

res <- list()
for (i in 1:nrow(us)) {
print( system.time( ex <- extract(env, us[i,]) ) )
res[i] <- table(ex[[1]])
cat(i, '\n')
flush.console()
}

# and now process res, but that should be easy

Best, Robert

On Mon, Jun 18, 2012 at 4:07 PM, Spencer Scheidt <[hidden email]>wrote:

> Hello,
>
> I am attempting to extract land cover values from the NLCD 2006 dataset
> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted
> into a
> SpatialPolygons* object.  However, the NLCD raster is approximately 16 GB
> in
> size (30 m resolution), and the grid that I would like to work with
> contains
> 5520 polygons (0.5 degree resolution).
>
> I have written a script based on previous postings that have asked a
> similar
> question and Applied Spatial Data Analysis in R by Bivand et. al, and used
> this approach successfully for smaller SpatialPolygons* objects with the
> same NLCD dataset.
>
> What I am wondering is if there is a quicker way to extract my polygon
> values, either by breaking the raster and polygons into smaller chunks, or
> using an entirely different approach.  I realize that the extreme sizes of
> my 2 objects may be the ultimate constraint, and if necessary I can reduce
> the resolution of my grid.
>
> Below is my working code, which uses 2 different approaches, via rasterize
> (estimated to take 10+ days to run and cross tabulate) and extract
> (estimated to take over 30 days to run):
>
> #load appropriate libraries
> library(maps)
> library(rgdal)
> library(raster)
> library(sp)
> library(maptools)
>
> #create bounding box
> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)
>
> #create grid
> getClass("GridTopology")
>
> cs<-c(0.5, 0.5)
> cc<-us.box[,1] + (cs/2)
> cd<-ceiling(diff(t(us.box))/cs)
> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim =
> cd)
> us.grid
>
> #create spatial polygons
> #projection is albers equal area conic
> prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96
> +units=m"
> us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string =
> CRS("+proj=longlat +ellps=WGS84"))
> us.SP<-spTransform(us.sp, CRS(prj.string))
>
> summary(us.SP)
>
> #check to make sure ID values are appropriate - should be listed by row
> from
> top left
> IDvaluesGridTopology(us.grid)
> slot(us.SP, "polygons")
>
> #load land cover data raster (16 GB)
> env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
> +lon_0=-96")
>
> #create raster of polygons
> poly.raster <- rasterize(us.SP, env.data, progress = 'window')
> df <- crosstab(poly.raster, env.data, progress = 'window')
>
> #maybe another way?
> ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window')
>
> Any help or advice on working with large datasets is much appreciated.
> Thank you for your time.
>
> Spencer
>
>
>
> --
> View this message in context:
> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html
> Sent from the R-sig-geo mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Extract polygon values from large raster dataset

Robert Hijmans
Spencer,

After some tinkering... you could reclass to get single class (land cover
type) layer, then aggregate, then extract. For me, that takes about 4 hours
per class.

> rcl <- matrix(c(11,12,1, 21,100,0), byrow=T, ncol=3)
> system.time(cls11_12 <- reclass(env, rcl, right=FALSE,
filename='class_11-12', progress='text'))
   user  system elapsed
6494.48  599.67 8864.11

# I aggregate 100 times, cells still only 3 by 3 km, much smaller than
polygons
> system.time( a <- aggregate(cls11_12, 100, fun=sum,  filename='abc',
progress='text') )
   user  system elapsed
4248.00  276.85 4565.93


> system.time(x <- extract(a, us))
   user  system elapsed
 877.60   23.07  900.74


> (8864.11 +  4565.93 + 900.74 ) / 3600
[1] 3.980772

given that there are 16 classes (11, 12, 21, 22, 23, 24, 31, 41, 42, 43,
52, 71, 81, 82, 90, 95)
you could run this in about 2.5 days, that is not that bad :), if you
actually need ALL classes, otherwise it could be much faster.

Robert

On Tue, Jun 19, 2012 at 11:31 AM, Robert J. Hijmans <[hidden email]>wrote:

> Spencer,
>
> That is an interesting challenge. Your approaches are fine in principle,
> but you have a raster of 30 meter resolution for the conterminous US ( ~17
> billion cells ) and, if I understand you well, you want to tabulate the
> values by polygon, using irregular polygons (relative to the raster) of
> 0.5x0.5 degrees.  That is a good case study for me to improve the speed of
> rasterize (which is used in both of your methods as extract will call it
> for each polygon); and I might work on that this summer.
>
> For now, I have not much help to offer, except that you can probably make
> things more efficient by removing the polygons that are outside the US:
>
> #load appropriate libraries
> library(rgdal)
> library(raster)
> library(rgeos)
>
> r <- raster(xmn=-124.5, xmx=-67, ymn=25, ymx=49, nrow=48, ncol=115)
> r[] <- 1:ncell(r)
> p <- as(r, 'SpatialPolygonsDataFrame')
>
> # remove the cells that are outside of the USA
> # that is 35% and could saves a lot of processing time
>
> usa <- getData('GADM', country='USA', level=0)
> i <- gIntersects(p, usa, byid=TRUE)  #some patience required here
> pp <- p[as.vector(i), ]
>
> #  projection (changed to the exact spec of the landcover data)
> prj <- CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0
> +y_0=0 +ellps=GRS80 +units=m")
> us <- spTransform(pp, prj)
>
> #load land cover data raster (16 GB)
> env <- raster('nlcd2006_landcover_4-20-11_se5.img', RAT=F)
>
> plot(env) #remakably fast!
> plot(us, add=TRUE, border='red')
>
> NAvalue(env) <- 0
>
> rasterize/crosstab is probably faster (according to you numbers), but if
> you are going the 'extract' route, you need to do some looping, otherwise
> you will attempt to create a list with 10 billion values, and that will
> likely not work. Something like this, which is indeed rather slow:
>
> res <- list()
> for (i in 1:nrow(us)) {
> print( system.time( ex <- extract(env, us[i,]) ) )
> res[i] <- table(ex[[1]])
>  cat(i, '\n')
> flush.console()
> }
>
> # and now process res, but that should be easy
>
> Best, Robert
>
> On Mon, Jun 18, 2012 at 4:07 PM, Spencer Scheidt <[hidden email]>wrote:
>
>> Hello,
>>
>> I am attempting to extract land cover values from the NLCD 2006 dataset
>> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted
>> into a
>> SpatialPolygons* object.  However, the NLCD raster is approximately 16 GB
>> in
>> size (30 m resolution), and the grid that I would like to work with
>> contains
>> 5520 polygons (0.5 degree resolution).
>>
>> I have written a script based on previous postings that have asked a
>> similar
>> question and Applied Spatial Data Analysis in R by Bivand et. al, and used
>> this approach successfully for smaller SpatialPolygons* objects with the
>> same NLCD dataset.
>>
>> What I am wondering is if there is a quicker way to extract my polygon
>> values, either by breaking the raster and polygons into smaller chunks, or
>> using an entirely different approach.  I realize that the extreme sizes of
>> my 2 objects may be the ultimate constraint, and if necessary I can reduce
>> the resolution of my grid.
>>
>> Below is my working code, which uses 2 different approaches, via rasterize
>> (estimated to take 10+ days to run and cross tabulate) and extract
>> (estimated to take over 30 days to run):
>>
>> #load appropriate libraries
>> library(maps)
>> library(rgdal)
>> library(raster)
>> library(sp)
>> library(maptools)
>>
>> #create bounding box
>> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE)
>>
>> #create grid
>> getClass("GridTopology")
>>
>> cs<-c(0.5, 0.5)
>> cc<-us.box[,1] + (cs/2)
>> cd<-ceiling(diff(t(us.box))/cs)
>> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim =
>> cd)
>> us.grid
>>
>> #create spatial polygons
>> #projection is albers equal area conic
>> prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96
>> +units=m"
>> us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string =
>> CRS("+proj=longlat +ellps=WGS84"))
>> us.SP<-spTransform(us.sp, CRS(prj.string))
>>
>> summary(us.SP)
>>
>> #check to make sure ID values are appropriate - should be listed by row
>> from
>> top left
>> IDvaluesGridTopology(us.grid)
>> slot(us.SP, "polygons")
>>
>> #load land cover data raster (16 GB)
>> env.data = raster('nlcd2006_landcover_4-20-11_se5.img')
>> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
>> +lon_0=-96")
>>
>> #create raster of polygons
>> poly.raster <- rasterize(us.SP, env.data, progress = 'window')
>> df <- crosstab(poly.raster, env.data, progress = 'window')
>>
>> #maybe another way?
>> ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window')
>>
>> Any help or advice on working with large datasets is much appreciated.
>> Thank you for your time.
>>
>> Spencer
>>
>>
>>
>> --
>> View this message in context:
>> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html
>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Loading...