|
Hello,
I am attempting to extract land cover values from the NLCD 2006 dataset (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in size (30 m resolution), and the grid that I would like to work with contains 5520 polygons (0.5 degree resolution). I have written a script based on previous postings that have asked a similar question and Applied Spatial Data Analysis in R by Bivand et. al, and used this approach successfully for smaller SpatialPolygons* objects with the same NLCD dataset. What I am wondering is if there is a quicker way to extract my polygon values, either by breaking the raster and polygons into smaller chunks, or using an entirely different approach. I realize that the extreme sizes of my 2 objects may be the ultimate constraint, and if necessary I can reduce the resolution of my grid. Below is my working code, which uses 2 different approaches, via rasterize (estimated to take 10+ days to run and cross tabulate) and extract (estimated to take over 30 days to run): #load appropriate libraries library(maps) library(rgdal) library(raster) library(sp) library(maptools) #create bounding box us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) #create grid getClass("GridTopology") cs<-c(0.5, 0.5) cc<-us.box[,1] + (cs/2) cd<-ceiling(diff(t(us.box))/cs) us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) us.grid #create spatial polygons #projection is albers equal area conic prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +units=m" us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string = CRS("+proj=longlat +ellps=WGS84")) us.SP<-spTransform(us.sp, CRS(prj.string)) summary(us.SP) #check to make sure ID values are appropriate - should be listed by row from top left IDvaluesGridTopology(us.grid) slot(us.SP, "polygons") #load land cover data raster (16 GB) env.data = raster('nlcd2006_landcover_4-20-11_se5.img') projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96") #create raster of polygons poly.raster <- rasterize(us.SP, env.data, progress = 'window') df <- crosstab(poly.raster, env.data, progress = 'window') #maybe another way? ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window') Any help or advice on working with large datasets is much appreciated. Thank you for your time. Spencer |
|
Spencer, it's been awhile since I tried to use rasterize(), but when I
did, I found it very slow compared to alternative approaches, e.g. using ArcGIS. In the end, I discovered gdal_rasterize ( http://www.gdal.org/gdal_rasterize.html), which seems to be faster, although I've never compared them head to head. That's what I now use on a regular basis. Maybe it will work for you? M On 06/18/2012 07:07 PM, Spencer Scheidt wrote: > Hello, > > I am attempting to extract land cover values from the NLCD 2006 dataset > (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a > SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in > size (30 m resolution), and the grid that I would like to work with contains > 5520 polygons (0.5 degree resolution). > > I have written a script based on previous postings that have asked a similar > question and Applied Spatial Data Analysis in R by Bivand et. al, and used > this approach successfully for smaller SpatialPolygons* objects with the > same NLCD dataset. > > What I am wondering is if there is a quicker way to extract my polygon > values, either by breaking the raster and polygons into smaller chunks, or > using an entirely different approach. I realize that the extreme sizes of > my 2 objects may be the ultimate constraint, and if necessary I can reduce > the resolution of my grid. > > Below is my working code, which uses 2 different approaches, via rasterize > (estimated to take 10+ days to run and cross tabulate) and extract > (estimated to take over 30 days to run): > > #load appropriate libraries > library(maps) > library(rgdal) > library(raster) > library(sp) > library(maptools) > > #create bounding box > us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) > > #create grid > getClass("GridTopology") > > cs<-c(0.5, 0.5) > cc<-us.box[,1] + (cs/2) > cd<-ceiling(diff(t(us.box))/cs) > us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) > us.grid > > #create spatial polygons > #projection is albers equal area conic > prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 > +units=m" > us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string = > CRS("+proj=longlat +ellps=WGS84")) > us.SP<-spTransform(us.sp, CRS(prj.string)) > > summary(us.SP) > > #check to make sure ID values are appropriate - should be listed by row from > top left > IDvaluesGridTopology(us.grid) > slot(us.SP, "polygons") > > #load land cover data raster (16 GB) > env.data = raster('nlcd2006_landcover_4-20-11_se5.img') > projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 > +lon_0=-96") > > #create raster of polygons > poly.raster<- rasterize(us.SP, env.data, progress = 'window') > df<- crosstab(poly.raster, env.data, progress = 'window') > > #maybe another way? > ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window') > > Any help or advice on working with large datasets is much appreciated. > Thank you for your time. > > Spencer > > > > -- > View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html > Sent from the R-sig-geo mailing list archive at Nabble.com. > > _______________________________________________ > R-sig-Geo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/r-sig-geo -- ~~~~~~~~~~~~~~~~~~~~~~~~~~ Matthew Landis, Ph.D. Research Scientist ISciences, LLC 61 Main St. Suite 200 Burlington VT 05401 802.864.2999 www.isciences.com ~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo |
|
You might also want to consider using PostGIS [1], which now handles
rasters by default in the recent 2.0 version, and was precisely developed to handle large maps easily. Note that it uses GDAL for some raster functions, so that it might not be faster than GDAL itself... Mathieu. [1] http://postgis.refractions.net/ Le 18/06/2012 21:54, Matthew Landis a écrit : > Spencer, it's been awhile since I tried to use rasterize(), but when I did, > I found it very slow compared to alternative approaches, e.g. using ArcGIS. > In the end, I discovered gdal_rasterize ( > http://www.gdal.org/gdal_rasterize.html), which seems to be faster, > although I've never compared them head to head. That's what I now use on a > regular basis. Maybe it will work for you? > > M > > On 06/18/2012 07:07 PM, Spencer Scheidt wrote: >> Hello, >> >> I am attempting to extract land cover values from the NLCD 2006 dataset >> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a >> SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in >> size (30 m resolution), and the grid that I would like to work with contains >> 5520 polygons (0.5 degree resolution). >> >> I have written a script based on previous postings that have asked a similar >> question and Applied Spatial Data Analysis in R by Bivand et. al, and used >> this approach successfully for smaller SpatialPolygons* objects with the >> same NLCD dataset. >> >> What I am wondering is if there is a quicker way to extract my polygon >> values, either by breaking the raster and polygons into smaller chunks, or >> using an entirely different approach. I realize that the extreme sizes of >> my 2 objects may be the ultimate constraint, and if necessary I can reduce >> the resolution of my grid. >> >> Below is my working code, which uses 2 different approaches, via rasterize >> (estimated to take 10+ days to run and cross tabulate) and extract >> (estimated to take over 30 days to run): >> >> #load appropriate libraries >> library(maps) >> library(rgdal) >> library(raster) >> library(sp) >> library(maptools) >> >> #create bounding box >> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) >> >> #create grid >> getClass("GridTopology") >> >> cs<-c(0.5, 0.5) >> cc<-us.box[,1] + (cs/2) >> cd<-ceiling(diff(t(us.box))/cs) >> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) >> us.grid >> >> #create spatial polygons >> #projection is albers equal area conic >> prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 >> +units=m" >> us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string = >> CRS("+proj=longlat +ellps=WGS84")) >> us.SP<-spTransform(us.sp, CRS(prj.string)) >> >> summary(us.SP) >> >> #check to make sure ID values are appropriate - should be listed by row from >> top left >> IDvaluesGridTopology(us.grid) >> slot(us.SP, "polygons") >> >> #load land cover data raster (16 GB) >> env.data = raster('nlcd2006_landcover_4-20-11_se5.img') >> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 >> +lon_0=-96") >> >> #create raster of polygons >> poly.raster<- rasterize(us.SP, env.data, progress = 'window') >> df<- crosstab(poly.raster, env.data, progress = 'window') >> >> #maybe another way? >> ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window') >> >> Any help or advice on working with large datasets is much appreciated. >> Thank you for your time. >> >> Spencer >> >> >> >> -- >> View this message in context: >> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html >> >> Sent from the R-sig-geo mailing list archive at Nabble.com. >> >> _______________________________________________ >> R-sig-Geo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo > > -- ~$ whoami Mathieu Basille, Post-Doc ~$ locate Laboratoire d'Écologie Comportementale et de Conservation de la Faune + Centre d'Étude de la Forêt Département de Biologie Université Laval, Québec ~$ info http://ase-research.org/basille ~$ fortune ``If you can't win by reason, go for volume.'' Calvin, by Bill Watterson. _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo |
|
Hi,i have a data set and I modeled the response, Y~X1+X2 (where Y=yield, X1=temperature, X2=Precipitation) using SAR and CAR models.. Under both models, it seems that there is significant spatial correlation in the residuals because the estimated value of lambda is 0.12087 and the p-value of the likelihood ratio test is 2.22x10-16. (i.e. I reject null hypothesis concluding that there is spatial auto correlation in the residuals.)My question is: what this means in terms of precipitation and temperature? In other words, How do I explain the meaning of interpretation to non statisticians? (for some boday who has not taken spatial statistics courses??I would really appreciate any clarification from the experts.Thanks,Mitra DevkotaGraduate Research AssistantSDSU, Brookings571 242 6140 > Date: Mon, 18 Jun 2012 22:34:46 -0400 > From: [hidden email] > CC: [hidden email] > Subject: Re: [R-sig-Geo] Extract polygon values from large raster dataset > > You might also want to consider using PostGIS [1], which now handles > rasters by default in the recent 2.0 version, and was precisely developed > to handle large maps easily. Note that it uses GDAL for some raster > functions, so that it might not be faster than GDAL itself... > > Mathieu. > > [1] http://postgis.refractions.net/ > > > Le 18/06/2012 21:54, Matthew Landis a écrit : > > Spencer, it's been awhile since I tried to use rasterize(), but when I did, > > I found it very slow compared to alternative approaches, e.g. using ArcGIS. > > In the end, I discovered gdal_rasterize ( > > http://www.gdal.org/gdal_rasterize.html), which seems to be faster, > > although I've never compared them head to head. That's what I now use on a > > regular basis. Maybe it will work for you? > > > > M > > > > On 06/18/2012 07:07 PM, Spencer Scheidt wrote: > >> Hello, > >> > >> I am attempting to extract land cover values from the NLCD 2006 dataset > >> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted into a > >> SpatialPolygons* object. However, the NLCD raster is approximately 16 GB in > >> size (30 m resolution), and the grid that I would like to work with contains > >> 5520 polygons (0.5 degree resolution). > >> > >> I have written a script based on previous postings that have asked a similar > >> question and Applied Spatial Data Analysis in R by Bivand et. al, and used > >> this approach successfully for smaller SpatialPolygons* objects with the > >> same NLCD dataset. > >> > >> What I am wondering is if there is a quicker way to extract my polygon > >> values, either by breaking the raster and polygons into smaller chunks, or > >> using an entirely different approach. I realize that the extreme sizes of > >> my 2 objects may be the ultimate constraint, and if necessary I can reduce > >> the resolution of my grid. > >> > >> Below is my working code, which uses 2 different approaches, via rasterize > >> (estimated to take 10+ days to run and cross tabulate) and extract > >> (estimated to take over 30 days to run): > >> > >> #load appropriate libraries > >> library(maps) > >> library(rgdal) > >> library(raster) > >> library(sp) > >> library(maptools) > >> > >> #create bounding box > >> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) > >> > >> #create grid > >> getClass("GridTopology") > >> > >> cs<-c(0.5, 0.5) > >> cc<-us.box[,1] + (cs/2) > >> cd<-ceiling(diff(t(us.box))/cs) > >> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd) > >> us.grid > >> > >> #create spatial polygons > >> #projection is albers equal area conic > >> prj.string<- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 > >> +units=m" > >> us.sp<- as.SpatialPolygons.GridTopology(us.grid, proj4string = > >> CRS("+proj=longlat +ellps=WGS84")) > >> us.SP<-spTransform(us.sp, CRS(prj.string)) > >> > >> summary(us.SP) > >> > >> #check to make sure ID values are appropriate - should be listed by row from > >> top left > >> IDvaluesGridTopology(us.grid) > >> slot(us.SP, "polygons") > >> > >> #load land cover data raster (16 GB) > >> env.data = raster('nlcd2006_landcover_4-20-11_se5.img') > >> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 > >> +lon_0=-96") > >> > >> #create raster of polygons > >> poly.raster<- rasterize(us.SP, env.data, progress = 'window') > >> df<- crosstab(poly.raster, env.data, progress = 'window') > >> > >> #maybe another way? > >> ex.test<- extract(env.data, us.SP, df=TRUE, progress = 'window') > >> > >> Any help or advice on working with large datasets is much appreciated. > >> Thank you for your time. > >> > >> Spencer > >> > >> > >> > >> -- > >> View this message in context: > >> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html > >> > >> Sent from the R-sig-geo mailing list archive at Nabble.com. > >> > >> _______________________________________________ > >> R-sig-Geo mailing list > >> [hidden email] > >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo > > > > > > -- > > ~$ whoami > Mathieu Basille, Post-Doc > > ~$ locate > Laboratoire d'Écologie Comportementale et de Conservation de la Faune > + Centre d'Étude de la Forêt > Département de Biologie > Université Laval, Québec > > ~$ info > http://ase-research.org/basille > > ~$ fortune > ``If you can't win by reason, go for volume.'' > Calvin, by Bill Watterson. > > _______________________________________________ > R-sig-Geo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/r-sig-geo _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo |
|
In reply to this post by Spencer Scheidt
WOW!
I appreciate the insight. I constantly need rasterize. but have very little working knowledge of using it. I guess patience is the first ingredient. brad On Mon, Jun 18, 2012 at 6:07 PM, Spencer Scheidt <[hidden email]>wrote: > Hello, > > I am attempting to extract land cover values from the NLCD 2006 dataset > (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted > into a > SpatialPolygons* object. However, the NLCD raster is approximately 16 GB > in > size (30 m resolution), and the grid that I would like to work with > contains > 5520 polygons (0.5 degree resolution). > > I have written a script based on previous postings that have asked a > similar > question and Applied Spatial Data Analysis in R by Bivand et. al, and used > this approach successfully for smaller SpatialPolygons* objects with the > same NLCD dataset. > > What I am wondering is if there is a quicker way to extract my polygon > values, either by breaking the raster and polygons into smaller chunks, or > using an entirely different approach. I realize that the extreme sizes of > my 2 objects may be the ultimate constraint, and if necessary I can reduce > the resolution of my grid. > > Below is my working code, which uses 2 different approaches, via rasterize > (estimated to take 10+ days to run and cross tabulate) and extract > (estimated to take over 30 days to run): > > #load appropriate libraries > library(maps) > library(rgdal) > library(raster) > library(sp) > library(maptools) > > #create bounding box > us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) > > #create grid > getClass("GridTopology") > > cs<-c(0.5, 0.5) > cc<-us.box[,1] + (cs/2) > cd<-ceiling(diff(t(us.box))/cs) > us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = > cd) > us.grid > > #create spatial polygons > #projection is albers equal area conic > prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 > +units=m" > us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string = > CRS("+proj=longlat +ellps=WGS84")) > us.SP<-spTransform(us.sp, CRS(prj.string)) > > summary(us.SP) > > #check to make sure ID values are appropriate - should be listed by row > from > top left > IDvaluesGridTopology(us.grid) > slot(us.SP, "polygons") > > #load land cover data raster (16 GB) > env.data = raster('nlcd2006_landcover_4-20-11_se5.img') > projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 > +lon_0=-96") > > #create raster of polygons > poly.raster <- rasterize(us.SP, env.data, progress = 'window') > df <- crosstab(poly.raster, env.data, progress = 'window') > > #maybe another way? > ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window') > > Any help or advice on working with large datasets is much appreciated. > Thank you for your time. > > Spencer > > > > -- > View this message in context: > http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html > Sent from the R-sig-geo mailing list archive at Nabble.com. > > _______________________________________________ > R-sig-Geo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/r-sig-geo > > -- [image: Steven Wright]<http://www.goodreads.com/author/show/181771.Steven_Wright> "I have a map of the United States... Actual size. It says, 'Scale: 1 mile = 1 mile.' I spent last summer folding it. I hardly ever unroll it. People ask me where I live, and I say, 'E6." -- Steven Wright <http://www.goodreads.com/author/show/181771.Steven_Wright> [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo |
|
In reply to this post by Spencer Scheidt
Spencer,
That is an interesting challenge. Your approaches are fine in principle, but you have a raster of 30 meter resolution for the conterminous US ( ~17 billion cells ) and, if I understand you well, you want to tabulate the values by polygon, using irregular polygons (relative to the raster) of 0.5x0.5 degrees. That is a good case study for me to improve the speed of rasterize (which is used in both of your methods as extract will call it for each polygon); and I might work on that this summer. For now, I have not much help to offer, except that you can probably make things more efficient by removing the polygons that are outside the US: #load appropriate libraries library(rgdal) library(raster) library(rgeos) r <- raster(xmn=-124.5, xmx=-67, ymn=25, ymx=49, nrow=48, ncol=115) r[] <- 1:ncell(r) p <- as(r, 'SpatialPolygonsDataFrame') # remove the cells that are outside of the USA # that is 35% and could saves a lot of processing time usa <- getData('GADM', country='USA', level=0) i <- gIntersects(p, usa, byid=TRUE) #some patience required here pp <- p[as.vector(i), ] # projection (changed to the exact spec of the landcover data) prj <- CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +units=m") us <- spTransform(pp, prj) #load land cover data raster (16 GB) env <- raster('nlcd2006_landcover_4-20-11_se5.img', RAT=F) plot(env) #remakably fast! plot(us, add=TRUE, border='red') NAvalue(env) <- 0 rasterize/crosstab is probably faster (according to you numbers), but if you are going the 'extract' route, you need to do some looping, otherwise you will attempt to create a list with 10 billion values, and that will likely not work. Something like this, which is indeed rather slow: res <- list() for (i in 1:nrow(us)) { print( system.time( ex <- extract(env, us[i,]) ) ) res[i] <- table(ex[[1]]) cat(i, '\n') flush.console() } # and now process res, but that should be easy Best, Robert On Mon, Jun 18, 2012 at 4:07 PM, Spencer Scheidt <[hidden email]>wrote: > Hello, > > I am attempting to extract land cover values from the NLCD 2006 dataset > (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted > into a > SpatialPolygons* object. However, the NLCD raster is approximately 16 GB > in > size (30 m resolution), and the grid that I would like to work with > contains > 5520 polygons (0.5 degree resolution). > > I have written a script based on previous postings that have asked a > similar > question and Applied Spatial Data Analysis in R by Bivand et. al, and used > this approach successfully for smaller SpatialPolygons* objects with the > same NLCD dataset. > > What I am wondering is if there is a quicker way to extract my polygon > values, either by breaking the raster and polygons into smaller chunks, or > using an entirely different approach. I realize that the extreme sizes of > my 2 objects may be the ultimate constraint, and if necessary I can reduce > the resolution of my grid. > > Below is my working code, which uses 2 different approaches, via rasterize > (estimated to take 10+ days to run and cross tabulate) and extract > (estimated to take over 30 days to run): > > #load appropriate libraries > library(maps) > library(rgdal) > library(raster) > library(sp) > library(maptools) > > #create bounding box > us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) > > #create grid > getClass("GridTopology") > > cs<-c(0.5, 0.5) > cc<-us.box[,1] + (cs/2) > cd<-ceiling(diff(t(us.box))/cs) > us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = > cd) > us.grid > > #create spatial polygons > #projection is albers equal area conic > prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 > +units=m" > us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string = > CRS("+proj=longlat +ellps=WGS84")) > us.SP<-spTransform(us.sp, CRS(prj.string)) > > summary(us.SP) > > #check to make sure ID values are appropriate - should be listed by row > from > top left > IDvaluesGridTopology(us.grid) > slot(us.SP, "polygons") > > #load land cover data raster (16 GB) > env.data = raster('nlcd2006_landcover_4-20-11_se5.img') > projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 > +lon_0=-96") > > #create raster of polygons > poly.raster <- rasterize(us.SP, env.data, progress = 'window') > df <- crosstab(poly.raster, env.data, progress = 'window') > > #maybe another way? > ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window') > > Any help or advice on working with large datasets is much appreciated. > Thank you for your time. > > Spencer > > > > -- > View this message in context: > http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html > Sent from the R-sig-geo mailing list archive at Nabble.com. > > _______________________________________________ > R-sig-Geo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/r-sig-geo > [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo |
|
Spencer,
After some tinkering... you could reclass to get single class (land cover type) layer, then aggregate, then extract. For me, that takes about 4 hours per class. > rcl <- matrix(c(11,12,1, 21,100,0), byrow=T, ncol=3) > system.time(cls11_12 <- reclass(env, rcl, right=FALSE, filename='class_11-12', progress='text')) user system elapsed 6494.48 599.67 8864.11 # I aggregate 100 times, cells still only 3 by 3 km, much smaller than polygons > system.time( a <- aggregate(cls11_12, 100, fun=sum, filename='abc', progress='text') ) user system elapsed 4248.00 276.85 4565.93 > system.time(x <- extract(a, us)) user system elapsed 877.60 23.07 900.74 > (8864.11 + 4565.93 + 900.74 ) / 3600 [1] 3.980772 given that there are 16 classes (11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 52, 71, 81, 82, 90, 95) you could run this in about 2.5 days, that is not that bad :), if you actually need ALL classes, otherwise it could be much faster. Robert On Tue, Jun 19, 2012 at 11:31 AM, Robert J. Hijmans <[hidden email]>wrote: > Spencer, > > That is an interesting challenge. Your approaches are fine in principle, > but you have a raster of 30 meter resolution for the conterminous US ( ~17 > billion cells ) and, if I understand you well, you want to tabulate the > values by polygon, using irregular polygons (relative to the raster) of > 0.5x0.5 degrees. That is a good case study for me to improve the speed of > rasterize (which is used in both of your methods as extract will call it > for each polygon); and I might work on that this summer. > > For now, I have not much help to offer, except that you can probably make > things more efficient by removing the polygons that are outside the US: > > #load appropriate libraries > library(rgdal) > library(raster) > library(rgeos) > > r <- raster(xmn=-124.5, xmx=-67, ymn=25, ymx=49, nrow=48, ncol=115) > r[] <- 1:ncell(r) > p <- as(r, 'SpatialPolygonsDataFrame') > > # remove the cells that are outside of the USA > # that is 35% and could saves a lot of processing time > > usa <- getData('GADM', country='USA', level=0) > i <- gIntersects(p, usa, byid=TRUE) #some patience required here > pp <- p[as.vector(i), ] > > # projection (changed to the exact spec of the landcover data) > prj <- CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 > +y_0=0 +ellps=GRS80 +units=m") > us <- spTransform(pp, prj) > > #load land cover data raster (16 GB) > env <- raster('nlcd2006_landcover_4-20-11_se5.img', RAT=F) > > plot(env) #remakably fast! > plot(us, add=TRUE, border='red') > > NAvalue(env) <- 0 > > rasterize/crosstab is probably faster (according to you numbers), but if > you are going the 'extract' route, you need to do some looping, otherwise > you will attempt to create a list with 10 billion values, and that will > likely not work. Something like this, which is indeed rather slow: > > res <- list() > for (i in 1:nrow(us)) { > print( system.time( ex <- extract(env, us[i,]) ) ) > res[i] <- table(ex[[1]]) > cat(i, '\n') > flush.console() > } > > # and now process res, but that should be easy > > Best, Robert > > On Mon, Jun 18, 2012 at 4:07 PM, Spencer Scheidt <[hidden email]>wrote: > >> Hello, >> >> I am attempting to extract land cover values from the NLCD 2006 dataset >> (http://www.mrlc.gov/nlcd2006.php) using a grid that I have converted >> into a >> SpatialPolygons* object. However, the NLCD raster is approximately 16 GB >> in >> size (30 m resolution), and the grid that I would like to work with >> contains >> 5520 polygons (0.5 degree resolution). >> >> I have written a script based on previous postings that have asked a >> similar >> question and Applied Spatial Data Analysis in R by Bivand et. al, and used >> this approach successfully for smaller SpatialPolygons* objects with the >> same NLCD dataset. >> >> What I am wondering is if there is a quicker way to extract my polygon >> values, either by breaking the raster and polygons into smaller chunks, or >> using an entirely different approach. I realize that the extreme sizes of >> my 2 objects may be the ultimate constraint, and if necessary I can reduce >> the resolution of my grid. >> >> Below is my working code, which uses 2 different approaches, via rasterize >> (estimated to take 10+ days to run and cross tabulate) and extract >> (estimated to take over 30 days to run): >> >> #load appropriate libraries >> library(maps) >> library(rgdal) >> library(raster) >> library(sp) >> library(maptools) >> >> #create bounding box >> us.box<-matrix(c(-124.5,-67,25,49), nrow = 2, ncol = 2, byrow = TRUE) >> >> #create grid >> getClass("GridTopology") >> >> cs<-c(0.5, 0.5) >> cc<-us.box[,1] + (cs/2) >> cd<-ceiling(diff(t(us.box))/cs) >> us.grid<-GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = >> cd) >> us.grid >> >> #create spatial polygons >> #projection is albers equal area conic >> prj.string <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 >> +units=m" >> us.sp <- as.SpatialPolygons.GridTopology(us.grid, proj4string = >> CRS("+proj=longlat +ellps=WGS84")) >> us.SP<-spTransform(us.sp, CRS(prj.string)) >> >> summary(us.SP) >> >> #check to make sure ID values are appropriate - should be listed by row >> from >> top left >> IDvaluesGridTopology(us.grid) >> slot(us.SP, "polygons") >> >> #load land cover data raster (16 GB) >> env.data = raster('nlcd2006_landcover_4-20-11_se5.img') >> projection(env.data) = CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 >> +lon_0=-96") >> >> #create raster of polygons >> poly.raster <- rasterize(us.SP, env.data, progress = 'window') >> df <- crosstab(poly.raster, env.data, progress = 'window') >> >> #maybe another way? >> ex.test <- extract(env.data, us.SP, df=TRUE, progress = 'window') >> >> Any help or advice on working with large datasets is much appreciated. >> Thank you for your time. >> >> Spencer >> >> >> >> -- >> View this message in context: >> http://r-sig-geo.2731867.n2.nabble.com/Extract-polygon-values-from-large-raster-dataset-tp7580250.html >> Sent from the R-sig-geo mailing list archive at Nabble.com. >> >> _______________________________________________ >> R-sig-Geo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > > [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo |
| Powered by Nabble | Edit this page |
