Faster way to extract raster values using multiple polygons?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Faster way to extract raster values using multiple polygons?

R-sig-geo mailing list
Dear all,

I am trying to extract a time series from a raster brick using multiple polygons.

The raster brick is a global temperature netcdf file with 1200 layers (months) and the polygons are each of the 5567 municipalities in Brazil. I intend to extract the temperature time series for each municipality.

As a final result, I would like to have a data frame containing: -the name and coordinates of the municipality, -the date tag from the raster, and -the extracted temperature value.

My initial, VERY SLOW attempt was something like this:


library(raster)

# Create some sample raster data
idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
r <- raster(ncol=360, nrow=180)
b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
b <- setZ(b, idx)

# Import polygon data
br <- getData("GADM", country="Brazil", level=2) # about 7MB to download

# Subset the SMALLEST state - only 75 municipalities
br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
plot(br_sub)

# Now let's extract the data for each municipality
beginCluster()
e <- extract(b, br_sub, fun=mean, df=T)
endCluster()


Even using the smallest state possible, this example takes about 20 minutes to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there are states with over 850 municipalities. And remember, in total there are over 5500 municipalities I need to extract the data for.

I have the feeling that my code is not very optimized.

Has anybody ever dealt with this amount of data in this kind of raster operation? Is there any fastest way to achieve my expected result?
Thanks in advance,
-- Thiago V. dos Santos

PhD student
Land and Atmospheric Science

University of Minnesota

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Faster way to extract raster values using multiple polygons?

Ben Tupper
Hi,

The resolution of your raster data (1 degree) is much more coarse than what your polygons represent.  Could you short-circuit the process by assuming that the temp at the centroid of each polygon would suitably represent the mean temperature across each polygon?  Unless you have some much bigger polygons, I can't imagine it will be very far off. If so, then you could pretty quickly extract the values for each layer in the raster at each centroid.  Perhaps like this?

cents <- coordinates(br_sub)
v <- extract(b, cents)

Is that close enough?

Cheers,
Ben

> On Apr 6, 2016, at 8:00 AM, Thiago V. dos Santos via R-sig-Geo <[hidden email]> wrote:
>
> Dear all,
>
> I am trying to extract a time series from a raster brick using multiple polygons.
>
> The raster brick is a global temperature netcdf file with 1200 layers (months) and the polygons are each of the 5567 municipalities in Brazil. I intend to extract the temperature time series for each municipality.
>
> As a final result, I would like to have a data frame containing: -the name and coordinates of the municipality, -the date tag from the raster, and -the extracted temperature value.
>
> My initial, VERY SLOW attempt was something like this:
>
>
> library(raster)
>
> # Create some sample raster data
> idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
> r <- raster(ncol=360, nrow=180)
> b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
> b <- setZ(b, idx)
>
> # Import polygon data
> br <- getData("GADM", country="Brazil", level=2) # about 7MB to download
>
> # Subset the SMALLEST state - only 75 municipalities
> br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
> plot(br_sub)
>
> # Now let's extract the data for each municipality
> beginCluster()
> e <- extract(b, br_sub, fun=mean, df=T)
> endCluster()
>
>
> Even using the smallest state possible, this example takes about 20 minutes to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there are states with over 850 municipalities. And remember, in total there are over 5500 municipalities I need to extract the data for.
>
> I have the feeling that my code is not very optimized.
>
> Has anybody ever dealt with this amount of data in this kind of raster operation? Is there any fastest way to achieve my expected result?
> Thanks in advance,
> -- Thiago V. dos Santos
>
> PhD student
> Land and Atmospheric Science
>
> University of Minnesota
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Faster way to extract raster values using multiple polygons?

R-sig-geo mailing list
Hi Ben,


Thank you so much for the suggestion.

It is pretty fast, and I guess it's a good assumption for temperature. I'll have to do the same with precipitation, which is way more spatially variable than temperature. In that case, I think I'll have to wait the 10 hours it takes to run "extract" using a mean value of all cells covering a polygon.

It is a one-time processing anyway, so I should not worry too much about the waiting time.

Greetings,
-- Thiago V. dos Santos

PhD student
Land and Atmospheric Science
University of Minnesota




On Wednesday, April 6, 2016 10:48 AM, Ben Tupper <[hidden email]> wrote:
Hi,

The resolution of your raster data (1 degree) is much more coarse than what your polygons represent.  Could you short-circuit the process by assuming that the temp at the centroid of each polygon would suitably represent the mean temperature across each polygon?  Unless you have some much bigger polygons, I can't imagine it will be very far off. If so, then you could pretty quickly extract the values for each layer in the raster at each centroid.  Perhaps like this?

cents <- coordinates(br_sub)
v <- extract(b, cents)

Is that close enough?

Cheers,
Ben


> On Apr 6, 2016, at 8:00 AM, Thiago V. dos Santos via R-sig-Geo <[hidden email]> wrote:
>
> Dear all,
>
> I am trying to extract a time series from a raster brick using multiple polygons.
>
> The raster brick is a global temperature netcdf file with 1200 layers (months) and the polygons are each of the 5567 municipalities in Brazil. I intend to extract the temperature time series for each municipality.
>
> As a final result, I would like to have a data frame containing: -the name and coordinates of the municipality, -the date tag from the raster, and -the extracted temperature value.
>
> My initial, VERY SLOW attempt was something like this:
>
>
> library(raster)
>
> # Create some sample raster data
> idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")
> r <- raster(ncol=360, nrow=180)
> b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
> b <- setZ(b, idx)
>
> # Import polygon data
> br <- getData("GADM", country="Brazil", level=2) # about 7MB to download
>
> # Subset the SMALLEST state - only 75 municipalities
> br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]
> plot(br_sub)
>
> # Now let's extract the data for each municipality
> beginCluster()
> e <- extract(b, br_sub, fun=mean, df=T)
> endCluster()
>
>
> Even using the smallest state possible, this example takes about 20 minutes to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there are states with over 850 municipalities. And remember, in total there are over 5500 municipalities I need to extract the data for.
>
> I have the feeling that my code is not very optimized.
>
> Has anybody ever dealt with this amount of data in this kind of raster operation? Is there any fastest way to achieve my expected result?
> Thanks in advance,
> -- Thiago V. dos Santos
>
> PhD student
> Land and Atmospheric Science
>
> University of Minnesota
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo