Hi Ben,

Thank you so much for the suggestion.

It is pretty fast, and I guess it's a good assumption for temperature. I'll have to do the same with precipitation, which is way more spatially variable than temperature. In that case, I think I'll have to wait the 10 hours it takes to run "extract" using a mean value of all cells covering a polygon.

It is a one-time processing anyway, so I should not worry too much about the waiting time.

Greetings,

-- Thiago V. dos Santos

PhD student

Land and Atmospheric Science

University of Minnesota

On Wednesday, April 6, 2016 10:48 AM, Ben Tupper <

[hidden email]> wrote:

Hi,

The resolution of your raster data (1 degree) is much more coarse than what your polygons represent. Could you short-circuit the process by assuming that the temp at the centroid of each polygon would suitably represent the mean temperature across each polygon? Unless you have some much bigger polygons, I can't imagine it will be very far off. If so, then you could pretty quickly extract the values for each layer in the raster at each centroid. Perhaps like this?

cents <- coordinates(br_sub)

v <- extract(b, cents)

Is that close enough?

Cheers,

Ben

> On Apr 6, 2016, at 8:00 AM, Thiago V. dos Santos via R-sig-Geo <

[hidden email]> wrote:

>

> Dear all,

>

> I am trying to extract a time series from a raster brick using multiple polygons.

>

> The raster brick is a global temperature netcdf file with 1200 layers (months) and the polygons are each of the 5567 municipalities in Brazil. I intend to extract the temperature time series for each municipality.

>

> As a final result, I would like to have a data frame containing: -the name and coordinates of the municipality, -the date tag from the raster, and -the extracted temperature value.

>

> My initial, VERY SLOW attempt was something like this:

>

>

> library(raster)

>

> # Create some sample raster data

> idx <- seq(as.Date("1960/1/1"), as.Date("1990/12/01"), by = "month")

> r <- raster(ncol=360, nrow=180)

> b <- brick(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))

> b <- setZ(b, idx)

>

> # Import polygon data

> br <- getData("GADM", country="Brazil", level=2) # about 7MB to download

>

> # Subset the SMALLEST state - only 75 municipalities

> br_sub <- br[br$NAME_1=="Sergipe" & !is.na(br$NAME_1),]

> plot(br_sub)

>

> # Now let's extract the data for each municipality

> beginCluster()

> e <- extract(b, br_sub, fun=mean, df=T)

> endCluster()

>

>

> Even using the smallest state possible, this example takes about 20 minutes to run on a dual-core 2.5GHz Macbook Pro using four threads. As a reference, there are states with over 850 municipalities. And remember, in total there are over 5500 municipalities I need to extract the data for.

>

> I have the feeling that my code is not very optimized.

>

> Has anybody ever dealt with this amount of data in this kind of raster operation? Is there any fastest way to achieve my expected result?

> Thanks in advance,

> -- Thiago V. dos Santos

>

> PhD student

> Land and Atmospheric Science

>

> University of Minnesota

>

> _______________________________________________

> R-sig-Geo mailing list

>

[hidden email]
>

https://stat.ethz.ch/mailman/listinfo/r-sig-geoBen Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org_______________________________________________

R-sig-Geo mailing list

[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo