Help on slow processing of extract fuction

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Help on slow processing of extract fuction

Rahul Raj
Dear All,

I am using the "extract" command to extract the fraction of each class in
the raster cells, falling in each grid of shape file of polygon type.
> extract(r,v) ### where 'v' is my polygon grid of 1 km resolution and 'r'
is my classified raster image.

My polygon grid has more than 2 million polygons and I am running the
command in 64 bit window xp operating system having quard 4 processor and 8
GB RAM. But the extract function is taking a long time (more than 2 days) to
generate results.
please help me how can I speed up the processing time. Is there any other
packages in R that can provide the fast processing for this type of
extraction.

Also, how can I read one polygon at a time in R from the polygon shape file
stored in disk.

Thanks in advance for the valuable suggestions.

With regards
Rahul Raj
Indian Institute of Remote Sensing, india.

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Help on slow processing of extract fuction

Roger Bivand
Administrator
On Mon, 21 Feb 2011, Rahul Raj wrote:

> Dear All,
>
> I am using the "extract" command to extract the fraction of each class in
> the raster cells, falling in each grid of shape file of polygon type.
>> extract(r,v) ### where 'v' is my polygon grid of 1 km resolution and 'r'
> is my classified raster image.
>
> My polygon grid has more than 2 million polygons and I am running the
> command in 64 bit window xp operating system having quard 4 processor and 8
> GB RAM. But the extract function is taking a long time (more than 2 days) to
> generate results.
> please help me how can I speed up the processing time. Is there any other
> packages in R that can provide the fast processing for this type of
> extraction.
>
> Also, how can I read one polygon at a time in R from the polygon shape file
> stored in disk.

Please, do try to think carefully before beginning work. What are you
actually doing? Are your polygon objects really irregular multipolygons,
or are they rectangles? Do they overlap? If I recall, you are looping over
the polygons - did you time a single polygon and say 100 polygons taken
together? You are trying to use R/raster/sp for tasks that are very
different from those they were designed for, in a very demanding way
(loops are known to be inferior to vectorisation in R).

So you run extract() on your whole raster for each polygon, which
internally crops your raster to the polygon, rasterises it, turns it into
points, and extracts the values at those rasterised points from the
raster. In large scale applications like yours, almost certainly one
should use different data and process models.

Doing this 2 million times is not efficient, as you are experiencing. I
don't know why you thought it would be. How to make it more efficient will
depend on your data structures, and possibly on coding in R, or R and C.
extract() will work perfectly well on your raster with say 500 polygons,
but does not seem to have been written for very many polygons.

Roger

>
> Thanks in advance for the valuable suggestions.
>
> With regards
> Rahul Raj
> Indian Institute of Remote Sensing, india.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Help on slow processing of extract fuction

Rahul Raj
On Mon, Feb 21, 2011 at 3:18 AM, Roger Bivand <[hidden email]> wrote:

> On Mon, 21 Feb 2011, Rahul Raj wrote:
>
>  Dear All,
>>
>> I am using the "extract" command to extract the fraction of each class in
>> the raster cells, falling in each grid of shape file of polygon type.
>>
>>> extract(r,v) ### where 'v' is my polygon grid of 1 km resolution and 'r'
>>>
>> is my classified raster image.
>>
>> My polygon grid has more than 2 million polygons and I am running the
>> command in 64 bit window xp operating system having quard 4 processor and
>> 8
>> GB RAM. But the extract function is taking a long time (more than 2 days)
>> to
>> generate results.
>> please help me how can I speed up the processing time. Is there any other
>> packages in R that can provide the fast processing for this type of
>> extraction.
>>
>> Also, how can I read one polygon at a time in R from the polygon shape
>> file
>> stored in disk.
>>
>
> Please, do try to think carefully before beginning work. What are you
> actually doing? Are your polygon objects really irregular multipolygons, or
> are they rectangles? Do they overlap? If I recall, you are looping over the
> polygons - did you time a single polygon and say 100 polygons taken
> together? You are trying to use R/raster/sp for tasks that are very
> different from those they were designed for, in a very demanding way (loops
> are known to be inferior to vectorisation in R).
>
> So you run extract() on your whole raster for each polygon, which
> internally crops your raster to the polygon, rasterises it, turns it into
> points, and extracts the values at those rasterised points from the raster.
> In large scale applications like yours, almost certainly one should use
> different data and process models.
>
> Doing this 2 million times is not efficient, as you are experiencing. I
> don't know why you thought it would be. How to make it more efficient will
> depend on your data structures, and possibly on coding in R, or R and C.
> extract() will work perfectly well on your raster with say 500 polygons, but
> does not seem to have been written for very many polygons.
>
> Roger
>
>
>> Thanks in advance for the valuable suggestions.
>>
>> With regards
>> Rahul Raj
>> Indian Institute of Remote Sensing, india.
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: [hidden email]
>
> Dear Roger

I have a classified image of whole India (India measures 3,214 km (1,997 mi)
from north to south and 2,993 km (1,860 mi) from east to west) at 56m
resolution. I have generate a square grid  of type polygon (area of each
square polygon in grid is 1sq km) that covers whole India. Now I have to
extract the fraction of crop classes from classified image for each square
polygon in the grid.

For this purpose I thought to use the extract command. As the computer
memory can not handle the such a large grid, therefore I divided my grid
according to the state of India (India has 28 states) .I have used the clip
operation of ARC GIS to clip the state wise grid (from my large grid) using
vector boundary of state. At the boundary of state grid (still have 0.1 to 2
million polygons) , I am losing the square shape of polygon.
 I have used rgdal package to import this subset grid (state wise). I am
running the 'extract' command in loop so that the extraction of class
fraction (from classified map) runs on one polygon at a time in each loop (I
did it to avoid the out of memory problem of computer).

I knew by reading your mail that my approach is not appropriate.
I am also getting problems like " Error in gc(): caught access violation -
continue with care", when the looping on  extract function starts.

I am a beginner in R programming and I have joined R sig geo just before few
days. Please help me to short out my problem in handling large grid in R.

Please let me know if still you have doubt in understanding my data.

With regards
Rahul Raj
Indian Institute of Remote Sensing, India.

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Help on slow processing of extract fuction

Roger Bivand
Administrator
On Mon, 21 Feb 2011, Rahul Raj wrote:

> On Mon, Feb 21, 2011 at 3:18 AM, Roger Bivand <[hidden email]> wrote:
>
>> On Mon, 21 Feb 2011, Rahul Raj wrote:
>>
>>  Dear All,
>>>
>>> I am using the "extract" command to extract the fraction of each class in
>>> the raster cells, falling in each grid of shape file of polygon type.
>>>
>>>> extract(r,v) ### where 'v' is my polygon grid of 1 km resolution and 'r'
>>>>
>>> is my classified raster image.
>>>
>>> My polygon grid has more than 2 million polygons and I am running the
>>> command in 64 bit window xp operating system having quard 4 processor and
>>> 8
>>> GB RAM. But the extract function is taking a long time (more than 2 days)
>>> to
>>> generate results.
>>> please help me how can I speed up the processing time. Is there any other
>>> packages in R that can provide the fast processing for this type of
>>> extraction.
>>>
>>> Also, how can I read one polygon at a time in R from the polygon shape
>>> file
>>> stored in disk.
>>>
>>
>> Please, do try to think carefully before beginning work. What are you
>> actually doing? Are your polygon objects really irregular multipolygons, or
>> are they rectangles? Do they overlap? If I recall, you are looping over the
>> polygons - did you time a single polygon and say 100 polygons taken
>> together? You are trying to use R/raster/sp for tasks that are very
>> different from those they were designed for, in a very demanding way (loops
>> are known to be inferior to vectorisation in R).
>>
>> So you run extract() on your whole raster for each polygon, which
>> internally crops your raster to the polygon, rasterises it, turns it into
>> points, and extracts the values at those rasterised points from the raster.
>> In large scale applications like yours, almost certainly one should use
>> different data and process models.
>>
>> Doing this 2 million times is not efficient, as you are experiencing. I
>> don't know why you thought it would be. How to make it more efficient will
>> depend on your data structures, and possibly on coding in R, or R and C.
>> extract() will work perfectly well on your raster with say 500 polygons, but
>> does not seem to have been written for very many polygons.
>>
>> Roger
>>
>>
>>> Thanks in advance for the valuable suggestions.
>>>
>>> With regards
>>> Rahul Raj
>>> Indian Institute of Remote Sensing, india.
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>> --
>> Roger Bivand
>> Economic Geography Section, Department of Economics, Norwegian School of
>> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
>> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: [hidden email]
>>
>> Dear Roger
>
> I have a classified image of whole India (India measures 3,214 km (1,997 mi)
> from north to south and 2,993 km (1,860 mi) from east to west) at 56m
> resolution. I have generate a square grid  of type polygon (area of each
> square polygon in grid is 1sq km) that covers whole India. Now I have to
> extract the fraction of crop classes from classified image for each square
> polygon in the grid.

My immediate impression is that you are going to have to loop anyway, but
that you can avoid all the cropping by using the lower-level functions in
rgdal. The function GDAL.open() opens a handle to a dataset - you must
close the dataset with GDAL.close() when done. Then use the offset=c(yrow,
xrow) in the getRasterTable() function to step over the dataset in
appropriate chunks. Train first using a part of the map you know well to
be sure what is going on. Then just loop over your offset, and aggregate
on the values returned by getRasterTable(), which are the cell coordinates
and the land cover value of interest.

This is using the fact that your polygons are rectangles, so can be
emulated by setting for example region.dim=c(20, 20), and iterating over
offset. You can tag your output data with the centroid of the 400 cells
retrieved each time. This gives you access to all the land cover classes
in each 1km square. Because 20 x 56m is 1120m, you may want to choose say
18 by 18 for region.dim=, which is 1008 by 1008.

You might also decimate your input grid with output.dim= to let you train
on a smaller data set that is easier to visualise.

For the global landcover dataset:

> library(rgdal)
> dataset <- GDAL.open("GLOBCOVER_L4_200901_200912_V2.3.tif")
> dim(dataset)
[1]  55800 129600
> displayDataset(dataset, band = 1, output.dim=c(180, 360))

shows me the whole dataset, and:

> displayDataset(dataset, band = 1, offset=c(1000, 2000),
+ region.dim=c(20000, 40000), output.dim=c(180, 360))

northern North America. This data set is otherwise too large to handle,
but getRasterTable() and getRasterData() - which returns just the data as
a matrix - are pretty fast.

Remember to do GDAL.close(dataset) when done.

Hope this helps,

Roger

PS. library(fortunes); fortune("Yoda")


>
> For this purpose I thought to use the extract command. As the computer
> memory can not handle the such a large grid, therefore I divided my grid
> according to the state of India (India has 28 states) .I have used the clip
> operation of ARC GIS to clip the state wise grid (from my large grid) using
> vector boundary of state. At the boundary of state grid (still have 0.1 to 2
> million polygons) , I am losing the square shape of polygon.
> I have used rgdal package to import this subset grid (state wise). I am
> running the 'extract' command in loop so that the extraction of class
> fraction (from classified map) runs on one polygon at a time in each loop (I
> did it to avoid the out of memory problem of computer).
>
> I knew by reading your mail that my approach is not appropriate.
> I am also getting problems like " Error in gc(): caught access violation -
> continue with care", when the looping on  extract function starts.
>
> I am a beginner in R programming and I have joined R sig geo just before few
> days. Please help me to short out my problem in handling large grid in R.
>
> Please let me know if still you have doubt in understanding my data.
>
> With regards
> Rahul Raj
> Indian Institute of Remote Sensing, India.
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Help on slow processing of extract fuction

Robert Hijmans
In reply to this post by Rahul Raj
> I have a classified image of whole India (India measures 3,214 km (1,997 mi)
> from north to south and 2,993 km (1,860 mi) from east to west) at 56m
> resolution. I have generate a square grid  of type polygon (area of each
> square polygon in grid is 1sq km) that covers whole India. Now I have to
> extract the fraction of crop classes from classified image for each square
> polygon in the grid.

Presumably you have the 1 km grid also as a raster file. I think I would do this:

library(raster)
km <- raster("1km_raster_filename")
# make an "empty" RasterLayer
km <- raster(km)
# disaggregate to 50 m
m50 <- disaggregate(km, 20)

# resample land cover data to 50 m such that it lines up with the 1 km cells you want.
x <- raster("land cover data file")

m50 <- resample(x, m50, method="ngb", filename="lc_resampled.tif", progress="text")

# now things line up, and you could reclassify and use zonal to extract the fraction of each class, but that would be a bit inefficient for what you want. So I would do something along these lines

km1 <- writeStart(km, filename='km1.tif')
km2 <- writeStart(km, filename='km2.tif')

nsteps <- ceiling(nrow(km) / 20)
for (s in 1:nsteps) {
        r <- (s-1) * 20 + 1
        v <- getValues(m50, r, r + 19)
        # v has 20 rows of values. They need to be summarized in square blocks (20 columns)
        v <- matrix(as.vector(v), nrow=400)
        # % of values that is '1'
        vv1 <- apply(v, 2, fun(x) sum(x==1))  / 4
        km1 <- writeValues(km1, vv1, s)
        # % of values that is '2'
        vv2 <- apply(v, 2, fun(x) sum(x==2))  / 4
        km2 <- writeValues(km1, vv2, s)
}
km1 <- writeStop(km1)
km2 <- writeStop(km2)

# not tested, not dealing with NA values, and probably full of small errors, so this will take some work, but I think the general approach should work. See the rater vignette about writing functions for more details.

Hope this helps,
Robert