(no subject)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

(no subject)

Giuseppe Amatulli
Hi,
first of all  happy new year!

I'm trying to do forest species distribution at European level (1 km
resolution) by means of  randomForest running it in a cluster
computer. I'm using several predictors of different data sources. All
of them are rasters in grass format.  Therefore i was using spgrass6
to import the data in to R and apply randomForest prediction to the
layers.

In the same time, reading carefully the help page of the raster
package seems to me that his "row by row" feature allows a better
performance of the memory limitation, compare to spgrass6. It is this
the case?
If raster package is more efficient, how i can use it to import grass
data?  I suppose by reading the raster under the cellhd folder
>  maps  <-  stack ( c ( 'LOCATION/PERMANENT/cellhd/grid1','LOCATION/PERMANENT/cellhd/grid2'))

One more question.
The data for training randomforest are stored in R table. Each
observation represent the  presence/absence ( 0 or 1 ) of a plant
specie. I also have an item of presence/absence reliability which give
to me information concerning the quality of the data. Whit this item i
would like to give a "weight" in randForest in order to give more
importance to the "good" data. Any idea?
As rough  idea i was thinking to replicate the data in accordance to
the quality but it this will increment to much the amount of data. On
the opposite a stratified bootstrapping will result in a data
squeezing and long computation.
In other words i'm searching a weight options as present in lm model. Any idea?

Thank in advance
Regards
Giuseppe Amatulli

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

Roger Bivand
Administrator
On Mon, 10 Jan 2011, Giuseppe Amatulli wrote:

> Hi,
> first of all  happy new year!
>
> I'm trying to do forest species distribution at European level (1 km
> resolution) by means of  randomForest running it in a cluster
> computer. I'm using several predictors of different data sources. All
> of them are rasters in grass format.  Therefore i was using spgrass6
> to import the data in to R and apply randomForest prediction to the
> layers.
>
> In the same time, reading carefully the help page of the raster
> package seems to me that his "row by row" feature allows a better
> performance of the memory limitation, compare to spgrass6. It is this
> the case?
> If raster package is more efficient, how i can use it to import grass
> data?  I suppose by reading the raster under the cellhd folder
>>  maps  <-  stack ( c ( 'LOCATION/PERMANENT/cellhd/grid1','LOCATION/PERMANENT/cellhd/grid2'))

In principle, the GRASS GDAL plugin should work in this way, but you can
also use g.region in GRASS to set the region for readRAST6() to read,
which could be in tiles or rows at your convenience. This might be easier
if the predicted tiles are to be written back to GRASS as part of the
process. It would be overkill to think of this kind of iterated region
support in raster, which uses the region.dim= and offset= features of the
GDAL interface, I think.

It would be fun to see whether SAGA could be used in the same way with
raster, as there is a SAGA GDAL driver.

Roger

>
> One more question.
> The data for training randomforest are stored in R table. Each
> observation represent the  presence/absence ( 0 or 1 ) of a plant
> specie. I also have an item of presence/absence reliability which give
> to me information concerning the quality of the data. Whit this item i
> would like to give a "weight" in randForest in order to give more
> importance to the "good" data. Any idea?
> As rough  idea i was thinking to replicate the data in accordance to
> the quality but it this will increment to much the amount of data. On
> the opposite a stratified bootstrapping will result in a data
> squeezing and long computation.
> In other words i'm searching a weight options as present in lm model. Any idea?
>
> Thank in advance
> Regards
> Giuseppe Amatulli
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

Robert Hijmans
Giuseppe,

You can also have a look at the 'dismo' package for species distribution modeling.  The vignette (under development) has a brief example with randomForest.

To apply weights in randomForest (I have not tried this and could very well be wrong), you can perhaps use regression rather than classification (in my experience randomForest regression works much better in this context) and adjust your presence/absence data based on certainty (highly certain absence = -1, highly certain presence = 1, everything else scaled in between).

A perhpas better alternative would be to use the cforest function (another random forest implementation) in 'party', because this function has a weights argument.

Robert
Reply | Threaded
Open this post in threaded view
|

Re: (no subject)

Rainer M Krug-6
In reply to this post by Giuseppe Amatulli
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/10/2011 06:48 PM, Giuseppe Amatulli wrote:

> Hi,
> first of all  happy new year!
>
> I'm trying to do forest species distribution at European level (1 km
> resolution) by means of  randomForest running it in a cluster
> computer. I'm using several predictors of different data sources. All
> of them are rasters in grass format.  Therefore i was using spgrass6
> to import the data in to R and apply randomForest prediction to the
> layers.
>
> In the same time, reading carefully the help page of the raster
> package seems to me that his "row by row" feature allows a better
> performance of the memory limitation, compare to spgrass6. It is this
> the case?
> If raster package is more efficient, how i can use it to import grass
> data?  I suppose by reading the raster under the cellhd folder
>>  maps  <-  stack ( c ( 'LOCATION/PERMANENT/cellhd/grid1','LOCATION/PERMANENT/cellhd/grid2'))

Another option would be to copy the data into a database, and then use
sql queries to select locations to analyse. In this way, you could have
more control over the number of cells to analyse. This obviously only
works, if you don't need neighbourhood information.

Cheers,

Rainer


>
> One more question.
> The data for training randomforest are stored in R table. Each
> observation represent the  presence/absence ( 0 or 1 ) of a plant
> specie. I also have an item of presence/absence reliability which give
> to me information concerning the quality of the data. Whit this item i
> would like to give a "weight" in randForest in order to give more
> importance to the "good" data. Any idea?
> As rough  idea i was thinking to replicate the data in accordance to
> the quality but it this will increment to much the amount of data. On
> the opposite a stratified bootstrapping will result in a data
> squeezing and long computation.
> In other words i'm searching a weight options as present in lm model. Any idea?
>
> Thank in advance
> Regards
> Giuseppe Amatulli
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo


- --
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:        +33 - (0)9 53 10 27 44
Cell:       +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:      [hidden email]

Skype:      RMkrug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0sCLUACgkQoYgNqgF2egqO8ACfWW+xW5pjMWAyFNwBWMpZrzDa
Z+AAn1i5yQcHuBjY68g7xheCpJkYhL3z
=vfdo
-----END PGP SIGNATURE-----

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo