Making GIS and R play nicely - some tips?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Making GIS and R play nicely - some tips?

Andy Bunn
Hi R Spatial People:

I am an experienced R user and an experienced GIS user - however, that GIS
is Arc. In the past when I've worked with spatial data in R, I've done so by
exporting the GIS data (always grids) from Arc as ASCII files. Then I read
them into R and go through some gymnastics to dump the no data values
(-9999) while I do my analysis (say a regression tree) and then write the
ASCII back out and suck it into Arc for display. This lose coupling seems
inelegant.

I'm starting on a new project where I have a time series of about 250 grids
that are roughly 1000 by 1000. There is a response variable and up to six or
so predictors. So, it's enough data that I feel it's time to learn how to do
it right. Question is, what's right?

Is the state of the art approach to use GRASS a la Bivand and Neteler's
paper (http://agec144.agecon.uiuc.edu/csiss/Rgeo/)? Is it better to dump the
grids to an imagine file (or bil) and read them with rgdal? I work mostly in
Windows (b/c of ESRI - damn them!) but switch gears onto Linux when the task
requires it. I have never used GRASS. Show me the way!

Thanks in advance, Andy



Reply | Threaded
Open this post in threaded view
|

Making GIS and R play nicely - some tips?

Roger Bivand
Administrator
On Wed, 2 Mar 2005, abunn wrote:

> Hi R Spatial People:
>
> I am an experienced R user and an experienced GIS user - however, that GIS
> is Arc. In the past when I've worked with spatial data in R, I've done so by
> exporting the GIS data (always grids) from Arc as ASCII files. Then I read
> them into R and go through some gymnastics to dump the no data values
> (-9999) while I do my analysis (say a regression tree) and then write the
> ASCII back out and suck it into Arc for display. This lose coupling seems
> inelegant.
>
> I'm starting on a new project where I have a time series of about 250 grids
> that are roughly 1000 by 1000. There is a response variable and up to six or
> so predictors. So, it's enough data that I feel it's time to learn how to do
> it right. Question is, what's right?
>
> Is the state of the art approach to use GRASS a la Bivand and Neteler's
> paper (http://agec144.agecon.uiuc.edu/csiss/Rgeo/)? Is it better to dump the
> grids to an imagine file (or bil) and read them with rgdal? I work mostly in
> Windows (b/c of ESRI - damn them!) but switch gears onto Linux when the task
> requires it. I have never used GRASS. Show me the way!
>

Well, there are plenty of possibilities. The first question (abstracting
from the size of the data) is whether to go the way suggested in:

http://www.gisvet.org/Documents/GisVet04/RegularPapers/Tait.pdf

which leaves Arc in front, using R just as the compute engine, and passing
data through StatConnector. I don't think that the code used has been
published, but a very simple VBA example is at:

http://perso.univ-lr.fr/csaintje/Recherche/RArcgis/index.html

That permits the Arc user to avoid knowing about R if you need later to
package the procedure for others to use. My guess is that this hasn't been
checked on ArcGIS 9 (I'll try to check soon).

Another is as you suggest to use the rgdal package to move the data into
R, and once we get rgdal better checked for writing, back out too. That
puts R in front of the loose-coupled data.

A third is to use GRASS 5.4 and the R GRASS interface package, which reads
and writes GRASS raster layers; GRASS can read from many formats and to
many formats, and can run (as can the interface) under cygwin. This runs
R on top of GRASS, which can be accessed through system(). The state
of the art is certainly M. Neteler, H. Mitasova, 2004. Open Source GIS: A
GRASS GIS Approach. Second Edition. 424 pages, Kluwer Academic Publishers,
Boston, Dordrecht, ISBN 1-4020-8064-6 (Also published as eBook, ISBN
1-4020-8065-4), chapter 13. You might also enjoy:

http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/FurlanelloEtAl.pdf

showing GRASS, R and PostgreSQL working together.

But isn't the real challenge going to be shoehorning 250 by 1000 by 1000
by 8 into R at once - or are the data layers needed just one after the
other? If I could grasp the workflow better, the advice above could become
more focussed, I think.

Best wishes,

Roger

> Thanks in advance, Andy
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no



Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Making GIS and R play nicely - some tips?

Andy Bunn
Great. Thanks Roger and other others that replied.

> But isn't the real challenge going to be shoehorning 250 by 1000 by 1000
> by 8 into R at once - or are the data layers needed just one after the
> other? If I could grasp the workflow better, the advice above could become
> more focussed, I think.

I'm not positive on the workflow yet! That's one of the problems. I'm
finding it tough to wrap my head around the data since it is in both time
and space.

The data are not as huge as I thought at first. Each layer has 567 rows and
409 columns (231903 elements) but 137113 are no data. So, that makes the
actual amount of data to crunch much more manageable at 94790 elements per
layer.

The goal is to predict satellite reflectance as a function of climate. The
response data are monthly satellite reflectance over 19 years (So, that's
228 images). The predictor data are interpolated climate data and there
should be a minimum of three predictors and I'd like to use two more if they
are reliable. So, assuming three predictors the data would only run to 500mb
or so.

R > object.size(rep(runif(94790), times = 19*12*3)) / 2^20
[1] 494.6622

And if I can get all five then I'd be under a gig still.

R > object.size(rep(runif(94790), times = 19*12*5)) / 2^20
[1] 824.437

That's cumbersome but possible.

I've managed to read in a test example using erdas img files using rgdal
(thanks Tim Keitt):

R > library(rgdal)
R > junk <- list()
R > try1 <- GDAL.open("test.img")
R > getDriverLongName(getDriver(try1))
[1] "Erdas Imagine Images (.img)"
R > junk[[1]] <- c(getRasterData(try1))
R > GDAL.close(try1)
Closing GDAL dataset handle 00CACBF0...  destroyed ... done.
R > try2 <- GDAL.open("test2.img")
R > getDriverLongName(getDriver(try2))
[1] "Erdas Imagine Images (.img)"
R > junk[[2]] <- c(getRasterData(try2))
R > GDAL.close(try2)
Closing GDAL dataset handle 00C7F180...  destroyed ... done.
R > str(junk)
List of 2
 $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...
 $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...


So, to read all this in I can pipe a script to Arc to convert the grids to
images and loop through all the data. Does that sound like a reasonable
plan?

Thanks for all the help, Andy



Reply | Threaded
Open this post in threaded view
|

Making GIS and R play nicely - some tips?

Roger Bivand
Administrator
On Wed, 2 Mar 2005, abunn wrote:

> Great. Thanks Roger and other others that replied.
>
> > But isn't the real challenge going to be shoehorning 250 by 1000 by 1000
> > by 8 into R at once - or are the data layers needed just one after the
> > other? If I could grasp the workflow better, the advice above could become
> > more focussed, I think.
>
> I'm not positive on the workflow yet! That's one of the problems. I'm
> finding it tough to wrap my head around the data since it is in both time
> and space.
>
> The data are not as huge as I thought at first. Each layer has 567 rows and
> 409 columns (231903 elements) but 137113 are no data. So, that makes the
> actual amount of data to crunch much more manageable at 94790 elements per
> layer.
>
> The goal is to predict satellite reflectance as a function of climate. The
> response data are monthly satellite reflectance over 19 years (So, that's
> 228 images). The predictor data are interpolated climate data and there
> should be a minimum of three predictors and I'd like to use two more if they
> are reliable. So, assuming three predictors the data would only run to 500mb
> or so.
>
> R > object.size(rep(runif(94790), times = 19*12*3)) / 2^20
> [1] 494.6622
>
> And if I can get all five then I'd be under a gig still.
>
> R > object.size(rep(runif(94790), times = 19*12*5)) / 2^20
> [1] 824.437
>
> That's cumbersome but possible.
>
> I've managed to read in a test example using erdas img files using rgdal
> (thanks Tim Keitt):
>
> R > library(rgdal)
> R > junk <- list()
> R > try1 <- GDAL.open("test.img")
> R > getDriverLongName(getDriver(try1))
> [1] "Erdas Imagine Images (.img)"
> R > junk[[1]] <- c(getRasterData(try1))
> R > GDAL.close(try1)
> Closing GDAL dataset handle 00CACBF0...  destroyed ... done.
> R > try2 <- GDAL.open("test2.img")
> R > getDriverLongName(getDriver(try2))
> [1] "Erdas Imagine Images (.img)"
> R > junk[[2]] <- c(getRasterData(try2))
> R > GDAL.close(try2)
> Closing GDAL dataset handle 00C7F180...  destroyed ... done.
> R > str(junk)
> List of 2
>  $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...
>  $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...
>
>
> So, to read all this in I can pipe a script to Arc to convert the grids to
> images and loop through all the data. Does that sound like a reasonable
> plan?

What are the volumes going back?

Another itch to scratch is how much more you will get in terms of
precision in the coefficient point estimates by actually forcing all this
data down lm() or whatever's throat? Would it be possible to reduce the
load by using subsetting in rgdal, given that you had the *.img and model
based files lined up (or just use locations for which you have observed
climate data)?  Won't the interpolation errors in the predictors feed
through? I think the Furlanello et al. paper may be very helpful (they use
randomForest on regression trees of climate/topography drivers).

Interesting problem.

Roger

>
> Thanks for all the help, Andy
>
>
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no



Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Making GIS and R play nicely - some tips?

Tim Keitt
In reply to this post by Andy Bunn
Hi Andy,

I've had pretty good luck reading large images line-by-line in rgdal.
Its kind of slow, but not a bad as one might imagine. ArcGIS went down
in flames trying to do any sort of processing on these files (10K x
10K). A better solution I think though will be to write space-time
points into a postgis database (given lots of disk space) and use
queries to subset the data as needed. I've found the database approach
increadibly useful for working with BBS + climate data (~1e7 records).

Ciao,
THK

On Wed, 2005-03-02 at 11:26 -0500, abunn wrote:

> Great. Thanks Roger and other others that replied.
>
> > But isn't the real challenge going to be shoehorning 250 by 1000 by 1000
> > by 8 into R at once - or are the data layers needed just one after the
> > other? If I could grasp the workflow better, the advice above could become
> > more focussed, I think.
>
> I'm not positive on the workflow yet! That's one of the problems. I'm
> finding it tough to wrap my head around the data since it is in both time
> and space.
>
> The data are not as huge as I thought at first. Each layer has 567 rows and
> 409 columns (231903 elements) but 137113 are no data. So, that makes the
> actual amount of data to crunch much more manageable at 94790 elements per
> layer.
>
> The goal is to predict satellite reflectance as a function of climate. The
> response data are monthly satellite reflectance over 19 years (So, that's
> 228 images). The predictor data are interpolated climate data and there
> should be a minimum of three predictors and I'd like to use two more if they
> are reliable. So, assuming three predictors the data would only run to 500mb
> or so.
>
> R > object.size(rep(runif(94790), times = 19*12*3)) / 2^20
> [1] 494.6622
>
> And if I can get all five then I'd be under a gig still.
>
> R > object.size(rep(runif(94790), times = 19*12*5)) / 2^20
> [1] 824.437
>
> That's cumbersome but possible.
>
> I've managed to read in a test example using erdas img files using rgdal
> (thanks Tim Keitt):
>
> R > library(rgdal)
> R > junk <- list()
> R > try1 <- GDAL.open("test.img")
> R > getDriverLongName(getDriver(try1))
> [1] "Erdas Imagine Images (.img)"
> R > junk[[1]] <- c(getRasterData(try1))
> R > GDAL.close(try1)
> Closing GDAL dataset handle 00CACBF0...  destroyed ... done.
> R > try2 <- GDAL.open("test2.img")
> R > getDriverLongName(getDriver(try2))
> [1] "Erdas Imagine Images (.img)"
> R > junk[[2]] <- c(getRasterData(try2))
> R > GDAL.close(try2)
> Closing GDAL dataset handle 00C7F180...  destroyed ... done.
> R > str(junk)
> List of 2
>  $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...
>  $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...
>
>
> So, to read all this in I can pipe a script to Arc to convert the grids to
> images and loop through all the data. Does that sound like a reasonable
> plan?
>
> Thanks for all the help, Andy
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



Reply | Threaded
Open this post in threaded view
|

Making GIS and R play nicely - some tips?

Andy Bunn
Hey Tim:

I'm writing the database as we speak which will take awhile but will then be
done. It is going faster than I thought it would. I figured out how to call
arc from the prompt and pass it an aml and variables from R:

        aml2pass <- paste('arc "&r make.an.image.aml', which.var, year,
                           formatC(month, width = 2, flag = 0), '"', sep = "
")
        system(aml2pass)

This means that I can write temporary files from arc, process them and clean
up without having a lot of loose ends. So, with that and rgdal, everything
is smooth. I'll leave GRASS for another day (I've been saying that for about
seven years now).

Thanks, A



Reply | Threaded
Open this post in threaded view
|

Making GIS and R play nicely - some tips?

Michael Sumner
In reply to this post by Andy Bunn

>
>Is the state of the art approach to use GRASS a la Bivand and Neteler's
>paper (http://agec144.agecon.uiuc.edu/csiss/Rgeo/)? Is it better to dump the
>grids to an imagine file (or bil) and read them with rgdal? I work mostly in
>Windows (b/c of ESRI - damn them!) but switch gears onto Linux when the task
>requires it. I have never used GRASS. Show me the way!

Hello

Disclaimer - I've never used GRASS, and I haven't used ESRI much.

You might consider using Manifold GIS:  http://www.manifold.net/

Much more Windows (and user) friendly IMO, and has many options for input
and output for most formats, as well as generic text and binary.    (Give
it a few years and Manifold will have taken over much of the GIS market.)

You could output your grids to binary files (or ASCII), and with some
modification of my simple script write the projection metadata to XML -
then script-import these to Manifold.  This scripting could be done all
within R if that was desirable using RDCOMClient

I have some R routines that export with Manifold-friendly metadata in XML,
very rudimentary but might give you some ideas.

Passing grid (matrix) data from R to Manifold:
http://www.georeference.org/Forums/forums/thread-view.asp?tid=789&posts=9

Running R from Manifold:
http://www.georeference.org/Forums/forums/thread-view.asp?tid=650&posts=8

Running Manifold from R:
http://www.georeference.org/Forums/forums/thread-view.asp?tid=876&posts=9

Cheers, Mike.




###############################################

Michael Sumner - PhD. candidate
Maths and Physics (ACE CRC & IASOS) and Zoology (AWRU)
University of Tasmania
Private Bag 77, Hobart, Tas 7001, Australia
Phone: (03) 6226 1752

http://www.acecrc.org.au/