A package for spatial data classes: request for comments

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

A package for spatial data classes: request for comments

Edzer J. Pebesma
Much like time, spatial locations are not likely to change for observed
data. Still, R has no infrastructure or knowledge about spatial locations.

Many packages that deal with spatial data in R use their own classes
to deal with spatial data. On the workshop on spatial statistics software
held during DSC2003 (organized by Roger Bivand), we decided that
a base class that defines classes for spatial data should be helpful,
both as an exchange platform and (later) as a required package for
working with spatial data (compare the ts class for time series).

I started writing such a package, and opened a sourceforge.net project
for it, called r-spatial. I first worked with Barry Rowlinson's
spatial.data.frame (found in r-asp, or rasp, also on sourceforge),
which is  an S3 class. Then I restarted using S4 classes, because
they are here to stay, and allow validation.

Currently I defined three classes:

+ SpatialData
    +-- SpatialDataFrame
        +-- SpatialDataFrameGrid

SpatialData is only meant as a base class; it only contains a bounding
box (2D or 3D data), and information about projection (if present),
anticipating (re)projecting facilities in Roger's proj4R package.

SpatialDataFrame extends this class; holds a data frame, and the
information where in the data frame the coordinates (2D or 3D) are
stored.

SpatialDataFrameGrid extends SpatialDataFrame for the case where
the data are on a regular 2D/3D grid; it contains the offset of the grid,
the cellsize in (x,y,z) and the nr of row/cols/layers. One simple
way of creating these classes (idea taken from Barry) is:

 > data(meuse.grid) # data frame with gridded data
 > coordinates(meuse.grid) = c("x", "y") # promote to SpatialDataFrame
 > gridded(meuse.grid) = TRUE # promote to SpatialDataFrameGrid

in the last expression, the grid topology is auto-detected and stored.

Missing values for coordinates are not allowed.

This now works; I need to add more tests for a lot of pathetic cases.

To be really useful, the class should include vector (polygon) data,
and probably line elements. For this, I need help. The simples approach
would be to extend SpatialDataFrame to SpatialDataFramePolygon,
and add for each row add the corresponding polygon.

Questions:
- did I overlook important things in the current proposal?
- is there some class in a package that serves as a good
starting point for vector/polygon data?
- which information should be stored for each polygon (if I look
at class "Map" in maptools, there is a lot!)
- should we anticipate exchangeability with shapefiles, and
store everything needed for them right at the start? If yes,
how to deal with much simpler representations such as in
packages maps?
- should we work to two extensions of SpatialDataFrame,
first simply with the polygons, a second with all the
shapefile information requirements?
- will there ever be a need to export R data as shape files?
if not, which part of information in shapefiles may be ignored?
- Do we need another name, instead of the current SpatialCls?


You can download SpatialCls by cvs; use:

export CVS_RSH=ssh
cvs -d:pserver:anonymous at cvs.sf.net:/cvsroot/r-spatial login
# press return on the password prompt
cvs -d:pserver:anonymous at cvs.sf.net:/cvsroot/r-spatial co SpatialCls

If you are interested in becoming a co-developer, please join!
--
Edzer



Reply | Threaded
Open this post in threaded view
|

Re: [R] A package for spatial data classes: request for comments

Barry Rowlingson

> To be really useful, the class should include vector (polygon) data,
> and probably line elements. For this, I need help. The simples approach
> would be to extend SpatialDataFrame to SpatialDataFramePolygon,
> and add for each row add the corresponding polygon.

  I think you need to stop thinking about building classes at the
moment, and to think about specifying _interfaces_. What methods do we
want for spatial data? How will functions get the spatial data out of
the objects? Then we can build classes that implement these interfaces.

  For example, the rasp library uses sp.coords(foo) whenever it needs a
set of point locations for analysis. It doesn't care what 'foo' is, as
long as sp.coords(foo) returns a 2-column matrix.

  rasp provides spatial.data.frame as a class for when your data is
conveniently stored in a data frame, but each row has an associated
coordinate. Suppose your data is a varying number of measurements of
soil acidity over time at a number of locations - this would be better
stored as a list with foo[[i]] being a 2-column matrix (or data frame)
of time and acidity, and a single associated coordinate. This could then
be made into a spatial.list class. As long as there's an
sp.coords.spatial.list then you can use rasp analysis functions using
this object and it will get the coordinates.

  Typical methods that spatial location data will have would be things
like sp.coords() to get point locations, bbox() for bounding box, and
ummm, I cant really think of anything else at that level.

  Lat-long geographic data needs a slightly different set of interface
methods, to cope with projections, and this might implement a
'project<-' function, for example:

sp.coords(foo) # returns cbind(Lat, Long)
project(foo) <- '+proj=merc +lat0=35'

now sp.coords(foo) returns the projected coordinates.

  Any objects with lat-long coordinates (points, polygons, lines) need
to implement the 'project<-' function, and whatever 'foo' is, it gets
projected properly. The object can decide whether to do the projection
calculations at 'project(foo)<-' time or save it until 'sp.coords(foo)'
time.

  So, I'll wrap up this ramble with the thought that we need to consider
interfaces rather than classes, and I might just have another bash at S4
classes with this :)

Baz



Reply | Threaded
Open this post in threaded view
|

Re: [R] A package for spatial data classes: request for comments

Edzer J. Pebesma
Barry Rowlingson wrote:

>
>> To be really useful, the class should include vector (polygon) data,
>> and probably line elements. For this, I need help. The simples approach
>> would be to extend SpatialDataFrame to SpatialDataFramePolygon,
>> and add for each row add the corresponding polygon.
>
>
>  I think you need to stop thinking about building classes at the
> moment, and to think about specifying _interfaces_. What methods do we
> want for spatial data? How will functions get the spatial data out of
> the objects? Then we can build classes that implement these interfaces.

Baz, I agree that interfaces are important. Still, concentrating on
interfaces _only_
does encourage package writers to come up each with a new set of spatial
classes
-- something I would like to discourage: sharing more code makes the
code more
reliable.

I did build upon your idea of sp.coords, but called it coordinates, and
used reference (column name/numbers) instead of actual values:

coordinates(meuse) = c("x", "y")

I prefer coordinates because at some stage the S3 method mechanism may
interpret sp.coords as an sp method for a coords class.

My question, in your perspective, would be: which methods do we need for
a generic class containing vector/polygon data?

E.g. how is,

polygons(world) # gets or sets the polygons, as a list of 2 col matrices
polygonAttributes(world) # gets or sets the polygon attributes, as data
frame
--
Edzer



Reply | Threaded
Open this post in threaded view
|

Re: [R] A package for spatial data classes: request for comments

Luc Anselin
As I mentioned to Roger Bivand a few days ago, it might be a good
idea to take a look at the spatial classes in ESRI's ArcGIS. In my
view they are not 100% ready for statistical analysis, but the vector
classes are very well structured. What is missing is an efficient
way to incorporate "topology" (contiguity structure) to provide
an easy way to construct spatial weights. In GeoDa, we build this
from scratch, using the shape files, but  that's not the way it
should be (although very fast).
L.

On Monday, November 3, 2003, at 09:08 AM, Edzer J. Pebesma wrote:

> Barry Rowlingson wrote:
>
>>
>>> To be really useful, the class should include vector (polygon) data,
>>> and probably line elements. For this, I need help. The simples
>>> approach
>>> would be to extend SpatialDataFrame to SpatialDataFramePolygon,
>>> and add for each row add the corresponding polygon.
>>
>>
>>  I think you need to stop thinking about building classes at the
>> moment, and to think about specifying _interfaces_. What methods do
>> we want for spatial data? How will functions get the spatial data out
>> of the objects? Then we can build classes that implement these
>> interfaces.
>
> Baz, I agree that interfaces are important. Still, concentrating on
> interfaces _only_
> does encourage package writers to come up each with a new set of
> spatial classes
> -- something I would like to discourage: sharing more code makes the
> code more
> reliable.
>
> I did build upon your idea of sp.coords, but called it coordinates,
> and used reference (column name/numbers) instead of actual values:
>
> coordinates(meuse) = c("x", "y")
>
> I prefer coordinates because at some stage the S3 method mechanism may
> interpret sp.coords as an sp method for a coords class.
>
> My question, in your perspective, would be: which methods do we need
> for
> a generic class containing vector/polygon data?
>
> E.g. how is,
>
> polygons(world) # gets or sets the polygons, as a list of 2 col
> matrices
> polygonAttributes(world) # gets or sets the polygon attributes, as
> data frame
> --
> Edzer
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-geo
>
>
----------------------------------------------------------------
Luc Anselin, PhD.
Faculty Excellence Professor and Director
Spatial Analysis Laboratory
Dept. Agricultural and Consumer Economics
University of Illinois at Urbana-Champaign
http://sal.agecon.uiuc.edu/



Reply | Threaded
Open this post in threaded view
|

Re: [R] A package for spatial data classes: request for comments

Roger Bivand
Administrator
On Mon, 3 Nov 2003, Luc Anselin wrote:

> As I mentioned to Roger Bivand a few days ago, it might be a good
> idea to take a look at the spatial classes in ESRI's ArcGIS. In my
> view they are not 100% ready for statistical analysis, but the vector
> classes are very well structured. What is missing is an efficient
> way to incorporate "topology" (contiguity structure) to provide
> an easy way to construct spatial weights. In GeoDa, we build this
> from scratch, using the shape files, but  that's not the way it
> should be (although very fast).
> L.

Yes, it's the topology/no topology that emerges as a question. It will be
worth looking at one or more of the following too: Terralib
(www.terralib.org) - work going on in Brazil is really well done and
exciting (which is why I've CC'd to Paulo Ribeiro), the GRASS 5.7
experimental vector engine, now approaching a draft alpha release but very
promising (grass.itc.it), maybe OpenGIS GML - which doesn't do topology,
and finally the newly released map* packages, which do topology using the
original Bell Labs Geographical Database model. Terralib and GRASS are
open source, the map* packages are mixed (2 OS, 1 non-commercial) and
don't include methods for ingesting other data, which needs to go through
an external tool topologising train (I think). Other users will be
grateful for a route from shapefile through topology to map().

I think Barry has an important point about methods, but which may expand
fast - as Edzer wrote - if the number of input object types also grows -
so to do methods right, we still need more consistency in the input
objects. From there on, methods can rule, perhaps.

Roger

>
> On Monday, November 3, 2003, at 09:08 AM, Edzer J. Pebesma wrote:
>
> > Barry Rowlingson wrote:
> >
> >>
> >>> To be really useful, the class should include vector (polygon) data,
> >>> and probably line elements. For this, I need help. The simples
> >>> approach
> >>> would be to extend SpatialDataFrame to SpatialDataFramePolygon,
> >>> and add for each row add the corresponding polygon.
> >>
> >>
> >>  I think you need to stop thinking about building classes at the
> >> moment, and to think about specifying _interfaces_. What methods do
> >> we want for spatial data? How will functions get the spatial data out
> >> of the objects? Then we can build classes that implement these
> >> interfaces.
> >
> > Baz, I agree that interfaces are important. Still, concentrating on
> > interfaces _only_
> > does encourage package writers to come up each with a new set of
> > spatial classes
> > -- something I would like to discourage: sharing more code makes the
> > code more
> > reliable.
> >
> > I did build upon your idea of sp.coords, but called it coordinates,
> > and used reference (column name/numbers) instead of actual values:
> >
> > coordinates(meuse) = c("x", "y")
> >
> > I prefer coordinates because at some stage the S3 method mechanism may
> > interpret sp.coords as an sp method for a coords class.
> >
> > My question, in your perspective, would be: which methods do we need
> > for
> > a generic class containing vector/polygon data?
> >
> > E.g. how is,
> >
> > polygons(world) # gets or sets the polygons, as a list of 2 col
> > matrices
> > polygonAttributes(world) # gets or sets the polygon attributes, as
> > data frame
> > --
> > Edzer
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
> ----------------------------------------------------------------
> Luc Anselin, PhD.
> Faculty Excellence Professor and Director
> Spatial Analysis Laboratory
> Dept. Agricultural and Consumer Economics
> University of Illinois at Urbana-Champaign
> http://sal.agecon.uiuc.edu/
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no



Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

multivariate moran & lisa for spdep

hi_ono2001
Hi.

 GeoDA supports multivariate(two variables) moran & lisa for spdep, while
spdep only support univariate ones.

 I've never read multivariate moran & lisa in human geography's
bibliography.

 Could you tell me whether adding these methods into spdep is possible or
not.

 Regards.



Reply | Threaded
Open this post in threaded view
|

RE: A package for spatial data classes: request for comments

Nicholas Lewin-Koh
In reply to this post by Edzer J. Pebesma
Hi,
I couldn't agree more with Barry Rowlingson. The initial set of basic
interfaces should be set up first then start building in the methods.
Without the underlying data structures well defined
this could turn into a nightmare.

regarding topology, I am not sure that that is the most efficient set of
information necessary for
statistical computation. I think some sort of spatial indexing for fast
neighborhood calculation and spatial joins is important. I could see
topology being important for network modeling, where flow through the
network would be important, however that could be tackled again with
efficient contiguity calculations and a method to convert a map to a
graph data structure. A quad tree (or octtree for 3d data) or perhaps an
R-tree structure would be most effective. I was thinking about the
implementation this morning, and I thought this might be a good use of
the external pointer mechanism, with the option to write the data
structure as an R object to save and restore on startup (home grown
pickling).

my 2c

Nicholas



Reply | Threaded
Open this post in threaded view
|

multivariate moran & lisa for spdep

Luc Anselin
In reply to this post by hi_ono2001
Multivariate Moran was first (as far as I know) described in an article
by Dan Wartenberg in Geographical Analysis (1985?). The implementation
of this idea in a Moran scatterplot and in LISA maps is described in
a conference proceedings by Anselin, Syabri and Smirnov (2002)
http://sal.agecon.uiuc.edu/users/anselin/papers.html#dynesda2
as well as in several powerpoint presentations.
A slightly revised version is to appear  in a forthcoming special issue
of Computers and Geosciences.
I believe it would be fairly straightforward to add this to spdep,
it just a matter of time (having time, that is ...).
L.


On Tuesday, November 4, 2003, at 07:32 AM, Hisaji Ono wrote:

> Hi.
>
>  GeoDA supports multivariate(two variables) moran & lisa for spdep,
> while
> spdep only support univariate ones.
>
>  I've never read multivariate moran & lisa in human geography's
> bibliography.
>
>  Could you tell me whether adding these methods into spdep is possible
> or
> not.
>
>  Regards.
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-geo
>



Reply | Threaded
Open this post in threaded view
|

multivariate moran & lisa for spdep

Stéphane Dray
In reply to this post by hi_ono2001
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20031104/c7d60cd1/attachment.pl>

Reply | Threaded
Open this post in threaded view
|

Re: [R] A package for spatial data classes: request for comments

Edzer J. Pebesma
In reply to this post by Luc Anselin
Luc Anselin wrote:

> As I mentioned to Roger Bivand a few days ago, it might be a good
> idea to take a look at the spatial classes in ESRI's ArcGIS. In my
> view they are not 100% ready for statistical analysis, but the vector
> classes are very well structured.

Luc, are you referring to the "Geometry" section in the second paper
(arcob81post.pdf) of the incredibly hart to read poster at

http://www.esri.com/library/whitepapers/ao_lit.html
or is there some more accessible information to these classes?

> What is missing is an efficient
> way to incorporate "topology" (contiguity structure) to provide
> an easy way to construct spatial weights. In GeoDa, we build this
> from scratch, using the shape files, but  that's not the way it
> should be (although very fast).

If constructing it is very fast, why should we incorporate it in the
class definition, instead of creating it on the fly? Package gstat
builds PR bucket quadtrees for fast neighbourhood selections,
which makes the program scalable to large interpolation or
simulation jobs, but it never stores them.
We recently found out that if you want to apply
gstat to say 1e9 points (so many that you will never be able to
hold them in RAM), even then quadtree _building_ does take so
little time (minutes, maybe) that it does not reward storing it.

When constructing topology from prohibitively large spatial data
bases in R, another route to investigate would be Postgress/PostGIS;
It can deal with tree search indexes, and I think it uses the GEOS
geometry toolkit.
--
Edzer



Reply | Threaded
Open this post in threaded view
|

Re: [R] A package for spatial data classes: request for comments

Luc Anselin
There is better documentation, I'll look for a digital form
or maybe somebody else has found it already.
The construction of topology is a tradeoff. It is (can be)
fast, so it can be done on the fly. This is actually what
ArcGIS 8.2 and later does, topology is no longer "stored".
I have found that in spatial regression when the same
weights object is used over and over (e.g., 3000+ US
counties, or multiples of 100 000 in real estate transactions),
it makes sense to store it and to somehow
relate it back to the original coverage. This doesn't
necessarily mean it has to be part of the class design
for that object, but the connection should be there.
Also, this is a one to many relationship, in that multiple
weights can be constructed for the same coverage.
L.


On Thursday, November 6, 2003, at 02:59 AM, Edzer J. Pebesma wrote:

> Luc Anselin wrote:
>
>> As I mentioned to Roger Bivand a few days ago, it might be a good
>> idea to take a look at the spatial classes in ESRI's ArcGIS. In my
>> view they are not 100% ready for statistical analysis, but the vector
>> classes are very well structured.
>
> Luc, are you referring to the "Geometry" section in the second paper
> (arcob81post.pdf) of the incredibly hart to read poster at
>
> http://www.esri.com/library/whitepapers/ao_lit.html
> or is there some more accessible information to these classes?
>
>> What is missing is an efficient
>> way to incorporate "topology" (contiguity structure) to provide
>> an easy way to construct spatial weights. In GeoDa, we build this
>> from scratch, using the shape files, but  that's not the way it
>> should be (although very fast).
>
> If constructing it is very fast, why should we incorporate it in the
> class definition, instead of creating it on the fly? Package gstat
> builds PR bucket quadtrees for fast neighbourhood selections,
> which makes the program scalable to large interpolation or
> simulation jobs, but it never stores them.
> We recently found out that if you want to apply
> gstat to say 1e9 points (so many that you will never be able to
> hold them in RAM), even then quadtree _building_ does take so
> little time (minutes, maybe) that it does not reward storing it.
>
> When constructing topology from prohibitively large spatial data
> bases in R, another route to investigate would be Postgress/PostGIS;
> It can deal with tree search indexes, and I think it uses the GEOS
> geometry toolkit.
> --
> Edzer
>