How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

Roger Bivand
Administrator
While https://github.com/edzer/sfr/wiki/migrating is very helpful,
rgeos::gUnaryUnion is not the typical use case of sf::st_union. The
typical use case, from ASDAR 1st edition, chapter 5 and the maptools
"combine_maptools" vignette (the shapefiles are shipped with maptools), is
grouping features that should belong to the same statistical entity:

library(maptools)
vignette("combine_maptools")

###################################################
### chunk number 16:
###################################################

library(sf)
nc90 <- st_read(system.file("shapes/co37_d90.shp", package = "maptools"))
st_crs(nc90) <- "+proj=longlat +datum=NAD27"
table(table(paste(nc90$ST, nc90$CO, sep="")))

The point is that two counties are represented by multiple features of
"POLYGON" objects, rather than single "MULTIPOLYGON" objects, in the input
shapefile. I've tried doing:

ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
nc90a <- st_cast(nc90, to="MULTIPOLYGON", ids=as.integer(ids))

but:

> dim(nc90a)
[1] 104   9

all turned into "MULTIPOLYGON", but not grouped, even though I think ids=
are as they should be:

> table(table(as.integer(ids)))

  1  2  4
98  1  1

This may be to avoid dropping data.frame rows. It looks as though I can
get there using:

nc90a <- st_cast(st_geometry(nc90), to="MULTIPOLYGON", ids=as.integer(ids))

> length(nc90a)
[1] 100

but all are "MULTIPOLYGON", not a mixture of "POLYGON" and "MULTIPOLYGON"
features, and I've no idea which is which - that is how the order of nc90a
relates to the counties of nc90. How do I associate the features in
nc90a with their county ids? Where do i find the ids I gave to st_cast in
nc90a?

What am I missing? I'm writing here rather than raising an issue on GH
because others may want to know too, and Edzer needs others to reply for
him if they know the answer.

Puzzled,

Roger

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

edzer


On 09/05/17 15:32, Roger Bivand wrote:

> While https://github.com/edzer/sfr/wiki/migrating is very helpful,
> rgeos::gUnaryUnion is not the typical use case of sf::st_union. The
> typical use case, from ASDAR 1st edition, chapter 5 and the maptools
> "combine_maptools" vignette (the shapefiles are shipped with maptools),
> is grouping features that should belong to the same statistical entity:
>
> library(maptools)
> vignette("combine_maptools")
>
> ###################################################
> ### chunk number 16:
> ###################################################
>
> library(sf)
> nc90 <- st_read(system.file("shapes/co37_d90.shp", package = "maptools"))
> st_crs(nc90) <- "+proj=longlat +datum=NAD27"
> table(table(paste(nc90$ST, nc90$CO, sep="")))
>
> The point is that two counties are represented by multiple features of
> "POLYGON" objects, rather than single "MULTIPOLYGON" objects, in the
> input shapefile. I've tried doing:
>
> ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
> nc90a <- st_cast(nc90, to="MULTIPOLYGON", ids=as.integer(ids))
>
> but:
>
>> dim(nc90a)
> [1] 104   9
>
> all turned into "MULTIPOLYGON", but not grouped, even though I think
> ids= are as they should be:
>
>> table(table(as.integer(ids)))
>
>  1  2  4
> 98  1  1
this is at least documented: nc90 is of class `sf`, and st_cast.sf
has no ids argument; ... is ignored. If it would merge, it'd need
some guidance what to do with non-geometry feature attributes.

>
> This may be to avoid dropping data.frame rows. It looks as though I can
> get there using:
>
> nc90a <- st_cast(st_geometry(nc90), to="MULTIPOLYGON", ids=as.integer(ids))
>
>> length(nc90a)
> [1] 100
>
> but all are "MULTIPOLYGON", not a mixture of "POLYGON" and
> "MULTIPOLYGON" features, and I've no idea which is which - that is how
> the order of nc90a relates to the counties of nc90. How do I associate
> the features in nc90a with their county ids? Where do i find the ids I
> gave to st_cast in nc90a?
st_cast uses base::split, which converts ids to a factor,
and returns a list with the order of the levels of that factor
(alphabetically?). Neither convenient, nor documented. I guess more
intuitive would be to squeeze, but keep original order, right?

I'd suggest to use instead

nc90a = aggregate(nc90, list(ids = ids), head, n = 1)

which returns a GEOMETRY sf, with 2 MULTIPOLYGON and 98 POLYGON. You
could st_cast that to MULTIPOLYGON.

Alternatively:

library(dplyr)
bind_cols(nc90, ids=ids) %>%
        group_by(ids) %>%
        summarise(ST=head(ST,1), CO=head(CO,1), do_union=FALSE)

converts straight into MULTIPOLYGON.

I'll update the sp -> sf migration wiki.

>
> What am I missing? I'm writing here rather than raising an issue on GH
> because others may want to know too, and Edzer needs others to reply for
> him if they know the answer.
>
> Puzzled,
>
> Roger
>

--
Edzer Pebesma
Institute for Geoinformatics  (ifgi),  University of Münster
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

Roger Bivand
Administrator
On Tue, 9 May 2017, Edzer Pebesma wrote:

>
>
> On 09/05/17 15:32, Roger Bivand wrote:
>> While https://github.com/edzer/sfr/wiki/migrating is very helpful,
>> rgeos::gUnaryUnion is not the typical use case of sf::st_union. The
>> typical use case, from ASDAR 1st edition, chapter 5 and the maptools
>> "combine_maptools" vignette (the shapefiles are shipped with maptools),
>> is grouping features that should belong to the same statistical entity:
>>
>> library(maptools)
>> vignette("combine_maptools")
>>
>> ###################################################
>> ### chunk number 16:
>> ###################################################
>>
>> library(sf)
>> nc90 <- st_read(system.file("shapes/co37_d90.shp", package = "maptools"))
>> st_crs(nc90) <- "+proj=longlat +datum=NAD27"
>> table(table(paste(nc90$ST, nc90$CO, sep="")))
>>
>> The point is that two counties are represented by multiple features of
>> "POLYGON" objects, rather than single "MULTIPOLYGON" objects, in the
>> input shapefile. I've tried doing:
>>
>> ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
>> nc90a <- st_cast(nc90, to="MULTIPOLYGON", ids=as.integer(ids))
>>
>> but:
>>
>>> dim(nc90a)
>> [1] 104   9
>>
>> all turned into "MULTIPOLYGON", but not grouped, even though I think
>> ids= are as they should be:
>>
>>> table(table(as.integer(ids)))
>>
>>  1  2  4
>> 98  1  1
>
> this is at least documented: nc90 is of class `sf`, and st_cast.sf
> has no ids argument; ... is ignored. If it would merge, it'd need
> some guidance what to do with non-geometry feature attributes.
>

OK, thanks! I was hoping one or other of the many contributors to sf would
jump in, given the amount of time you commit to moving sf forward!

>>
>> This may be to avoid dropping data.frame rows. It looks as though I can
>> get there using:
>>
>> nc90a <- st_cast(st_geometry(nc90), to="MULTIPOLYGON", ids=as.integer(ids))
>>
>>> length(nc90a)
>> [1] 100
>>
>> but all are "MULTIPOLYGON", not a mixture of "POLYGON" and
>> "MULTIPOLYGON" features, and I've no idea which is which - that is how
>> the order of nc90a relates to the counties of nc90. How do I associate
>> the features in nc90a with their county ids? Where do i find the ids I
>> gave to st_cast in nc90a?
>
> st_cast uses base::split, which converts ids to a factor,
> and returns a list with the order of the levels of that factor
> (alphabetically?). Neither convenient, nor documented. I guess more
> intuitive would be to squeeze, but keep original order, right?
>
> I'd suggest to use instead
>
> nc90a = aggregate(nc90, list(ids = ids), head, n = 1)
>
> which returns a GEOMETRY sf, with 2 MULTIPOLYGON and 98 POLYGON. You
> could st_cast that to MULTIPOLYGON.

This works and keeps the output as c("sf", "data.frame"):

ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
t0 <- aggregate(nc90, list(ids = ids), head, n = 1)
nc90a <- t0[, c("ids", attr(t0, "sf_column"))]
rm(t0)
# n is an argument to head saying take the first value in each ids-group

>
> Alternatively:
>
> library(dplyr)
> bind_cols(nc90, ids=ids) %>%
> group_by(ids) %>%
> summarise(ST=head(ST,1), CO=head(CO,1), do_union=FALSE)

This works but returns an object of: c("sf", "tbl_df", "tbl", "data.frame"):

library(dplyr)
nc90b <- summarise(group_by(bind_cols(nc90, data.frame(ids=ids)), ids),
   do_union=FALSE)

or equivalently (for released dplyr):

bind_cols(nc90, data.frame(ids=ids)) %>%
         group_by(ids) %>%
         summarise(do_union=FALSE) -> nc90b

This fails in:

all.equal(nc90b, nc90a, check.attributes=FALSE)
# Error in equal_data_frame(target, current, ignore_col_order =
# ignore_col_order,  :
#  Can't join on 'geometry' x 'geometry' because of incompatible types
# (sfc_GEOMETRY, sfc / sfc_GEOMETRY, sfc)

but:

> all.equal(nc90a, nc90b, check.attributes=FALSE)
[1] TRUE

so being c("tbl_df", "tbl") before "data.frame" makes a difference.

Since I always try to check intermediate results, I only found the missing
data.frame() in bind_cols() by stepping through - is that implicit in the
development version of dplyr (or sf)?

Sometime the maptools vignette will migrate, when I get so far ...

Thanks again,

Roger

>
> converts straight into MULTIPOLYGON.
>
> I'll update the sp -> sf migration wiki.
>
>>
>> What am I missing? I'm writing here rather than raising an issue on GH
>> because others may want to know too, and Edzer needs others to reply for
>> him if they know the answer.
>>
>> Puzzled,
>>
>> Roger
>>
>
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway