Is a simple feature -friendly version of spdep being developed?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Is a simple feature -friendly version of spdep being developed?

Tiernan Martin
Is anyone thinking about creating an adaptation of the `spdep` package that
expects sf-class inputs and works well in a pipeline?

I understand that there is skepticism about the wisdom of adopting the
“tidyverse” principles throughout the R package ecosystem, and I share the
concern that an over-reliance on any single paradigm could reduce the
resilience and diversity of the system as a whole.

That said, I believe that the enthusiastic adoption of the `sf` package and
the package's connections with widely-used tidyverse packages like `dplyr`
and `ggplot2` may result in increased demand for sf-friendly spatial
analysis tools. As an amateur who recently started using R as my primary
GIS tool, it seems like the tidyverse's preference for dataframes, S3
objects, list columns, and pipeline workflows would be well-suited to the
field of spatial analysis. Are there some fundamental reasons why the
`spdep` tools cannot (or should not) be adapted to the tidyverse "dialect"?

Let me put the question in the context of an actual analysis: in February
2017, the pop culture infovis website The Pudding (https://pudding.cool/)
published an analysis of regional preferences for Oscar-nominated films in
the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
the author posted a tutorial explaining the method of “regional smoothing”
used to create the article’s choropleths (
https://pudding.cool/process/regional_smoothing/).

The method relies on several `spdep` functions (
https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R).
In the code below, I provide reprex with a smaller dataset included in the
`sf` package:

library(sf)
library(spdep)

nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
Carolina counties
nc_shp <- as(nc,'Spatial')

coords <- coordinates(nc_shp)
IDs<-row.names(as(nc_shp, "data.frame"))

knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
nearest neighbors for each county
knn5 <- include.self(knn5)

localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
localGvalues <- round(localGvalues,3)

nc_shp@data$LOCAL_G <- as.numeric(localGvalues)

p1 <- spplot(nc_shp, c('NWBIR74'))
p2 <- spplot(nc_shp, c('LOCAL_G'))
plot(p1, split=c(1,1,2,2), more=TRUE)
plot(p2, split=c(1,2,2,2), more=TRUE)

Here’s what I imagine that would look like in a tidyverse pipeline (please
note that this code is for illustrative purposes and will not run):

library(tidyverse)
library(purrr)
library(sf)
library(sfdep) # this package doesn't exist (yet)

nc <- st_read(system.file("shape/nc.shp", package = "sf"))

nc_g <-
  nc %>%
  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self =
TRUE)),  # find the nearest neighbors for each county
         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
make a list of the neighbors using the binary method
         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
zero.policy = TRUE),  # calculate the G scores
         LOCAL_G = round(LOCAL_G,3))

We can see that the (hypothetical) tidyverse version reduces the amount of
intermediate objects and wraps the creation of the G scores into a single
code chunk with clear steps.

I'd be grateful to hear from the users and developers of the `spdep` and
`sf` packages about this topic!

Tiernan Martin

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is a simple feature -friendly version of spdep being developed?

Roger Bivand
Administrator
On Fri, 12 May 2017, Tiernan Martin wrote:

> Is anyone thinking about creating an adaptation of the `spdep` package that
> expects sf-class inputs and works well in a pipeline?

I assume that "this is not a pipeline" is a joke for insiders (in the
pipe)?

No, there is your issue on the sfr github repository that is relevant for
contiguous neighbours, but not beyond that:

https://github.com/edzer/sfr/issues/234

An sf is a data.frame, and as such should "just work", like "Spatial"
objects have, in formula/data settings. The problem is (of course) the
weights matrix.

Should it be a list column (each row has a list nesting two lists, first
indices, second non-zero weights), or remain separate as it has been for
20 years, or become a column-oriented representation (Matrix package) - a
nested list like a list column or a listw obeject is row-oriented. I had
started thinking about using a sparse column-oriented representation, but
have not implemented functions accepting them instead of listw objects.

I am very cautious about creating classes for data that were data.frame,
then sf, and then have the weights built-in. In the simple case it would
work, but all you have to do is re-order the rows and the link between the
neighbour ids and row order breaks down; the same applies to subsetting.

The problems to solve first are related the workflows, and easiest to look
at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
global and local tests. I think that all or almost all of the NSE verbs
will cause chaos (mutate, select, group) once weights have been created.
If there is a way to bind the neighour IDs to the input sf object rows, a
list column might be possible, but we have to permit multiple such columns
(Queen, Rook, ...), and ensure that subsetting and row-reordering keep the
graph relations intact (and their modifications, like row-standardisation)

If you've seen any examples of how tidy relates to graphs (adjacency lists
are like nb objects), we could learn from that.

How should we go about this technically? spdep is on R-Forge and is happy
there. Its present functionality has to be maintained as it stands, it has
too many reverse dependencies to break. Should it be re-written from
scratch (with more sparse matrices internally for computation)?

Your example creates weights on the fly for a local G* map, but has n=100,
not say n=90000 (LA census blocks). Using the sf_ geos based methods does
not use STRtrees, which cut the time needed to find contigious neighbours
from days to seconds. We ought to pre-compute weights, but this messes up
the flow of data, because a second lot of data (the weights) have to enter
the pipe, and be matched with the row-ids.

We'd need proof of concept with realistically sized data sets (not yet NY
taxis, but maybe later ...). spdep started as spweights, sptests and
spdep, and the first two got folded into the third when stable. If weights
are the first thing to go for, sfweights is where to go first (and port
the STRtree envelope intersections for contiguities). It could define the
new classes, and tests and modelling would use them.

Thanks for starting this discussion.

Roger

>
> I understand that there is skepticism about the wisdom of adopting the
> “tidyverse” principles throughout the R package ecosystem, and I share the
> concern that an over-reliance on any single paradigm could reduce the
> resilience and diversity of the system as a whole.
>
> That said, I believe that the enthusiastic adoption of the `sf` package and
> the package's connections with widely-used tidyverse packages like `dplyr`
> and `ggplot2` may result in increased demand for sf-friendly spatial
> analysis tools. As an amateur who recently started using R as my primary
> GIS tool, it seems like the tidyverse's preference for dataframes, S3
> objects, list columns, and pipeline workflows would be well-suited to the
> field of spatial analysis. Are there some fundamental reasons why the
> `spdep` tools cannot (or should not) be adapted to the tidyverse "dialect"?
>
> Let me put the question in the context of an actual analysis: in February
> 2017, the pop culture infovis website The Pudding (https://pudding.cool/)
> published an analysis of regional preferences for Oscar-nominated films in
> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
> the author posted a tutorial explaining the method of “regional smoothing”
> used to create the article’s choropleths (
> https://pudding.cool/process/regional_smoothing/).
>
> The method relies on several `spdep` functions (
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R).
> In the code below, I provide reprex with a smaller dataset included in the
> `sf` package:
>
> library(sf)
> library(spdep)
>
> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> Carolina counties
> nc_shp <- as(nc,'Spatial')
>
> coords <- coordinates(nc_shp)
> IDs<-row.names(as(nc_shp, "data.frame"))
>
> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
> nearest neighbors for each county
> knn5 <- include.self(knn5)
>
> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
> localGvalues <- round(localGvalues,3)
>
> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
>
> p1 <- spplot(nc_shp, c('NWBIR74'))
> p2 <- spplot(nc_shp, c('LOCAL_G'))
> plot(p1, split=c(1,1,2,2), more=TRUE)
> plot(p2, split=c(1,2,2,2), more=TRUE)
>
> Here’s what I imagine that would look like in a tidyverse pipeline (please
> note that this code is for illustrative purposes and will not run):
>
> library(tidyverse)
> library(purrr)
> library(sf)
> library(sfdep) # this package doesn't exist (yet)
>
> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>
> nc_g <-
>  nc %>%
>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self =
> TRUE)),  # find the nearest neighbors for each county
>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
> make a list of the neighbors using the binary method
>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> zero.policy = TRUE),  # calculate the G scores
>         LOCAL_G = round(LOCAL_G,3))
>
> We can see that the (hypothetical) tidyverse version reduces the amount of
> intermediate objects and wraps the creation of the G scores into a single
> code chunk with clear steps.
>
> I'd be grateful to hear from the users and developers of the `spdep` and
> `sf` packages about this topic!
>
> Tiernan Martin
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is a simple feature -friendly version of spdep being developed?

Tiernan Martin
You’re right about one thing: those of us introduced to the pipe in our
formative years find it hard to escape (everything is better here inside
the pipe - right?) ;)

Thank you for your very informative response. Before reading it, I had a
vague inkling that matrices were part of the challenge of adapting these
tools to work with simple features, but I also thought perhaps all this
issue needed was a little nudge from a user like me. Your response made it
clear that there is a series of technical decisions that need to be made,
and in order to do that I imagine some exploratory testing is in order. Of
course, it is possible that the data.frame + list-col foundation is just
not well-suited to the needs of spatial weighting/testing tools, but I
suspect that there is some valuable insight to be gained from exploring
this possibility further. And as I stated in my first post, going forward I
expect more users will be interested in seeing this sort of integration
happen.

> How should we go about this technically? spdep is on R-Forge and is happy
> there. Its present functionality has to be maintained as it stands, it
has
> too many reverse dependencies to break. Should it be re-written from
> scratch (with more sparse matrices internally for computation)?
> ...
> We'd need proof of concept with realistically sized data sets (not yet NY
> taxis, but maybe later ...). spdep started as spweights, sptests and
> spdep, and the first two got folded into the third when stable. If
weights
> are the first thing to go for, sfweights is where to go first (and port
> the STRtree envelope intersections for contiguities). It could define the
> new classes, and tests and modelling would use them.

Sounds to me like this package would need to be built from the ground up,
possibly following a similar path to the `sp` development process as you
mentioned. Maybe that presents a challenge with regards to resources (i.e.,
funding, time, etc.). Perhaps project this is a candidate for future
proposals to the R Consortium or other funding sources. I hope that this
post is useful in documenting user interest in seeing the `sf` protocol
integrated into other spatial tools and workflows, and I look forward to
hearing others' thoughts on the matter.

Thanks again,

Tiernan





On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]> wrote:

> On Fri, 12 May 2017, Tiernan Martin wrote:
>
> > Is anyone thinking about creating an adaptation of the `spdep` package
> that
> > expects sf-class inputs and works well in a pipeline?
>
> I assume that "this is not a pipeline" is a joke for insiders (in the
> pipe)?
>
> No, there is your issue on the sfr github repository that is relevant for
> contiguous neighbours, but not beyond that:
>
> https://github.com/edzer/sfr/issues/234
>
> An sf is a data.frame, and as such should "just work", like "Spatial"
> objects have, in formula/data settings. The problem is (of course) the
> weights matrix.
>
> Should it be a list column (each row has a list nesting two lists, first
> indices, second non-zero weights), or remain separate as it has been for
> 20 years, or become a column-oriented representation (Matrix package) - a
> nested list like a list column or a listw obeject is row-oriented. I had
> started thinking about using a sparse column-oriented representation, but
> have not implemented functions accepting them instead of listw objects.
>
> I am very cautious about creating classes for data that were data.frame,
> then sf, and then have the weights built-in. In the simple case it would
> work, but all you have to do is re-order the rows and the link between the
> neighbour ids and row order breaks down; the same applies to subsetting.
>
> The problems to solve first are related the workflows, and easiest to look
> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
> global and local tests. I think that all or almost all of the NSE verbs
> will cause chaos (mutate, select, group) once weights have been created.
> If there is a way to bind the neighour IDs to the input sf object rows, a
> list column might be possible, but we have to permit multiple such columns
> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep the
> graph relations intact (and their modifications, like row-standardisation)
>
> If you've seen any examples of how tidy relates to graphs (adjacency lists
> are like nb objects), we could learn from that.
>
> How should we go about this technically? spdep is on R-Forge and is happy
> there. Its present functionality has to be maintained as it stands, it has
> too many reverse dependencies to break. Should it be re-written from
> scratch (with more sparse matrices internally for computation)?
>
> Your example creates weights on the fly for a local G* map, but has n=100,
> not say n=90000 (LA census blocks). Using the sf_ geos based methods does
> not use STRtrees, which cut the time needed to find contigious neighbours
> from days to seconds. We ought to pre-compute weights, but this messes up
> the flow of data, because a second lot of data (the weights) have to enter
> the pipe, and be matched with the row-ids.
>
> We'd need proof of concept with realistically sized data sets (not yet NY
> taxis, but maybe later ...). spdep started as spweights, sptests and
> spdep, and the first two got folded into the third when stable. If weights
> are the first thing to go for, sfweights is where to go first (and port
> the STRtree envelope intersections for contiguities). It could define the
> new classes, and tests and modelling would use them.
>
> Thanks for starting this discussion.
>
> Roger
>
> >
> > I understand that there is skepticism about the wisdom of adopting the
> > “tidyverse” principles throughout the R package ecosystem, and I share
> the
> > concern that an over-reliance on any single paradigm could reduce the
> > resilience and diversity of the system as a whole.
> >
> > That said, I believe that the enthusiastic adoption of the `sf` package
> and
> > the package's connections with widely-used tidyverse packages like
> `dplyr`
> > and `ggplot2` may result in increased demand for sf-friendly spatial
> > analysis tools. As an amateur who recently started using R as my primary
> > GIS tool, it seems like the tidyverse's preference for dataframes, S3
> > objects, list columns, and pipeline workflows would be well-suited to the
> > field of spatial analysis. Are there some fundamental reasons why the
> > `spdep` tools cannot (or should not) be adapted to the tidyverse
> "dialect"?
> >
> > Let me put the question in the context of an actual analysis: in February
> > 2017, the pop culture infovis website The Pudding (https://pudding.cool/
> )
> > published an analysis of regional preferences for Oscar-nominated films
> in
> > the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
> > the author posted a tutorial explaining the method of “regional
> smoothing”
> > used to create the article’s choropleths (
> > https://pudding.cool/process/regional_smoothing/).
> >
> > The method relies on several `spdep` functions (
> >
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
> ).
> > In the code below, I provide reprex with a smaller dataset included in
> the
> > `sf` package:
> >
> > library(sf)
> > library(spdep)
> >
> > nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> > Carolina counties
> > nc_shp <- as(nc,'Spatial')
> >
> > coords <- coordinates(nc_shp)
> > IDs<-row.names(as(nc_shp, "data.frame"))
> >
> > knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
> > nearest neighbors for each county
> > knn5 <- include.self(knn5)
> >
> > localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> > nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
> > localGvalues <- round(localGvalues,3)
> >
> > nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
> >
> > p1 <- spplot(nc_shp, c('NWBIR74'))
> > p2 <- spplot(nc_shp, c('LOCAL_G'))
> > plot(p1, split=c(1,1,2,2), more=TRUE)
> > plot(p2, split=c(1,2,2,2), more=TRUE)
> >
> > Here’s what I imagine that would look like in a tidyverse pipeline
> (please
> > note that this code is for illustrative purposes and will not run):
> >
> > library(tidyverse)
> > library(purrr)
> > library(sf)
> > library(sfdep) # this package doesn't exist (yet)
> >
> > nc <- st_read(system.file("shape/nc.shp", package = "sf"))
> >
> > nc_g <-
> >  nc %>%
> >  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self
> =
> > TRUE)),  # find the nearest neighbors for each county
> >         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
> > make a list of the neighbors using the binary method
> >         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> > zero.policy = TRUE),  # calculate the G scores
> >         LOCAL_G = round(LOCAL_G,3))
> >
> > We can see that the (hypothetical) tidyverse version reduces the amount
> of
> > intermediate objects and wraps the creation of the G scores into a single
> > code chunk with clear steps.
> >
> > I'd be grateful to hear from the users and developers of the `spdep` and
> > `sf` packages about this topic!
> >
> > Tiernan Martin
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
> [hidden email]
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is a simple feature -friendly version of spdep being developed?

Roger Bivand
Administrator
On Sun, 14 May 2017, Tiernan Martin wrote:

> You’re right about one thing: those of us introduced to the pipe in our
> formative years find it hard to escape (everything is better here inside
> the pipe - right?) ;)

Pipes can be great if they don't involve extra inputs at a later stage in
the pipe, but as:

https://github.com/thomasp85/tidygraph

suggests, it isn't very easy, and may be forced.

>
> Thank you for your very informative response. Before reading it, I had a
> vague inkling that matrices were part of the challenge of adapting these
> tools to work with simple features, but I also thought perhaps all this
> issue needed was a little nudge from a user like me. Your response made it
> clear that there is a series of technical decisions that need to be made,
> and in order to do that I imagine some exploratory testing is in order. Of
> course, it is possible that the data.frame + list-col foundation is just
> not well-suited to the needs of spatial weighting/testing tools, but I
> suspect that there is some valuable insight to be gained from exploring
> this possibility further. And as I stated in my first post, going forward I
> expect more users will be interested in seeing this sort of integration
> happen.
I think that there are two tasks - one to create neighbour objects from sf
objects, then another to see whether the costs of putting the spatial
weights into a data.frame (or even more challenging, a tibble) to test for
spatial autocorrelation or model spatial dependence. Should the weights be
squirreled away inside the object flowing down the pipe?

sf_object %>% add_weights(...) -> sf_object_with_weights
moran.test(sf_object_with_weights, ...)
geary.test(sf_object_with_weights, ...)

or

sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights, ...)
-> output

all in one (output is a htest object). It gets more complex in:

lm.morantest(lm(formula, sf_object), sf_object_with_weights)

as we need the weights and the lm output object.

>
>> How should we go about this technically? spdep is on R-Forge and is happy
>> there. Its present functionality has to be maintained as it stands, it
> has
>> too many reverse dependencies to break. Should it be re-written from
>> scratch (with more sparse matrices internally for computation)?
>> ...
>> We'd need proof of concept with realistically sized data sets (not yet NY
>> taxis, but maybe later ...). spdep started as spweights, sptests and
>> spdep, and the first two got folded into the third when stable. If
> weights
>> are the first thing to go for, sfweights is where to go first (and port
>> the STRtree envelope intersections for contiguities). It could define the
>> new classes, and tests and modelling would use them.
>
> Sounds to me like this package would need to be built from the ground up,
> possibly following a similar path to the `sp` development process as you
> mentioned. Maybe that presents a challenge with regards to resources (i.e.,
> funding, time, etc.). Perhaps project this is a candidate for future
> proposals to the R Consortium or other funding sources. I hope that this
> post is useful in documenting user interest in seeing the `sf` protocol
> integrated into other spatial tools and workflows, and I look forward to
> hearing others' thoughts on the matter.
Making spatial weights from sf objects is just expanding spdep, and is
feasible. There is no point looking for funding, all it needs is knowledge
of how things have been done in spdep. Contiguity, knn, distance,
graph-based neighbours are all feasible.

Extending this to adding weights to sf objects may be going too far. It
seems slightly forced to - say - attach a sparse matrix to an sf geometry
column as an attribute, or to add a neighbour nested list column when
writing subsetting protection is very far from obvious. Like time series
data (not time stamped observations, but real ordered time series),
spatial data have implicit order that is tidy in a different way than tidy
(graph dependencies between observation rows). Maybe mapped features with
attached attribute data are "tidier" than a table, because they would
preserve their relative positions?

Roger

>
> Thanks again,
>
> Tiernan
>
>
>
>
>
> On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]> wrote:
>
>> On Fri, 12 May 2017, Tiernan Martin wrote:
>>
>>> Is anyone thinking about creating an adaptation of the `spdep` package
>> that
>>> expects sf-class inputs and works well in a pipeline?
>>
>> I assume that "this is not a pipeline" is a joke for insiders (in the
>> pipe)?
>>
>> No, there is your issue on the sfr github repository that is relevant for
>> contiguous neighbours, but not beyond that:
>>
>> https://github.com/edzer/sfr/issues/234
>>
>> An sf is a data.frame, and as such should "just work", like "Spatial"
>> objects have, in formula/data settings. The problem is (of course) the
>> weights matrix.
>>
>> Should it be a list column (each row has a list nesting two lists, first
>> indices, second non-zero weights), or remain separate as it has been for
>> 20 years, or become a column-oriented representation (Matrix package) - a
>> nested list like a list column or a listw obeject is row-oriented. I had
>> started thinking about using a sparse column-oriented representation, but
>> have not implemented functions accepting them instead of listw objects.
>>
>> I am very cautious about creating classes for data that were data.frame,
>> then sf, and then have the weights built-in. In the simple case it would
>> work, but all you have to do is re-order the rows and the link between the
>> neighbour ids and row order breaks down; the same applies to subsetting.
>>
>> The problems to solve first are related the workflows, and easiest to look
>> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
>> global and local tests. I think that all or almost all of the NSE verbs
>> will cause chaos (mutate, select, group) once weights have been created.
>> If there is a way to bind the neighour IDs to the input sf object rows, a
>> list column might be possible, but we have to permit multiple such columns
>> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep the
>> graph relations intact (and their modifications, like row-standardisation)
>>
>> If you've seen any examples of how tidy relates to graphs (adjacency lists
>> are like nb objects), we could learn from that.
>>
>> How should we go about this technically? spdep is on R-Forge and is happy
>> there. Its present functionality has to be maintained as it stands, it has
>> too many reverse dependencies to break. Should it be re-written from
>> scratch (with more sparse matrices internally for computation)?
>>
>> Your example creates weights on the fly for a local G* map, but has n=100,
>> not say n=90000 (LA census blocks). Using the sf_ geos based methods does
>> not use STRtrees, which cut the time needed to find contigious neighbours
>> from days to seconds. We ought to pre-compute weights, but this messes up
>> the flow of data, because a second lot of data (the weights) have to enter
>> the pipe, and be matched with the row-ids.
>>
>> We'd need proof of concept with realistically sized data sets (not yet NY
>> taxis, but maybe later ...). spdep started as spweights, sptests and
>> spdep, and the first two got folded into the third when stable. If weights
>> are the first thing to go for, sfweights is where to go first (and port
>> the STRtree envelope intersections for contiguities). It could define the
>> new classes, and tests and modelling would use them.
>>
>> Thanks for starting this discussion.
>>
>> Roger
>>
>>>
>>> I understand that there is skepticism about the wisdom of adopting the
>>> “tidyverse” principles throughout the R package ecosystem, and I share
>> the
>>> concern that an over-reliance on any single paradigm could reduce the
>>> resilience and diversity of the system as a whole.
>>>
>>> That said, I believe that the enthusiastic adoption of the `sf` package
>> and
>>> the package's connections with widely-used tidyverse packages like
>> `dplyr`
>>> and `ggplot2` may result in increased demand for sf-friendly spatial
>>> analysis tools. As an amateur who recently started using R as my primary
>>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
>>> objects, list columns, and pipeline workflows would be well-suited to the
>>> field of spatial analysis. Are there some fundamental reasons why the
>>> `spdep` tools cannot (or should not) be adapted to the tidyverse
>> "dialect"?
>>>
>>> Let me put the question in the context of an actual analysis: in February
>>> 2017, the pop culture infovis website The Pudding (https://pudding.cool/
>> )
>>> published an analysis of regional preferences for Oscar-nominated films
>> in
>>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
>>> the author posted a tutorial explaining the method of “regional
>> smoothing”
>>> used to create the article’s choropleths (
>>> https://pudding.cool/process/regional_smoothing/).
>>>
>>> The method relies on several `spdep` functions (
>>>
>> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
>> ).
>>> In the code below, I provide reprex with a smaller dataset included in
>> the
>>> `sf` package:
>>>
>>> library(sf)
>>> library(spdep)
>>>
>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
>>> Carolina counties
>>> nc_shp <- as(nc,'Spatial')
>>>
>>> coords <- coordinates(nc_shp)
>>> IDs<-row.names(as(nc_shp, "data.frame"))
>>>
>>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
>>> nearest neighbors for each county
>>> knn5 <- include.self(knn5)
>>>
>>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
>>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
>>> localGvalues <- round(localGvalues,3)
>>>
>>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
>>>
>>> p1 <- spplot(nc_shp, c('NWBIR74'))
>>> p2 <- spplot(nc_shp, c('LOCAL_G'))
>>> plot(p1, split=c(1,1,2,2), more=TRUE)
>>> plot(p2, split=c(1,2,2,2), more=TRUE)
>>>
>>> Here’s what I imagine that would look like in a tidyverse pipeline
>> (please
>>> note that this code is for illustrative purposes and will not run):
>>>
>>> library(tidyverse)
>>> library(purrr)
>>> library(sf)
>>> library(sfdep) # this package doesn't exist (yet)
>>>
>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>>>
>>> nc_g <-
>>>  nc %>%
>>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self
>> =
>>> TRUE)),  # find the nearest neighbors for each county
>>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
>>> make a list of the neighbors using the binary method
>>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
>>> zero.policy = TRUE),  # calculate the G scores
>>>         LOCAL_G = round(LOCAL_G,3))
>>>
>>> We can see that the (hypothetical) tidyverse version reduces the amount
>> of
>>> intermediate objects and wraps the creation of the G scores into a single
>>> code chunk with clear steps.
>>>
>>> I'd be grateful to hear from the users and developers of the `spdep` and
>>> `sf` packages about this topic!
>>>
>>> Tiernan Martin
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
>> [hidden email]
>> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is a simple feature -friendly version of spdep being developed?

Tiernan Martin
> Should the weights be
> squirreled away inside the object flowing down the pipe?

I'd propose that the that functions be written in such a way that the
neighbors, weights, and test results can be stored in list cols within the
data.frame.

This way multiple tests (and their inputs) can be stored in a single,
easily comprehensible, rectangular data object.

Here's some more pretend code to illustrate that concept with regard to sf:

  nc %>%
    group_by(NAME) %>%
    nest %>%
    mutate(
           geometry = map(.x = data, ~.x[['geometry']]),
           neighbors = map(.x = data, .f = my_nb), # creates a list col of
the neighbors
           weights = map(.x = data, .f = add_weights), # creates a weights
object (class=wt, is this a sparse matrix in spdep?)
           test_moran = map2(.x = data, .y =  weights, .f = my_moran), #
creates list of Moran's I and sample kurtosis of x
           test_geary = map2(.x = data, .y = weights, .f = my_geary) #
creates list of Geary's C and sample kurtosis of x
               )

#> # A tibble: 100 x 7
#>           NAME              data         geometry  neighbors  weights
test_moran test_geary
#>        <fctr>            <list>           <list>     <list>   <list>
<list>     <list>
#>  1        Ashe <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#>  2   Alleghany <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
<list [2]> <list [2]>
#>  3       Surry <tibble [1 x 14]> <simple_feature> <list [4]> <S3: wt>
<list [2]> <list [2]>
#>  4   Currituck <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#>  5 Northampton <tibble [1 x 14]> <simple_feature> <list [3]> <S3: wt>
<list [2]> <list [2]>
#>  6    Hertford <tibble [1 x 14]> <simple_feature> <list [6]> <S3: wt>
<list [2]> <list [2]>
#>  7      Camden <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
<list [2]> <list [2]>
#>  8       Gates <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#>  9      Warren <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
<list [2]> <list [2]>
#> 10      Stokes <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#> # ... with 90 more rows

 Examples of this nested data.frame / list col approach can be found here:
http://r4ds.had.co.nz/many-models.html#nested-data

> Extending this to adding weights to sf objects may be going too far. It
> seems slightly forced to - say - attach a sparse matrix to an sf geometry
> column as an attribute, or to add a neighbour nested list column when
> writing subsetting protection is very far from obvious. Like time series
> data (not time stamped observations, but real ordered time series),
> spatial data have implicit order that is tidy in a different way than tidy
> (graph dependencies between observation rows). Maybe mapped features with
> attached attribute data are "tidier" than a table, because they would
> preserve their relative positions?

This seems like the crux of the problem. Are you saying that the orthogonal
"tidyness" of the tidyverse is fundamentally different from the underlying
relationships of spatial data, and therefore there's a limit to the
usefulness of tidyverse approach to spatial analysis?

On Mon, May 15, 2017 at 5:17 AM Roger Bivand <[hidden email]> wrote:

> On Sun, 14 May 2017, Tiernan Martin wrote:
>
> > You’re right about one thing: those of us introduced to the pipe in our
> > formative years find it hard to escape (everything is better here inside
> > the pipe - right?) ;)
>
> Pipes can be great if they don't involve extra inputs at a later stage in
> the pipe, but as:
>
> https://github.com/thomasp85/tidygraph
>
> suggests, it isn't very easy, and may be forced.
>
> >
> > Thank you for your very informative response. Before reading it, I had a
> > vague inkling that matrices were part of the challenge of adapting these
> > tools to work with simple features, but I also thought perhaps all this
> > issue needed was a little nudge from a user like me. Your response made
> it
> > clear that there is a series of technical decisions that need to be made,
> > and in order to do that I imagine some exploratory testing is in order.
> Of
> > course, it is possible that the data.frame + list-col foundation is just
> > not well-suited to the needs of spatial weighting/testing tools, but I
> > suspect that there is some valuable insight to be gained from exploring
> > this possibility further. And as I stated in my first post, going
> forward I
> > expect more users will be interested in seeing this sort of integration
> > happen.
>
> I think that there are two tasks - one to create neighbour objects from sf
> objects, then another to see whether the costs of putting the spatial
> weights into a data.frame (or even more challenging, a tibble) to test for
> spatial autocorrelation or model spatial dependence. Should the weights be
> squirreled away inside the object flowing down the pipe?
>
> sf_object %>% add_weights(...) -> sf_object_with_weights
> moran.test(sf_object_with_weights, ...)
> geary.test(sf_object_with_weights, ...)
>
> or
>
> sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights, ...)
> -> output
>
> all in one (output is a htest object). It gets more complex in:
>
> lm.morantest(lm(formula, sf_object), sf_object_with_weights)
>
> as we need the weights and the lm output object.
>
> >
> >> How should we go about this technically? spdep is on R-Forge and is
> happy
> >> there. Its present functionality has to be maintained as it stands, it
> > has
> >> too many reverse dependencies to break. Should it be re-written from
> >> scratch (with more sparse matrices internally for computation)?
> >> ...
> >> We'd need proof of concept with realistically sized data sets (not yet
> NY
> >> taxis, but maybe later ...). spdep started as spweights, sptests and
> >> spdep, and the first two got folded into the third when stable. If
> > weights
> >> are the first thing to go for, sfweights is where to go first (and port
> >> the STRtree envelope intersections for contiguities). It could define
> the
> >> new classes, and tests and modelling would use them.
> >
> > Sounds to me like this package would need to be built from the ground up,
> > possibly following a similar path to the `sp` development process as you
> > mentioned. Maybe that presents a challenge with regards to resources
> (i.e.,
> > funding, time, etc.). Perhaps project this is a candidate for future
> > proposals to the R Consortium or other funding sources. I hope that this
> > post is useful in documenting user interest in seeing the `sf` protocol
> > integrated into other spatial tools and workflows, and I look forward to
> > hearing others' thoughts on the matter.
>
> Making spatial weights from sf objects is just expanding spdep, and is
> feasible. There is no point looking for funding, all it needs is knowledge
> of how things have been done in spdep. Contiguity, knn, distance,
> graph-based neighbours are all feasible.
>
> Extending this to adding weights to sf objects may be going too far. It
> seems slightly forced to - say - attach a sparse matrix to an sf geometry
> column as an attribute, or to add a neighbour nested list column when
> writing subsetting protection is very far from obvious. Like time series
> data (not time stamped observations, but real ordered time series),
> spatial data have implicit order that is tidy in a different way than tidy
> (graph dependencies between observation rows). Maybe mapped features with
> attached attribute data are "tidier" than a table, because they would
> preserve their relative positions?
>
> Roger
>
> >
> > Thanks again,
> >
> > Tiernan
> >
> >
> >
> >
> >
> > On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]>
> wrote:
> >
> >> On Fri, 12 May 2017, Tiernan Martin wrote:
> >>
> >>> Is anyone thinking about creating an adaptation of the `spdep` package
> >> that
> >>> expects sf-class inputs and works well in a pipeline?
> >>
> >> I assume that "this is not a pipeline" is a joke for insiders (in the
> >> pipe)?
> >>
> >> No, there is your issue on the sfr github repository that is relevant
> for
> >> contiguous neighbours, but not beyond that:
> >>
> >> https://github.com/edzer/sfr/issues/234
> >>
> >> An sf is a data.frame, and as such should "just work", like "Spatial"
> >> objects have, in formula/data settings. The problem is (of course) the
> >> weights matrix.
> >>
> >> Should it be a list column (each row has a list nesting two lists, first
> >> indices, second non-zero weights), or remain separate as it has been for
> >> 20 years, or become a column-oriented representation (Matrix package) -
> a
> >> nested list like a list column or a listw obeject is row-oriented. I had
> >> started thinking about using a sparse column-oriented representation,
> but
> >> have not implemented functions accepting them instead of listw objects.
> >>
> >> I am very cautious about creating classes for data that were data.frame,
> >> then sf, and then have the weights built-in. In the simple case it would
> >> work, but all you have to do is re-order the rows and the link between
> the
> >> neighbour ids and row order breaks down; the same applies to subsetting.
> >>
> >> The problems to solve first are related the workflows, and easiest to
> look
> >> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
> >> global and local tests. I think that all or almost all of the NSE verbs
> >> will cause chaos (mutate, select, group) once weights have been created.
> >> If there is a way to bind the neighour IDs to the input sf object rows,
> a
> >> list column might be possible, but we have to permit multiple such
> columns
> >> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep
> the
> >> graph relations intact (and their modifications, like
> row-standardisation)
> >>
> >> If you've seen any examples of how tidy relates to graphs (adjacency
> lists
> >> are like nb objects), we could learn from that.
> >>
> >> How should we go about this technically? spdep is on R-Forge and is
> happy
> >> there. Its present functionality has to be maintained as it stands, it
> has
> >> too many reverse dependencies to break. Should it be re-written from
> >> scratch (with more sparse matrices internally for computation)?
> >>
> >> Your example creates weights on the fly for a local G* map, but has
> n=100,
> >> not say n=90000 (LA census blocks). Using the sf_ geos based methods
> does
> >> not use STRtrees, which cut the time needed to find contigious
> neighbours
> >> from days to seconds. We ought to pre-compute weights, but this messes
> up
> >> the flow of data, because a second lot of data (the weights) have to
> enter
> >> the pipe, and be matched with the row-ids.
> >>
> >> We'd need proof of concept with realistically sized data sets (not yet
> NY
> >> taxis, but maybe later ...). spdep started as spweights, sptests and
> >> spdep, and the first two got folded into the third when stable. If
> weights
> >> are the first thing to go for, sfweights is where to go first (and port
> >> the STRtree envelope intersections for contiguities). It could define
> the
> >> new classes, and tests and modelling would use them.
> >>
> >> Thanks for starting this discussion.
> >>
> >> Roger
> >>
> >>>
> >>> I understand that there is skepticism about the wisdom of adopting the
> >>> “tidyverse” principles throughout the R package ecosystem, and I share
> >> the
> >>> concern that an over-reliance on any single paradigm could reduce the
> >>> resilience and diversity of the system as a whole.
> >>>
> >>> That said, I believe that the enthusiastic adoption of the `sf` package
> >> and
> >>> the package's connections with widely-used tidyverse packages like
> >> `dplyr`
> >>> and `ggplot2` may result in increased demand for sf-friendly spatial
> >>> analysis tools. As an amateur who recently started using R as my
> primary
> >>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
> >>> objects, list columns, and pipeline workflows would be well-suited to
> the
> >>> field of spatial analysis. Are there some fundamental reasons why the
> >>> `spdep` tools cannot (or should not) be adapted to the tidyverse
> >> "dialect"?
> >>>
> >>> Let me put the question in the context of an actual analysis: in
> February
> >>> 2017, the pop culture infovis website The Pudding (
> https://pudding.cool/
> >> )
> >>> published an analysis of regional preferences for Oscar-nominated films
> >> in
> >>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days
> ago,
> >>> the author posted a tutorial explaining the method of “regional
> >> smoothing”
> >>> used to create the article’s choropleths (
> >>> https://pudding.cool/process/regional_smoothing/).
> >>>
> >>> The method relies on several `spdep` functions (
> >>>
> >>
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
> >> ).
> >>> In the code below, I provide reprex with a smaller dataset included in
> >> the
> >>> `sf` package:
> >>>
> >>> library(sf)
> >>> library(spdep)
> >>>
> >>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> >>> Carolina counties
> >>> nc_shp <- as(nc,'Spatial')
> >>>
> >>> coords <- coordinates(nc_shp)
> >>> IDs<-row.names(as(nc_shp, "data.frame"))
> >>>
> >>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
> >>> nearest neighbors for each county
> >>> knn5 <- include.self(knn5)
> >>>
> >>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> >>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G
> scores
> >>> localGvalues <- round(localGvalues,3)
> >>>
> >>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
> >>>
> >>> p1 <- spplot(nc_shp, c('NWBIR74'))
> >>> p2 <- spplot(nc_shp, c('LOCAL_G'))
> >>> plot(p1, split=c(1,1,2,2), more=TRUE)
> >>> plot(p2, split=c(1,2,2,2), more=TRUE)
> >>>
> >>> Here’s what I imagine that would look like in a tidyverse pipeline
> >> (please
> >>> note that this code is for illustrative purposes and will not run):
> >>>
> >>> library(tidyverse)
> >>> library(purrr)
> >>> library(sf)
> >>> library(sfdep) # this package doesn't exist (yet)
> >>>
> >>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
> >>>
> >>> nc_g <-
> >>>  nc %>%
> >>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5,
> include.self
> >> =
> >>> TRUE)),  # find the nearest neighbors for each county
> >>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style =
> 'B')),  #
> >>> make a list of the neighbors using the binary method
> >>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> >>> zero.policy = TRUE),  # calculate the G scores
> >>>         LOCAL_G = round(LOCAL_G,3))
> >>>
> >>> We can see that the (hypothetical) tidyverse version reduces the amount
> >> of
> >>> intermediate objects and wraps the creation of the G scores into a
> single
> >>> code chunk with clear steps.
> >>>
> >>> I'd be grateful to hear from the users and developers of the `spdep`
> and
> >>> `sf` packages about this topic!
> >>>
> >>> Tiernan Martin
> >>>
> >>>       [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> [hidden email]
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
> <+47%2055%2095%2093%2055>; e-mail:
> >> [hidden email]
> >> Editor-in-Chief of The R Journal,
> https://journal.r-project.org/index.html
> >> http://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
> [hidden email]
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is a simple feature -friendly version of spdep being developed?

Roger Bivand
Administrator
On Tue, 16 May 2017, Tiernan Martin wrote:

>> Should the weights be
>> squirreled away inside the object flowing down the pipe?
>
> I'd propose that the that functions be written in such a way that the
> neighbors, weights, and test results can be stored in list cols within the
> data.frame.
>
> This way multiple tests (and their inputs) can be stored in a single,
> easily comprehensible, rectangular data object.
>
> Here's some more pretend code to illustrate that concept with regard to sf:
>
>  nc %>%
>    group_by(NAME) %>%
>    nest %>%
>    mutate(
>           geometry = map(.x = data, ~.x[['geometry']]),
>           neighbors = map(.x = data, .f = my_nb), # creates a list col of
> the neighbors
>           weights = map(.x = data, .f = add_weights), # creates a weights
> object (class=wt, is this a sparse matrix in spdep?)
>           test_moran = map2(.x = data, .y =  weights, .f = my_moran), #
> creates list of Moran's I and sample kurtosis of x
>           test_geary = map2(.x = data, .y = weights, .f = my_geary) #
> creates list of Geary's C and sample kurtosis of x
>               )
>
> #> # A tibble: 100 x 7
> #>           NAME              data         geometry  neighbors  weights
> test_moran test_geary
> #>        <fctr>            <list>           <list>     <list>   <list>
> <list>     <list>
> #>  1        Ashe <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #>  2   Alleghany <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> <list [2]> <list [2]>
> #>  3       Surry <tibble [1 x 14]> <simple_feature> <list [4]> <S3: wt>
> <list [2]> <list [2]>
> #>  4   Currituck <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #>  5 Northampton <tibble [1 x 14]> <simple_feature> <list [3]> <S3: wt>
> <list [2]> <list [2]>
> #>  6    Hertford <tibble [1 x 14]> <simple_feature> <list [6]> <S3: wt>
> <list [2]> <list [2]>
> #>  7      Camden <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> <list [2]> <list [2]>
> #>  8       Gates <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #>  9      Warren <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> <list [2]> <list [2]>
> #> 10      Stokes <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #> # ... with 90 more rows
>
> Examples of this nested data.frame / list col approach can be found here:
> http://r4ds.had.co.nz/many-models.html#nested-data
Thanks, interesting but really scary. You get numeric output for local
Moran's I and local Geary, but what will happen next?

All of these tests assume a correctly specified mean model. Without this
assumption being met (absence of trends, absence of global spatial
autocorrelation in the local test case, absence of missing right-hand-side
variables and absence of incorrect functional forms for these, ...), a map
pretending to invite "inference" of "hotspots" etc. - collections of
observations with similar values of the local statistic, will most often
be highly misleading. This isn't only about whether it is possible to do
technically, but also about not making it simpler than reason would
suggest. With a correct mean model, there may not be any (local) spatial
autocorrelation left. See ?localmoran.sad and ?localmoran.exact, and their
references, and McMillen (2003) (the McSpatial author). I won't even
mention the need to adjust p.values for multiple comparisons ...

>
>> Extending this to adding weights to sf objects may be going too far. It
>> seems slightly forced to - say - attach a sparse matrix to an sf geometry
>> column as an attribute, or to add a neighbour nested list column when
>> writing subsetting protection is very far from obvious. Like time series
>> data (not time stamped observations, but real ordered time series),
>> spatial data have implicit order that is tidy in a different way than tidy
>> (graph dependencies between observation rows). Maybe mapped features with
>> attached attribute data are "tidier" than a table, because they would
>> preserve their relative positions?
>
> This seems like the crux of the problem. Are you saying that the orthogonal
> "tidyness" of the tidyverse is fundamentally different from the underlying
> relationships of spatial data, and therefore there's a limit to the
> usefulness of tidyverse approach to spatial analysis?
Right, this is central and open. Spatial objects can be tidy in the
tidyverse sense - sf shows this can be done. The open question is whether
this extends to the analysis of the data. From looking around, it seems
that time series and graphs suffer from the same kinds of questions, and
it will be interesting to see what stars turns up for spatio-temporal
data:

https://github.com/edzer/stars

I see that stars wisely does not include trip/trajectory data structures,
but analysis might involve reading the array-based representation of
spatio-temporal environmental drivers for an ensemble of buffers around
trajectories (say animal movement), then modelling.

Maybe: "everything should be as tidy as it can be but not tidier"?

Roger

>
> On Mon, May 15, 2017 at 5:17 AM Roger Bivand <[hidden email]> wrote:
>
>> On Sun, 14 May 2017, Tiernan Martin wrote:
>>
>>> You’re right about one thing: those of us introduced to the pipe in our
>>> formative years find it hard to escape (everything is better here inside
>>> the pipe - right?) ;)
>>
>> Pipes can be great if they don't involve extra inputs at a later stage in
>> the pipe, but as:
>>
>> https://github.com/thomasp85/tidygraph
>>
>> suggests, it isn't very easy, and may be forced.
>>
>>>
>>> Thank you for your very informative response. Before reading it, I had a
>>> vague inkling that matrices were part of the challenge of adapting these
>>> tools to work with simple features, but I also thought perhaps all this
>>> issue needed was a little nudge from a user like me. Your response made
>> it
>>> clear that there is a series of technical decisions that need to be made,
>>> and in order to do that I imagine some exploratory testing is in order.
>> Of
>>> course, it is possible that the data.frame + list-col foundation is just
>>> not well-suited to the needs of spatial weighting/testing tools, but I
>>> suspect that there is some valuable insight to be gained from exploring
>>> this possibility further. And as I stated in my first post, going
>> forward I
>>> expect more users will be interested in seeing this sort of integration
>>> happen.
>>
>> I think that there are two tasks - one to create neighbour objects from sf
>> objects, then another to see whether the costs of putting the spatial
>> weights into a data.frame (or even more challenging, a tibble) to test for
>> spatial autocorrelation or model spatial dependence. Should the weights be
>> squirreled away inside the object flowing down the pipe?
>>
>> sf_object %>% add_weights(...) -> sf_object_with_weights
>> moran.test(sf_object_with_weights, ...)
>> geary.test(sf_object_with_weights, ...)
>>
>> or
>>
>> sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights, ...)
>> -> output
>>
>> all in one (output is a htest object). It gets more complex in:
>>
>> lm.morantest(lm(formula, sf_object), sf_object_with_weights)
>>
>> as we need the weights and the lm output object.
>>
>>>
>>>> How should we go about this technically? spdep is on R-Forge and is
>> happy
>>>> there. Its present functionality has to be maintained as it stands, it
>>> has
>>>> too many reverse dependencies to break. Should it be re-written from
>>>> scratch (with more sparse matrices internally for computation)?
>>>> ...
>>>> We'd need proof of concept with realistically sized data sets (not yet
>> NY
>>>> taxis, but maybe later ...). spdep started as spweights, sptests and
>>>> spdep, and the first two got folded into the third when stable. If
>>> weights
>>>> are the first thing to go for, sfweights is where to go first (and port
>>>> the STRtree envelope intersections for contiguities). It could define
>> the
>>>> new classes, and tests and modelling would use them.
>>>
>>> Sounds to me like this package would need to be built from the ground up,
>>> possibly following a similar path to the `sp` development process as you
>>> mentioned. Maybe that presents a challenge with regards to resources
>> (i.e.,
>>> funding, time, etc.). Perhaps project this is a candidate for future
>>> proposals to the R Consortium or other funding sources. I hope that this
>>> post is useful in documenting user interest in seeing the `sf` protocol
>>> integrated into other spatial tools and workflows, and I look forward to
>>> hearing others' thoughts on the matter.
>>
>> Making spatial weights from sf objects is just expanding spdep, and is
>> feasible. There is no point looking for funding, all it needs is knowledge
>> of how things have been done in spdep. Contiguity, knn, distance,
>> graph-based neighbours are all feasible.
>>
>> Extending this to adding weights to sf objects may be going too far. It
>> seems slightly forced to - say - attach a sparse matrix to an sf geometry
>> column as an attribute, or to add a neighbour nested list column when
>> writing subsetting protection is very far from obvious. Like time series
>> data (not time stamped observations, but real ordered time series),
>> spatial data have implicit order that is tidy in a different way than tidy
>> (graph dependencies between observation rows). Maybe mapped features with
>> attached attribute data are "tidier" than a table, because they would
>> preserve their relative positions?
>>
>> Roger
>>
>>>
>>> Thanks again,
>>>
>>> Tiernan
>>>
>>>
>>>
>>>
>>>
>>> On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]>
>> wrote:
>>>
>>>> On Fri, 12 May 2017, Tiernan Martin wrote:
>>>>
>>>>> Is anyone thinking about creating an adaptation of the `spdep` package
>>>> that
>>>>> expects sf-class inputs and works well in a pipeline?
>>>>
>>>> I assume that "this is not a pipeline" is a joke for insiders (in the
>>>> pipe)?
>>>>
>>>> No, there is your issue on the sfr github repository that is relevant
>> for
>>>> contiguous neighbours, but not beyond that:
>>>>
>>>> https://github.com/edzer/sfr/issues/234
>>>>
>>>> An sf is a data.frame, and as such should "just work", like "Spatial"
>>>> objects have, in formula/data settings. The problem is (of course) the
>>>> weights matrix.
>>>>
>>>> Should it be a list column (each row has a list nesting two lists, first
>>>> indices, second non-zero weights), or remain separate as it has been for
>>>> 20 years, or become a column-oriented representation (Matrix package) -
>> a
>>>> nested list like a list column or a listw obeject is row-oriented. I had
>>>> started thinking about using a sparse column-oriented representation,
>> but
>>>> have not implemented functions accepting them instead of listw objects.
>>>>
>>>> I am very cautious about creating classes for data that were data.frame,
>>>> then sf, and then have the weights built-in. In the simple case it would
>>>> work, but all you have to do is re-order the rows and the link between
>> the
>>>> neighbour ids and row order breaks down; the same applies to subsetting.
>>>>
>>>> The problems to solve first are related the workflows, and easiest to
>> look
>>>> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
>>>> global and local tests. I think that all or almost all of the NSE verbs
>>>> will cause chaos (mutate, select, group) once weights have been created.
>>>> If there is a way to bind the neighour IDs to the input sf object rows,
>> a
>>>> list column might be possible, but we have to permit multiple such
>> columns
>>>> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep
>> the
>>>> graph relations intact (and their modifications, like
>> row-standardisation)
>>>>
>>>> If you've seen any examples of how tidy relates to graphs (adjacency
>> lists
>>>> are like nb objects), we could learn from that.
>>>>
>>>> How should we go about this technically? spdep is on R-Forge and is
>> happy
>>>> there. Its present functionality has to be maintained as it stands, it
>> has
>>>> too many reverse dependencies to break. Should it be re-written from
>>>> scratch (with more sparse matrices internally for computation)?
>>>>
>>>> Your example creates weights on the fly for a local G* map, but has
>> n=100,
>>>> not say n=90000 (LA census blocks). Using the sf_ geos based methods
>> does
>>>> not use STRtrees, which cut the time needed to find contigious
>> neighbours
>>>> from days to seconds. We ought to pre-compute weights, but this messes
>> up
>>>> the flow of data, because a second lot of data (the weights) have to
>> enter
>>>> the pipe, and be matched with the row-ids.
>>>>
>>>> We'd need proof of concept with realistically sized data sets (not yet
>> NY
>>>> taxis, but maybe later ...). spdep started as spweights, sptests and
>>>> spdep, and the first two got folded into the third when stable. If
>> weights
>>>> are the first thing to go for, sfweights is where to go first (and port
>>>> the STRtree envelope intersections for contiguities). It could define
>> the
>>>> new classes, and tests and modelling would use them.
>>>>
>>>> Thanks for starting this discussion.
>>>>
>>>> Roger
>>>>
>>>>>
>>>>> I understand that there is skepticism about the wisdom of adopting the
>>>>> “tidyverse” principles throughout the R package ecosystem, and I share
>>>> the
>>>>> concern that an over-reliance on any single paradigm could reduce the
>>>>> resilience and diversity of the system as a whole.
>>>>>
>>>>> That said, I believe that the enthusiastic adoption of the `sf` package
>>>> and
>>>>> the package's connections with widely-used tidyverse packages like
>>>> `dplyr`
>>>>> and `ggplot2` may result in increased demand for sf-friendly spatial
>>>>> analysis tools. As an amateur who recently started using R as my
>> primary
>>>>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
>>>>> objects, list columns, and pipeline workflows would be well-suited to
>> the
>>>>> field of spatial analysis. Are there some fundamental reasons why the
>>>>> `spdep` tools cannot (or should not) be adapted to the tidyverse
>>>> "dialect"?
>>>>>
>>>>> Let me put the question in the context of an actual analysis: in
>> February
>>>>> 2017, the pop culture infovis website The Pudding (
>> https://pudding.cool/
>>>> )
>>>>> published an analysis of regional preferences for Oscar-nominated films
>>>> in
>>>>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days
>> ago,
>>>>> the author posted a tutorial explaining the method of “regional
>>>> smoothing”
>>>>> used to create the article’s choropleths (
>>>>> https://pudding.cool/process/regional_smoothing/).
>>>>>
>>>>> The method relies on several `spdep` functions (
>>>>>
>>>>
>> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
>>>> ).
>>>>> In the code below, I provide reprex with a smaller dataset included in
>>>> the
>>>>> `sf` package:
>>>>>
>>>>> library(sf)
>>>>> library(spdep)
>>>>>
>>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
>>>>> Carolina counties
>>>>> nc_shp <- as(nc,'Spatial')
>>>>>
>>>>> coords <- coordinates(nc_shp)
>>>>> IDs<-row.names(as(nc_shp, "data.frame"))
>>>>>
>>>>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
>>>>> nearest neighbors for each county
>>>>> knn5 <- include.self(knn5)
>>>>>
>>>>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
>>>>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G
>> scores
>>>>> localGvalues <- round(localGvalues,3)
>>>>>
>>>>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
>>>>>
>>>>> p1 <- spplot(nc_shp, c('NWBIR74'))
>>>>> p2 <- spplot(nc_shp, c('LOCAL_G'))
>>>>> plot(p1, split=c(1,1,2,2), more=TRUE)
>>>>> plot(p2, split=c(1,2,2,2), more=TRUE)
>>>>>
>>>>> Here’s what I imagine that would look like in a tidyverse pipeline
>>>> (please
>>>>> note that this code is for illustrative purposes and will not run):
>>>>>
>>>>> library(tidyverse)
>>>>> library(purrr)
>>>>> library(sf)
>>>>> library(sfdep) # this package doesn't exist (yet)
>>>>>
>>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>>>>>
>>>>> nc_g <-
>>>>>  nc %>%
>>>>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5,
>> include.self
>>>> =
>>>>> TRUE)),  # find the nearest neighbors for each county
>>>>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style =
>> 'B')),  #
>>>>> make a list of the neighbors using the binary method
>>>>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
>>>>> zero.policy = TRUE),  # calculate the G scores
>>>>>         LOCAL_G = round(LOCAL_G,3))
>>>>>
>>>>> We can see that the (hypothetical) tidyverse version reduces the amount
>>>> of
>>>>> intermediate objects and wraps the creation of the G scores into a
>> single
>>>>> code chunk with clear steps.
>>>>>
>>>>> I'd be grateful to hear from the users and developers of the `spdep`
>> and
>>>>> `sf` packages about this topic!
>>>>>
>>>>> Tiernan Martin
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> [hidden email]
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
>> <+47%2055%2095%2093%2055>; e-mail:
>>>> [hidden email]
>>>> Editor-in-Chief of The R Journal,
>> https://journal.r-project.org/index.html
>>>> http://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
>> [hidden email]
>> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is a simple feature -friendly version of spdep being developed?

Michael Sumner-2
Hello, I'm following this conversation with great interest. There's a lot
here for me
to work with, it's an active pursuit and I don't have much to offer yet but
I have a couple of inline responses below.  I've learnt a lot about spdep
from this discussion, something I've meant to explore for a long time
but my work has never had the same modelling focus).

On Tue, 16 May 2017 at 18:25 Roger Bivand <[hidden email]> wrote:

> On Tue, 16 May 2017, Tiernan Martin wrote:
>
> >> Should the weights be
> >> squirreled away inside the object flowing down the pipe?
> >
> > I'd propose that the that functions be written in such a way that the
> > neighbors, weights, and test results can be stored in list cols within
> the
> > data.frame.
> >
> > This way multiple tests (and their inputs) can be stored in a single,
> > easily comprehensible, rectangular data object.
> >
> > Here's some more pretend code to illustrate that concept with regard to
> sf:
> >
> >  nc %>%
> >    group_by(NAME) %>%
> >    nest %>%
> >    mutate(
> >           geometry = map(.x = data, ~.x[['geometry']]),
> >           neighbors = map(.x = data, .f = my_nb), # creates a list col of
> > the neighbors
> >           weights = map(.x = data, .f = add_weights), # creates a weights
> > object (class=wt, is this a sparse matrix in spdep?)
> >           test_moran = map2(.x = data, .y =  weights, .f = my_moran), #
> > creates list of Moran's I and sample kurtosis of x
> >           test_geary = map2(.x = data, .y = weights, .f = my_geary) #
> > creates list of Geary's C and sample kurtosis of x
> >               )
> >
> > #> # A tibble: 100 x 7
> > #>           NAME              data         geometry  neighbors  weights
> > test_moran test_geary
> > #>        <fctr>            <list>           <list>     <list>   <list>
> > <list>     <list>
> > #>  1        Ashe <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  2   Alleghany <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  3       Surry <tibble [1 x 14]> <simple_feature> <list [4]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  4   Currituck <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  5 Northampton <tibble [1 x 14]> <simple_feature> <list [3]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  6    Hertford <tibble [1 x 14]> <simple_feature> <list [6]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  7      Camden <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  8       Gates <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  9      Warren <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> > <list [2]> <list [2]>
> > #> 10      Stokes <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #> # ... with 90 more rows
> >
> > Examples of this nested data.frame / list col approach can be found here:
> > http://r4ds.had.co.nz/many-models.html#nested-data
>
> Thanks, interesting but really scary. You get numeric output for local
> Moran's I and local Geary, but what will happen next?
>
> All of these tests assume a correctly specified mean model. Without this
> assumption being met (absence of trends, absence of global spatial
> autocorrelation in the local test case, absence of missing right-hand-side
> variables and absence of incorrect functional forms for these, ...), a map
> pretending to invite "inference" of "hotspots" etc. - collections of
> observations with similar values of the local statistic, will most often
> be highly misleading. This isn't only about whether it is possible to do
> technically, but also about not making it simpler than reason would
> suggest. With a correct mean model, there may not be any (local) spatial
> autocorrelation left. See ?localmoran.sad and ?localmoran.exact, and their
> references, and McMillen (2003) (the McSpatial author). I won't even
> mention the need to adjust p.values for multiple comparisons ...
>
> >
> >> Extending this to adding weights to sf objects may be going too far. It
> >> seems slightly forced to - say - attach a sparse matrix to an sf
> geometry
> >> column as an attribute, or to add a neighbour nested list column when
> >> writing subsetting protection is very far from obvious. Like time series
> >> data (not time stamped observations, but real ordered time series),
> >> spatial data have implicit order that is tidy in a different way than
> tidy
> >> (graph dependencies between observation rows). Maybe mapped features
> with
> >> attached attribute data are "tidier" than a table, because they would
> >> preserve their relative positions?
> >
> > This seems like the crux of the problem. Are you saying that the
> orthogonal
> > "tidyness" of the tidyverse is fundamentally different from the
> underlying
> > relationships of spatial data, and therefore there's a limit to the
> > usefulness of tidyverse approach to spatial analysis?
>
> Right, this is central and open. Spatial objects can be tidy in the
> tidyverse sense - sf shows this can be done. The open question is whether
> this extends to the analysis of the data.



I think the structures are still wide open, unless you think the goals of sf
encompass everything. The simple features standard is one corner of the
available ways to structure spatial data. It pointedly ignores relations
between objects, and heavily relies on 2D-specific optimizations. All
the topology is done "on the fly" and is treated as expendable. I personally
want structures that treat simple features and similar formats as
expendable,
but that maintain the topology as a first-class citizen. I don't say this
because I think simple features is not important, it's just not my
main focus. I want time, depth, temperature, salinity, anything and
everything
storable on features, parts, edges, vertices and nodes. Lots of tools
do this or aspects of it, and none of them can be shoe-horned into simple
features, and
they don't have a unifying framework. (It's called "mesh structures", or
the simplicial
complex, or "a database" in other contexts, but none of those are really
enough).

 (I'm dismayed by GIS ideas when they are spoken about as if it's the end
of the story. Sf is really great, but it's now more than the standard and
it's kind
of unique in the available tools for the standard itself - but most of them
are pretty specialized and different to each other, probably unavoidably).

ggplot2 and tidygraph still have a long way to go, but in there I believe
is a language not just of graphics and of converting to and from igraph,
but of flexible ways of specifying how spatial data are constructed,
analysed
and interacted with.
Ggplot builds geometries, but it has no output other than a plot, and the
object that can
modify that plot. Select, group-by, arrange, filter, nest, and normalization
into multiple tables are all part of the powerful ways in which raw data
can be given structure. "Spatial" is just one small part of that structure,
and
unfortunately the 'geographic spatial', the 2D optimizations baked in
by decades old GIS practice seems to have the most sway in discussions.

This community could find a way to bake-in the geometries from ggplot
constructions as sf objects, i.e. convert gg-plot into sf, not into a plot.
I think that's a
digestable project that would really provide some valuable insights.


> From looking around, it seems
> that time series and graphs suffer from the same kinds of questions, and
> it will be interesting to see what stars turns up for spatio-temporal
> data:
>
> https://github.com/edzer/stars
>
> I see that stars wisely does not include trip/trajectory data structures,
> but analysis might involve reading the array-based representation of
> spatio-temporal environmental drivers for an ensemble of buffers around
> trajectories (say animal movement), then modelling.
>
>
Interesting that you mention this, I agree it's a crux area in modern
spatial,
tracking has been neglected for years but it's starting to be looked at.

 I don't see this as terribly difficult, in fact we do this routinely
for all kinds of tracking data both as-measured and modelled. The
difficulty is having
access to the data as well as the motivation in the same place. We have
in-house tools
that sit on large collections of remotely sensed time series data:

https://github.com/AustralianAntarcticDataCentre/raadsync

When you have both the motivation (lots of animal tracking and voyage data)
and
the data on hand it's relatively easy (we have a ready audience at in
Southern Ocean
ecosystems research here).  First step is to write a read-data tool as a
function
of time. Then the problem of matching tracking data to the right space-time
window is easily broken down into the intervals between the time
series of environmental data. At that level, you can employ interpolation
between the layers,
aggregation to compromise resolution with coverage, and even
bring user-defined calculations to the smallish windows to calculate
derived products (rugosity, slope, distance to thresholds etc.)

In our modelled track outputs with tripEstimation/SGAT and bsam[1] we can
use the more probabilistic
estimates of location as more nuanced "query buckets" against the
same environmental data, it's the same issue with matching time with
all the different options you might want. What is hard for us is to share
the tools,
because first you need to arrange access to quite a lot of data, before
you get to see the magic in action.

I do think that stars provides a great opportunity though, and my thoughts
along
those lines are here:

https://github.com/mdsumner/stars.use.cases/blob/master/Use-cases.rmd#animal-tracking-mcmc-estimation


[1] https://CRAN.R-project.org/package=bsam



> Maybe: "everything should be as tidy as it can be but not tidier"?
>
>
I don't think we've even started with the housekeeping here. Some early
thoughts are here:

http://rpubs.com/cyclemumner/sc-rationale

I'm interested to help find ways to nest the spdep idioms in "tidy" ways,
but I'm also not sure
how valuable that is yet. When you convert the structures to primitives the
edge and vertex relations
are inherent already, and it's a mater of standard database
join/select/filter idioms
to link things up:

http://rpubs.com/cyclemumner/neighbours01

http://rpubs.com/cyclemumner/neighbours02

(those examples use https://github.com/mdsumner/scsf for the PRIMITIVE
model
but it's really too raw to use yet). This is all fine and dandy, but then
you still have a list-bag
- i.e. database  - of tables with no strong idioms for treating them as a
single object -
that's the real loftier goal  of tidygraph, it's certainly a clear goal of
the tidyverse to be a
general master of data).

Is this a serious option worth pursuing for spdep? I don't know, but I'm
pursuing it for my own
reasons as detailed above. I don't think modernizing spdep would be that
difficult.  At the least the limitations and
obstacles faced when using it  should be described.

I'm happy to help and I might have a go at it at some point, and I'd
encourage
anyone to get in and try it out.With rlang and the imminent release of the
greatly
improved dplyr it's a good time to start.

Cheers, Mike.



> Roger
>
> >
> > On Mon, May 15, 2017 at 5:17 AM Roger Bivand <[hidden email]>
> wrote:
> >
> >> On Sun, 14 May 2017, Tiernan Martin wrote:
> >>
> >>> You’re right about one thing: those of us introduced to the pipe in our
> >>> formative years find it hard to escape (everything is better here
> inside
> >>> the pipe - right?) ;)
> >>
> >> Pipes can be great if they don't involve extra inputs at a later stage
> in
> >> the pipe, but as:
> >>
> >> https://github.com/thomasp85/tidygraph
> >>
> >> suggests, it isn't very easy, and may be forced.
> >>
> >>>
> >>> Thank you for your very informative response. Before reading it, I had
> a
> >>> vague inkling that matrices were part of the challenge of adapting
> these
> >>> tools to work with simple features, but I also thought perhaps all this
> >>> issue needed was a little nudge from a user like me. Your response made
> >> it
> >>> clear that there is a series of technical decisions that need to be
> made,
> >>> and in order to do that I imagine some exploratory testing is in order.
> >> Of
> >>> course, it is possible that the data.frame + list-col foundation is
> just
> >>> not well-suited to the needs of spatial weighting/testing tools, but I
> >>> suspect that there is some valuable insight to be gained from exploring
> >>> this possibility further. And as I stated in my first post, going
> >> forward I
> >>> expect more users will be interested in seeing this sort of integration
> >>> happen.
> >>
> >> I think that there are two tasks - one to create neighbour objects from
> sf
> >> objects, then another to see whether the costs of putting the spatial
> >> weights into a data.frame (or even more challenging, a tibble) to test
> for
> >> spatial autocorrelation or model spatial dependence. Should the weights
> be
> >> squirreled away inside the object flowing down the pipe?
> >>
> >> sf_object %>% add_weights(...) -> sf_object_with_weights
> >> moran.test(sf_object_with_weights, ...)
> >> geary.test(sf_object_with_weights, ...)
> >>
> >> or
> >>
> >> sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights,
> ...)
> >> -> output
> >>
> >> all in one (output is a htest object). It gets more complex in:
> >>
> >> lm.morantest(lm(formula, sf_object), sf_object_with_weights)
> >>
> >> as we need the weights and the lm output object.
> >>
> >>>
> >>>> How should we go about this technically? spdep is on R-Forge and is
> >> happy
> >>>> there. Its present functionality has to be maintained as it stands, it
> >>> has
> >>>> too many reverse dependencies to break. Should it be re-written from
> >>>> scratch (with more sparse matrices internally for computation)?
> >>>> ...
> >>>> We'd need proof of concept with realistically sized data sets (not yet
> >> NY
> >>>> taxis, but maybe later ...). spdep started as spweights, sptests and
> >>>> spdep, and the first two got folded into the third when stable. If
> >>> weights
> >>>> are the first thing to go for, sfweights is where to go first (and
> port
> >>>> the STRtree envelope intersections for contiguities). It could define
> >> the
> >>>> new classes, and tests and modelling would use them.
> >>>
> >>> Sounds to me like this package would need to be built from the ground
> up,
> >>> possibly following a similar path to the `sp` development process as
> you
> >>> mentioned. Maybe that presents a challenge with regards to resources
> >> (i.e.,
> >>> funding, time, etc.). Perhaps project this is a candidate for future
> >>> proposals to the R Consortium or other funding sources. I hope that
> this
> >>> post is useful in documenting user interest in seeing the `sf` protocol
> >>> integrated into other spatial tools and workflows, and I look forward
> to
> >>> hearing others' thoughts on the matter.
> >>
> >> Making spatial weights from sf objects is just expanding spdep, and is
> >> feasible. There is no point looking for funding, all it needs is
> knowledge
> >> of how things have been done in spdep. Contiguity, knn, distance,
> >> graph-based neighbours are all feasible.
> >>
> >> Extending this to adding weights to sf objects may be going too far. It
> >> seems slightly forced to - say - attach a sparse matrix to an sf
> geometry
> >> column as an attribute, or to add a neighbour nested list column when
> >> writing subsetting protection is very far from obvious. Like time series
> >> data (not time stamped observations, but real ordered time series),
> >> spatial data have implicit order that is tidy in a different way than
> tidy
> >> (graph dependencies between observation rows). Maybe mapped features
> with
> >> attached attribute data are "tidier" than a table, because they would
> >> preserve their relative positions?
> >>
> >> Roger
> >>
> >>>
> >>> Thanks again,
> >>>
> >>> Tiernan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]>
> >> wrote:
> >>>
> >>>> On Fri, 12 May 2017, Tiernan Martin wrote:
> >>>>
> >>>>> Is anyone thinking about creating an adaptation of the `spdep`
> package
> >>>> that
> >>>>> expects sf-class inputs and works well in a pipeline?
> >>>>
> >>>> I assume that "this is not a pipeline" is a joke for insiders (in the
> >>>> pipe)?
> >>>>
> >>>> No, there is your issue on the sfr github repository that is relevant
> >> for
> >>>> contiguous neighbours, but not beyond that:
> >>>>
> >>>> https://github.com/edzer/sfr/issues/234
> >>>>
> >>>> An sf is a data.frame, and as such should "just work", like "Spatial"
> >>>> objects have, in formula/data settings. The problem is (of course) the
> >>>> weights matrix.
> >>>>
> >>>> Should it be a list column (each row has a list nesting two lists,
> first
> >>>> indices, second non-zero weights), or remain separate as it has been
> for
> >>>> 20 years, or become a column-oriented representation (Matrix package)
> -
> >> a
> >>>> nested list like a list column or a listw obeject is row-oriented. I
> had
> >>>> started thinking about using a sparse column-oriented representation,
> >> but
> >>>> have not implemented functions accepting them instead of listw
> objects.
> >>>>
> >>>> I am very cautious about creating classes for data that were
> data.frame,
> >>>> then sf, and then have the weights built-in. In the simple case it
> would
> >>>> work, but all you have to do is re-order the rows and the link between
> >> the
> >>>> neighbour ids and row order breaks down; the same applies to
> subsetting.
> >>>>
> >>>> The problems to solve first are related the workflows, and easiest to
> >> look
> >>>> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...)
> for
> >>>> global and local tests. I think that all or almost all of the NSE
> verbs
> >>>> will cause chaos (mutate, select, group) once weights have been
> created.
> >>>> If there is a way to bind the neighour IDs to the input sf object
> rows,
> >> a
> >>>> list column might be possible, but we have to permit multiple such
> >> columns
> >>>> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep
> >> the
> >>>> graph relations intact (and their modifications, like
> >> row-standardisation)
> >>>>
> >>>> If you've seen any examples of how tidy relates to graphs (adjacency
> >> lists
> >>>> are like nb objects), we could learn from that.
> >>>>
> >>>> How should we go about this technically? spdep is on R-Forge and is
> >> happy
> >>>> there. Its present functionality has to be maintained as it stands, it
> >> has
> >>>> too many reverse dependencies to break. Should it be re-written from
> >>>> scratch (with more sparse matrices internally for computation)?
> >>>>
> >>>> Your example creates weights on the fly for a local G* map, but has
> >> n=100,
> >>>> not say n=90000 (LA census blocks). Using the sf_ geos based methods
> >> does
> >>>> not use STRtrees, which cut the time needed to find contigious
> >> neighbours
> >>>> from days to seconds. We ought to pre-compute weights, but this messes
> >> up
> >>>> the flow of data, because a second lot of data (the weights) have to
> >> enter
> >>>> the pipe, and be matched with the row-ids.
> >>>>
> >>>> We'd need proof of concept with realistically sized data sets (not yet
> >> NY
> >>>> taxis, but maybe later ...). spdep started as spweights, sptests and
> >>>> spdep, and the first two got folded into the third when stable. If
> >> weights
> >>>> are the first thing to go for, sfweights is where to go first (and
> port
> >>>> the STRtree envelope intersections for contiguities). It could define
> >> the
> >>>> new classes, and tests and modelling would use them.
> >>>>
> >>>> Thanks for starting this discussion.
> >>>>
> >>>> Roger
> >>>>
> >>>>>
> >>>>> I understand that there is skepticism about the wisdom of adopting
> the
> >>>>> “tidyverse” principles throughout the R package ecosystem, and I
> share
> >>>> the
> >>>>> concern that an over-reliance on any single paradigm could reduce the
> >>>>> resilience and diversity of the system as a whole.
> >>>>>
> >>>>> That said, I believe that the enthusiastic adoption of the `sf`
> package
> >>>> and
> >>>>> the package's connections with widely-used tidyverse packages like
> >>>> `dplyr`
> >>>>> and `ggplot2` may result in increased demand for sf-friendly spatial
> >>>>> analysis tools. As an amateur who recently started using R as my
> >> primary
> >>>>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
> >>>>> objects, list columns, and pipeline workflows would be well-suited to
> >> the
> >>>>> field of spatial analysis. Are there some fundamental reasons why the
> >>>>> `spdep` tools cannot (or should not) be adapted to the tidyverse
> >>>> "dialect"?
> >>>>>
> >>>>> Let me put the question in the context of an actual analysis: in
> >> February
> >>>>> 2017, the pop culture infovis website The Pudding (
> >> https://pudding.cool/
> >>>> )
> >>>>> published an analysis of regional preferences for Oscar-nominated
> films
> >>>> in
> >>>>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days
> >> ago,
> >>>>> the author posted a tutorial explaining the method of “regional
> >>>> smoothing”
> >>>>> used to create the article’s choropleths (
> >>>>> https://pudding.cool/process/regional_smoothing/).
> >>>>>
> >>>>> The method relies on several `spdep` functions (
> >>>>>
> >>>>
> >>
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
> >>>> ).
> >>>>> In the code below, I provide reprex with a smaller dataset included
> in
> >>>> the
> >>>>> `sf` package:
> >>>>>
> >>>>> library(sf)
> >>>>> library(spdep)
> >>>>>
> >>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> >>>>> Carolina counties
> >>>>> nc_shp <- as(nc,'Spatial')
> >>>>>
> >>>>> coords <- coordinates(nc_shp)
> >>>>> IDs<-row.names(as(nc_shp, "data.frame"))
> >>>>>
> >>>>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find
> the
> >>>>> nearest neighbors for each county
> >>>>> knn5 <- include.self(knn5)
> >>>>>
> >>>>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> >>>>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G
> >> scores
> >>>>> localGvalues <- round(localGvalues,3)
> >>>>>
> >>>>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
> >>>>>
> >>>>> p1 <- spplot(nc_shp, c('NWBIR74'))
> >>>>> p2 <- spplot(nc_shp, c('LOCAL_G'))
> >>>>> plot(p1, split=c(1,1,2,2), more=TRUE)
> >>>>> plot(p2, split=c(1,2,2,2), more=TRUE)
> >>>>>
> >>>>> Here’s what I imagine that would look like in a tidyverse pipeline
> >>>> (please
> >>>>> note that this code is for illustrative purposes and will not run):
> >>>>>
> >>>>> library(tidyverse)
> >>>>> library(purrr)
> >>>>> library(sf)
> >>>>> library(sfdep) # this package doesn't exist (yet)
> >>>>>
> >>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
> >>>>>
> >>>>> nc_g <-
> >>>>>  nc %>%
> >>>>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5,
> >> include.self
> >>>> =
> >>>>> TRUE)),  # find the nearest neighbors for each county
> >>>>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style =
> >> 'B')),  #
> >>>>> make a list of the neighbors using the binary method
> >>>>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> >>>>> zero.policy = TRUE),  # calculate the G scores
> >>>>>         LOCAL_G = round(LOCAL_G,3))
> >>>>>
> >>>>> We can see that the (hypothetical) tidyverse version reduces the
> amount
> >>>> of
> >>>>> intermediate objects and wraps the creation of the G scores into a
> >> single
> >>>>> code chunk with clear steps.
> >>>>>
> >>>>> I'd be grateful to hear from the users and developers of the `spdep`
> >> and
> >>>>> `sf` packages about this topic!
> >>>>>
> >>>>> Tiernan Martin
> >>>>>
> >>>>>       [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> R-sig-Geo mailing list
> >>>>> [hidden email]
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>>
> >>>> --
> >>>> Roger Bivand
> >>>> Department of Economics, Norwegian School of Economics,
> >>>> Helleveien 30, N-5045 Bergen, Norway.
> >>>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
> <+47%2055%2095%2093%2055>
> >> <+47%2055%2095%2093%2055>; e-mail:
> >>>> [hidden email]
> >>>> Editor-in-Chief of The R Journal,
> >> https://journal.r-project.org/index.html
> >>>> http://orcid.org/0000-0003-2392-6140
> >>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>>
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
> <+47%2055%2095%2093%2055>; e-mail:
> >> [hidden email]
> >> Editor-in-Chief of The R Journal,
> https://journal.r-project.org/index.html
> >> http://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
> [hidden email]
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is a simple feature -friendly version of spdep being developed?

edzer
Interesting discussion, thanks for bringing this up here. Some ideas:

I agree that there's more than one answer to the question what is tidy
in the context of analysing spatial data. Compared to the S4 classes in
sp, with the simplicity of S3 and data.frame's, there's also a certain
risk/loss, two examples came up the last few days in [1] and [2].

I don't think that the risks of adding neighbourhood lists or sparse
weights matrices to sf/data.frame objects are really large - right now
I'd guess (untried!) it is also possible to create nb lists, then subset
data and this list or sparse matrix, and end up with a meaningless mess
relatively unnoticed (some luck needed).

It might be a good idea to give these lists a certain class (right now
`st_intersects` or `st_rook` return an unclassed list), and modify the
`filter.sf` method such that it will warn when these objects are
subsetted - in which case they would loose their meaning.

I'm not exactly sure how to pack a sparse weights matrix in a
list-column, but that can't be hard; somehow I feel that sparse matrices
from Matrix would not fit in, since they're not lists - convert on the fly?

[1] https://github.com/edzer/sfr/issues/343
[2] https://github.com/edzer/sfr/issues/340

On 16/05/17 14:27, Michael Sumner wrote:

> Hello, I'm following this conversation with great interest. There's a lot
> here for me
> to work with, it's an active pursuit and I don't have much to offer yet but
> I have a couple of inline responses below.  I've learnt a lot about spdep
> from this discussion, something I've meant to explore for a long time
> but my work has never had the same modelling focus).
>
> On Tue, 16 May 2017 at 18:25 Roger Bivand <[hidden email]> wrote:
>
>> On Tue, 16 May 2017, Tiernan Martin wrote:
>>
>>>> Should the weights be
>>>> squirreled away inside the object flowing down the pipe?
>>>
>>> I'd propose that the that functions be written in such a way that the
>>> neighbors, weights, and test results can be stored in list cols within
>> the
>>> data.frame.
>>>
>>> This way multiple tests (and their inputs) can be stored in a single,
>>> easily comprehensible, rectangular data object.
>>>
>>> Here's some more pretend code to illustrate that concept with regard to
>> sf:
>>>
>>>  nc %>%
>>>    group_by(NAME) %>%
>>>    nest %>%
>>>    mutate(
>>>           geometry = map(.x = data, ~.x[['geometry']]),
>>>           neighbors = map(.x = data, .f = my_nb), # creates a list col of
>>> the neighbors
>>>           weights = map(.x = data, .f = add_weights), # creates a weights
>>> object (class=wt, is this a sparse matrix in spdep?)
>>>           test_moran = map2(.x = data, .y =  weights, .f = my_moran), #
>>> creates list of Moran's I and sample kurtosis of x
>>>           test_geary = map2(.x = data, .y = weights, .f = my_geary) #
>>> creates list of Geary's C and sample kurtosis of x
>>>               )
>>>
>>> #> # A tibble: 100 x 7
>>> #>           NAME              data         geometry  neighbors  weights
>>> test_moran test_geary
>>> #>        <fctr>            <list>           <list>     <list>   <list>
>>> <list>     <list>
>>> #>  1        Ashe <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  2   Alleghany <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  3       Surry <tibble [1 x 14]> <simple_feature> <list [4]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  4   Currituck <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  5 Northampton <tibble [1 x 14]> <simple_feature> <list [3]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  6    Hertford <tibble [1 x 14]> <simple_feature> <list [6]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  7      Camden <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  8       Gates <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #>  9      Warren <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #> 10      Stokes <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
>>> <list [2]> <list [2]>
>>> #> # ... with 90 more rows
>>>
>>> Examples of this nested data.frame / list col approach can be found here:
>>> http://r4ds.had.co.nz/many-models.html#nested-data
>>
>> Thanks, interesting but really scary. You get numeric output for local
>> Moran's I and local Geary, but what will happen next?
>>
>> All of these tests assume a correctly specified mean model. Without this
>> assumption being met (absence of trends, absence of global spatial
>> autocorrelation in the local test case, absence of missing right-hand-side
>> variables and absence of incorrect functional forms for these, ...), a map
>> pretending to invite "inference" of "hotspots" etc. - collections of
>> observations with similar values of the local statistic, will most often
>> be highly misleading. This isn't only about whether it is possible to do
>> technically, but also about not making it simpler than reason would
>> suggest. With a correct mean model, there may not be any (local) spatial
>> autocorrelation left. See ?localmoran.sad and ?localmoran.exact, and their
>> references, and McMillen (2003) (the McSpatial author). I won't even
>> mention the need to adjust p.values for multiple comparisons ...
>>
>>>
>>>> Extending this to adding weights to sf objects may be going too far. It
>>>> seems slightly forced to - say - attach a sparse matrix to an sf
>> geometry
>>>> column as an attribute, or to add a neighbour nested list column when
>>>> writing subsetting protection is very far from obvious. Like time series
>>>> data (not time stamped observations, but real ordered time series),
>>>> spatial data have implicit order that is tidy in a different way than
>> tidy
>>>> (graph dependencies between observation rows). Maybe mapped features
>> with
>>>> attached attribute data are "tidier" than a table, because they would
>>>> preserve their relative positions?
>>>
>>> This seems like the crux of the problem. Are you saying that the
>> orthogonal
>>> "tidyness" of the tidyverse is fundamentally different from the
>> underlying
>>> relationships of spatial data, and therefore there's a limit to the
>>> usefulness of tidyverse approach to spatial analysis?
>>
>> Right, this is central and open. Spatial objects can be tidy in the
>> tidyverse sense - sf shows this can be done. The open question is whether
>> this extends to the analysis of the data.
>
>
>
> I think the structures are still wide open, unless you think the goals of sf
> encompass everything. The simple features standard is one corner of the
> available ways to structure spatial data. It pointedly ignores relations
> between objects, and heavily relies on 2D-specific optimizations. All
> the topology is done "on the fly" and is treated as expendable. I personally
> want structures that treat simple features and similar formats as
> expendable,
> but that maintain the topology as a first-class citizen. I don't say this
> because I think simple features is not important, it's just not my
> main focus. I want time, depth, temperature, salinity, anything and
> everything
> storable on features, parts, edges, vertices and nodes. Lots of tools
> do this or aspects of it, and none of them can be shoe-horned into simple
> features, and
> they don't have a unifying framework. (It's called "mesh structures", or
> the simplicial
> complex, or "a database" in other contexts, but none of those are really
> enough).
>
>  (I'm dismayed by GIS ideas when they are spoken about as if it's the end
> of the story. Sf is really great, but it's now more than the standard and
> it's kind
> of unique in the available tools for the standard itself - but most of them
> are pretty specialized and different to each other, probably unavoidably).
>
> ggplot2 and tidygraph still have a long way to go, but in there I believe
> is a language not just of graphics and of converting to and from igraph,
> but of flexible ways of specifying how spatial data are constructed,
> analysed
> and interacted with.
> Ggplot builds geometries, but it has no output other than a plot, and the
> object that can
> modify that plot. Select, group-by, arrange, filter, nest, and normalization
> into multiple tables are all part of the powerful ways in which raw data
> can be given structure. "Spatial" is just one small part of that structure,
> and
> unfortunately the 'geographic spatial', the 2D optimizations baked in
> by decades old GIS practice seems to have the most sway in discussions.
>
> This community could find a way to bake-in the geometries from ggplot
> constructions as sf objects, i.e. convert gg-plot into sf, not into a plot.
> I think that's a
> digestable project that would really provide some valuable insights.
>
>
>> From looking around, it seems
>> that time series and graphs suffer from the same kinds of questions, and
>> it will be interesting to see what stars turns up for spatio-temporal
>> data:
>>
>> https://github.com/edzer/stars
>>
>> I see that stars wisely does not include trip/trajectory data structures,
>> but analysis might involve reading the array-based representation of
>> spatio-temporal environmental drivers for an ensemble of buffers around
>> trajectories (say animal movement), then modelling.
>>
>>
> Interesting that you mention this, I agree it's a crux area in modern
> spatial,
> tracking has been neglected for years but it's starting to be looked at.
>
>  I don't see this as terribly difficult, in fact we do this routinely
> for all kinds of tracking data both as-measured and modelled. The
> difficulty is having
> access to the data as well as the motivation in the same place. We have
> in-house tools
> that sit on large collections of remotely sensed time series data:
>
> https://github.com/AustralianAntarcticDataCentre/raadsync
>
> When you have both the motivation (lots of animal tracking and voyage data)
> and
> the data on hand it's relatively easy (we have a ready audience at in
> Southern Ocean
> ecosystems research here).  First step is to write a read-data tool as a
> function
> of time. Then the problem of matching tracking data to the right space-time
> window is easily broken down into the intervals between the time
> series of environmental data. At that level, you can employ interpolation
> between the layers,
> aggregation to compromise resolution with coverage, and even
> bring user-defined calculations to the smallish windows to calculate
> derived products (rugosity, slope, distance to thresholds etc.)
>
> In our modelled track outputs with tripEstimation/SGAT and bsam[1] we can
> use the more probabilistic
> estimates of location as more nuanced "query buckets" against the
> same environmental data, it's the same issue with matching time with
> all the different options you might want. What is hard for us is to share
> the tools,
> because first you need to arrange access to quite a lot of data, before
> you get to see the magic in action.
>
> I do think that stars provides a great opportunity though, and my thoughts
> along
> those lines are here:
>
> https://github.com/mdsumner/stars.use.cases/blob/master/Use-cases.rmd#animal-tracking-mcmc-estimation
>
>
> [1] https://CRAN.R-project.org/package=bsam
>
>
>
>> Maybe: "everything should be as tidy as it can be but not tidier"?
>>
>>
> I don't think we've even started with the housekeeping here. Some early
> thoughts are here:
>
> http://rpubs.com/cyclemumner/sc-rationale
>
> I'm interested to help find ways to nest the spdep idioms in "tidy" ways,
> but I'm also not sure
> how valuable that is yet. When you convert the structures to primitives the
> edge and vertex relations
> are inherent already, and it's a mater of standard database
> join/select/filter idioms
> to link things up:
>
> http://rpubs.com/cyclemumner/neighbours01
>
> http://rpubs.com/cyclemumner/neighbours02
>
> (those examples use https://github.com/mdsumner/scsf for the PRIMITIVE
> model
> but it's really too raw to use yet). This is all fine and dandy, but then
> you still have a list-bag
> - i.e. database  - of tables with no strong idioms for treating them as a
> single object -
> that's the real loftier goal  of tidygraph, it's certainly a clear goal of
> the tidyverse to be a
> general master of data).
>
> Is this a serious option worth pursuing for spdep? I don't know, but I'm
> pursuing it for my own
> reasons as detailed above. I don't think modernizing spdep would be that
> difficult.  At the least the limitations and
> obstacles faced when using it  should be described.
>
> I'm happy to help and I might have a go at it at some point, and I'd
> encourage
> anyone to get in and try it out.With rlang and the imminent release of the
> greatly
> improved dplyr it's a good time to start.
>
> Cheers, Mike.
>
>
>
>> Roger
>>
>>>
>>> On Mon, May 15, 2017 at 5:17 AM Roger Bivand <[hidden email]>
>> wrote:
>>>
>>>> On Sun, 14 May 2017, Tiernan Martin wrote:
>>>>
>>>>> You’re right about one thing: those of us introduced to the pipe in our
>>>>> formative years find it hard to escape (everything is better here
>> inside
>>>>> the pipe - right?) ;)
>>>>
>>>> Pipes can be great if they don't involve extra inputs at a later stage
>> in
>>>> the pipe, but as:
>>>>
>>>> https://github.com/thomasp85/tidygraph
>>>>
>>>> suggests, it isn't very easy, and may be forced.
>>>>
>>>>>
>>>>> Thank you for your very informative response. Before reading it, I had
>> a
>>>>> vague inkling that matrices were part of the challenge of adapting
>> these
>>>>> tools to work with simple features, but I also thought perhaps all this
>>>>> issue needed was a little nudge from a user like me. Your response made
>>>> it
>>>>> clear that there is a series of technical decisions that need to be
>> made,
>>>>> and in order to do that I imagine some exploratory testing is in order.
>>>> Of
>>>>> course, it is possible that the data.frame + list-col foundation is
>> just
>>>>> not well-suited to the needs of spatial weighting/testing tools, but I
>>>>> suspect that there is some valuable insight to be gained from exploring
>>>>> this possibility further. And as I stated in my first post, going
>>>> forward I
>>>>> expect more users will be interested in seeing this sort of integration
>>>>> happen.
>>>>
>>>> I think that there are two tasks - one to create neighbour objects from
>> sf
>>>> objects, then another to see whether the costs of putting the spatial
>>>> weights into a data.frame (or even more challenging, a tibble) to test
>> for
>>>> spatial autocorrelation or model spatial dependence. Should the weights
>> be
>>>> squirreled away inside the object flowing down the pipe?
>>>>
>>>> sf_object %>% add_weights(...) -> sf_object_with_weights
>>>> moran.test(sf_object_with_weights, ...)
>>>> geary.test(sf_object_with_weights, ...)
>>>>
>>>> or
>>>>
>>>> sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights,
>> ...)
>>>> -> output
>>>>
>>>> all in one (output is a htest object). It gets more complex in:
>>>>
>>>> lm.morantest(lm(formula, sf_object), sf_object_with_weights)
>>>>
>>>> as we need the weights and the lm output object.
>>>>
>>>>>
>>>>>> How should we go about this technically? spdep is on R-Forge and is
>>>> happy
>>>>>> there. Its present functionality has to be maintained as it stands, it
>>>>> has
>>>>>> too many reverse dependencies to break. Should it be re-written from
>>>>>> scratch (with more sparse matrices internally for computation)?
>>>>>> ...
>>>>>> We'd need proof of concept with realistically sized data sets (not yet
>>>> NY
>>>>>> taxis, but maybe later ...). spdep started as spweights, sptests and
>>>>>> spdep, and the first two got folded into the third when stable. If
>>>>> weights
>>>>>> are the first thing to go for, sfweights is where to go first (and
>> port
>>>>>> the STRtree envelope intersections for contiguities). It could define
>>>> the
>>>>>> new classes, and tests and modelling would use them.
>>>>>
>>>>> Sounds to me like this package would need to be built from the ground
>> up,
>>>>> possibly following a similar path to the `sp` development process as
>> you
>>>>> mentioned. Maybe that presents a challenge with regards to resources
>>>> (i.e.,
>>>>> funding, time, etc.). Perhaps project this is a candidate for future
>>>>> proposals to the R Consortium or other funding sources. I hope that
>> this
>>>>> post is useful in documenting user interest in seeing the `sf` protocol
>>>>> integrated into other spatial tools and workflows, and I look forward
>> to
>>>>> hearing others' thoughts on the matter.
>>>>
>>>> Making spatial weights from sf objects is just expanding spdep, and is
>>>> feasible. There is no point looking for funding, all it needs is
>> knowledge
>>>> of how things have been done in spdep. Contiguity, knn, distance,
>>>> graph-based neighbours are all feasible.
>>>>
>>>> Extending this to adding weights to sf objects may be going too far. It
>>>> seems slightly forced to - say - attach a sparse matrix to an sf
>> geometry
>>>> column as an attribute, or to add a neighbour nested list column when
>>>> writing subsetting protection is very far from obvious. Like time series
>>>> data (not time stamped observations, but real ordered time series),
>>>> spatial data have implicit order that is tidy in a different way than
>> tidy
>>>> (graph dependencies between observation rows). Maybe mapped features
>> with
>>>> attached attribute data are "tidier" than a table, because they would
>>>> preserve their relative positions?
>>>>
>>>> Roger
>>>>
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Tiernan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]>
>>>> wrote:
>>>>>
>>>>>> On Fri, 12 May 2017, Tiernan Martin wrote:
>>>>>>
>>>>>>> Is anyone thinking about creating an adaptation of the `spdep`
>> package
>>>>>> that
>>>>>>> expects sf-class inputs and works well in a pipeline?
>>>>>>
>>>>>> I assume that "this is not a pipeline" is a joke for insiders (in the
>>>>>> pipe)?
>>>>>>
>>>>>> No, there is your issue on the sfr github repository that is relevant
>>>> for
>>>>>> contiguous neighbours, but not beyond that:
>>>>>>
>>>>>> https://github.com/edzer/sfr/issues/234
>>>>>>
>>>>>> An sf is a data.frame, and as such should "just work", like "Spatial"
>>>>>> objects have, in formula/data settings. The problem is (of course) the
>>>>>> weights matrix.
>>>>>>
>>>>>> Should it be a list column (each row has a list nesting two lists,
>> first
>>>>>> indices, second non-zero weights), or remain separate as it has been
>> for
>>>>>> 20 years, or become a column-oriented representation (Matrix package)
>> -
>>>> a
>>>>>> nested list like a list column or a listw obeject is row-oriented. I
>> had
>>>>>> started thinking about using a sparse column-oriented representation,
>>>> but
>>>>>> have not implemented functions accepting them instead of listw
>> objects.
>>>>>>
>>>>>> I am very cautious about creating classes for data that were
>> data.frame,
>>>>>> then sf, and then have the weights built-in. In the simple case it
>> would
>>>>>> work, but all you have to do is re-order the rows and the link between
>>>> the
>>>>>> neighbour ids and row order breaks down; the same applies to
>> subsetting.
>>>>>>
>>>>>> The problems to solve first are related the workflows, and easiest to
>>>> look
>>>>>> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...)
>> for
>>>>>> global and local tests. I think that all or almost all of the NSE
>> verbs
>>>>>> will cause chaos (mutate, select, group) once weights have been
>> created.
>>>>>> If there is a way to bind the neighour IDs to the input sf object
>> rows,
>>>> a
>>>>>> list column might be possible, but we have to permit multiple such
>>>> columns
>>>>>> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep
>>>> the
>>>>>> graph relations intact (and their modifications, like
>>>> row-standardisation)
>>>>>>
>>>>>> If you've seen any examples of how tidy relates to graphs (adjacency
>>>> lists
>>>>>> are like nb objects), we could learn from that.
>>>>>>
>>>>>> How should we go about this technically? spdep is on R-Forge and is
>>>> happy
>>>>>> there. Its present functionality has to be maintained as it stands, it
>>>> has
>>>>>> too many reverse dependencies to break. Should it be re-written from
>>>>>> scratch (with more sparse matrices internally for computation)?
>>>>>>
>>>>>> Your example creates weights on the fly for a local G* map, but has
>>>> n=100,
>>>>>> not say n=90000 (LA census blocks). Using the sf_ geos based methods
>>>> does
>>>>>> not use STRtrees, which cut the time needed to find contigious
>>>> neighbours
>>>>>> from days to seconds. We ought to pre-compute weights, but this messes
>>>> up
>>>>>> the flow of data, because a second lot of data (the weights) have to
>>>> enter
>>>>>> the pipe, and be matched with the row-ids.
>>>>>>
>>>>>> We'd need proof of concept with realistically sized data sets (not yet
>>>> NY
>>>>>> taxis, but maybe later ...). spdep started as spweights, sptests and
>>>>>> spdep, and the first two got folded into the third when stable. If
>>>> weights
>>>>>> are the first thing to go for, sfweights is where to go first (and
>> port
>>>>>> the STRtree envelope intersections for contiguities). It could define
>>>> the
>>>>>> new classes, and tests and modelling would use them.
>>>>>>
>>>>>> Thanks for starting this discussion.
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>>
>>>>>>> I understand that there is skepticism about the wisdom of adopting
>> the
>>>>>>> “tidyverse” principles throughout the R package ecosystem, and I
>> share
>>>>>> the
>>>>>>> concern that an over-reliance on any single paradigm could reduce the
>>>>>>> resilience and diversity of the system as a whole.
>>>>>>>
>>>>>>> That said, I believe that the enthusiastic adoption of the `sf`
>> package
>>>>>> and
>>>>>>> the package's connections with widely-used tidyverse packages like
>>>>>> `dplyr`
>>>>>>> and `ggplot2` may result in increased demand for sf-friendly spatial
>>>>>>> analysis tools. As an amateur who recently started using R as my
>>>> primary
>>>>>>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
>>>>>>> objects, list columns, and pipeline workflows would be well-suited to
>>>> the
>>>>>>> field of spatial analysis. Are there some fundamental reasons why the
>>>>>>> `spdep` tools cannot (or should not) be adapted to the tidyverse
>>>>>> "dialect"?
>>>>>>>
>>>>>>> Let me put the question in the context of an actual analysis: in
>>>> February
>>>>>>> 2017, the pop culture infovis website The Pudding (
>>>> https://pudding.cool/
>>>>>> )
>>>>>>> published an analysis of regional preferences for Oscar-nominated
>> films
>>>>>> in
>>>>>>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days
>>>> ago,
>>>>>>> the author posted a tutorial explaining the method of “regional
>>>>>> smoothing”
>>>>>>> used to create the article’s choropleths (
>>>>>>> https://pudding.cool/process/regional_smoothing/).
>>>>>>>
>>>>>>> The method relies on several `spdep` functions (
>>>>>>>
>>>>>>
>>>>
>> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
>>>>>> ).
>>>>>>> In the code below, I provide reprex with a smaller dataset included
>> in
>>>>>> the
>>>>>>> `sf` package:
>>>>>>>
>>>>>>> library(sf)
>>>>>>> library(spdep)
>>>>>>>
>>>>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
>>>>>>> Carolina counties
>>>>>>> nc_shp <- as(nc,'Spatial')
>>>>>>>
>>>>>>> coords <- coordinates(nc_shp)
>>>>>>> IDs<-row.names(as(nc_shp, "data.frame"))
>>>>>>>
>>>>>>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find
>> the
>>>>>>> nearest neighbors for each county
>>>>>>> knn5 <- include.self(knn5)
>>>>>>>
>>>>>>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
>>>>>>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G
>>>> scores
>>>>>>> localGvalues <- round(localGvalues,3)
>>>>>>>
>>>>>>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
>>>>>>>
>>>>>>> p1 <- spplot(nc_shp, c('NWBIR74'))
>>>>>>> p2 <- spplot(nc_shp, c('LOCAL_G'))
>>>>>>> plot(p1, split=c(1,1,2,2), more=TRUE)
>>>>>>> plot(p2, split=c(1,2,2,2), more=TRUE)
>>>>>>>
>>>>>>> Here’s what I imagine that would look like in a tidyverse pipeline
>>>>>> (please
>>>>>>> note that this code is for illustrative purposes and will not run):
>>>>>>>
>>>>>>> library(tidyverse)
>>>>>>> library(purrr)
>>>>>>> library(sf)
>>>>>>> library(sfdep) # this package doesn't exist (yet)
>>>>>>>
>>>>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>>>>>>>
>>>>>>> nc_g <-
>>>>>>>  nc %>%
>>>>>>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5,
>>>> include.self
>>>>>> =
>>>>>>> TRUE)),  # find the nearest neighbors for each county
>>>>>>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style =
>>>> 'B')),  #
>>>>>>> make a list of the neighbors using the binary method
>>>>>>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
>>>>>>> zero.policy = TRUE),  # calculate the G scores
>>>>>>>         LOCAL_G = round(LOCAL_G,3))
>>>>>>>
>>>>>>> We can see that the (hypothetical) tidyverse version reduces the
>> amount
>>>>>> of
>>>>>>> intermediate objects and wraps the creation of the G scores into a
>>>> single
>>>>>>> code chunk with clear steps.
>>>>>>>
>>>>>>> I'd be grateful to hear from the users and developers of the `spdep`
>>>> and
>>>>>>> `sf` packages about this topic!
>>>>>>>
>>>>>>> Tiernan Martin
>>>>>>>
>>>>>>>       [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-Geo mailing list
>>>>>>> [hidden email]
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>
>>>>>> --
>>>>>> Roger Bivand
>>>>>> Department of Economics, Norwegian School of Economics,
>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
>> <+47%2055%2095%2093%2055>
>>>> <+47%2055%2095%2093%2055>; e-mail:
>>>>>> [hidden email]
>>>>>> Editor-in-Chief of The R Journal,
>>>> https://journal.r-project.org/index.html
>>>>>> http://orcid.org/0000-0003-2392-6140
>>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>>>
>>>>
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
>> <+47%2055%2095%2093%2055>; e-mail:
>>>> [hidden email]
>>>> Editor-in-Chief of The R Journal,
>> https://journal.r-project.org/index.html
>>>> http://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
>> [hidden email]
>> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Edzer Pebesma
Institute for Geoinformatics  (ifgi),  University of Münster
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

signature.asc (484 bytes) Download Attachment
Loading...