Aggregating points based on distance

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Aggregating points based on distance

Andy Bunn-4
I would like to create averages of all the variables in a SpatialPointsDataFrame when points are within a specified distance of each other. I have a method for doing this but it seems like a silly way to approach the problem. Any ideas for doing this using modern syntax (especially of the tidy variety) would be appreciated.
   

To start, I have a SpatialPointsDataFrame with several variables measured for each point. I'd like to get an average value for each variable for points within a specified distance. E.g., getting average cadmium values from the meuse data for points within 100 m of each other:
   
    library(sf)
    library(sp)
    data(meuse)
    pts <- st_as_sf(meuse, coords = c("x", "y"), remove=FALSE)
    pts100 <- st_is_within_distance(pts, dist = 100)
    # can use sapply to get mean of a variable. E.g., cadmium
    sapply(pts100, function(x){ mean(pts$cadmium[x]) })

Above, I've figured out how to use sapply to do this variable by variable. So I could, if I wanted, calculate the mean for each variable, generate a centroid for each point and then a SpatialPointsDataFrame of the unique values. E.g., for the first few variables:
   
    res <- data.frame(id=1:length(pts100),
                      x=NA, y=NA,
                      cadmium=NA, copper=NA, lead=NA)
    res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
    res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
    res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
    res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
    res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
    res2 <- res[duplicated(res$cadmium),]
    coordinates(res2) <- c("x","y")
    bubble(res2,"cadmium")
   
   
This works but seems cumbersome and like there must be a more efficient way.
   
   
Thanks for any help, Andy
   
   

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating points based on distance

Barry Rowlingson-3
On Wed, Mar 13, 2019 at 6:14 PM Andy Bunn <[hidden email]> wrote:

> I would like to create averages of all the variables in a
> SpatialPointsDataFrame when points are within a specified distance of each
> other. I have a method for doing this but it seems like a silly way to
> approach the problem. Any ideas for doing this using modern syntax
> (especially of the tidy variety) would be appreciated.
>
>
> To start, I have a SpatialPointsDataFrame with several variables measured
> for each point. I'd like to get an average value for each variable for
> points within a specified distance. E.g., getting average cadmium values
> from the meuse data for points within 100 m of each other:
>
>     library(sf)
>     library(sp)
>     data(meuse)
>     pts <- st_as_sf(meuse, coords = c("x", "y"), remove=FALSE)
>     pts100 <- st_is_within_distance(pts, dist = 100)
>     # can use sapply to get mean of a variable. E.g., cadmium
>     sapply(pts100, function(x){ mean(pts$cadmium[x]) })
>
>
If this is the method you call "silly" then I don't see anything silly at
all here, only efficient well-written use of base R constructs. The problem
with "modern" syntax is that its subject to rapid change and often slower
than using base R, which has had years to stabilise and optimise.

If you want to iterate this over variables then nest your sapplys:

items = c("cadmium", "copper","lead")
sapply(items, function(item){
 sapply(pts100, function(x){ mean(pts[[item]][x]) })
})

gets you:

         cadmium    copper      lead
  [1,] 10.150000  83.00000 288.00000
  [2,] 10.150000  83.00000 288.00000
  [3,]  6.500000  68.00000 199.00000
  [4,]  2.600000  81.00000 116.00000


Barry


> Above, I've figured out how to use sapply to do this variable by variable.
> So I could, if I wanted, calculate the mean for each variable, generate a
> centroid for each point and then a SpatialPointsDataFrame of the unique
> values. E.g., for the first few variables:
>
>     res <- data.frame(id=1:length(pts100),
>                       x=NA, y=NA,
>                       cadmium=NA, copper=NA, lead=NA)
>     res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
>     res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
>     res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
>     res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
>     res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
>     res2 <- res[duplicated(res$cadmium),]
>     coordinates(res2) <- c("x","y")
>     bubble(res2,"cadmium")
>
>
> This works but seems cumbersome and like there must be a more efficient
> way.
>
>
> Thanks for any help, Andy
>
>
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Aggregating points based on distance

Andy Bunn-4
Ha! That is a great take. Thanks Barry.

´╗┐On 3/13/19, 11:34 AM, "R-sig-Geo on behalf of Barry Rowlingson" <[hidden email] on behalf of [hidden email]> wrote:

    On Wed, Mar 13, 2019 at 6:14 PM Andy Bunn <[hidden email]> wrote:
   
    > I would like to create averages of all the variables in a
    > SpatialPointsDataFrame when points are within a specified distance of each
    > other. I have a method for doing this but it seems like a silly way to
    > approach the problem. Any ideas for doing this using modern syntax
    > (especially of the tidy variety) would be appreciated.
    >
    >
    > To start, I have a SpatialPointsDataFrame with several variables measured
    > for each point. I'd like to get an average value for each variable for
    > points within a specified distance. E.g., getting average cadmium values
    > from the meuse data for points within 100 m of each other:
    >
    >     library(sf)
    >     library(sp)
    >     data(meuse)
    >     pts <- st_as_sf(meuse, coords = c("x", "y"), remove=FALSE)
    >     pts100 <- st_is_within_distance(pts, dist = 100)
    >     # can use sapply to get mean of a variable. E.g., cadmium
    >     sapply(pts100, function(x){ mean(pts$cadmium[x]) })
    >
    >
    If this is the method you call "silly" then I don't see anything silly at
    all here, only efficient well-written use of base R constructs. The problem
    with "modern" syntax is that its subject to rapid change and often slower
    than using base R, which has had years to stabilise and optimise.
   
    If you want to iterate this over variables then nest your sapplys:
   
    items = c("cadmium", "copper","lead")
    sapply(items, function(item){
     sapply(pts100, function(x){ mean(pts[[item]][x]) })
    })
   
    gets you:
   
             cadmium    copper      lead
      [1,] 10.150000  83.00000 288.00000
      [2,] 10.150000  83.00000 288.00000
      [3,]  6.500000  68.00000 199.00000
      [4,]  2.600000  81.00000 116.00000
   
   
    Barry
   
   
    > Above, I've figured out how to use sapply to do this variable by variable.
    > So I could, if I wanted, calculate the mean for each variable, generate a
    > centroid for each point and then a SpatialPointsDataFrame of the unique
    > values. E.g., for the first few variables:
    >
    >     res <- data.frame(id=1:length(pts100),
    >                       x=NA, y=NA,
    >                       cadmium=NA, copper=NA, lead=NA)
    >     res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
    >     res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
    >     res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
    >     res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
    >     res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
    >     res2 <- res[duplicated(res$cadmium),]
    >     coordinates(res2) <- c("x","y")
    >     bubble(res2,"cadmium")
    >
    >
    > This works but seems cumbersome and like there must be a more efficient
    > way.
    >
    >
    > Thanks for any help, Andy
    >
    >
    >
    > _______________________________________________
    > R-sig-Geo mailing list
    > [hidden email]
    > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-geo&amp;data=02%7C01%7Cbunna%40wwu.edu%7C5470ab0ee3cb407f76ef08d6a7e2828f%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C636880988634768989&amp;sdata=upduDGbDHMYznJ35Bv6sJZL8t3JBeJB%2FmCqgePjvmlo%3D&amp;reserved=0
    >
   
    [[alternative HTML version deleted]]
   
    _______________________________________________
    R-sig-Geo mailing list
    [hidden email]
    https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-geo&amp;data=02%7C01%7Cbunna%40wwu.edu%7C5470ab0ee3cb407f76ef08d6a7e2828f%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C636880988634768989&amp;sdata=upduDGbDHMYznJ35Bv6sJZL8t3JBeJB%2FmCqgePjvmlo%3D&amp;reserved=0
   

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: Aggregating points based on distance

Rolf Turner
In reply to this post by Barry Rowlingson-3

On 14/03/19 7:33 AM, Barry Rowlingson wrote:

<SNIP>

> The problem with "modern" syntax is that it's subject to rapid change
> and often slower than using base R, which has had years to stabilise
> and optimise.

<SNIP>

Fortune nomination.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: Aggregating points based on distance

Dexter Locke
FWIW: I agree with Rolfs nomination.

+1

-Dexter
http://dexterlocke.com



On Thu, Mar 14, 2019 at 5:30 PM Rolf Turner <[hidden email]> wrote:

>
> On 14/03/19 7:33 AM, Barry Rowlingson wrote:
>
> <SNIP>
>
> > The problem with "modern" syntax is that it's subject to rapid change
> > and often slower than using base R, which has had years to stabilise
> > and optimise.
>
> <SNIP>
>
> Fortune nomination.
>
> cheers,
>
> Rolf
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: Aggregating points based on distance

Vijay Lulla
+1 from me too for Rolf's nomination!

On Thu, Mar 14, 2019 at 6:43 PM Dexter Locke <[hidden email]> wrote:

> FWIW: I agree with Rolfs nomination.
>
> +1
>
> -Dexter
> http://dexterlocke.com
>
>
>
> On Thu, Mar 14, 2019 at 5:30 PM Rolf Turner <[hidden email]>
> wrote:
>
> >
> > On 14/03/19 7:33 AM, Barry Rowlingson wrote:
> >
> > <SNIP>
> >
> > > The problem with "modern" syntax is that it's subject to rapid change
> > > and often slower than using base R, which has had years to stabilise
> > > and optimise.
> >
> > <SNIP>
> >
> > Fortune nomination.
> >
> > cheers,
> >
> > Rolf
> >
> > --
> > Honorary Research Fellow
> > Department of Statistics
> > University of Auckland
> > Phone: +64-9-373-7599 ext. 88276
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo