skater - spdep runtime - geographic territories

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

skater - spdep runtime - geographic territories

Salo V
Hi Everyone,

I am trying to run the skater function for graph partitions, part of the
spdep package. My goal is to create contiguous territories for the entire
USA at the ZIP Code level.

The function takes a very long time to run even for ~15% of my total areas.
I am looking to run this for the 30,000 ZIP Codes in the USA.

The skater function documentation gives an example of parallel processing,
but it doesn’t seem to be speeding things up. I have a windows laptop with
2 physical cores and 4 logical cores. In the below code, I have already
tried to set nc = 1, nc=2 and nc=4 all with very similar results in time.

Has anyone been able to run the skater function for a large amount of areas
in a reasonable amount of time? Would really appreciate any guidance on
this, perhaps I am missing steps.



Here is the example from the documentation and which I am also running.

*library*(parallel)

nc <- detectCores(logical=FALSE)

# set nc to 1L here

*if* (nc > 1L) nc <- 1L

coresOpt <- get.coresOption()

invisible(set.coresOption(nc))

*if*(!get.mcOption()) {

# no-op, "snow" parallel calculation not available

  cl <- makeCluster(get.coresOption())

  set.ClusterOption(cl)

}

### calculating costs

system.time(plcosts <- nbcosts(bh.nb, dpad))

all.equal(lcosts, plcosts, check.attributes=FALSE)

### making listw

pnb.w <- nb2listw(bh.nb, plcosts, style="B")

### find a minimum spanning tree

pmst.bh <- mstree(pnb.w,5)

### three groups with no restriction

system.time(pres1 <- skater(pmst.bh[,1:2], dpad, 2))

*if*(!get.mcOption()) {

  set.ClusterOption(NULL)

  stopCluster(cl)

}


much appreciated!

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: skater - spdep runtime - geographic territories

Elias T Krainski
Hi Salo,

I have implemented it several years ago and this is not optimal some
ways. I will update it in near future to account for an heuristic to
avoid the exhaustive search that it performs. For now, you can find a
significant runtime reduction considering an alternative function to
compute the ssw because the way it does by default uses a lot of memory
and is bad for big datasets.

Please consider the attached code that illustrates this fact. When using
the ssdfun() I experienced a reduction factor around 4 for n=2k. I found
an additional reduction factor of 1.6 by using two (physical) cores.
This is the result I got on my laptop:

       n t1 t2 t3 t4
15  225  1  1  1  1
20  400  1  1  1  1
25  625  4  3  3  2
30  900 10  5  6  4
35 1225 21  8 13  5
40 1600 39 12 23  8
45 2025 86 24 50 15

best regards,

Elias

On 6/11/19 5:21 PM, Salo V wrote:

> Hi Everyone,
>
> I am trying to run the skater function for graph partitions, part of the
> spdep package. My goal is to create contiguous territories for the entire
> USA at the ZIP Code level.
>
> The function takes a very long time to run even for ~15% of my total areas.
> I am looking to run this for the 30,000 ZIP Codes in the USA.
>
> The skater function documentation gives an example of parallel processing,
> but it doesn’t seem to be speeding things up. I have a windows laptop with
> 2 physical cores and 4 logical cores. In the below code, I have already
> tried to set nc = 1, nc=2 and nc=4 all with very similar results in time.
>
> Has anyone been able to run the skater function for a large amount of areas
> in a reasonable amount of time? Would really appreciate any guidance on
> this, perhaps I am missing steps.
>
>
>
> Here is the example from the documentation and which I am also running.
>
> *library*(parallel)
>
> nc <- detectCores(logical=FALSE)
>
> # set nc to 1L here
>
> *if* (nc > 1L) nc <- 1L
>
> coresOpt <- get.coresOption()
>
> invisible(set.coresOption(nc))
>
> *if*(!get.mcOption()) {
>
> # no-op, "snow" parallel calculation not available
>
>    cl <- makeCluster(get.coresOption())
>
>    set.ClusterOption(cl)
>
> }
>
> ### calculating costs
>
> system.time(plcosts <- nbcosts(bh.nb, dpad))
>
> all.equal(lcosts, plcosts, check.attributes=FALSE)
>
> ### making listw
>
> pnb.w <- nb2listw(bh.nb, plcosts, style="B")
>
> ### find a minimum spanning tree
>
> pmst.bh <- mstree(pnb.w,5)
>
> ### three groups with no restriction
>
> system.time(pres1 <- skater(pmst.bh[,1:2], dpad, 2))
>
> *if*(!get.mcOption()) {
>
>    set.ClusterOption(NULL)
>
>    stopCluster(cl)
>
> }
>
>
> much appreciated!
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

elapsed-time-ssdfun.R (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: skater - spdep runtime - geographic territories

Elias T Krainski
In reply to this post by Salo V
Hi Salo,

In the file I sent attached to my previous email the ssdfun() has to be
replaced by the following in order to give the same results as the
default option in skater():

ssdfun <- function(d,i)
     sum(sqrt(colSums((t(d[i,,drop=FALSE])-
                       colMeans(d[i,,drop=FALSE]))^2)))

So, the recommendation is to use skater(..., method=ssdfun)

Best regards,

Elias

On 6/11/19 5:21 PM, Salo V wrote:

> Hi Everyone,
>
> I am trying to run the skater function for graph partitions, part of the
> spdep package. My goal is to create contiguous territories for the entire
> USA at the ZIP Code level.
>
> The function takes a very long time to run even for ~15% of my total areas.
> I am looking to run this for the 30,000 ZIP Codes in the USA.
>
> The skater function documentation gives an example of parallel processing,
> but it doesn’t seem to be speeding things up. I have a windows laptop with
> 2 physical cores and 4 logical cores. In the below code, I have already
> tried to set nc = 1, nc=2 and nc=4 all with very similar results in time.
>
> Has anyone been able to run the skater function for a large amount of areas
> in a reasonable amount of time? Would really appreciate any guidance on
> this, perhaps I am missing steps.
>
>
>
> Here is the example from the documentation and which I am also running.
>
> *library*(parallel)
>
> nc <- detectCores(logical=FALSE)
>
> # set nc to 1L here
>
> *if* (nc > 1L) nc <- 1L
>
> coresOpt <- get.coresOption()
>
> invisible(set.coresOption(nc))
>
> *if*(!get.mcOption()) {
>
> # no-op, "snow" parallel calculation not available
>
>    cl <- makeCluster(get.coresOption())
>
>    set.ClusterOption(cl)
>
> }
>
> ### calculating costs
>
> system.time(plcosts <- nbcosts(bh.nb, dpad))
>
> all.equal(lcosts, plcosts, check.attributes=FALSE)
>
> ### making listw
>
> pnb.w <- nb2listw(bh.nb, plcosts, style="B")
>
> ### find a minimum spanning tree
>
> pmst.bh <- mstree(pnb.w,5)
>
> ### three groups with no restriction
>
> system.time(pres1 <- skater(pmst.bh[,1:2], dpad, 2))
>
> *if*(!get.mcOption()) {
>
>    set.ClusterOption(NULL)
>
>    stopCluster(cl)
>
> }
>
>
> much appreciated!
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo