Re: gstat now uses Lapack / failing cokriging

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: gstat now uses Lapack / failing cokriging

sytbru
On 10/20/2015 05:58 AM, Edzer Pebesma wrote:

> gstat 1.1-0, now on CRAN, no longer comes with its own functions for
> matrix factorization and solving systems of equations [1], but now
> directly uses Lapack (dpotrf and dtrsm) through R's own lapack interface
> and R_ext/Lapack.h header files.

> For global kriging at one location from 10,000 observations, as in

> library(sp)
> library(gstat)
> set.seed(1331)
> n = 10000
> pts = SpatialPoints(cbind(x = runif(n), y = runif(n)))
> pts$z = runif(n)
> k <- krige(z~1, pts, pts[1,], vgm(1, "Exp", 1))

> I see a speed increase from 120 (gstat 1.0-26) to 46 seconds; using
> openblas on a 4 core laptop brings this down to 15 seconds - I expect
> sth similar with MKL/RevoR.

> For local kriging on large data sets with smaller neighborhoods and many
> locations, I wouldn't expect large improvements; for global kriging of
> large data sets to many prediction locations, krige0 may be faster when
> you use openblas or MKL - as long as things fit in memory.

>  I'd be happy to hear experiences (positive and negative), or otherwise
> reactions or questions.

Hi Edzer,

With the previous version of gstat the following code for doing cokriging using a linear model of coregionalization worked fine.

library(sp)
library(gstat)

# some data
x <- c(215, 330, 410, 470, 545)
y <- c(230, 310, 330, 340, 365)
fc <- c(0.211, 0.251, 0.281, 0.262, 0.242)
por <- c(0.438, 0.457, 0.419, 0.430, 0.468)
Allier <- data.frame(x, y, fc, por)
coordinates(Allier) = ~x+y

# gstat object for co-kriging using linear model of co-regionalization
g <- gstat(id=c("fc"), formula=fc~1, data=Allier,
           model=vgm(0.00247, "Sph", 480, 0.00166))
g <- gstat(g, id="por", formula=por~1, data=Allier,
           model=vgm(0.00239, "Sph", 480, 0.00118))
g <- gstat(g, id=c("fc", "por"),
           model=vgm(0.00151, "Sph", 480, -0.00124))

# predict at single point
point.krig <- predict(g, SpatialPoints(data.frame(x=450, y=350)))

However, using the recently downloaded version of gstat, I get:

Linear Model of Coregionalization found. Good.
[using ordinary cokriging]
Warning message:
In predict.gstat(g, SpatialPoints(data.frame(x = 450, y = 350))) :
  Warning: Covariance matrix (nearly) singular at location [450,350,0]: skipping...

Yet, the covariance matrix is not nearly singular. This appears to be caused by recent changes to the library.

Regards,

Sytze de Bruin
Wageningen University
Laboratory of Geo-Information Science and Remote Sensing
Telephone: +31 317 481830
Mobile: +31 6 13898626

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: gstat now uses Lapack / failing cokriging

edzer


On 16/11/15 12:54, Bruin, Sytze de wrote:

> On 10/20/2015 05:58 AM, Edzer Pebesma wrote:
>
>> gstat 1.1-0, now on CRAN, no longer comes with its own functions for
>> matrix factorization and solving systems of equations [1], but now
>> directly uses Lapack (dpotrf and dtrsm) through R's own lapack interface
>> and R_ext/Lapack.h header files.
>
>> For global kriging at one location from 10,000 observations, as in
>
>> library(sp)
>> library(gstat)
>> set.seed(1331)
>> n = 10000
>> pts = SpatialPoints(cbind(x = runif(n), y = runif(n)))
>> pts$z = runif(n)
>> k <- krige(z~1, pts, pts[1,], vgm(1, "Exp", 1))
>
>> I see a speed increase from 120 (gstat 1.0-26) to 46 seconds; using
>> openblas on a 4 core laptop brings this down to 15 seconds - I expect
>> sth similar with MKL/RevoR.
>
>> For local kriging on large data sets with smaller neighborhoods and many
>> locations, I wouldn't expect large improvements; for global kriging of
>> large data sets to many prediction locations, krige0 may be faster when
>> you use openblas or MKL - as long as things fit in memory.
>
>>  I'd be happy to hear experiences (positive and negative), or otherwise
>> reactions or questions.
>
> Hi Edzer,
>
> With the previous version of gstat the following code for doing cokriging using a linear model of coregionalization worked fine.
>
> library(sp)
> library(gstat)
>
> # some data
> x <- c(215, 330, 410, 470, 545)
> y <- c(230, 310, 330, 340, 365)
> fc <- c(0.211, 0.251, 0.281, 0.262, 0.242)
> por <- c(0.438, 0.457, 0.419, 0.430, 0.468)
> Allier <- data.frame(x, y, fc, por)
> coordinates(Allier) = ~x+y
>
> # gstat object for co-kriging using linear model of co-regionalization
> g <- gstat(id=c("fc"), formula=fc~1, data=Allier,
>            model=vgm(0.00247, "Sph", 480, 0.00166))
> g <- gstat(g, id="por", formula=por~1, data=Allier,
>            model=vgm(0.00239, "Sph", 480, 0.00118))
> g <- gstat(g, id=c("fc", "por"),
>            model=vgm(0.00151, "Sph", 480, -0.00124))
>
> # predict at single point
> point.krig <- predict(g, SpatialPoints(data.frame(x=450, y=350)))
>
> However, using the recently downloaded version of gstat, I get:
>
> Linear Model of Coregionalization found. Good.
> [using ordinary cokriging]
> Warning message:
> In predict.gstat(g, SpatialPoints(data.frame(x = 450, y = 350))) :
>   Warning: Covariance matrix (nearly) singular at location [450,350,0]: skipping...
>
> Yet, the covariance matrix is not nearly singular. This appears to be caused by recent changes to the library.
Hi Sytze, thanks for the clear test case. The warning message is wrong
but the problem remains: the covariance matrix is not singular, but
non-positive definite:

> r
             [,1]        [,2]        [,3]        [,4]        [,5]
 [1,] 0.004130000 0.001419390 0.000895995 0.000565582 0.000224073
 [2,] 0.001419390 0.004130000 0.001839760 0.001397620 0.000879083
 [3,] 0.000895995 0.001839760 0.004130000 0.002003000 0.001423810
 [4,] 0.000565582 0.001397620 0.002003000 0.004130000 0.001865300
 [5,] 0.000224073 0.000879083 0.001423810 0.001865300 0.004130000
 [6,] 0.001510000 0.002107720 0.001787750 0.001585760 0.001376980
 [7,] 0.002107720 0.001510000 0.002364710 0.002094420 0.001777420
 [8,] 0.001787750 0.002364710 0.001510000 0.002464510 0.002110430
 [9,] 0.001585760 0.002094420 0.002464510 0.001510000 0.002380320
[10,] 0.001376980 0.001777420 0.002110430 0.002380320 0.001510000
             [,6]        [,7]        [,8]        [,9]       [,10]
 [1,] 0.001510000 0.002107720 0.001787750 0.001585760 0.001376980
 [2,] 0.002107720 0.001510000 0.002364710 0.002094420 0.001777420
 [3,] 0.001787750 0.002364710 0.001510000 0.002464510 0.002110430
 [4,] 0.001585760 0.002094420 0.002464510 0.001510000 0.002380320
 [5,] 0.001376980 0.001777420 0.002110430 0.002380320 0.001510000
 [6,] 0.003570000 0.001373420 0.000866975 0.000547264 0.000216816
 [7,] 0.001373420 0.003570000 0.001780170 0.001352350 0.000850611
 [8,] 0.000866975 0.001780170 0.003570000 0.001938130 0.001377690
 [9,] 0.000547264 0.001352350 0.001938130 0.003570000 0.001804880
[10,] 0.000216816 0.000850611 0.001377690 0.001804880 0.003570000
> chol(r)
Error in chol.default(r) :
  the leading minor of order 9 is not positive definite

As explained in https://en.wikipedia.org/wiki/Cholesky_decomposition ,
Choleski differs from LDL', which gstat used to do before 1.1-0. As Tim
Peterson reminded me off-list, lapack also has LDL'; will look into it.

Changing the nugget of the cross variogram to -0.001 makes the matrix
positive definite, for your toy case.

>
> Regards,
>
> Sytze de Bruin
> Wageningen University
> Laboratory of Geo-Information Science and Remote Sensing
> Telephone: +31 317 481830
> Mobile: +31 6 13898626
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Edzer Pebesma
Institute for Geoinformatics (ifgi),  University of Münster,
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/
Spatial Statistics Society http://www.spatialstatistics.info


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

signature.asc (501 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: gstat now uses Lapack / failing cokriging

sytbru
In reply to this post by sytbru
Hi Edzer, thanks for your prompt reply. I tried reproducing your matrix r but got a different result, i.e. a valid covariance matrix. The cross-covariances are different. Using the same example as above, my code is as follows:

library(sp)
library(gstat)

# some data
x <- c(215, 330, 410, 470, 545)
y <- c(230, 310, 330, 340, 365)
fc <- c(0.211, 0.251, 0.281, 0.262, 0.242)
por <- c(0.438, 0.457, 0.419, 0.430, 0.468)
Allier <- data.frame(x, y, fc, por)
coordinates(Allier) = ~x+y

# gstat object for co-kriging using linear model of co-regionalization
g <- gstat(id=c("fc"), formula=fc~1, data=Allier,
            model=vgm(0.00247, "Sph", 480, 0.00166))
g <- gstat(g, id="por", formula=por~1, data=Allier,
            model=vgm(0.00239, "Sph", 480, 0.00118))
g <- gstat(g, id=c("fc", "por"),
            model=vgm(0.00151, "Sph", 480, -0.00124))


dists <- spDists(Allier)
r11 <- variogramLine(g$model$fc, dist_vector = dists, covariance = T)
r12 <- variogramLine(g$model$fc.por, dist_vector = dists, covariance = T)
r22 <- variogramLine(g$model$por, dist_vector = dists, covariance = T)
r <- cbind(r11, r12)
r <- rbind(r, cbind(r12, r22))

> r
              [,1]         [,2]         [,3]         [,4]         [,5]         [,6]         [,7]
 [1,] 0.0041300000 0.0014193874 0.0008959951 0.0005655821 0.0002240733 0.0002700000 0.0008677227
 [2,] 0.0014193874 0.0041300000 0.0018397575 0.0013976206 0.0008790828 0.0008677227 0.0002700000
 [3,] 0.0008959951 0.0018397575 0.0041300000 0.0020030001 0.0014238096 0.0005477541 0.0011247100
 [4,] 0.0005655821 0.0013976206 0.0020030001 0.0041300000 0.0018652970 0.0003457607 0.0008544158
 [5,] 0.0002240733 0.0008790828 0.0014238096 0.0018652970 0.0041300000 0.0001369841 0.0005374150
 [6,] 0.0002700000 0.0008677227 0.0005477541 0.0003457607 0.0001369841 0.0035700000 0.0013734154
 [7,] 0.0008677227 0.0002700000 0.0011247100 0.0008544158 0.0005374150 0.0013734154 0.0035700000
 [8,] 0.0005477541 0.0011247100 0.0002700000 0.0012245061 0.0008704261 0.0008669750 0.0017801702
 [9,] 0.0003457607 0.0008544158 0.0012245061 0.0002700000 0.0011403233 0.0005472637 0.0013523535
[10,] 0.0001369841 0.0005374150 0.0008704261 0.0011403233 0.0002700000 0.0002168159 0.0008506105
              [,8]         [,9]        [,10]
 [1,] 0.0005477541 0.0003457607 0.0001369841
 [2,] 0.0011247100 0.0008544158 0.0005374150
 [3,] 0.0002700000 0.0012245061 0.0008704261
 [4,] 0.0012245061 0.0002700000 0.0011403233
 [5,] 0.0008704261 0.0011403233 0.0002700000
 [6,] 0.0008669750 0.0005472637 0.0002168159
 [7,] 0.0017801702 0.0013523535 0.0008506105
 [8,] 0.0035700000 0.0019381256 0.0013776943
 [9,] 0.0019381256 0.0035700000 0.0018048825
[10,] 0.0013776943 0.0018048825 0.0035700000

> eigen(r)$values
 [1] 0.0124609506 0.0055040132 0.0046425761 0.0035980843 0.0031064016 0.0028989486 0.0028042439
 [8] 0.0018107335 0.0010254131 0.0006486352

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: gstat now uses Lapack / failing cokriging

edzer

On 17/11/15 18:14, Bruin, Sytze de wrote:

> Hi Edzer, thanks for your prompt reply. I tried reproducing your matrix r but got a different result, i.e. a valid covariance matrix. The cross-covariances are different. Using the same example as above, my code is as follows:
>
> library(sp)
> library(gstat)
>
> # some data
> x <- c(215, 330, 410, 470, 545)
> y <- c(230, 310, 330, 340, 365)
> fc <- c(0.211, 0.251, 0.281, 0.262, 0.242)
> por <- c(0.438, 0.457, 0.419, 0.430, 0.468)
> Allier <- data.frame(x, y, fc, por)
> coordinates(Allier) = ~x+y
>
> # gstat object for co-kriging using linear model of co-regionalization
> g <- gstat(id=c("fc"), formula=fc~1, data=Allier,
>             model=vgm(0.00247, "Sph", 480, 0.00166))
> g <- gstat(g, id="por", formula=por~1, data=Allier,
>             model=vgm(0.00239, "Sph", 480, 0.00118))
> g <- gstat(g, id=c("fc", "por"),
>             model=vgm(0.00151, "Sph", 480, -0.00124))
>
>
> dists <- spDists(Allier)
> r11 <- variogramLine(g$model$fc, dist_vector = dists, covariance = T)
> r12 <- variogramLine(g$model$fc.por, dist_vector = dists, covariance = T)
> r22 <- variogramLine(g$model$por, dist_vector = dists, covariance = T)
> r <- cbind(r11, r12)
> r <- rbind(r, cbind(r12, r22))
>
>> r
>               [,1]         [,2]         [,3]         [,4]         [,5]         [,6]         [,7]
>  [1,] 0.0041300000 0.0014193874 0.0008959951 0.0005655821 0.0002240733 0.0002700000 0.0008677227
>  [2,] 0.0014193874 0.0041300000 0.0018397575 0.0013976206 0.0008790828 0.0008677227 0.0002700000
>  [3,] 0.0008959951 0.0018397575 0.0041300000 0.0020030001 0.0014238096 0.0005477541 0.0011247100
>  [4,] 0.0005655821 0.0013976206 0.0020030001 0.0041300000 0.0018652970 0.0003457607 0.0008544158
>  [5,] 0.0002240733 0.0008790828 0.0014238096 0.0018652970 0.0041300000 0.0001369841 0.0005374150
>  [6,] 0.0002700000 0.0008677227 0.0005477541 0.0003457607 0.0001369841 0.0035700000 0.0013734154
>  [7,] 0.0008677227 0.0002700000 0.0011247100 0.0008544158 0.0005374150 0.0013734154 0.0035700000
>  [8,] 0.0005477541 0.0011247100 0.0002700000 0.0012245061 0.0008704261 0.0008669750 0.0017801702
>  [9,] 0.0003457607 0.0008544158 0.0012245061 0.0002700000 0.0011403233 0.0005472637 0.0013523535
> [10,] 0.0001369841 0.0005374150 0.0008704261 0.0011403233 0.0002700000 0.0002168159 0.0008506105
>               [,8]         [,9]        [,10]
>  [1,] 0.0005477541 0.0003457607 0.0001369841
>  [2,] 0.0011247100 0.0008544158 0.0005374150
>  [3,] 0.0002700000 0.0012245061 0.0008704261
>  [4,] 0.0012245061 0.0002700000 0.0011403233
>  [5,] 0.0008704261 0.0011403233 0.0002700000
>  [6,] 0.0008669750 0.0005472637 0.0002168159
>  [7,] 0.0017801702 0.0013523535 0.0008506105
>  [8,] 0.0035700000 0.0019381256 0.0013776943
>  [9,] 0.0019381256 0.0035700000 0.0018048825
> [10,] 0.0013776943 0.0018048825 0.0035700000
>
>> eigen(r)$values
>  [1] 0.0124609506 0.0055040132 0.0046425761 0.0035980843 0.0031064016 0.0028989486 0.0028042439
>  [8] 0.0018107335 0.0010254131 0.0006486352
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
Building upon your earlier example,

predict(g, SpatialPoints(data.frame(x=450, y=350)), debug = 32)

gives you the generalized covariance matrix that is used for the
cokriging, which I looked at. gstat computes generalized covariances as
C(0)-gamma(h) with C(0) = max(0, sum of the positive sill values),
instead of the sill of all sill values. If one of the sill components is
negative, this matters.

I looked in the Ver Hoef & Cressie 1993 paper, but couldn't find out
which one is right. Maybe Gerard can also take a look at it; the fix
would be trivial.

--
Edzer Pebesma
Institute for Geoinformatics (ifgi),  University of Münster,
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/
Spatial Statistics Society http://www.spatialstatistics.info


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

signature.asc (501 bytes) Download Attachment