inference of local Gi* using permutations

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

inference of local Gi* using permutations

Jay Wang
Hello,

I am currently using the localG () in spdep package, I was wondering if we
can have a conditional permutation-based inference to get the P value for
every Gi*. I saw that a Mote Carlo simulation is used in Moran.MC(), and I
borrowed the following codes from this function and tried to see if I can
do a permutation for localG():

pvals<-matrix(0, nrow = V, ncol = 1)
for (i in 1:V){
  rankresi<-rank(res[i, ])
  xranki <- rankresi[length(res[i, ])]
  diffi <- nsim - xranki
  diffi <- ifelse(diffi > 0, diffi, 0)
  pvali <- punif((diffi + 1)/(nsim + 1))
  pvals[i,]<-pvali
}

After running these codes with several different datasets, I found that all
the negative Gi*s have very high P values say 0.999 with 999 permutations,
meaning that there are no significant cold spots. Where is the problem? How
can we do conditional permutation-based inference for localG() with R
spdep? I understand the critics of permutation-based inference for local
indicators, but I just want to explore this. Thank you!

Best

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: inference of local Gi* using permutations

Roger Bivand
Administrator
On Wed, 6 Mar 2019, Jay Wang wrote:

> Hello,
>
> I am currently using the localG () in spdep package, I was wondering if we
> can have a conditional permutation-based inference to get the P value for
> every Gi*. I saw that a Mote Carlo simulation is used in Moran.MC(), and I
> borrowed the following codes from this function and tried to see if I can
> do a permutation for localG():
>
> pvals<-matrix(0, nrow = V, ncol = 1)
> for (i in 1:V){
>  rankresi<-rank(res[i, ])
>  xranki <- rankresi[length(res[i, ])]
>  diffi <- nsim - xranki
>  diffi <- ifelse(diffi > 0, diffi, 0)
>  pvali <- punif((diffi + 1)/(nsim + 1))
>  pvals[i,]<-pvali
> }

Monte Carlo or Hope type tests or permutation bootstraps work like
analytical permutation in the global case, and are redundant for that
reason. Monte Carlo (conditional permutation) in the local case requires
that there is no global autocorrelation or other mis-specifications.

Your code snippet is not an example, and I think is not how permutation is
actually done. In the local case in ArcGIS, GeoDa, etc., you fix the value
of interest at i, randomly re-assign all the values to the other
observations, and recompute the local indicator at i, doing this say 499
times. Then you move on to the next i, and repeat. In your snippet, we
cannot see where res is coming from. Is this an n by nsim matrix from nsim
runs? Might you be needing to compare the sample values with the observed
value? If you are outputting Z-values of G_i anyway, you may need to step
back to the actual G_i (here from spdep):

attr(localG(..., return_internals=TRUE), "internals")[,1]

Then

mns <- apply(res, 1, mean)
sds <- apply(res, 1, sd)
permZ <- (obs_localG - mns)/sds

are z values that can be considered as drawn from the normal. However, I
advise against inference unless the assumptions are met, and p-values must
be adjusted anyway.

Roger

>
> After running these codes with several different datasets, I found that all
> the negative Gi*s have very high P values say 0.999 with 999 permutations,
> meaning that there are no significant cold spots. Where is the problem? How
> can we do conditional permutation-based inference for localG() with R
> spdep? I understand the critics of permutation-based inference for local
> indicators, but I just want to explore this. Thank you!
>
> Best
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: inference of local Gi* using permutations

Jay Wang
Thank you for the detailed explanation. Sorry for missing the explanation
of res. It is an n by nsim+1 matrix where the n columns are the permutation
results using sample() and the last column is the actual z values of G_i*s.
The following line of code was modified to do the permutations: (i in
1:nsim) res[i] <- moran(sample(x), listw, n, S0, zero.policy)$I. This could
be wrong since the sample() does not fix the value of interest at i. Are
there any functions in R can do this in a correct way? How do we calculate
the p-values after we get the n by nsim matrix? Thank you!


On Thu, Mar 7, 2019 at 8:05 AM Roger Bivand <[hidden email]> wrote:

> On Wed, 6 Mar 2019, Jay Wang wrote:
>
> > Hello,
> >
> > I am currently using the localG () in spdep package, I was wondering if
> we
> > can have a conditional permutation-based inference to get the P value for
> > every Gi*. I saw that a Mote Carlo simulation is used in Moran.MC(), and
> I
> > borrowed the following codes from this function and tried to see if I can
> > do a permutation for localG():
> >
> > pvals<-matrix(0, nrow = V, ncol = 1)
> > for (i in 1:V){
> >  rankresi<-rank(res[i, ])
> >  xranki <- rankresi[length(res[i, ])]
> >  diffi <- nsim - xranki
> >  diffi <- ifelse(diffi > 0, diffi, 0)
> >  pvali <- punif((diffi + 1)/(nsim + 1))
> >  pvals[i,]<-pvali
> > }
>
> Monte Carlo or Hope type tests or permutation bootstraps work like
> analytical permutation in the global case, and are redundant for that
> reason. Monte Carlo (conditional permutation) in the local case requires
> that there is no global autocorrelation or other mis-specifications.
>
> Your code snippet is not an example, and I think is not how permutation is
> actually done. In the local case in ArcGIS, GeoDa, etc., you fix the value
> of interest at i, randomly re-assign all the values to the other
> observations, and recompute the local indicator at i, doing this say 499
> times. Then you move on to the next i, and repeat. In your snippet, we
> cannot see where res is coming from. Is this an n by nsim matrix from nsim
> runs? Might you be needing to compare the sample values with the observed
> value? If you are outputting Z-values of G_i anyway, you may need to step
> back to the actual G_i (here from spdep):
>
> attr(localG(..., return_internals=TRUE), "internals")[,1]
>
> Then
>
> mns <- apply(res, 1, mean)
> sds <- apply(res, 1, sd)
> permZ <- (obs_localG - mns)/sds
>
> are z values that can be considered as drawn from the normal. However, I
> advise against inference unless the assumptions are met, and p-values must
> be adjusted anyway.
>
> Roger
>
> >
> > After running these codes with several different datasets, I found that
> all
> > the negative Gi*s have very high P values say 0.999 with 999
> permutations,
> > meaning that there are no significant cold spots. Where is the problem?
> How
> > can we do conditional permutation-based inference for localG() with R
> > spdep? I understand the critics of permutation-based inference for local
> > indicators, but I just want to explore this. Thank you!
> >
> > Best
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: inference of local Gi* using permutations

Roger Bivand
Administrator
On Thu, 7 Mar 2019, Jay Wang wrote:

> Thank you for the detailed explanation. Sorry for missing the explanation
> of res. It is an n by nsim+1 matrix where the n columns are the permutation
> results using sample() and the last column is the actual z values of G_i*s.
> The following line of code was modified to do the permutations: (i in
> 1:nsim) res[i] <- moran(sample(x), listw, n, S0, zero.policy)$I. This could
> be wrong since the sample() does not fix the value of interest at i. Are
> there any functions in R can do this in a correct way? How do we calculate
> the p-values after we get the n by nsim matrix? Thank you!

This is an exercise based on the built-in Getis-Ord dataset:

library(spdep)
data(getisord, package="spData")
xycoords <- cbind(xyz$x, xyz$y)
nb30 <- dnearneigh(xycoords, 0, 30)
lw <- nb2listw(nb30, style="B")
G30 <- localG(xyz$val, lw, return_internals=TRUE)
G_obs <- attr(G30, "internals")[,1]
val <- xyz$val
loc_G <- function(x, lw) {
   x_star <- sum(x)
   lx <- lag.listw(lw, x)
   G <- lx/(x_star-c(x))
   G
}
nsim <- 499L
set.seed(1)
res <- matrix(0, nrow=length(val), ncol=nsim)
thisx <- numeric(length(val))
idx <- 1:length(val)
for (j in 1:nsim) {
   for (i in seq(along=val)) {
     thisx[i] <- val[i]
     thisx[!idx %in% i] <- sample(val[-i])
     res[i, j] <- loc_G(thisx, lw)[i]
   }
}
mns <- apply(res, 1, mean)
sds <- apply(res, 1, sd)
G_Zi <- (G_obs - mns)/sds
plot(density(G30), ylim=c(0, 0.25))
lines(density(G_Zi), lty=2, col=2)

As you can see, the analytical z-values of the match the permutations of G
values closely. We're doing a permutation bootstrap on the G values, not
the z-values of G. Next, try pysal (here old pysal < 2):

library(reticulate)
pysal <- import("pysal")
np <- import("numpy")
write.nb.gal(nb30, "nb30.gal")
w <- pysal$open("nb30.gal")$read()
y <- np$array(val)
w$transformation = "B"
lg_30_499 <- pysal$esda$getisord$G_Local(y, w,
   transform="B", permutations=nsim)
lines(density(lg_30_499$z_sim), lty=3, col=3)
all.equal(G_obs, c(lg_30_499$Gs))
all.equal(c(G30), c(lg_30_499$Zs))

The G_Local() function provides both analytical and permutation z-values;
the G values and the analytical z-values agree exactly with spdep, the
permutations are not using the same random number generator or seed, but
are broadly similar. If you read pysal/esda/getisord.py, you'll see that
some indexing is being done, avoiding the double loop. The code is fairly
hard to read, but in optimising the permutation may cut corners; the
self.z_sim is the same code as the simpler R code above.

So while for the local Moran, permutation is demonstrably misleading in
the presence of mis-specification, for local G permutation simply
reproduces the analytical z-values. If you use these to look up normal
p-values, then adjust them for the false discovery rate, usually nothing
is left significant anyway.

Roger


>
>
> On Thu, Mar 7, 2019 at 8:05 AM Roger Bivand <[hidden email]> wrote:
>
>> On Wed, 6 Mar 2019, Jay Wang wrote:
>>
>>> Hello,
>>>
>>> I am currently using the localG () in spdep package, I was wondering if
>> we
>>> can have a conditional permutation-based inference to get the P value for
>>> every Gi*. I saw that a Mote Carlo simulation is used in Moran.MC(), and
>> I
>>> borrowed the following codes from this function and tried to see if I can
>>> do a permutation for localG():
>>>
>>> pvals<-matrix(0, nrow = V, ncol = 1)
>>> for (i in 1:V){
>>>  rankresi<-rank(res[i, ])
>>>  xranki <- rankresi[length(res[i, ])]
>>>  diffi <- nsim - xranki
>>>  diffi <- ifelse(diffi > 0, diffi, 0)
>>>  pvali <- punif((diffi + 1)/(nsim + 1))
>>>  pvals[i,]<-pvali
>>> }
>>
>> Monte Carlo or Hope type tests or permutation bootstraps work like
>> analytical permutation in the global case, and are redundant for that
>> reason. Monte Carlo (conditional permutation) in the local case requires
>> that there is no global autocorrelation or other mis-specifications.
>>
>> Your code snippet is not an example, and I think is not how permutation is
>> actually done. In the local case in ArcGIS, GeoDa, etc., you fix the value
>> of interest at i, randomly re-assign all the values to the other
>> observations, and recompute the local indicator at i, doing this say 499
>> times. Then you move on to the next i, and repeat. In your snippet, we
>> cannot see where res is coming from. Is this an n by nsim matrix from nsim
>> runs? Might you be needing to compare the sample values with the observed
>> value? If you are outputting Z-values of G_i anyway, you may need to step
>> back to the actual G_i (here from spdep):
>>
>> attr(localG(..., return_internals=TRUE), "internals")[,1]
>>
>> Then
>>
>> mns <- apply(res, 1, mean)
>> sds <- apply(res, 1, sd)
>> permZ <- (obs_localG - mns)/sds
>>
>> are z values that can be considered as drawn from the normal. However, I
>> advise against inference unless the assumptions are met, and p-values must
>> be adjusted anyway.
>>
>> Roger
>>
>>>
>>> After running these codes with several different datasets, I found that
>> all
>>> the negative Gi*s have very high P values say 0.999 with 999
>> permutations,
>>> meaning that there are no significant cold spots. Where is the problem?
>> How
>>> can we do conditional permutation-based inference for localG() with R
>>> spdep? I understand the critics of permutation-based inference for local
>>> indicators, but I just want to explore this. Thank you!
>>>
>>> Best
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: [hidden email]
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway