

Dear Roger and list members,
I have a ols regression and want to remove spatial
autocorrelation (SAC) from the residuals, in order to avoid its
potential effects of SAC on the hypothesis tests (and the
reviewers/editor). I have generated spatial eigenvectors with
SpatialFiltering(), and added the generated vectors to the
regression. Surprisingly, SAC appears to become more
pronounced. I also tried ME(), but many more vectors are
produced and SAC is also not removed. Isn't including the
vectors from SpatialFiltering() supposed to reduce SAC?
Can you please enlighten me as to what's going on, what I am
doing wrong, or what I should try?
Thanks in advance for you time.
Peter
The data are here:
https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=sharing
The analysis goes like this:
library(spatialreg)
library(spdep)
library(tidyverse)
library(car)
library(ncf)
data < read_csv("for_RAR.csv")
set.seed(12345)
x < data$ses.mntd
y < log(data$RAR)
ols_for_RAR < lm(y ~ x)
## qqplot() and shapiro.test() show residuals are nicely
distributed
# x is significant and Rsquare about 0.2, demonstrated here
car::Anova(ols_for_RAR,type="III")
summary(ols_for_RAR)
# the following appears to make an acceptable neighbor network
c1<c(data$LONG)
c2<c(data$LAT)
cbindForests<cbind(c1,c2)
# a value of 0.7, below, is sufficient to join all the points.
# qualitatively the results aren't affected, as far as I see by
setting this higher
# However, the number of eigenvectors generated by
SpatialFiltering() varies a lot
nbnear4 < dnearneigh(cbindForests, 0, 0.7)
plot(nbnear4, cbindForests, col = "red", pch = 20)
# SAC appears significant at short distances (<10km), which is
what I want to remove
cor_for < correlog(c1, c2, residuals(ols_for_RAR), increment =
1, resamp = 1000, latlon=TRUE, na.rm = TRUE)
plot(cor_for$correlation[1:20],type="s")
# pvalues
print(cor_for$p[which(cor_for$p < 0.05)])
# Moran's I values
cor_for$correlation[which(cor_for$p < 0.05)]
# Generate optimized spatial eigenvectors using SpatialFiltering()
and use them
# Several vectors are generated depending on values in
dnearneigh()
spfilt_mntd_RAR< spatialreg::SpatialFiltering(y ~ x,
nb=nbnear4,style = "W", tol=0.0001, ExactEV = TRUE)
new_mod < lm(y ~ x + fitted(spfilt_mntd_RAR))
car::Anova(new_mod, type="III")
summary(new_mod)
# Plot Moran's I at distances under 20km
cor_for_1c < correlog(c1, c2, residuals(new_mod), increment =
1, resamp = 1000, latlon=TRUE, na.rm = TRUE)
plot(cor_for_1c$correlation[1:20],type="s")
# Extract significant values of Moran's I
cor_for_1c$p[which(cor_for_1c$p < 0.05)]
cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
# The result is that Moran's I is significant at additional short
distances

_+_+_+_+_+_+_+_+_
Peter
B. Pearman
Ikerbasque Research Professor
Laboratory
for Computational Ecology and Evolution
Departamento
de Biología
Vegetal y Ecología
Facultad de Ciencias y Tecnología
Ap.
644
Universidad del País Vasco/ Euskal
Herriko Unibertsitatea
Barrio
Sarriena s/n
48940 Leioa, Bizkaia
SPAIN
Tel.
+34
94 601 8030
Fax
+34 94 601 3500
www.ehu.eus/es/web/bgppermp
When
you
believe in things that you don’t understand
Then
you
suffer

Stevie Wonder
Download
my public encryption key here: https://pgp.mit.edu/
and
please
use it when you send me email.
_______________________________________________
RsigGeo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/rsiggeo


Have you considered the "corSpher" and "corSpatial" functions in
the nlme package?
Spencer Graves
On 20200722 05:09, Peter B. Pearman wrote:
>
> Dear Roger and list members,
>
> I have a ols regression and want to remove spatial autocorrelation
> (SAC) from the residuals, in order to avoid its potential effects of
> SAC on the hypothesis tests (and the reviewers/editor). I have
> generated spatial eigenvectors with SpatialFiltering(), and added the
> generated vectors to the regression. Surprisingly, SAC appears to
> become more pronounced. I also tried ME(), but many more vectors are
> produced and SAC is also not removed. Isn't including the vectors
> from SpatialFiltering() supposed to reduce SAC?
>
> Can you please enlighten me as to what's going on, what I am doing
> wrong, or what I should try?
>
> Thanks in advance for you time.
>
> Peter
>
> The data are here:
>
> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=sharing>
> The analysis goes like this:
>
> library(spatialreg)
> library(spdep)
> library(tidyverse)
> library(car)
> library(ncf)
>
>
> data < read_csv("for_RAR.csv")
> set.seed(12345)
> x < data$ses.mntd
> y < log(data$RAR)
> ols_for_RAR < lm(y ~ x)
> ## qqplot() and shapiro.test() show residuals are nicely distributed
> # x is significant and Rsquare about 0.2, demonstrated here
> car::Anova(ols_for_RAR,type="III")
> summary(ols_for_RAR)
>
> # the following appears to make an acceptable neighbor network
> c1<c(data$LONG)
> c2<c(data$LAT)
> cbindForests<cbind(c1,c2)
> # a value of 0.7, below, is sufficient to join all the points.
> # qualitatively the results aren't affected, as far as I see by
> setting this higher
> # However, the number of eigenvectors generated by SpatialFiltering()
> varies a lot
> nbnear4 < dnearneigh(cbindForests, 0, 0.7)
> plot(nbnear4, cbindForests, col = "red", pch = 20)
>
> # SAC appears significant at short distances (<10km), which is what I
> want to remove
> cor_for < correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
> resamp = 1000, latlon=TRUE, na.rm = TRUE)
> plot(cor_for$correlation[1:20],type="s")
> # pvalues
> print(cor_for$p[which(cor_for$p < 0.05)])
> # Moran's I values
> cor_for$correlation[which(cor_for$p < 0.05)]
>
> # Generate optimized spatial eigenvectors using SpatialFiltering() and
> use them
> # Several vectors are generated depending on values in dnearneigh()
> spfilt_mntd_RAR< spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style
> = "W", tol=0.0001, ExactEV = TRUE)
> new_mod < lm(y ~ x + fitted(spfilt_mntd_RAR))
> car::Anova(new_mod, type="III")
> summary(new_mod)
>
> # Plot Moran's I at distances under 20km
> cor_for_1c < correlog(c1, c2, residuals(new_mod), increment = 1,
> resamp = 1000, latlon=TRUE, na.rm = TRUE)
> plot(cor_for_1c$correlation[1:20],type="s")
>
> # Extract significant values of Moran's I
> cor_for_1c$p[which(cor_for_1c$p < 0.05)]
> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
>
> # The result is that Moran's I is significant at additional short
> distances
> 
>
> _+_+_+_+_+_+_+_+_
>
> Peter B. Pearman
> Ikerbasque Research Professor
>
> Laboratory for Computational Ecology and Evolution
> Departamento de Biología Vegetal y Ecología
> Facultad de Ciencias y Tecnología
> Ap. 644
> Universidad del País Vasco/ Euskal Herriko Unibertsitatea
>
> Barrio Sarriena s/n
> 48940 Leioa, Bizkaia
>
> SPAIN
>
> Tel. +34 94 601 8030
>
> Fax +34 94 601 3500
>
> www.ehu.eus/es/web/bgppermp < http://www.ehu.es/peter.pearman>
>
> When you believe in things that you don’t understand
>
> Then you suffer
>
>  Stevie Wonder
>
>
> Download my public encryption key here: https://pgp.mit.edu/>
> and please use it when you send me email.
>
>
> _______________________________________________
> RsigGeo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/rsiggeo [[alternative HTML version deleted]]
_______________________________________________
RsigGeo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/rsiggeo


It is worth sending this question to [hidden email].
Em qua., 22 de jul. de 2020 às 11:45, Spencer Graves <
[hidden email]> escreveu:
> Have you considered the "corSpher" and "corSpatial" functions in
> the nlme package?
>
>
> Spencer Graves
>
>
> On 20200722 05:09, Peter B. Pearman wrote:
> >
> > Dear Roger and list members,
> >
> > I have a ols regression and want to remove spatial autocorrelation
> > (SAC) from the residuals, in order to avoid its potential effects of
> > SAC on the hypothesis tests (and the reviewers/editor). I have
> > generated spatial eigenvectors with SpatialFiltering(), and added the
> > generated vectors to the regression. Surprisingly, SAC appears to
> > become more pronounced. I also tried ME(), but many more vectors are
> > produced and SAC is also not removed. Isn't including the vectors
> > from SpatialFiltering() supposed to reduce SAC?
> >
> > Can you please enlighten me as to what's going on, what I am doing
> > wrong, or what I should try?
> >
> > Thanks in advance for you time.
> >
> > Peter
> >
> > The data are here:
> >
> >
> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=sharing> >
> > The analysis goes like this:
> >
> > library(spatialreg)
> > library(spdep)
> > library(tidyverse)
> > library(car)
> > library(ncf)
> >
> >
> > data < read_csv("for_RAR.csv")
> > set.seed(12345)
> > x < data$ses.mntd
> > y < log(data$RAR)
> > ols_for_RAR < lm(y ~ x)
> > ## qqplot() and shapiro.test() show residuals are nicely distributed
> > # x is significant and Rsquare about 0.2, demonstrated here
> > car::Anova(ols_for_RAR,type="III")
> > summary(ols_for_RAR)
> >
> > # the following appears to make an acceptable neighbor network
> > c1<c(data$LONG)
> > c2<c(data$LAT)
> > cbindForests<cbind(c1,c2)
> > # a value of 0.7, below, is sufficient to join all the points.
> > # qualitatively the results aren't affected, as far as I see by
> > setting this higher
> > # However, the number of eigenvectors generated by SpatialFiltering()
> > varies a lot
> > nbnear4 < dnearneigh(cbindForests, 0, 0.7)
> > plot(nbnear4, cbindForests, col = "red", pch = 20)
> >
> > # SAC appears significant at short distances (<10km), which is what I
> > want to remove
> > cor_for < correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
> > resamp = 1000, latlon=TRUE, na.rm = TRUE)
> > plot(cor_for$correlation[1:20],type="s")
> > # pvalues
> > print(cor_for$p[which(cor_for$p < 0.05)])
> > # Moran's I values
> > cor_for$correlation[which(cor_for$p < 0.05)]
> >
> > # Generate optimized spatial eigenvectors using SpatialFiltering() and
> > use them
> > # Several vectors are generated depending on values in dnearneigh()
> > spfilt_mntd_RAR< spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style
> > = "W", tol=0.0001, ExactEV = TRUE)
> > new_mod < lm(y ~ x + fitted(spfilt_mntd_RAR))
> > car::Anova(new_mod, type="III")
> > summary(new_mod)
> >
> > # Plot Moran's I at distances under 20km
> > cor_for_1c < correlog(c1, c2, residuals(new_mod), increment = 1,
> > resamp = 1000, latlon=TRUE, na.rm = TRUE)
> > plot(cor_for_1c$correlation[1:20],type="s")
> >
> > # Extract significant values of Moran's I
> > cor_for_1c$p[which(cor_for_1c$p < 0.05)]
> > cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
> >
> > # The result is that Moran's I is significant at additional short
> > distances
> > 
> >
> > _+_+_+_+_+_+_+_+_
> >
> > Peter B. Pearman
> > Ikerbasque Research Professor
> >
> > Laboratory for Computational Ecology and Evolution
> > Departamento de Biología Vegetal y Ecología
> > Facultad de Ciencias y Tecnología
> > Ap. 644
> > Universidad del País Vasco/ Euskal Herriko Unibertsitatea
> >
> > Barrio Sarriena s/n
> > 48940 Leioa, Bizkaia
> >
> > SPAIN
> >
> > Tel. +34 94 601 8030
> >
> > Fax +34 94 601 3500
> >
> > www.ehu.eus/es/web/bgppermp < http://www.ehu.es/peter.pearman>
> >
> > When you believe in things that you don’t understand
> >
> > Then you suffer
> >
> >  Stevie Wonder
> >
> >
> > Download my public encryption key here: https://pgp.mit.edu/> >
> > and please use it when you send me email.
> >
> >
> > _______________________________________________
> > RsigGeo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/rsiggeo>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> RsigGeo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/rsiggeo>
[[alternative HTML version deleted]]
_______________________________________________
RsigGeo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/rsiggeo

Administrator

On Wed, 22 Jul 2020, Peter B. Pearman wrote:
>
> Dear Roger and list members,
>
> I have a ols regression and want to remove spatial autocorrelation (SAC)
> from the residuals, in order to avoid its potential effects of SAC on
> the hypothesis tests (and the reviewers/editor). I have generated
> spatial eigenvectors with SpatialFiltering(), and added the generated
> vectors to the regression. Surprisingly, SAC appears to become more
> pronounced. I also tried ME(), but many more vectors are produced and
> SAC is also not removed. Isn't including the vectors from
> SpatialFiltering() supposed to reduce SAC?
>
> Can you please enlighten me as to what's going on, what I am doing
> wrong, or what I should try?
You are using point support (your forests are at points not areas, so that
the observations are not contiguous). The advice to consider this as a
mixed model problem may be relevant. It is certainly the case that
ncf::correlog() should not be used for regression residuals.
Further, your data are in a band 1.54.3E, 43.043.4N, so using 0.7
degrees in dnearneigh() was a risky choice, and squashes the data. If you
coerce to an sf object and apply an appropriate CRS, you get many fewer
neighbours on average.
library(sf)
data < st_as_sf(data, coords=c("LONG", "LAT"))
(nbnear4 < dnearneigh(data, 0, 0.7))
st_crs(data) < 4326
(nbnear4a < dnearneigh(data, 0, 27))
# or project to a relevant planar UTM 31N spec.
data_utm31 < st_transform(data, st_crs(32631))
(nbd < dnearneigh(data_utm31, 0, 27))
and so on. SpatialFiltering() and ME() test against global residual
autocorrelation in finding candidate eigenvectors, and running
lm.morantest() on the fitted model shows that there is no global
autocorrelation as intended (because positive and negative local
autocorrelation cancels out). I think that the underlying problems are
mixing approaches that are not asking the same questions, and in too
little care in choosing the weights.
Hope this clarifies,
Roger
>
> Thanks in advance for you time.
>
> Peter
>
> The data are here:
>
> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=> sharing
>
> The analysis goes like this:
>
> library(spatialreg)
> library(spdep)
> library(tidyverse)
> library(car)
> library(ncf)
>
>
> data < read_csv("for_RAR.csv")
> set.seed(12345)
> x < data$ses.mntd
> y < log(data$RAR)
> ols_for_RAR < lm(y ~ x)
> ## qqplot() and shapiro.test() show residuals are nicely distributed
> # x is significant and Rsquare about 0.2, demonstrated here
> car::Anova(ols_for_RAR,type="III")
> summary(ols_for_RAR)
>
> # the following appears to make an acceptable neighbor network
> c1<c(data$LONG)
> c2<c(data$LAT)
> cbindForests<cbind(c1,c2)
> # a value of 0.7, below, is sufficient to join all the points.
> # qualitatively the results aren't affected, as far as I see by setting this
> higher
> # However, the number of eigenvectors generated by SpatialFiltering() varies
> a lot
> nbnear4 < dnearneigh(cbindForests, 0, 0.7)
> plot(nbnear4, cbindForests, col = "red", pch = 20)
>
> # SAC appears significant at short distances (<10km), which is what I want
> to remove
> cor_for < correlog(c1, c2, residuals(ols_for_RAR), increment = 1, resamp =
> 1000, latlon=TRUE, na.rm = TRUE)
> plot(cor_for$correlation[1:20],type="s")
> # pvalues
> print(cor_for$p[which(cor_for$p < 0.05)])
> # Moran's I values
> cor_for$correlation[which(cor_for$p < 0.05)]
>
> # Generate optimized spatial eigenvectors using SpatialFiltering() and use
> them
> # Several vectors are generated depending on values in dnearneigh()
> spfilt_mntd_RAR< spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style =
> "W", tol=0.0001, ExactEV = TRUE)
> new_mod < lm(y ~ x + fitted(spfilt_mntd_RAR))
> car::Anova(new_mod, type="III")
> summary(new_mod)
>
> # Plot Moran's I at distances under 20km
> cor_for_1c < correlog(c1, c2, residuals(new_mod), increment = 1, resamp =
> 1000, latlon=TRUE, na.rm = TRUE)
> plot(cor_for_1c$correlation[1:20],type="s")
>
> # Extract significant values of Moran's I
> cor_for_1c$p[which(cor_for_1c$p < 0.05)]
> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
>
> # The result is that Moran's I is significant at additional short distances
> 
>
> _+_+_+_+_+_+_+_+_
>
> Peter B. Pearman
> Ikerbasque Research Professor
>
> Laboratory for Computational Ecology and Evolution
> Departamento de Biología Vegetal y Ecología
> Facultad de Ciencias y Tecnología
> Ap. 644
> Universidad del País Vasco/ Euskal Herriko Unibertsitatea
>
> Barrio Sarriena s/n
> 48940 Leioa, Bizkaia
>
> SPAIN
>
> Tel. +34 94 601 8030
>
> Fax +34 94 601 3500
>
> www.ehu.eus/es/web/bgppermp
>
>
>
> When you believe in things that you don’t understand
>
> Then you suffer
>
>  Stevie Wonder
>
>
> Download my public encryption key here: https://pgp.mit.edu/>
> and please use it when you send me email.
>
>
>
>
>

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N5045 Bergen, Norway.
voice: +47 55 95 93 55; email: [hidden email]
https://orcid.org/0000000323926140https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________
RsigGeo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N5045 Bergen, Norway


Dear Roger,
Why ncf::correlog() should not be used for regression residuals in this
case?
Could you clarify this to me, please?
Thank you
Best,
Vinicius
Em qui., 23 de jul. de 2020 às 13:52, Roger Bivand < [hidden email]>
escreveu:
> On Wed, 22 Jul 2020, Peter B. Pearman wrote:
>
> >
> > Dear Roger and list members,
> >
> > I have a ols regression and want to remove spatial autocorrelation (SAC)
> > from the residuals, in order to avoid its potential effects of SAC on
> > the hypothesis tests (and the reviewers/editor). I have generated
> > spatial eigenvectors with SpatialFiltering(), and added the generated
> > vectors to the regression. Surprisingly, SAC appears to become more
> > pronounced. I also tried ME(), but many more vectors are produced and
> > SAC is also not removed. Isn't including the vectors from
> > SpatialFiltering() supposed to reduce SAC?
> >
> > Can you please enlighten me as to what's going on, what I am doing
> > wrong, or what I should try?
>
> You are using point support (your forests are at points not areas, so that
> the observations are not contiguous). The advice to consider this as a
> mixed model problem may be relevant. It is certainly the case that
> ncf::correlog() should not be used for regression residuals.
>
> Further, your data are in a band 1.54.3E, 43.043.4N, so using 0.7
> degrees in dnearneigh() was a risky choice, and squashes the data. If you
> coerce to an sf object and apply an appropriate CRS, you get many fewer
> neighbours on average.
>
> library(sf)
> data < st_as_sf(data, coords=c("LONG", "LAT"))
> (nbnear4 < dnearneigh(data, 0, 0.7))
> st_crs(data) < 4326
> (nbnear4a < dnearneigh(data, 0, 27))
> # or project to a relevant planar UTM 31N spec.
> data_utm31 < st_transform(data, st_crs(32631))
> (nbd < dnearneigh(data_utm31, 0, 27))
>
> and so on. SpatialFiltering() and ME() test against global residual
> autocorrelation in finding candidate eigenvectors, and running
> lm.morantest() on the fitted model shows that there is no global
> autocorrelation as intended (because positive and negative local
> autocorrelation cancels out). I think that the underlying problems are
> mixing approaches that are not asking the same questions, and in too
> little care in choosing the weights.
>
> Hope this clarifies,
>
> Roger
>
> >
> > Thanks in advance for you time.
> >
> > Peter
> >
> > The data are here:
> >
> >
> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=> > sharing
> >
> > The analysis goes like this:
> >
> > library(spatialreg)
> > library(spdep)
> > library(tidyverse)
> > library(car)
> > library(ncf)
> >
> >
> > data < read_csv("for_RAR.csv")
> > set.seed(12345)
> > x < data$ses.mntd
> > y < log(data$RAR)
> > ols_for_RAR < lm(y ~ x)
> > ## qqplot() and shapiro.test() show residuals are nicely distributed
> > # x is significant and Rsquare about 0.2, demonstrated here
> > car::Anova(ols_for_RAR,type="III")
> > summary(ols_for_RAR)
> >
> > # the following appears to make an acceptable neighbor network
> > c1<c(data$LONG)
> > c2<c(data$LAT)
> > cbindForests<cbind(c1,c2)
> > # a value of 0.7, below, is sufficient to join all the points.
> > # qualitatively the results aren't affected, as far as I see by setting
> this
> > higher
> > # However, the number of eigenvectors generated by SpatialFiltering()
> varies
> > a lot
> > nbnear4 < dnearneigh(cbindForests, 0, 0.7)
> > plot(nbnear4, cbindForests, col = "red", pch = 20)
> >
> > # SAC appears significant at short distances (<10km), which is what I
> want
> > to remove
> > cor_for < correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
> resamp =
> > 1000, latlon=TRUE, na.rm = TRUE)
> > plot(cor_for$correlation[1:20],type="s")
> > # pvalues
> > print(cor_for$p[which(cor_for$p < 0.05)])
> > # Moran's I values
> > cor_for$correlation[which(cor_for$p < 0.05)]
> >
> > # Generate optimized spatial eigenvectors using SpatialFiltering() and
> use
> > them
> > # Several vectors are generated depending on values in dnearneigh()
> > spfilt_mntd_RAR< spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style =
> > "W", tol=0.0001, ExactEV = TRUE)
> > new_mod < lm(y ~ x + fitted(spfilt_mntd_RAR))
> > car::Anova(new_mod, type="III")
> > summary(new_mod)
> >
> > # Plot Moran's I at distances under 20km
> > cor_for_1c < correlog(c1, c2, residuals(new_mod), increment = 1, resamp
> =
> > 1000, latlon=TRUE, na.rm = TRUE)
> > plot(cor_for_1c$correlation[1:20],type="s")
> >
> > # Extract significant values of Moran's I
> > cor_for_1c$p[which(cor_for_1c$p < 0.05)]
> > cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
> >
> > # The result is that Moran's I is significant at additional short
> distances
> > 
> >
> > _+_+_+_+_+_+_+_+_
> >
> > Peter B. Pearman
> > Ikerbasque Research Professor
> >
> > Laboratory for Computational Ecology and Evolution
> > Departamento de Biología Vegetal y Ecología
> > Facultad de Ciencias y Tecnología
> > Ap. 644
> > Universidad del País Vasco/ Euskal Herriko Unibertsitatea
> >
> > Barrio Sarriena s/n
> > 48940 Leioa, Bizkaia
> >
> > SPAIN
> >
> > Tel. +34 94 601 8030
> >
> > Fax +34 94 601 3500
> >
> > www.ehu.eus/es/web/bgppermp
> >
> >
> >
> > When you believe in things that you don’t understand
> >
> > Then you suffer
> >
> >  Stevie Wonder
> >
> >
> > Download my public encryption key here: https://pgp.mit.edu/> >
> > and please use it when you send me email.
> >
> >
> >
> >
> >
>
> 
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N5045 Bergen, Norway.
> voice: +47 55 95 93 55; email: [hidden email]
> https://orcid.org/0000000323926140> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en> _______________________________________________
> RsigGeo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/rsiggeo>
[[alternative HTML version deleted]]
_______________________________________________
RsigGeo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/rsiggeo

Administrator

On Thu, 23 Jul 2020, Vinicius Maia wrote:
> Dear Roger,
>
> Why ncf::correlog() should not be used for regression residuals in this
> case?
>
> Could you clarify this to me, please?
I do not have my copy of Cliff & Ord (1973) to hand. The value of Moran's
I is the same, but the inferences will be affected by changes in the
expectation and variance (under Normality):
library(spdep)
columbus < st_read(system.file("shapes/columbus.shp",
package="spData")[1])
col.gal.nb < read.gal(system.file("weights/columbus.gal",
package="spData")[1])
OLS < lm(CRIME ~ INC + HOVAL, data=columbus)
lm.morantest(OLS, listw=nb2listw(col.gal.nb))
moran.test(residuals(OLS), listw=nb2listw(col.gal.nb),
randomisation=FALSE)
This is because standard regression assumptions affect how we view the
regression residuals, details in Cliff & Ord. See for example the code in
moran.test() for E(I) (var(I) is similar):
EI < (1)/wc$n1
where wc$n1 is n  1 from spweights.constants(). In the lm.morantest()
case:
XtXinv < chol2inv(model$qr$qr[p1, p1, drop = FALSE])
X < model.matrix(terms(model), model.frame(model))
if (length(nacoefs) > 0L)
X < X[, nacoefs]
if (!is.null(wts < weights(model))) {
X < drop(t(sapply(1:length(wts), function(i) sqrt(wts[i]) *
X[i, ])))
}
Z < lag.listw(listw.U, X, zero.policy = zero.policy)
C1 < t(X) %*% Z
trA < (sum(diag(XtXinv %*% C1)))
EI < ((N * trA)/((N  p) * S0))
and S_0 is the sum of weights (N if row standardised), p is the number
columns of X, and A = (X'X)^{1} X'WX. For the distance band case, you
also need to adjust N for the count of pairs of neighbours within that
band; in moran.test() you see n < as.double(length(which(cards > 0))) by
default, where cards is a vector of neighbour counts; in lm.morantest()
the equivalent by default is
N < as.double(length(which(card(listw$neighbours) > 0L)))
My presumption is that ncf::correlog() is for input variables, not
regression residuals, as there is no argument to pass though the fitted
model object from which to extract the model matrix or (X'X)^{1}.
Further, resampling regression residuals may be unsafe for similar
reasons; the draws should not be from the residuals, I think.
Recall that the code is shown by typing the function name, and this best
documents what is going on.
Again, refer to Cliff & Ord 1973 for the full development.
Roger
>
> Thank you
>
> Best,
>
> Vinicius
>
> Em qui., 23 de jul. de 2020 às 13:52, Roger Bivand < [hidden email]>
> escreveu:
>
>> On Wed, 22 Jul 2020, Peter B. Pearman wrote:
>>
>>>
>>> Dear Roger and list members,
>>>
>>> I have a ols regression and want to remove spatial autocorrelation (SAC)
>>> from the residuals, in order to avoid its potential effects of SAC on
>>> the hypothesis tests (and the reviewers/editor). I have generated
>>> spatial eigenvectors with SpatialFiltering(), and added the generated
>>> vectors to the regression. Surprisingly, SAC appears to become more
>>> pronounced. I also tried ME(), but many more vectors are produced and
>>> SAC is also not removed. Isn't including the vectors from
>>> SpatialFiltering() supposed to reduce SAC?
>>>
>>> Can you please enlighten me as to what's going on, what I am doing
>>> wrong, or what I should try?
>>
>> You are using point support (your forests are at points not areas, so that
>> the observations are not contiguous). The advice to consider this as a
>> mixed model problem may be relevant. It is certainly the case that
>> ncf::correlog() should not be used for regression residuals.
>>
>> Further, your data are in a band 1.54.3E, 43.043.4N, so using 0.7
>> degrees in dnearneigh() was a risky choice, and squashes the data. If you
>> coerce to an sf object and apply an appropriate CRS, you get many fewer
>> neighbours on average.
>>
>> library(sf)
>> data < st_as_sf(data, coords=c("LONG", "LAT"))
>> (nbnear4 < dnearneigh(data, 0, 0.7))
>> st_crs(data) < 4326
>> (nbnear4a < dnearneigh(data, 0, 27))
>> # or project to a relevant planar UTM 31N spec.
>> data_utm31 < st_transform(data, st_crs(32631))
>> (nbd < dnearneigh(data_utm31, 0, 27))
>>
>> and so on. SpatialFiltering() and ME() test against global residual
>> autocorrelation in finding candidate eigenvectors, and running
>> lm.morantest() on the fitted model shows that there is no global
>> autocorrelation as intended (because positive and negative local
>> autocorrelation cancels out). I think that the underlying problems are
>> mixing approaches that are not asking the same questions, and in too
>> little care in choosing the weights.
>>
>> Hope this clarifies,
>>
>> Roger
>>
>>>
>>> Thanks in advance for you time.
>>>
>>> Peter
>>>
>>> The data are here:
>>>
>>>
>> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=>>> sharing
>>>
>>> The analysis goes like this:
>>>
>>> library(spatialreg)
>>> library(spdep)
>>> library(tidyverse)
>>> library(car)
>>> library(ncf)
>>>
>>>
>>> data < read_csv("for_RAR.csv")
>>> set.seed(12345)
>>> x < data$ses.mntd
>>> y < log(data$RAR)
>>> ols_for_RAR < lm(y ~ x)
>>> ## qqplot() and shapiro.test() show residuals are nicely distributed
>>> # x is significant and Rsquare about 0.2, demonstrated here
>>> car::Anova(ols_for_RAR,type="III")
>>> summary(ols_for_RAR)
>>>
>>> # the following appears to make an acceptable neighbor network
>>> c1<c(data$LONG)
>>> c2<c(data$LAT)
>>> cbindForests<cbind(c1,c2)
>>> # a value of 0.7, below, is sufficient to join all the points.
>>> # qualitatively the results aren't affected, as far as I see by setting
>> this
>>> higher
>>> # However, the number of eigenvectors generated by SpatialFiltering()
>> varies
>>> a lot
>>> nbnear4 < dnearneigh(cbindForests, 0, 0.7)
>>> plot(nbnear4, cbindForests, col = "red", pch = 20)
>>>
>>> # SAC appears significant at short distances (<10km), which is what I
>> want
>>> to remove
>>> cor_for < correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
>> resamp =
>>> 1000, latlon=TRUE, na.rm = TRUE)
>>> plot(cor_for$correlation[1:20],type="s")
>>> # pvalues
>>> print(cor_for$p[which(cor_for$p < 0.05)])
>>> # Moran's I values
>>> cor_for$correlation[which(cor_for$p < 0.05)]
>>>
>>> # Generate optimized spatial eigenvectors using SpatialFiltering() and
>> use
>>> them
>>> # Several vectors are generated depending on values in dnearneigh()
>>> spfilt_mntd_RAR< spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style =
>>> "W", tol=0.0001, ExactEV = TRUE)
>>> new_mod < lm(y ~ x + fitted(spfilt_mntd_RAR))
>>> car::Anova(new_mod, type="III")
>>> summary(new_mod)
>>>
>>> # Plot Moran's I at distances under 20km
>>> cor_for_1c < correlog(c1, c2, residuals(new_mod), increment = 1, resamp
>> =
>>> 1000, latlon=TRUE, na.rm = TRUE)
>>> plot(cor_for_1c$correlation[1:20],type="s")
>>>
>>> # Extract significant values of Moran's I
>>> cor_for_1c$p[which(cor_for_1c$p < 0.05)]
>>> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
>>>
>>> # The result is that Moran's I is significant at additional short
>> distances
>>> 
>>>
>>> _+_+_+_+_+_+_+_+_
>>>
>>> Peter B. Pearman
>>> Ikerbasque Research Professor
>>>
>>> Laboratory for Computational Ecology and Evolution
>>> Departamento de Biología Vegetal y Ecología
>>> Facultad de Ciencias y Tecnología
>>> Ap. 644
>>> Universidad del País Vasco/ Euskal Herriko Unibertsitatea
>>>
>>> Barrio Sarriena s/n
>>> 48940 Leioa, Bizkaia
>>>
>>> SPAIN
>>>
>>> Tel. +34 94 601 8030
>>>
>>> Fax +34 94 601 3500
>>>
>>> www.ehu.eus/es/web/bgppermp
>>>
>>>
>>>
>>> When you believe in things that you don’t understand
>>>
>>> Then you suffer
>>>
>>>  Stevie Wonder
>>>
>>>
>>> Download my public encryption key here: https://pgp.mit.edu/>>>
>>> and please use it when you send me email.
>>>
>>>
>>>
>>>
>>>
>>
>> 
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N5045 Bergen, Norway.
>> voice: +47 55 95 93 55; email: [hidden email]
>> https://orcid.org/0000000323926140>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en>> _______________________________________________
>> RsigGeo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/rsiggeo>>
>

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N5045 Bergen, Norway.
voice: +47 55 95 93 55; email: [hidden email]
https://orcid.org/0000000323926140https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________
RsigGeo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N5045 Bergen, Norway


Hi Roger,
Thank you for this clarification, it will certainly be useful for me and
others.
Best,
Vinicius
Em sáb., 25 de jul. de 2020 às 08:41, Roger Bivand < [hidden email]>
escreveu:
> On Thu, 23 Jul 2020, Vinicius Maia wrote:
>
> > Dear Roger,
> >
> > Why ncf::correlog() should not be used for regression residuals in this
> > case?
> >
> > Could you clarify this to me, please?
>
> I do not have my copy of Cliff & Ord (1973) to hand. The value of Moran's
> I is the same, but the inferences will be affected by changes in the
> expectation and variance (under Normality):
>
> library(spdep)
> columbus < st_read(system.file("shapes/columbus.shp",
> package="spData")[1])
> col.gal.nb < read.gal(system.file("weights/columbus.gal",
> package="spData")[1])
> OLS < lm(CRIME ~ INC + HOVAL, data=columbus)
> lm.morantest(OLS, listw=nb2listw(col.gal.nb))
> moran.test(residuals(OLS), listw=nb2listw(col.gal.nb),
> randomisation=FALSE)
>
> This is because standard regression assumptions affect how we view the
> regression residuals, details in Cliff & Ord. See for example the code in
> moran.test() for E(I) (var(I) is similar):
>
> EI < (1)/wc$n1
>
> where wc$n1 is n  1 from spweights.constants(). In the lm.morantest()
> case:
>
> XtXinv < chol2inv(model$qr$qr[p1, p1, drop = FALSE])
> X < model.matrix(terms(model), model.frame(model))
> if (length(nacoefs) > 0L)
> X < X[, nacoefs]
> if (!is.null(wts < weights(model))) {
> X < drop(t(sapply(1:length(wts), function(i) sqrt(wts[i]) *
> X[i, ])))
> }
> Z < lag.listw(listw.U, X, zero.policy = zero.policy)
> C1 < t(X) %*% Z
> trA < (sum(diag(XtXinv %*% C1)))
> EI < ((N * trA)/((N  p) * S0))
>
> and S_0 is the sum of weights (N if row standardised), p is the number
> columns of X, and A = (X'X)^{1} X'WX. For the distance band case, you
> also need to adjust N for the count of pairs of neighbours within that
> band; in moran.test() you see n < as.double(length(which(cards > 0))) by
> default, where cards is a vector of neighbour counts; in lm.morantest()
> the equivalent by default is
> N < as.double(length(which(card(listw$neighbours) > 0L)))
>
> My presumption is that ncf::correlog() is for input variables, not
> regression residuals, as there is no argument to pass though the fitted
> model object from which to extract the model matrix or (X'X)^{1}.
> Further, resampling regression residuals may be unsafe for similar
> reasons; the draws should not be from the residuals, I think.
>
> Recall that the code is shown by typing the function name, and this best
> documents what is going on.
>
> Again, refer to Cliff & Ord 1973 for the full development.
>
> Roger
>
>
> >
> > Thank you
> >
> > Best,
> >
> > Vinicius
> >
> > Em qui., 23 de jul. de 2020 às 13:52, Roger Bivand < [hidden email]>
> > escreveu:
> >
> >> On Wed, 22 Jul 2020, Peter B. Pearman wrote:
> >>
> >>>
> >>> Dear Roger and list members,
> >>>
> >>> I have a ols regression and want to remove spatial autocorrelation
> (SAC)
> >>> from the residuals, in order to avoid its potential effects of SAC on
> >>> the hypothesis tests (and the reviewers/editor). I have generated
> >>> spatial eigenvectors with SpatialFiltering(), and added the generated
> >>> vectors to the regression. Surprisingly, SAC appears to become more
> >>> pronounced. I also tried ME(), but many more vectors are produced and
> >>> SAC is also not removed. Isn't including the vectors from
> >>> SpatialFiltering() supposed to reduce SAC?
> >>>
> >>> Can you please enlighten me as to what's going on, what I am doing
> >>> wrong, or what I should try?
> >>
> >> You are using point support (your forests are at points not areas, so
> that
> >> the observations are not contiguous). The advice to consider this as a
> >> mixed model problem may be relevant. It is certainly the case that
> >> ncf::correlog() should not be used for regression residuals.
> >>
> >> Further, your data are in a band 1.54.3E, 43.043.4N, so using 0.7
> >> degrees in dnearneigh() was a risky choice, and squashes the data. If
> you
> >> coerce to an sf object and apply an appropriate CRS, you get many fewer
> >> neighbours on average.
> >>
> >> library(sf)
> >> data < st_as_sf(data, coords=c("LONG", "LAT"))
> >> (nbnear4 < dnearneigh(data, 0, 0.7))
> >> st_crs(data) < 4326
> >> (nbnear4a < dnearneigh(data, 0, 27))
> >> # or project to a relevant planar UTM 31N spec.
> >> data_utm31 < st_transform(data, st_crs(32631))
> >> (nbd < dnearneigh(data_utm31, 0, 27))
> >>
> >> and so on. SpatialFiltering() and ME() test against global residual
> >> autocorrelation in finding candidate eigenvectors, and running
> >> lm.morantest() on the fitted model shows that there is no global
> >> autocorrelation as intended (because positive and negative local
> >> autocorrelation cancels out). I think that the underlying problems are
> >> mixing approaches that are not asking the same questions, and in too
> >> little care in choosing the weights.
> >>
> >> Hope this clarifies,
> >>
> >> Roger
> >>
> >>>
> >>> Thanks in advance for you time.
> >>>
> >>> Peter
> >>>
> >>> The data are here:
> >>>
> >>>
> >>
> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=> >>> sharing
> >>>
> >>> The analysis goes like this:
> >>>
> >>> library(spatialreg)
> >>> library(spdep)
> >>> library(tidyverse)
> >>> library(car)
> >>> library(ncf)
> >>>
> >>>
> >>> data < read_csv("for_RAR.csv")
> >>> set.seed(12345)
> >>> x < data$ses.mntd
> >>> y < log(data$RAR)
> >>> ols_for_RAR < lm(y ~ x)
> >>> ## qqplot() and shapiro.test() show residuals are nicely distributed
> >>> # x is significant and Rsquare about 0.2, demonstrated here
> >>> car::Anova(ols_for_RAR,type="III")
> >>> summary(ols_for_RAR)
> >>>
> >>> # the following appears to make an acceptable neighbor network
> >>> c1<c(data$LONG)
> >>> c2<c(data$LAT)
> >>> cbindForests<cbind(c1,c2)
> >>> # a value of 0.7, below, is sufficient to join all the points.
> >>> # qualitatively the results aren't affected, as far as I see by setting
> >> this
> >>> higher
> >>> # However, the number of eigenvectors generated by SpatialFiltering()
> >> varies
> >>> a lot
> >>> nbnear4 < dnearneigh(cbindForests, 0, 0.7)
> >>> plot(nbnear4, cbindForests, col = "red", pch = 20)
> >>>
> >>> # SAC appears significant at short distances (<10km), which is what I
> >> want
> >>> to remove
> >>> cor_for < correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
> >> resamp =
> >>> 1000, latlon=TRUE, na.rm = TRUE)
> >>> plot(cor_for$correlation[1:20],type="s")
> >>> # pvalues
> >>> print(cor_for$p[which(cor_for$p < 0.05)])
> >>> # Moran's I values
> >>> cor_for$correlation[which(cor_for$p < 0.05)]
> >>>
> >>> # Generate optimized spatial eigenvectors using SpatialFiltering() and
> >> use
> >>> them
> >>> # Several vectors are generated depending on values in dnearneigh()
> >>> spfilt_mntd_RAR< spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style
> =
> >>> "W", tol=0.0001, ExactEV = TRUE)
> >>> new_mod < lm(y ~ x + fitted(spfilt_mntd_RAR))
> >>> car::Anova(new_mod, type="III")
> >>> summary(new_mod)
> >>>
> >>> # Plot Moran's I at distances under 20km
> >>> cor_for_1c < correlog(c1, c2, residuals(new_mod), increment = 1,
> resamp
> >> =
> >>> 1000, latlon=TRUE, na.rm = TRUE)
> >>> plot(cor_for_1c$correlation[1:20],type="s")
> >>>
> >>> # Extract significant values of Moran's I
> >>> cor_for_1c$p[which(cor_for_1c$p < 0.05)]
> >>> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
> >>>
> >>> # The result is that Moran's I is significant at additional short
> >> distances
> >>> 
> >>>
> >>> _+_+_+_+_+_+_+_+_
> >>>
> >>> Peter B. Pearman
> >>> Ikerbasque Research Professor
> >>>
> >>> Laboratory for Computational Ecology and Evolution
> >>> Departamento de Biología Vegetal y Ecología
> >>> Facultad de Ciencias y Tecnología
> >>> Ap. 644
> >>> Universidad del País Vasco/ Euskal Herriko Unibertsitatea
> >>>
> >>> Barrio Sarriena s/n
> >>> 48940 Leioa, Bizkaia
> >>>
> >>> SPAIN
> >>>
> >>> Tel. +34 94 601 8030
> >>>
> >>> Fax +34 94 601 3500
> >>>
> >>> www.ehu.eus/es/web/bgppermp
> >>>
> >>>
> >>>
> >>> When you believe in things that you don’t understand
> >>>
> >>> Then you suffer
> >>>
> >>>  Stevie Wonder
> >>>
> >>>
> >>> Download my public encryption key here: https://pgp.mit.edu/> >>>
> >>> and please use it when you send me email.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> 
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N5045 Bergen, Norway.
> >> voice: +47 55 95 93 55; email: [hidden email]
> >> https://orcid.org/0000000323926140> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en> >> _______________________________________________
> >> RsigGeo mailing list
> >> [hidden email]
> >> https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>
> >
>
> 
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N5045 Bergen, Norway.
> voice: +47 55 95 93 55; email: [hidden email]
> https://orcid.org/0000000323926140> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en [[alternative HTML version deleted]]
_______________________________________________
RsigGeo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/rsiggeo

