Thank you, Roger for your help. A quick follow-up:

What do you mean when you say "Use one of the approaches described in the tutorial and you may be OK, but you should not trust the outcome of Moran's I on residuals without using an appropriate variant." ? Or in other words, what is an appropriate variant in this context?

Elizabeth

_______________________________________From: Roger Bivand <

[hidden email]>

Sent: Tuesday, August 20, 2019 4:43 PM

To: Elizabeth Webb

Cc:

[hidden email]
Subject: Re: [R-sig-Geo] spatial autocorrelation in GAM residuals for large data set

On Tue, 20 Aug 2019, Elizabeth Webb wrote:

I do hope that you read

https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_web_packages_ape_vignettes_MoranI.pdf&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=tp6IoNz-8y-MBnLZldMpb3wUT_faHdoGMFszkZQxYBU&e=first, because the approach used in ape has been revised.

The main problem is that ape uses by default a square matrix, and it is

uncertain whether sparse matrices are accepted. This means that completely

unneeded computations are carried out - dense matrices should never be

used unless there is a convincing scientific argument (see

https://urldefense.proofpoint.com/v2/url?u=https-3A__edzer.github.io_UseR2019_part2.html-23exercise-2Dreview-2D1&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=PIyJQgsz9qD81VCZsyfJQGdO-Gh6iJNgF2xH9jATbhI&e= for a

development on why distances are wasteful when edge counts on a graph do

the same thing sparsely).

Use one of the approaches described in the tutorial and you may be OK, but

you should not trust the outcome of Moran's I on residuals without using

an appropriate variant. Say you can represent your GAM with a linear model

with say spline terms, you can use Moran's I for regression residuals.

Take care that the average number of neighbours is very small (6-10), and

large numbers of observations should not be a problem.

A larger problem is that Moran's I (also for residuals) also responds to

other mis-specifications than spatial autocorrelation, in particular

missing variables and spatial processes with a different scale from the

units of observation chosen.

>

> So, two questions: (1) Is there a memory efficient way to check for

> spatial autocorrelation using Moran's I in large data sets? or (2) Is

> there another way to check for spatial autocorrelation (besides Moran's

> I) that won't have such memory problems?

1) Yes, see above, do not use dense matrices

2) Consider a higher level MRF term in your GAM for aggregates of your

observations if such aggregation makes sense for your data.

Hope this clarifies,

Roger

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail:

[hidden email]
https://urldefense.proofpoint.com/v2/url?u=https-3A__orcid.org_0000-2D0003-2D2392-2D6140&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=nyvB8TRse_NA-lALeTG3k_KOIVVzaNLMuDAqPo3dyGI&e=https://urldefense.proofpoint.com/v2/url?u=https-3A__scholar.google.no_citations-3Fuser-3DAWeghB0AAAAJ-26hl-3Den&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=UBVoMNMtGxwGxOGDcIJHGAuFm8gqb8X3kqNkGpPVKS4&e= [[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo