Inference of local Gi*

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Inference of local Gi*

Anaïs Ladoy
Dear list members,

I'm currently working on a point dataset, from which I want to conduct
a Hot Spot Analysis with local Gi* statistics (Getis-Ord).

I'm trying to find a way of computing its significance. I see two ways
of computing significance in this case:

1) Compare the obtained local Gi from spdep::localG to a normal
distribution. But here I have several questions :
a) In my first case study (BMI value of 15 000 participants in a cohort
study), the distribution of local Gi is far from normal (it is bimodal
with a mode around very negative values and a mode around 0). However,
I do need a normal distribution of Gi in order to compare it with a
normal distribution, right? Or am I missing something here? What should
I do in this case?
b) In my second case study (Years of life lost for 30 000 individuals),
the distribution of Gi returned by spdep::localG is approximately
normal but the standard deviation is far from 1. In fact, in
spdep::localG, the Gi values are supposedly standardized (from what I
understood using an analytical mean and variance). Should I use these
to compare to a normal distribution, or should I use raw G values
(using return_internals=TRUE) and standardize them with the observed
mean and variance of Gi? Does it cause a problem that my observed
variance does not match the analytical variance?

2) Compute permutations
However this is not implemented in R for localG. I tried using PySAL
but the initial file is big and the weight file is huge, and my
computer crashes. Any thoughts to solve this issue?

Thank you for any feedback.
Kind regards,
Anaïs

--
Anaïs Ladoy
PhD student, Laboratory of Geographic Information Systems, Swiss
Federal Institute of Technology in Lausanne (EPFL), Switzerland.

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Inference of local Gi*

Jose Ramon Martinez Batlle
Dear Anaïs.

I am sure more experienced members will give you a better answer, but until
that I will try to help.

1) If I understood correctly, the spatial objects have 15 000 and 30 000
points in each case study, respectively. If this is the case, I am afraid
that nb objects of such large datasets surely would have an impact on the
system performance when used in subsequent tasks. The best I can suggest is
to try some sort of spatial binning if possible (e.g. hexbins), but at the
same time accounting for the modifiable areal unit problem.

2) The spdep:localG help page states that "For inference, a Bonferroni-type
test is suggested in the references, where tables of critical values may be
found". The source mentioned is free access, and can be found here:

Ord, J. K. and Getis, A. 1995 Local spatial autocorrelation statistics:
distributional issues and an application. Geographical Analysis, 27, 286–306
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1538-4632.1995.tb00912.x

Standard measures (critical values) for selected percentiles and number of
entities, are included in Table 3 of the cited reference. Since the values
returned from localG are Z-values, you can use them to determine whether
the critical value chosen is exceeded and thus infer significant local
spatial association for each entity.

Kind regards.
José

El vie., 24 abr. 2020 a las 14:00, Anaïs Ladoy (<[hidden email]>)
escribió:

> Dear list members,
>
> I'm currently working on a point dataset, from which I want to conduct
> a Hot Spot Analysis with local Gi* statistics (Getis-Ord).
>
> I'm trying to find a way of computing its significance. I see two ways
> of computing significance in this case:
>
> 1) Compare the obtained local Gi from spdep::localG to a normal
> distribution. But here I have several questions :
> a) In my first case study (BMI value of 15 000 participants in a cohort
> study), the distribution of local Gi is far from normal (it is bimodal
> with a mode around very negative values and a mode around 0). However,
> I do need a normal distribution of Gi in order to compare it with a
> normal distribution, right? Or am I missing something here? What should
> I do in this case?
> b) In my second case study (Years of life lost for 30 000 individuals),
> the distribution of Gi returned by spdep::localG is approximately
> normal but the standard deviation is far from 1. In fact, in
> spdep::localG, the Gi values are supposedly standardized (from what I
> understood using an analytical mean and variance). Should I use these
> to compare to a normal distribution, or should I use raw G values
> (using return_internals=TRUE) and standardize them with the observed
> mean and variance of Gi? Does it cause a problem that my observed
> variance does not match the analytical variance?
>
> 2) Compute permutations
> However this is not implemented in R for localG. I tried using PySAL
> but the initial file is big and the weight file is huge, and my
> computer crashes. Any thoughts to solve this issue?
>
> Thank you for any feedback.
> Kind regards,
> Anaïs
>
> --
> Anaïs Ladoy
> PhD student, Laboratory of Geographic Information Systems, Swiss
> Federal Institute of Technology in Lausanne (EPFL), Switzerland.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>


--
*José Ramón Martínez Batlle*
*Investigador/Profesor Universidad Autónoma de Santo Domingo (UASD)*
Correo electrónico: [hidden email]
Página web: http://geografiafisica.org

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Inference of local Gi*

Roger Bivand
Administrator
On Sat, 25 Apr 2020, Jose Ramon Martinez Batlle wrote:

> Dear Anaïs.
>
> I am sure more experienced members will give you a better answer, but until
> that I will try to help.
>
> 1) If I understood correctly, the spatial objects have 15 000 and 30 000
> points in each case study, respectively. If this is the case, I am afraid
> that nb objects of such large datasets surely would have an impact on the
> system performance when used in subsequent tasks. The best I can suggest is
> to try some sort of spatial binning if possible (e.g. hexbins), but at the
> same time accounting for the modifiable areal unit problem.
>
> 2) The spdep:localG help page states that "For inference, a Bonferroni-type
> test is suggested in the references, where tables of critical values may be
> found". The source mentioned is free access, and can be found here:
>
> Ord, J. K. and Getis, A. 1995 Local spatial autocorrelation statistics:
> distributional issues and an application. Geographical Analysis, 27, 286–306
> https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1538-4632.1995.tb00912.x
>
> Standard measures (critical values) for selected percentiles and number of
> entities, are included in Table 3 of the cited reference. Since the values
> returned from localG are Z-values, you can use them to determine whether
> the critical value chosen is exceeded and thus infer significant local
> spatial association for each entity.
Thanks, José, you are quite correct that false discovery rate problems are
among the main reasons why so-called "hot-spot" analyses may be very
misleading, in appearing to give an inferential basis for apparent map
pattern.

In our survey paper with David Wong referenced on ?localG,
https://doi.org/10.1007/s11749-018-0599-x, we show that the analytical and
bootstrap-based inferences are similar - the normality is related not to
the underlying variable seen globally, but the the local behaviour of the
statistic. For this reason, bootstrap permutation implementations are not
included in spdep, though the code is available if need be. Please
indicate whether users would like this code included for comparative
purposes here or in a github issue on
https://github.com/r-spatial/spdep/issues/.

Further, the LOSH statistic, which is a measure of local spatial
heteroscedasticity building on local G, provides a little insight into the
problems raised for so-called "hot-spot" analyses by variability across
the study area in the behaviour of the variable of interest. If, for
example, the variable of interest is influenced by a background variable
with a spatial pattern, we will probably find "hot-spots" which look like
the omitted background variable on a map.

While local G cannot take residuals of a linear model, local Moran's I can
do so. For local G, we do not have exact case-by-case standard deviates;
we do have these for local Moran's I as discussed in the article with
David Wong, and they very typically reduce strongly the counts of
apparently significant local statistcs even before adjusting p-values for
FDR. Finally, only some local measures can adjust for global
autocorrelation - unadjusted local measures also respond to the presence
of global autocorrelation.

On balance, judicious choice of class intervals in mapping a variable of
interest may prove more helpful than trying to present wobbly inferences
from ESDA.

Hope this isn't too discouraging,

Roger


>
> Kind regards.
> José
>
> El vie., 24 abr. 2020 a las 14:00, Anaïs Ladoy (<[hidden email]>)
> escribió:
>
>> Dear list members,
>>
>> I'm currently working on a point dataset, from which I want to conduct
>> a Hot Spot Analysis with local Gi* statistics (Getis-Ord).
>>
>> I'm trying to find a way of computing its significance. I see two ways
>> of computing significance in this case:
>>
>> 1) Compare the obtained local Gi from spdep::localG to a normal
>> distribution. But here I have several questions :
>> a) In my first case study (BMI value of 15 000 participants in a cohort
>> study), the distribution of local Gi is far from normal (it is bimodal
>> with a mode around very negative values and a mode around 0). However,
>> I do need a normal distribution of Gi in order to compare it with a
>> normal distribution, right? Or am I missing something here? What should
>> I do in this case?
>> b) In my second case study (Years of life lost for 30 000 individuals),
>> the distribution of Gi returned by spdep::localG is approximately
>> normal but the standard deviation is far from 1. In fact, in
>> spdep::localG, the Gi values are supposedly standardized (from what I
>> understood using an analytical mean and variance). Should I use these
>> to compare to a normal distribution, or should I use raw G values
>> (using return_internals=TRUE) and standardize them with the observed
>> mean and variance of Gi? Does it cause a problem that my observed
>> variance does not match the analytical variance?
>>
>> 2) Compute permutations
>> However this is not implemented in R for localG. I tried using PySAL
>> but the initial file is big and the weight file is huge, and my
>> computer crashes. Any thoughts to solve this issue?
>>
>> Thank you for any feedback.
>> Kind regards,
>> Anaïs
>>
>> --
>> Anaïs Ladoy
>> PhD student, Laboratory of Geographic Information Systems, Swiss
>> Federal Institute of Technology in Lausanne (EPFL), Switzerland.
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Inference of local Gi*

Anaïs Ladoy
Dear José and Roger,
Thank you very much for your answers! Your detailed explanations are
really helpful and I will take your recommendations to continue my
research work.
Kind regards,Anaïs

On Mon, 2020-04-27 at 11:04 +0200, Roger Bivand wrote:

> On Sat, 25 Apr 2020, Jose Ramon Martinez Batlle wrote:
> Dear Anaïs.
> I am sure more experienced members will give you a better answer, but
> untilthat I will try to help.
> 1) If I understood correctly, the spatial objects have 15 000 and 30
> 000points in each case study, respectively. If this is the case, I am
> afraidthat nb objects of such large datasets surely would have an
> impact on thesystem performance when used in subsequent tasks. The
> best I can suggest isto try some sort of spatial binning if possible
> (e.g. hexbins), but at thesame time accounting for the modifiable
> areal unit problem.
> 2) The spdep:localG help page states that "For inference, a
> Bonferroni-typetest is suggested in the references, where tables of
> critical values may befound". The source mentioned is free access,
> and can be found here:
> Ord, J. K. and Getis, A. 1995 Local spatial autocorrelation
> statistics:distributional issues and an application. Geographical
> Analysis, 27, 286–306
> https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1538-4632.1995.tb00912.x
> Standard measures (critical values) for selected percentiles and
> number ofentities, are included in Table 3 of the cited reference.
> Since the valuesreturned from localG are Z-values, you can use them
> to determine whetherthe critical value chosen is exceeded and thus
> infer significant localspatial association for each entity.
> Thanks, José, you are quite correct that false discovery rate
> problems are among the main reasons why so-called "hot-spot" analyses
> may be very misleading, in appearing to give an inferential basis for
> apparent map pattern.
> In our survey paper with David Wong referenced on ?localG,
> https://doi.org/10.1007/s11749-018-0599-x, we show that the
> analytical and bootstrap-based inferences are similar - the normality
> is related not to the underlying variable seen globally, but the the
> local behaviour of the statistic. For this reason, bootstrap
> permutation implementations are not included in spdep, though the
> code is available if need be. Please indicate whether users would
> like this code included for comparative purposes here or in a github
> issue on https://github.com/r-spatial/spdep/issues/.
> Further, the LOSH statistic, which is a measure of local spatial
> heteroscedasticity building on local G, provides a little insight
> into the problems raised for so-called "hot-spot" analyses by
> variability across the study area in the behaviour of the variable of
> interest. If, for example, the variable of interest is influenced by
> a background variable with a spatial pattern, we will probably find
> "hot-spots" which look like the omitted background variable on a map.
> While local G cannot take residuals of a linear model, local Moran's
> I can do so. For local G, we do not have exact case-by-case standard
> deviates; we do have these for local Moran's I as discussed in the
> article with David Wong, and they very typically reduce strongly the
> counts of apparently significant local statistcs even before
> adjusting p-values for FDR. Finally, only some local measures can
> adjust for global autocorrelation - unadjusted local measures also
> respond to the presence of global autocorrelation.
> On balance, judicious choice of class intervals in mapping a variable
> of interest may prove more helpful than trying to present wobbly
> inferences from ESDA.
> Hope this isn't too discouraging,
> Roger
>
>
> Kind regards.José
> El vie., 24 abr. 2020 a las 14:00, Anaïs Ladoy (<[hidden email]>
> )escribió:
> Dear list members,
> I'm currently working on a point dataset, from which I want to
> conducta Hot Spot Analysis with local Gi* statistics (Getis-Ord).
> I'm trying to find a way of computing its significance. I see two
> waysof computing significance in this case:
> 1) Compare the obtained local Gi from spdep::localG to a
> normaldistribution. But here I have several questions :a) In my first
> case study (BMI value of 15 000 participants in a cohortstudy), the
> distribution of local Gi is far from normal (it is bimodalwith a mode
> around very negative values and a mode around 0). However,I do need a
> normal distribution of Gi in order to compare it with anormal
> distribution, right? Or am I missing something here? What shouldI do
> in this case?b) In my second case study (Years of life lost for 30
> 000 individuals),the distribution of Gi returned by spdep::localG is
> approximatelynormal but the standard deviation is far from 1. In
> fact, inspdep::localG, the Gi values are supposedly standardized
> (from what Iunderstood using an analytical mean and variance). Should
> I use theseto compare to a normal distribution, or should I use raw G
> values(using return_internals=TRUE) and standardize them with the
> observedmean and variance of Gi? Does it cause a problem that my
> observedvariance does not match the analytical variance?
> 2) Compute permutationsHowever this is not implemented in R for
> localG. I tried using PySALbut the initial file is big and the weight
> file is huge, and mycomputer crashes. Any thoughts to solve this
> issue?
> Thank you for any feedback.Kind regards,Anaïs
> --Anaïs LadoyPhD student, Laboratory of Geographic Information
> Systems, SwissFederal Institute of Technology in Lausanne (EPFL),
> Switzerland.
> _______________________________________________R-sig-Geo mailing
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>
>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Inference of local Gi*

Jose Ramon Martinez Batlle
In reply to this post by Roger Bivand
Thanks Roger for your feedback and clarification.

Best regards.


El lun., 27 abr. 2020 a las 5:04, Roger Bivand (<[hidden email]>)
escribió:

> On Sat, 25 Apr 2020, Jose Ramon Martinez Batlle wrote:
>
> > Dear Anaïs.
> >
> > I am sure more experienced members will give you a better answer, but
> until
> > that I will try to help.
> >
> > 1) If I understood correctly, the spatial objects have 15 000 and 30 000
> > points in each case study, respectively. If this is the case, I am afraid
> > that nb objects of such large datasets surely would have an impact on the
> > system performance when used in subsequent tasks. The best I can suggest
> is
> > to try some sort of spatial binning if possible (e.g. hexbins), but at
> the
> > same time accounting for the modifiable areal unit problem.
> >
> > 2) The spdep:localG help page states that "For inference, a
> Bonferroni-type
> > test is suggested in the references, where tables of critical values may
> be
> > found". The source mentioned is free access, and can be found here:
> >
> > Ord, J. K. and Getis, A. 1995 Local spatial autocorrelation statistics:
> > distributional issues and an application. Geographical Analysis, 27,
> 286–306
> >
> https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1538-4632.1995.tb00912.x
> >
> > Standard measures (critical values) for selected percentiles and number
> of
> > entities, are included in Table 3 of the cited reference. Since the
> values
> > returned from localG are Z-values, you can use them to determine whether
> > the critical value chosen is exceeded and thus infer significant local
> > spatial association for each entity.
>
> Thanks, José, you are quite correct that false discovery rate problems are
> among the main reasons why so-called "hot-spot" analyses may be very
> misleading, in appearing to give an inferential basis for apparent map
> pattern.
>
> In our survey paper with David Wong referenced on ?localG,
> https://doi.org/10.1007/s11749-018-0599-x, we show that the analytical
> and
> bootstrap-based inferences are similar - the normality is related not to
> the underlying variable seen globally, but the the local behaviour of the
> statistic. For this reason, bootstrap permutation implementations are not
> included in spdep, though the code is available if need be. Please
> indicate whether users would like this code included for comparative
> purposes here or in a github issue on
> https://github.com/r-spatial/spdep/issues/.
>
> Further, the LOSH statistic, which is a measure of local spatial
> heteroscedasticity building on local G, provides a little insight into the
> problems raised for so-called "hot-spot" analyses by variability across
> the study area in the behaviour of the variable of interest. If, for
> example, the variable of interest is influenced by a background variable
> with a spatial pattern, we will probably find "hot-spots" which look like
> the omitted background variable on a map.
>
> While local G cannot take residuals of a linear model, local Moran's I can
> do so. For local G, we do not have exact case-by-case standard deviates;
> we do have these for local Moran's I as discussed in the article with
> David Wong, and they very typically reduce strongly the counts of
> apparently significant local statistcs even before adjusting p-values for
> FDR. Finally, only some local measures can adjust for global
> autocorrelation - unadjusted local measures also respond to the presence
> of global autocorrelation.
>
> On balance, judicious choice of class intervals in mapping a variable of
> interest may prove more helpful than trying to present wobbly inferences
> from ESDA.
>
> Hope this isn't too discouraging,
>
> Roger
>
>
> >
> > Kind regards.
> > José
> >
> > El vie., 24 abr. 2020 a las 14:00, Anaïs Ladoy (<[hidden email]>)
> > escribió:
> >
> >> Dear list members,
> >>
> >> I'm currently working on a point dataset, from which I want to conduct
> >> a Hot Spot Analysis with local Gi* statistics (Getis-Ord).
> >>
> >> I'm trying to find a way of computing its significance. I see two ways
> >> of computing significance in this case:
> >>
> >> 1) Compare the obtained local Gi from spdep::localG to a normal
> >> distribution. But here I have several questions :
> >> a) In my first case study (BMI value of 15 000 participants in a cohort
> >> study), the distribution of local Gi is far from normal (it is bimodal
> >> with a mode around very negative values and a mode around 0). However,
> >> I do need a normal distribution of Gi in order to compare it with a
> >> normal distribution, right? Or am I missing something here? What should
> >> I do in this case?
> >> b) In my second case study (Years of life lost for 30 000 individuals),
> >> the distribution of Gi returned by spdep::localG is approximately
> >> normal but the standard deviation is far from 1. In fact, in
> >> spdep::localG, the Gi values are supposedly standardized (from what I
> >> understood using an analytical mean and variance). Should I use these
> >> to compare to a normal distribution, or should I use raw G values
> >> (using return_internals=TRUE) and standardize them with the observed
> >> mean and variance of Gi? Does it cause a problem that my observed
> >> variance does not match the analytical variance?
> >>
> >> 2) Compute permutations
> >> However this is not implemented in R for localG. I tried using PySAL
> >> but the initial file is big and the weight file is huge, and my
> >> computer crashes. Any thoughts to solve this issue?
> >>
> >> Thank you for any feedback.
> >> Kind regards,
> >> Anaïs
> >>
> >> --
> >> Anaïs Ladoy
> >> PhD student, Laboratory of Geographic Information Systems, Swiss
> >> Federal Institute of Technology in Lausanne (EPFL), Switzerland.
> >>
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> [hidden email]
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >
> >
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



--
*José Ramón Martínez Batlle*
*Investigador/Profesor Universidad Autónoma de Santo Domingo (UASD)*
Correo electrónico: [hidden email]
Página web: http://geografiafisica.org

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo