credible interval for empirical Bayesian estimates of rates

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

credible interval for empirical Bayesian estimates of rates

Dexter Locke
Dear esteemed list,

I'm using spdep::EBest with family = 'binomial' for counts of events within
polygons that have an 'at risk' population. The resultant "estmm" is
'shrunk' compared to the raw rate (both given by EBest and calculated "by
hand" rate. All good there.

Using GeoDa version 1.14.0 24 August 2019 produces identical results for
its Empirical Bayesian rate. This was confirmed by plotting the EBest
output against GeoDa's rate and finding a perfect correlation along the 1
to 1 line. All good there.

Two questions:
1. How can credible intervals around these smoothed rate estimates be
calculated?
2. The spdep documentation calls this a binomial family, but the identical
results are obtained from GeoDa calls this "Poisson-Gamma" model here:
https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so
what is actually being calculated? This question may help me answer the
first question..

Possibly the answers are addressed in the literature cited which I cannot
access right now at home without institutional library access.

Thanks for your consideration,
Dexter
http://dexterlocke.com/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: credible interval for empirical Bayesian estimates of rates

Roger Bivand
Administrator
On Fri, 24 Apr 2020, Dexter Locke wrote:

> Dear esteemed list,
>
> I'm using spdep::EBest with family = 'binomial' for counts of events within
> polygons that have an 'at risk' population. The resultant "estmm" is
> 'shrunk' compared to the raw rate (both given by EBest and calculated "by
> hand" rate. All good there.
>
> Using GeoDa version 1.14.0 24 August 2019 produces identical results for
> its Empirical Bayesian rate. This was confirmed by plotting the EBest
> output against GeoDa's rate and finding a perfect correlation along the 1
> to 1 line. All good there.

Please provide a reproducible example, as this may help with answers.

>
> Two questions:
> 1. How can credible intervals around these smoothed rate estimates be
> calculated?
> 2. The spdep documentation calls this a binomial family, but the identical
> results are obtained from GeoDa calls this "Poisson-Gamma" model here:
> https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so
> what is actually being calculated? This question may help me answer the
> first question..

No, the default family is "poisson", with "binomial" available for
non-rare conditions following Martuzzi, implemented by Olaf
Berke, ?spdep::EBest.

The code in spdep is easily accessible, so can be read directly. Please
also compare with code for the EB Moran test, and with analogous code in
the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section
10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs.
For code and data see https://asdar-book.org/bundles2ed/dismap_bundle.zip.

>
> Possibly the answers are addressed in the literature cited which I cannot
> access right now at home without institutional library access.
>

Most institutions do have proxy or VPN access, but the code will be as
useful. In PySAL, the code would also guide you, but even though GeoDa is
open source, the C++ is fairly dense.

Hope this helps,

Roger

> Thanks for your consideration,
> Dexter
> http://dexterlocke.com/
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: credible interval for empirical Bayesian estimates of rates

Dexter Locke
Thanks Roger and list.

I didn't think a repex was needed because a question was: why does
spdep::EBest(counts, population, family = 'binomial') give the same
results at GeoDa's, while EBest(.. binomial) is "binomial" while GeoDa
calls that "Poisson-Gamma". GeoDa can't give use a repex (GUI) and think
this is a question about terminology. The same results were achieved with
the packages while naming the model differently - why?

Yes ?spdep::EBest directed me to the literature I'm struggling to access.
And Yes, I've been looking at the raw code and understand how the estmm is
generated.

I've been using the epitools::pois.exact() and spdep::EBest. I can compare
the point estimates from pois.exact to those provided by EBest, but I'd
like to graph side by side their credible / confidence intervals.

Its this last part on the credible intervals I'm interested in. How to get
credible intervals around estmm? This is my main question.

ASDAR is a reference I'm using all the time. Thanks for that gem.

DCluster::empbaysmooth also does not provide a credible interval, either.

-Dexter
http://dexterlocke.com/



On Fri, Apr 24, 2020 at 10:23 AM Roger Bivand <[hidden email]> wrote:

> On Fri, 24 Apr 2020, Dexter Locke wrote:
>
> > Dear esteemed list,
> >
> > I'm using spdep::EBest with family = 'binomial' for counts of events
> within
> > polygons that have an 'at risk' population. The resultant "estmm" is
> > 'shrunk' compared to the raw rate (both given by EBest and calculated "by
> > hand" rate. All good there.
> >
> > Using GeoDa version 1.14.0 24 August 2019 produces identical results for
> > its Empirical Bayesian rate. This was confirmed by plotting the EBest
> > output against GeoDa's rate and finding a perfect correlation along the 1
> > to 1 line. All good there.
>
> Please provide a reproducible example, as this may help with answers.
>
> >
> > Two questions:
> > 1. How can credible intervals around these smoothed rate estimates be
> > calculated?
> > 2. The spdep documentation calls this a binomial family, but the
> identical
> > results are obtained from GeoDa calls this "Poisson-Gamma" model here:
> > https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so
> > what is actually being calculated? This question may help me answer the
> > first question..
>
> No, the default family is "poisson", with "binomial" available for
> non-rare conditions following Martuzzi, implemented by Olaf
> Berke, ?spdep::EBest.
>
> The code in spdep is easily accessible, so can be read directly. Please
> also compare with code for the EB Moran test, and with analogous code in
> the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section
> 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs.
> For code and data see https://asdar-book.org/bundles2ed/dismap_bundle.zip.
>
> >
> > Possibly the answers are addressed in the literature cited which I cannot
> > access right now at home without institutional library access.
> >
>
> Most institutions do have proxy or VPN access, but the code will be as
> useful. In PySAL, the code would also guide you, but even though GeoDa is
> open source, the C++ is fairly dense.
>
> Hope this helps,
>
> Roger
>
> > Thanks for your consideration,
> > Dexter
> > http://dexterlocke.com/
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: credible interval for empirical Bayesian estimates of rates

Roger Bivand
Administrator
On Fri, 24 Apr 2020, Dexter Locke wrote:

> Thanks Roger and list.
>
> I didn't think a repex was needed because a question was: why does
> spdep::EBest(counts, population, family = 'binomial') give the same
> results at GeoDa's, while EBest(.. binomial) is "binomial" while GeoDa
> calls that "Poisson-Gamma". GeoDa can't give use a repex (GUI) and think
> this is a question about terminology. The same results were achieved with
> the packages while naming the model differently - why?

Reproducible example:

auckland <- st_read(system.file("shapes/auckland.shp",
   package="spData")[1], quiet=TRUE)
res <- EBest(auckland$M77_85, 9*auckland$Und5_81)
res0 <- EBest(auckland$M77_85, 9*auckland$Und5_81, family="binomial")

> summary(res$estmm)
     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.001487 0.002237 0.002549 0.002648 0.002968 0.004531
> summary(res0$estmm)
     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.001484 0.002235 0.002549 0.002648 0.002968 0.004536

After calculating Und5_81 * 9 as a new column, running GeoDa -> Map ->
Rates-Caluculated Map -> Empirical Bayes and exporting a shapefile, in R I
see:

> summary(auck$R_EBS)
     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.001487 0.002237 0.002549 0.002648 0.002968 0.004531

which is the same as the Poisson, and:

> all.equal(res$estmm, auck$R_EBS)
[1] TRUE

GeoDa is providing the same EB Poisson as EBest, isn't it? If yours
differ, are both implementations seeing the same data?

>
> Yes ?spdep::EBest directed me to the literature I'm struggling to access.
> And Yes, I've been looking at the raw code and understand how the estmm is
> generated.
>
> I've been using the epitools::pois.exact() and spdep::EBest. I can compare
> the point estimates from pois.exact to those provided by EBest, but I'd
> like to graph side by side their credible / confidence intervals.
>
> Its this last part on the credible intervals I'm interested in. How to get
> credible intervals around estmm? This is my main question.

EBest() was written to implement Bailey & Gatrell's textbook, which did
not provide CI, and just used the Marshall Auckland dataset. If you'd like
to implement them, I'd welcome a contribution.

>
> ASDAR is a reference I'm using all the time. Thanks for that gem.
>
> DCluster::empbaysmooth also does not provide a credible interval, either.
>

As you see from ch. 10, CI are described for the epitools implementation.
My feeling is that the literature moved away from simple EB rates towards
IID RE and spatially structured RE, with relevant covariates, say like the
classic Scottish Lip Cancer dataset, and currently CARBayes is a solid
package among many others. Simply using base population becomes too
unsatisfactory. PHE uses funnel plots which do have CI of a kind, to draw
attention from small base populations:
https://nhsrcommunity.com/blog/introduction-to-funnel-plots/ which can be
used to adjust class intervals for mapping.

Roger

> -Dexter
> http://dexterlocke.com/
>
>
>
> On Fri, Apr 24, 2020 at 10:23 AM Roger Bivand <[hidden email]> wrote:
>
>> On Fri, 24 Apr 2020, Dexter Locke wrote:
>>
>>> Dear esteemed list,
>>>
>>> I'm using spdep::EBest with family = 'binomial' for counts of events
>> within
>>> polygons that have an 'at risk' population. The resultant "estmm" is
>>> 'shrunk' compared to the raw rate (both given by EBest and calculated "by
>>> hand" rate. All good there.
>>>
>>> Using GeoDa version 1.14.0 24 August 2019 produces identical results for
>>> its Empirical Bayesian rate. This was confirmed by plotting the EBest
>>> output against GeoDa's rate and finding a perfect correlation along the 1
>>> to 1 line. All good there.
>>
>> Please provide a reproducible example, as this may help with answers.
>>
>>>
>>> Two questions:
>>> 1. How can credible intervals around these smoothed rate estimates be
>>> calculated?
>>> 2. The spdep documentation calls this a binomial family, but the
>> identical
>>> results are obtained from GeoDa calls this "Poisson-Gamma" model here:
>>> https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so
>>> what is actually being calculated? This question may help me answer the
>>> first question..
>>
>> No, the default family is "poisson", with "binomial" available for
>> non-rare conditions following Martuzzi, implemented by Olaf
>> Berke, ?spdep::EBest.
>>
>> The code in spdep is easily accessible, so can be read directly. Please
>> also compare with code for the EB Moran test, and with analogous code in
>> the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section
>> 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs.
>> For code and data see https://asdar-book.org/bundles2ed/dismap_bundle.zip.
>>
>>>
>>> Possibly the answers are addressed in the literature cited which I cannot
>>> access right now at home without institutional library access.
>>>
>>
>> Most institutions do have proxy or VPN access, but the code will be as
>> useful. In PySAL, the code would also guide you, but even though GeoDa is
>> open source, the C++ is fairly dense.
>>
>> Hope this helps,
>>
>> Roger
>>
>>> Thanks for your consideration,
>>> Dexter
>>> http://dexterlocke.com/
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: [hidden email]
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: credible interval for empirical Bayesian estimates of rates

Dexter Locke
Wonderful thank you.

GeoDA and the spdep::EBest are providing the same estimates for my
datasets. Thanks for the reprex example with the GUI via ">". That is a
helpful convention I'll use in the future..

Ok great, there is not yet a CI implementation. If I can manage one I'd be
delighted to share and contribute it.

Julie Silge: https://juliasilge.com/blog/bayesian-blues/

and David Robinson
http://varianceexplained.org/r/credible_intervals_baseball/

have some suggestions that are similar to each other's (no surprise there,
they work together a lot).

Thanks for the suggestions about alternative model specifications and
including covariates.

This has been very helpful, I appreciate this list immensely,
Dexter



On Fri, Apr 24, 2020 at 3:05 PM Roger Bivand <[hidden email]> wrote:

> On Fri, 24 Apr 2020, Dexter Locke wrote:
>
> > Thanks Roger and list.
> >
> > I didn't think a repex was needed because a question was: why does
> > spdep::EBest(counts, population, family = 'binomial') give the same
> > results at GeoDa's, while EBest(.. binomial) is "binomial" while GeoDa
> > calls that "Poisson-Gamma". GeoDa can't give use a repex (GUI) and think
> > this is a question about terminology. The same results were achieved with
> > the packages while naming the model differently - why?
>
> Reproducible example:
>
> auckland <- st_read(system.file("shapes/auckland.shp",
>    package="spData")[1], quiet=TRUE)
> res <- EBest(auckland$M77_85, 9*auckland$Und5_81)
> res0 <- EBest(auckland$M77_85, 9*auckland$Und5_81, family="binomial")
>
> > summary(res$estmm)
>      Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
> 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531
> > summary(res0$estmm)
>      Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
> 0.001484 0.002235 0.002549 0.002648 0.002968 0.004536
>
> After calculating Und5_81 * 9 as a new column, running GeoDa -> Map ->
> Rates-Caluculated Map -> Empirical Bayes and exporting a shapefile, in R I
> see:
>
> > summary(auck$R_EBS)
>      Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
> 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531
>
> which is the same as the Poisson, and:
>
> > all.equal(res$estmm, auck$R_EBS)
> [1] TRUE
>
> GeoDa is providing the same EB Poisson as EBest, isn't it? If yours
> differ, are both implementations seeing the same data?
>
> >
> > Yes ?spdep::EBest directed me to the literature I'm struggling to access.
> > And Yes, I've been looking at the raw code and understand how the estmm
> is
> > generated.
> >
> > I've been using the epitools::pois.exact() and spdep::EBest. I can
> compare
> > the point estimates from pois.exact to those provided by EBest, but I'd
> > like to graph side by side their credible / confidence intervals.
> >
> > Its this last part on the credible intervals I'm interested in. How to
> get
> > credible intervals around estmm? This is my main question.
>
> EBest() was written to implement Bailey & Gatrell's textbook, which did
> not provide CI, and just used the Marshall Auckland dataset. If you'd like
> to implement them, I'd welcome a contribution.
>
> >
> > ASDAR is a reference I'm using all the time. Thanks for that gem.
> >
> > DCluster::empbaysmooth also does not provide a credible interval, either.
> >
>
> As you see from ch. 10, CI are described for the epitools implementation.
> My feeling is that the literature moved away from simple EB rates towards
> IID RE and spatially structured RE, with relevant covariates, say like the
> classic Scottish Lip Cancer dataset, and currently CARBayes is a solid
> package among many others. Simply using base population becomes too
> unsatisfactory. PHE uses funnel plots which do have CI of a kind, to draw
> attention from small base populations:
> https://nhsrcommunity.com/blog/introduction-to-funnel-plots/ which can be
> used to adjust class intervals for mapping.
>
> Roger
>
> > -Dexter
> > http://dexterlocke.com/
> >
> >
> >
> > On Fri, Apr 24, 2020 at 10:23 AM Roger Bivand <[hidden email]>
> wrote:
> >
> >> On Fri, 24 Apr 2020, Dexter Locke wrote:
> >>
> >>> Dear esteemed list,
> >>>
> >>> I'm using spdep::EBest with family = 'binomial' for counts of events
> >> within
> >>> polygons that have an 'at risk' population. The resultant "estmm" is
> >>> 'shrunk' compared to the raw rate (both given by EBest and calculated
> "by
> >>> hand" rate. All good there.
> >>>
> >>> Using GeoDa version 1.14.0 24 August 2019 produces identical results
> for
> >>> its Empirical Bayesian rate. This was confirmed by plotting the EBest
> >>> output against GeoDa's rate and finding a perfect correlation along
> the 1
> >>> to 1 line. All good there.
> >>
> >> Please provide a reproducible example, as this may help with answers.
> >>
> >>>
> >>> Two questions:
> >>> 1. How can credible intervals around these smoothed rate estimates be
> >>> calculated?
> >>> 2. The spdep documentation calls this a binomial family, but the
> >> identical
> >>> results are obtained from GeoDa calls this "Poisson-Gamma" model here:
> >>> https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 ,
> so
> >>> what is actually being calculated? This question may help me answer the
> >>> first question..
> >>
> >> No, the default family is "poisson", with "binomial" available for
> >> non-rare conditions following Martuzzi, implemented by Olaf
> >> Berke, ?spdep::EBest.
> >>
> >> The code in spdep is easily accessible, so can be read directly. Please
> >> also compare with code for the EB Moran test, and with analogous code in
> >> the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section
> >> 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs.
> >> For code and data see
> https://asdar-book.org/bundles2ed/dismap_bundle.zip.
> >>
> >>>
> >>> Possibly the answers are addressed in the literature cited which I
> cannot
> >>> access right now at home without institutional library access.
> >>>
> >>
> >> Most institutions do have proxy or VPN access, but the code will be as
> >> useful. In PySAL, the code would also guide you, but even though GeoDa
> is
> >> open source, the C++ is fairly dense.
> >>
> >> Hope this helps,
> >>
> >> Roger
> >>
> >>> Thanks for your consideration,
> >>> Dexter
> >>> http://dexterlocke.com/
> >>>
> >>>       [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> [hidden email]
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55; e-mail: [hidden email]
> >> https://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo