Hello,
I am running a negative binomial model (MASS) on count data collected on a grid. The dataset is large  ~4,000 points, with many predictors. Being counts, there are a lot of zeroes. All data are collected on a grid with 20 points, with high spatial autocorrelation. I would like to filter out the spatial autocorrelation. My question is: since I have very limited spatial info (only 20 distinct spatial locations), is it possible to simplify ME() so that I don't have to run it on the whole dataset? When I try to run ME() on a 100point subset of the data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a single instance of the grid, I "get away" with a warning ("algorithm did not converge"). Here's a fake dataset. It was grinding for a while but not throwing errors (like my original data would). Regardless, it demonstrates the repeated sampling at the same points and the large number of zeroes. Any advice would be most welcome. library(spdep) library(MASS) df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, 30, 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) coordinates(df) < ~Lon + Lat proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) dists < nbdists(nb, coordinates(df), longlat=TRUE) glist < lapply(dists, function(x) 1/x) lw < nb2listw(nb, glist, style="W") me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = 0.5) [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Wed, 10 Jan 2018, Sima Usvyatsov wrote:
> Hello, > > I am running a negative binomial model (MASS) on count data collected on a > grid. The dataset is large  ~4,000 points, with many predictors. Being > counts, there are a lot of zeroes. All data are collected on a grid with 20 > points, with high spatial autocorrelation. > > I would like to filter out the spatial autocorrelation. My question is: > since I have very limited spatial info (only 20 distinct spatial > locations), is it possible to simplify ME() so that I don't have to run it > on the whole dataset? When I try to run ME() on a 100point subset of the > data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a single > instance of the grid, I "get away" with a warning ("algorithm did not > converge"). > > Here's a fake dataset. It was grinding for a while but not throwing errors > (like my original data would). Regardless, it demonstrates the repeated > sampling at the same points and the large number of zeroes. The data set has 1000 values in Lon, so is probably bigger than you intended, and when 100 is used is not autocorrelated. You seem to have a hierarchical model, with repeated measurements at the locations, so a multilevel treatment of some kind may be sensible. If you want to stay with MEbased spatial filtering, maybe look at the literature on spatial panel (repeated measurements are in time) with ME/SF, and on network autocorrelation (dyadic relationships with autocorrelation among origins and/or destinations). Both these cases use Kronecker products on the selected eigenvectors, I think. Alternatively, use a standard GLMM with a grouped iid random effect and/or a spatially structured random effect at the 20 location level. If the groups are repeated observations in time, you should model the whole (non)separable spacetime process. Hope this helps, Roger > > Any advice would be most welcome. > > library(spdep) > library(MASS) > > df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, 30, > 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) > coordinates(df) < ~Lon + Lat > proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") > nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) > dists < nbdists(nb, coordinates(df), longlat=TRUE) > glist < lapply(dists, function(x) 1/x) > lw < nb2listw(nb, glist, style="W") > me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = 0.5) > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo >  Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Thank you so much for your response.
Yes, I managed to muck up the fake data in 2 (!) ways  the 1,000 lons and the fact that the lon/lats weren't repeated. Here's the correct structure. df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20, 30, 0.1), each = 5), Lon = rep(rnorm(20, 75, 1), each = 5), x = rnegbin(100, 1, 1), Stratum = rep(1:5, each = 20)) In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed effects  my fault, of course. My current model is set up as follows: mod1 < glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors, random = ~ 1 RoundStart, family = quasipoisson, correlation = corExp(form=~site.Easting + site.Northing + RoundStart) where RoundStart is the date/time of starting each count. I'm assuming that by "using a standard GLMM" you were thinking of MASS's glmmPQL()? Does this look correct for specifying the full space/time dependence? The variogram and pacf look very decent, but one can never have too many checks... On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <[hidden email]> wrote: > On Wed, 10 Jan 2018, Sima Usvyatsov wrote: > > Hello, >> >> I am running a negative binomial model (MASS) on count data collected on a >> grid. The dataset is large  ~4,000 points, with many predictors. Being >> counts, there are a lot of zeroes. All data are collected on a grid with >> 20 >> points, with high spatial autocorrelation. >> >> I would like to filter out the spatial autocorrelation. My question is: >> since I have very limited spatial info (only 20 distinct spatial >> locations), is it possible to simplify ME() so that I don't have to run it >> on the whole dataset? When I try to run ME() on a 100point subset of the >> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a single >> instance of the grid, I "get away" with a warning ("algorithm did not >> converge"). >> >> Here's a fake dataset. It was grinding for a while but not throwing errors >> (like my original data would). Regardless, it demonstrates the repeated >> sampling at the same points and the large number of zeroes. >> > > The data set has 1000 values in Lon, so is probably bigger than you > intended, and when 100 is used is not autocorrelated. You seem to have a > hierarchical model, with repeated measurements at the locations, so a > multilevel treatment of some kind may be sensible. If you want to stay > with MEbased spatial filtering, maybe look at the literature on spatial > panel (repeated measurements are in time) with ME/SF, and on network > autocorrelation (dyadic relationships with autocorrelation among origins > and/or destinations). Both these cases use Kronecker products on the > selected eigenvectors, I think. > > Alternatively, use a standard GLMM with a grouped iid random effect and/or > a spatially structured random effect at the 20 location level. If the > groups are repeated observations in time, you should model the whole > (non)separable spacetime process. > > Hope this helps, > > Roger > > >> Any advice would be most welcome. >> >> library(spdep) >> library(MASS) >> >> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, >> 30, >> 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) >> coordinates(df) < ~Lon + Lat >> proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") >> nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) >> dists < nbdists(nb, coordinates(df), longlat=TRUE) >> glist < lapply(dists, function(x) 1/x) >> lw < nb2listw(nb, glist, style="W") >> me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = >> 0.5) >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> RsigGeo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> >> >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Thu, 11 Jan 2018, Sima Usvyatsov wrote:
> Thank you so much for your response. > > Yes, I managed to muck up the fake data in 2 (!) ways  the 1,000 lons and > the fact that the lon/lats weren't repeated. Here's the correct structure. > > df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20, > 30, 0.1), each = 5), Lon = rep(rnorm(20, 75, 1), each = 5), x = > rnegbin(100, 1, 1), > Stratum = rep(1:5, each = 20)) > > In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed > effects  my fault, of course. > > My current model is set up as follows: > > mod1 < glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors, > random = ~ 1 RoundStart, > family = quasipoisson, > correlation = corExp(form=~site.Easting + site.Northing + RoundStart) > > where RoundStart is the date/time of starting each count. I'm assuming that > by "using a standard GLMM" you were thinking of MASS's glmmPQL()? > Does this look correct for specifying the full space/time dependence? The > variogram and pacf look very decent, but one can never have too many > checks... I can't judge whether it really makes sense, but I think this is much more robust. I'd explore some other GLMMs too, to see what they contribute. Roger > > On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <[hidden email]> wrote: > >> On Wed, 10 Jan 2018, Sima Usvyatsov wrote: >> >> Hello, >>> >>> I am running a negative binomial model (MASS) on count data collected on a >>> grid. The dataset is large  ~4,000 points, with many predictors. Being >>> counts, there are a lot of zeroes. All data are collected on a grid with >>> 20 >>> points, with high spatial autocorrelation. >>> >>> I would like to filter out the spatial autocorrelation. My question is: >>> since I have very limited spatial info (only 20 distinct spatial >>> locations), is it possible to simplify ME() so that I don't have to run it >>> on the whole dataset? When I try to run ME() on a 100point subset of the >>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a single >>> instance of the grid, I "get away" with a warning ("algorithm did not >>> converge"). >>> >>> Here's a fake dataset. It was grinding for a while but not throwing errors >>> (like my original data would). Regardless, it demonstrates the repeated >>> sampling at the same points and the large number of zeroes. >>> >> >> The data set has 1000 values in Lon, so is probably bigger than you >> intended, and when 100 is used is not autocorrelated. You seem to have a >> hierarchical model, with repeated measurements at the locations, so a >> multilevel treatment of some kind may be sensible. If you want to stay >> with MEbased spatial filtering, maybe look at the literature on spatial >> panel (repeated measurements are in time) with ME/SF, and on network >> autocorrelation (dyadic relationships with autocorrelation among origins >> and/or destinations). Both these cases use Kronecker products on the >> selected eigenvectors, I think. >> >> Alternatively, use a standard GLMM with a grouped iid random effect and/or >> a spatially structured random effect at the 20 location level. If the >> groups are repeated observations in time, you should model the whole >> (non)separable spacetime process. >> >> Hope this helps, >> >> Roger >> >> >>> Any advice would be most welcome. >>> >>> library(spdep) >>> library(MASS) >>> >>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, >>> 30, >>> 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) >>> coordinates(df) < ~Lon + Lat >>> proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") >>> nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) >>> dists < nbdists(nb, coordinates(df), longlat=TRUE) >>> glist < lapply(dists, function(x) 1/x) >>> lw < nb2listw(nb, glist, style="W") >>> me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = >>> 0.5) >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> RsigGeo mailing list >>> [hidden email] >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>> >>> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] >> EditorinChief of The R Journal, https://journal.rproject.org/index.html >> http://orcid.org/0000000323926140 >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >> >  Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
What other GLMM options do I have, under the restrictions of count data
with lots of zeroes and spatial autocorrelation? I was under the impression that glmmPQL was my only choice? On Thu, Jan 11, 2018 at 8:18 AM, Roger Bivand <[hidden email]> wrote: > On Thu, 11 Jan 2018, Sima Usvyatsov wrote: > > Thank you so much for your response. >> >> Yes, I managed to muck up the fake data in 2 (!) ways  the 1,000 lons and >> the fact that the lon/lats weren't repeated. Here's the correct structure. >> >> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20, >> 30, 0.1), each = 5), Lon = rep(rnorm(20, 75, 1), each = 5), x = >> rnegbin(100, 1, 1), >> Stratum = rep(1:5, each = 20)) >> >> In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed >> effects  my fault, of course. >> >> My current model is set up as follows: >> >> mod1 < glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors, >> random = ~ 1 RoundStart, >> family = quasipoisson, >> correlation = corExp(form=~site.Easting + site.Northing + RoundStart) >> >> where RoundStart is the date/time of starting each count. I'm assuming >> that >> by "using a standard GLMM" you were thinking of MASS's glmmPQL()? >> Does this look correct for specifying the full space/time dependence? The >> variogram and pacf look very decent, but one can never have too many >> checks... >> > > I can't judge whether it really makes sense, but I think this is much more > robust. I'd explore some other GLMMs too, to see what they contribute. > > Roger > > > >> On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <[hidden email]> >> wrote: >> >> On Wed, 10 Jan 2018, Sima Usvyatsov wrote: >>> >>> Hello, >>> >>>> >>>> I am running a negative binomial model (MASS) on count data collected >>>> on a >>>> grid. The dataset is large  ~4,000 points, with many predictors. Being >>>> counts, there are a lot of zeroes. All data are collected on a grid with >>>> 20 >>>> points, with high spatial autocorrelation. >>>> >>>> I would like to filter out the spatial autocorrelation. My question is: >>>> since I have very limited spatial info (only 20 distinct spatial >>>> locations), is it possible to simplify ME() so that I don't have to run >>>> it >>>> on the whole dataset? When I try to run ME() on a 100point subset of >>>> the >>>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a >>>> single >>>> instance of the grid, I "get away" with a warning ("algorithm did not >>>> converge"). >>>> >>>> Here's a fake dataset. It was grinding for a while but not throwing >>>> errors >>>> (like my original data would). Regardless, it demonstrates the repeated >>>> sampling at the same points and the large number of zeroes. >>>> >>>> >>> The data set has 1000 values in Lon, so is probably bigger than you >>> intended, and when 100 is used is not autocorrelated. You seem to have a >>> hierarchical model, with repeated measurements at the locations, so a >>> multilevel treatment of some kind may be sensible. If you want to stay >>> with MEbased spatial filtering, maybe look at the literature on spatial >>> panel (repeated measurements are in time) with ME/SF, and on network >>> autocorrelation (dyadic relationships with autocorrelation among origins >>> and/or destinations). Both these cases use Kronecker products on the >>> selected eigenvectors, I think. >>> >>> Alternatively, use a standard GLMM with a grouped iid random effect >>> and/or >>> a spatially structured random effect at the 20 location level. If the >>> groups are repeated observations in time, you should model the whole >>> (non)separable spacetime process. >>> >>> Hope this helps, >>> >>> Roger >>> >>> >>> Any advice would be most welcome. >>>> >>>> library(spdep) >>>> library(MASS) >>>> >>>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, >>>> 30, >>>> 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) >>>> coordinates(df) < ~Lon + Lat >>>> proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 >>>> +no_defs") >>>> nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) >>>> dists < nbdists(nb, coordinates(df), longlat=TRUE) >>>> glist < lapply(dists, function(x) 1/x) >>>> lw < nb2listw(nb, glist, style="W") >>>> me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = >>>> 0.5) >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> RsigGeo mailing list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, Norway. >>> voice: +47 55 95 93 55; email: [hidden email] >>> EditorinChief of The R Journal, https://journal.rproject.org/ >>> index.html >>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >>> >> >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Thu, 11 Jan 2018, Sima Usvyatsov wrote:
> What other GLMM options do I have, under the restrictions of count data > with lots of zeroes and spatial autocorrelation? I was under the impression > that glmmPQL was my only choice? See for example: https://www.fromthebottomoftheheap.net/2017/05/04/comparemgcvwithglmmTMB/ https://rjournal.github.io/archive/2017/RJ2017066/index.html (forthcoming) for surveys and discussions; maybe your current choice is the best bet for negbin. Roger > > On Thu, Jan 11, 2018 at 8:18 AM, Roger Bivand <[hidden email]> wrote: > >> On Thu, 11 Jan 2018, Sima Usvyatsov wrote: >> >> Thank you so much for your response. >>> >>> Yes, I managed to muck up the fake data in 2 (!) ways  the 1,000 lons and >>> the fact that the lon/lats weren't repeated. Here's the correct structure. >>> >>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20, >>> 30, 0.1), each = 5), Lon = rep(rnorm(20, 75, 1), each = 5), x = >>> rnegbin(100, 1, 1), >>> Stratum = rep(1:5, each = 20)) >>> >>> In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed >>> effects  my fault, of course. >>> >>> My current model is set up as follows: >>> >>> mod1 < glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors, >>> random = ~ 1 RoundStart, >>> family = quasipoisson, >>> correlation = corExp(form=~site.Easting + site.Northing + RoundStart) >>> >>> where RoundStart is the date/time of starting each count. I'm assuming >>> that >>> by "using a standard GLMM" you were thinking of MASS's glmmPQL()? >>> Does this look correct for specifying the full space/time dependence? The >>> variogram and pacf look very decent, but one can never have too many >>> checks... >>> >> >> I can't judge whether it really makes sense, but I think this is much more >> robust. I'd explore some other GLMMs too, to see what they contribute. >> >> Roger >> >> >> >>> On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <[hidden email]> >>> wrote: >>> >>> On Wed, 10 Jan 2018, Sima Usvyatsov wrote: >>>> >>>> Hello, >>>> >>>>> >>>>> I am running a negative binomial model (MASS) on count data collected >>>>> on a >>>>> grid. The dataset is large  ~4,000 points, with many predictors. Being >>>>> counts, there are a lot of zeroes. All data are collected on a grid with >>>>> 20 >>>>> points, with high spatial autocorrelation. >>>>> >>>>> I would like to filter out the spatial autocorrelation. My question is: >>>>> since I have very limited spatial info (only 20 distinct spatial >>>>> locations), is it possible to simplify ME() so that I don't have to run >>>>> it >>>>> on the whole dataset? When I try to run ME() on a 100point subset of >>>>> the >>>>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a >>>>> single >>>>> instance of the grid, I "get away" with a warning ("algorithm did not >>>>> converge"). >>>>> >>>>> Here's a fake dataset. It was grinding for a while but not throwing >>>>> errors >>>>> (like my original data would). Regardless, it demonstrates the repeated >>>>> sampling at the same points and the large number of zeroes. >>>>> >>>>> >>>> The data set has 1000 values in Lon, so is probably bigger than you >>>> intended, and when 100 is used is not autocorrelated. You seem to have a >>>> hierarchical model, with repeated measurements at the locations, so a >>>> multilevel treatment of some kind may be sensible. If you want to stay >>>> with MEbased spatial filtering, maybe look at the literature on spatial >>>> panel (repeated measurements are in time) with ME/SF, and on network >>>> autocorrelation (dyadic relationships with autocorrelation among origins >>>> and/or destinations). Both these cases use Kronecker products on the >>>> selected eigenvectors, I think. >>>> >>>> Alternatively, use a standard GLMM with a grouped iid random effect >>>> and/or >>>> a spatially structured random effect at the 20 location level. If the >>>> groups are repeated observations in time, you should model the whole >>>> (non)separable spacetime process. >>>> >>>> Hope this helps, >>>> >>>> Roger >>>> >>>> >>>> Any advice would be most welcome. >>>>> >>>>> library(spdep) >>>>> library(MASS) >>>>> >>>>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, >>>>> 30, >>>>> 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) >>>>> coordinates(df) < ~Lon + Lat >>>>> proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 >>>>> +no_defs") >>>>> nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) >>>>> dists < nbdists(nb, coordinates(df), longlat=TRUE) >>>>> glist < lapply(dists, function(x) 1/x) >>>>> lw < nb2listw(nb, glist, style="W") >>>>> me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = >>>>> 0.5) >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> RsigGeo mailing list >>>>> [hidden email] >>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>>> >>>>> >>>>>  >>>> Roger Bivand >>>> Department of Economics, Norwegian School of Economics, >>>> Helleveien 30, N5045 Bergen, Norway. >>>> voice: +47 55 95 93 55; email: [hidden email] >>>> EditorinChief of The R Journal, https://journal.rproject.org/ >>>> index.html >>>> http://orcid.org/0000000323926140 >>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>>> >>>> >>> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] >> EditorinChief of The R Journal, https://journal.rproject.org/index.html >> http://orcid.org/0000000323926140 >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >> >  Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
I had no idea this was coming up  can't believe it didn't come up in any
of my searches. Looks super useful, and I'm sure I'll be using it sooner than later, whether on this analysis or not. The worked example is ridiculously useful  so glad to have it. If I'm reading the help page correctly (the autocorrelation bits, struc(termsgroup))  to replicate the spatial autocorrelation within Round, would I add this? exp(site.Easting + site.Northing  RoundStart) So something like this: mod1 < glmmTMB(Count ~ Stratum + SiteInStratum + ...other predictors + # random variable (1  RoundStart) + # autocorrelation exp(site.Easting + site.Northing  RoundStart), family = nbinom2, # or nbinom1  I guess decide based on residuals? correlation = corExp(form=~site.Easting + site.Northing + RoundStart) Does that look like the twin of the glmmPQL specification I had (other than family)? I can also email sigME if it's a better forum for this. On Thu, Jan 11, 2018 at 8:33 AM, Roger Bivand <[hidden email]> wrote: > On Thu, 11 Jan 2018, Sima Usvyatsov wrote: > > What other GLMM options do I have, under the restrictions of count data >> with lots of zeroes and spatial autocorrelation? I was under the >> impression >> that glmmPQL was my only choice? >> > > See for example: > > https://www.fromthebottomoftheheap.net/2017/05/04/compare > mgcvwithglmmTMB/ > > https://rjournal.github.io/archive/2017/RJ2017066/index.html > (forthcoming) > > for surveys and discussions; maybe your current choice is the best bet for > negbin. > > > Roger > > >> On Thu, Jan 11, 2018 at 8:18 AM, Roger Bivand <[hidden email]> >> wrote: >> >> On Thu, 11 Jan 2018, Sima Usvyatsov wrote: >>> >>> Thank you so much for your response. >>> >>>> >>>> Yes, I managed to muck up the fake data in 2 (!) ways  the 1,000 lons >>>> and >>>> the fact that the lon/lats weren't repeated. Here's the correct >>>> structure. >>>> >>>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = >>>> rep(rnorm(20, >>>> 30, 0.1), each = 5), Lon = rep(rnorm(20, 75, 1), each = 5), x = >>>> rnegbin(100, 1, 1), >>>> Stratum = rep(1:5, each = 20)) >>>> >>>> In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed >>>> effects  my fault, of course. >>>> >>>> My current model is set up as follows: >>>> >>>> mod1 < glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors, >>>> random = ~ 1 RoundStart, >>>> family = quasipoisson, >>>> correlation = corExp(form=~site.Easting + site.Northing + RoundStart) >>>> >>>> where RoundStart is the date/time of starting each count. I'm assuming >>>> that >>>> by "using a standard GLMM" you were thinking of MASS's glmmPQL()? >>>> Does this look correct for specifying the full space/time dependence? >>>> The >>>> variogram and pacf look very decent, but one can never have too many >>>> checks... >>>> >>>> >>> I can't judge whether it really makes sense, but I think this is much >>> more >>> robust. I'd explore some other GLMMs too, to see what they contribute. >>> >>> Roger >>> >>> >>> >>> On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <[hidden email]> >>>> wrote: >>>> >>>> On Wed, 10 Jan 2018, Sima Usvyatsov wrote: >>>> >>>>> >>>>> Hello, >>>>> >>>>> >>>>>> I am running a negative binomial model (MASS) on count data collected >>>>>> on a >>>>>> grid. The dataset is large  ~4,000 points, with many predictors. >>>>>> Being >>>>>> counts, there are a lot of zeroes. All data are collected on a grid >>>>>> with >>>>>> 20 >>>>>> points, with high spatial autocorrelation. >>>>>> >>>>>> I would like to filter out the spatial autocorrelation. My question >>>>>> is: >>>>>> since I have very limited spatial info (only 20 distinct spatial >>>>>> locations), is it possible to simplify ME() so that I don't have to >>>>>> run >>>>>> it >>>>>> on the whole dataset? When I try to run ME() on a 100point subset of >>>>>> the >>>>>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a >>>>>> single >>>>>> instance of the grid, I "get away" with a warning ("algorithm did not >>>>>> converge"). >>>>>> >>>>>> Here's a fake dataset. It was grinding for a while but not throwing >>>>>> errors >>>>>> (like my original data would). Regardless, it demonstrates the >>>>>> repeated >>>>>> sampling at the same points and the large number of zeroes. >>>>>> >>>>>> >>>>>> The data set has 1000 values in Lon, so is probably bigger than you >>>>> intended, and when 100 is used is not autocorrelated. You seem to have >>>>> a >>>>> hierarchical model, with repeated measurements at the locations, so a >>>>> multilevel treatment of some kind may be sensible. If you want to stay >>>>> with MEbased spatial filtering, maybe look at the literature on >>>>> spatial >>>>> panel (repeated measurements are in time) with ME/SF, and on network >>>>> autocorrelation (dyadic relationships with autocorrelation among >>>>> origins >>>>> and/or destinations). Both these cases use Kronecker products on the >>>>> selected eigenvectors, I think. >>>>> >>>>> Alternatively, use a standard GLMM with a grouped iid random effect >>>>> and/or >>>>> a spatially structured random effect at the 20 location level. If the >>>>> groups are repeated observations in time, you should model the whole >>>>> (non)separable spacetime process. >>>>> >>>>> Hope this helps, >>>>> >>>>> Roger >>>>> >>>>> >>>>> Any advice would be most welcome. >>>>> >>>>>> >>>>>> library(spdep) >>>>>> library(MASS) >>>>>> >>>>>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = >>>>>> rnorm(100, >>>>>> 30, >>>>>> 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) >>>>>> coordinates(df) < ~Lon + Lat >>>>>> proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 >>>>>> +no_defs") >>>>>> nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) >>>>>> dists < nbdists(nb, coordinates(df), longlat=TRUE) >>>>>> glist < lapply(dists, function(x) 1/x) >>>>>> lw < nb2listw(nb, glist, >>>>>> <https://maps.google.com/?q=tw(nb,+glist,&entry=gmail&source=g> >>>>>> style="W") >>>>>> me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha >>>>>> = >>>>>> 0.5) >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> RsigGeo mailing list >>>>>> [hidden email] >>>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>>>> >>>>>> >>>>>>  >>>>>> >>>>> Roger Bivand >>>>> Department of Economics, Norwegian School of Economics, >>>>> Helleveien 30, N5045 Bergen, Norway. >>>>> voice: +47 55 95 93 55; email: [hidden email] >>>>> EditorinChief of The R Journal, https://journal.rproject.org/ >>>>> index.html >>>>> http://orcid.org/0000000323926140 >>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>>>> >>>>> >>>>> >>>>  >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, Norway. >>> voice: +47 55 95 93 55; email: [hidden email] >>> EditorinChief of The R Journal, https://journal.rproject.org/ >>> index.html >>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >>> >> >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Thu, 11 Jan 2018, Sima Usvyatsov wrote:
> I had no idea this was coming up  can't believe it didn't come up in any > of my searches. Looks super useful, and I'm sure I'll be using it sooner > than later, whether on this analysis or not. The worked example is > ridiculously useful  so glad to have it. > > If I'm reading the help page correctly (the autocorrelation bits, > struc(termsgroup))  to replicate the spatial autocorrelation within > Round, would I add this? Not sure  please report back if you find an equivalent to the glmmPQL approach. Roger > > exp(site.Easting + site.Northing  RoundStart) > > So something like this: > > mod1 < glmmTMB(Count ~ Stratum + SiteInStratum + ...other predictors + > # random variable > (1  RoundStart) + > # autocorrelation > exp(site.Easting + site.Northing  RoundStart), > > family = nbinom2, # or nbinom1  I guess decide based on residuals? > correlation = corExp(form=~site.Easting + site.Northing + RoundStart) > > Does that look like the twin of the glmmPQL specification I had (other than > family)? I can also email sigME if it's a better forum for this. > > On Thu, Jan 11, 2018 at 8:33 AM, Roger Bivand <[hidden email]> wrote: > >> On Thu, 11 Jan 2018, Sima Usvyatsov wrote: >> >> What other GLMM options do I have, under the restrictions of count data >>> with lots of zeroes and spatial autocorrelation? I was under the >>> impression >>> that glmmPQL was my only choice? >>> >> >> See for example: >> >> https://www.fromthebottomoftheheap.net/2017/05/04/compare >> mgcvwithglmmTMB/ >> >> https://rjournal.github.io/archive/2017/RJ2017066/index.html >> (forthcoming) >> >> for surveys and discussions; maybe your current choice is the best bet for >> negbin. >> >> >> Roger >> >> >>> On Thu, Jan 11, 2018 at 8:18 AM, Roger Bivand <[hidden email]> >>> wrote: >>> >>> On Thu, 11 Jan 2018, Sima Usvyatsov wrote: >>>> >>>> Thank you so much for your response. >>>> >>>>> >>>>> Yes, I managed to muck up the fake data in 2 (!) ways  the 1,000 lons >>>>> and >>>>> the fact that the lon/lats weren't repeated. Here's the correct >>>>> structure. >>>>> >>>>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = >>>>> rep(rnorm(20, >>>>> 30, 0.1), each = 5), Lon = rep(rnorm(20, 75, 1), each = 5), x = >>>>> rnegbin(100, 1, 1), >>>>> Stratum = rep(1:5, each = 20)) >>>>> >>>>> In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed >>>>> effects  my fault, of course. >>>>> >>>>> My current model is set up as follows: >>>>> >>>>> mod1 < glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors, >>>>> random = ~ 1 RoundStart, >>>>> family = quasipoisson, >>>>> correlation = corExp(form=~site.Easting + site.Northing + RoundStart) >>>>> >>>>> where RoundStart is the date/time of starting each count. I'm assuming >>>>> that >>>>> by "using a standard GLMM" you were thinking of MASS's glmmPQL()? >>>>> Does this look correct for specifying the full space/time dependence? >>>>> The >>>>> variogram and pacf look very decent, but one can never have too many >>>>> checks... >>>>> >>>>> >>>> I can't judge whether it really makes sense, but I think this is much >>>> more >>>> robust. I'd explore some other GLMMs too, to see what they contribute. >>>> >>>> Roger >>>> >>>> >>>> >>>> On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <[hidden email]> >>>>> wrote: >>>>> >>>>> On Wed, 10 Jan 2018, Sima Usvyatsov wrote: >>>>> >>>>>> >>>>>> Hello, >>>>>> >>>>>> >>>>>>> I am running a negative binomial model (MASS) on count data collected >>>>>>> on a >>>>>>> grid. The dataset is large  ~4,000 points, with many predictors. >>>>>>> Being >>>>>>> counts, there are a lot of zeroes. All data are collected on a grid >>>>>>> with >>>>>>> 20 >>>>>>> points, with high spatial autocorrelation. >>>>>>> >>>>>>> I would like to filter out the spatial autocorrelation. My question >>>>>>> is: >>>>>>> since I have very limited spatial info (only 20 distinct spatial >>>>>>> locations), is it possible to simplify ME() so that I don't have to >>>>>>> run >>>>>>> it >>>>>>> on the whole dataset? When I try to run ME() on a 100point subset of >>>>>>> the >>>>>>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a >>>>>>> single >>>>>>> instance of the grid, I "get away" with a warning ("algorithm did not >>>>>>> converge"). >>>>>>> >>>>>>> Here's a fake dataset. It was grinding for a while but not throwing >>>>>>> errors >>>>>>> (like my original data would). Regardless, it demonstrates the >>>>>>> repeated >>>>>>> sampling at the same points and the large number of zeroes. >>>>>>> >>>>>>> >>>>>>> The data set has 1000 values in Lon, so is probably bigger than you >>>>>> intended, and when 100 is used is not autocorrelated. You seem to have >>>>>> a >>>>>> hierarchical model, with repeated measurements at the locations, so a >>>>>> multilevel treatment of some kind may be sensible. If you want to stay >>>>>> with MEbased spatial filtering, maybe look at the literature on >>>>>> spatial >>>>>> panel (repeated measurements are in time) with ME/SF, and on network >>>>>> autocorrelation (dyadic relationships with autocorrelation among >>>>>> origins >>>>>> and/or destinations). Both these cases use Kronecker products on the >>>>>> selected eigenvectors, I think. >>>>>> >>>>>> Alternatively, use a standard GLMM with a grouped iid random effect >>>>>> and/or >>>>>> a spatially structured random effect at the 20 location level. If the >>>>>> groups are repeated observations in time, you should model the whole >>>>>> (non)separable spacetime process. >>>>>> >>>>>> Hope this helps, >>>>>> >>>>>> Roger >>>>>> >>>>>> >>>>>> Any advice would be most welcome. >>>>>> >>>>>>> >>>>>>> library(spdep) >>>>>>> library(MASS) >>>>>>> >>>>>>> df < data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = >>>>>>> rnorm(100, >>>>>>> 30, >>>>>>> 0.1), Lon = rnorm(1000, 75, 1), x = rnegbin(100, 1, 1)) >>>>>>> coordinates(df) < ~Lon + Lat >>>>>>> proj4string(df) < CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 >>>>>>> +no_defs") >>>>>>> nb < dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE) >>>>>>> dists < nbdists(nb, coordinates(df), longlat=TRUE) >>>>>>> glist < lapply(dists, function(x) 1/x) >>>>>>> lw < nb2listw(nb, glist, >>>>>>> <https://maps.google.com/?q=tw(nb,+glist,&entry=gmail&source=g> >>>>>>> style="W") >>>>>>> me < ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha >>>>>>> = >>>>>>> 0.5) >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> RsigGeo mailing list >>>>>>> [hidden email] >>>>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>>>>> >>>>>>> >>>>>>>  >>>>>>> >>>>>> Roger Bivand >>>>>> Department of Economics, Norwegian School of Economics, >>>>>> Helleveien 30, N5045 Bergen, Norway. >>>>>> voice: +47 55 95 93 55; email: [hidden email] >>>>>> EditorinChief of The R Journal, https://journal.rproject.org/ >>>>>> index.html >>>>>> http://orcid.org/0000000323926140 >>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>>>>> >>>>>> >>>>>> >>>>>  >>>> Roger Bivand >>>> Department of Economics, Norwegian School of Economics, >>>> Helleveien 30, N5045 Bergen, Norway. >>>> voice: +47 55 95 93 55; email: [hidden email] >>>> EditorinChief of The R Journal, https://journal.rproject.org/ >>>> index.html >>>> http://orcid.org/0000000323926140 >>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>>> >>>> >>> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] >> EditorinChief of The R Journal, https://journal.rproject.org/index.html >> http://orcid.org/0000000323926140 >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >> >  Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Free forum by Nabble  Edit this page 