Moran's I with Objects of Different Length

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Moran's I with Objects of Different Length

cchiseni
I am trying to perform moran's I test on residuals from by probit model
using k-nearest neighbour weights, however i run into an error which i
cant seem to find the solution anywhere online the error is

> moran.test(residuals.glm(svyprobitest),knear2weight)
Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
  objects of different length


The codes I use to come up to this are below

> library(foreign)
> dhsanalysis2= read.dta("DHSSpatialreg11.dta")
##removing NAs in PSU
> dhsanalysis3=subset(dhsanalysis2, !is.na(psu))
##generating a survey design
> mydesighdhs= svydesign(ids =~psu, data =dhsanalysis3, weight = ~wgt,
strata = ~v023,nest=TRUE)
##Defining coordinate colums
> coordinates(dhsanalysis3)= c("longnum","latnum")
#Defining Projection
> proj4string(dhsanalysis3) <- CRS("+init=epsg:4326")
##binding longiude and latitude
> lon2<- dhsanalysis3$longnum
> lat2<- dhsanalysis3$latnum
> coords<- cbind(lon2,lat2)
##Creating spatial weights based on the nearest neighbour
> knear2= knearneigh(coords,k=2,longlat=T)
Warning message:
In knearneigh(coords, k = 2, longlat = T) :
  knearneigh: identical points found ### I get a warning message on
identical points found, is this a problem and how would i deal with this
> knear2.nb= knn2nb(knear2)
> knear2weight= nb2listw(knear2.nb, style="W",zero.policy=T)

##declaring categorical variables as factor variables

> dhsanalysis3$hivpositive.f <- factor(dhsanalysis3$hivpositive)
> dhsanalysis3$protestant.f <- factor(dhsanalysis3$protestant2)
> dhsanalysis3$Married.f <- factor(dhsanalysis3$Married)
> dhsanalysis3$female.f <- factor(dhsanalysis3$female)
> dhsanalysis3$urban1.f <- factor(dhsanalysis3$urban1)
> dhsanalysis3$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
> dhsanalysis3$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
> dhsanalysis3$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
> dhsanalysis3$Province1.f <- factor(dhsanalysis3$Province1)
> dhsanalysis3$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
> dhsanalysis3$occupation2.f <- factor(dhsanalysis3$occupation2)
> dhsanalysis3$highested.f <- factor(dhsanalysis3$highested)
> protestant.f= dhsanalysis3$protestant.f
> hivpositive.f=dhsanalysi3$hivpositive.f
> Married.f=dhsanalysis3$Married.f
> female.f=dhsanalysis3$female.f
> urban1.f=dhsanalysis3$urban1.f
> river10kmdum.f=dhsanalysis3$river10kmdum.f
> Explorer50kmdum.f=dhsanalysis2$Explorer50kmdum.f
> Province1.f=dhsanalysis3$Province1.f
> WealthIndex.f=dhsanalysis3$WealthIndex.f
> occupation2.f=dhsanalysis3$occupation2.f
> highested.f=dhsanalysis3$highested.f
> Age=dhsanalysis3$Age
> Age2=dhsanalysis3$Age2
> HIVKnowledge=dhsanalysis3$HIVKnowledge
> churchkm=dhsanalysis3$churchkm
> lnhospitalkm=dhsanalysis3$lnhospitalkm
> lnElevationMean=dhsanalysis3$lnElevationMean

> ###VARIABLES IN SURVEY DESIGN
> mydesighdhs$hivpositive.f <- factor(dhsanalysis3$hivpositive)
> mydesighdhs$protestant.f <- factor(dhsanalysis3$protestant2)
> mydesighdhs$Married.f <- factor(dhsanalysis3$Married)
> mydesighdhs$female.f <- factor(dhsanalysis3$female)
> mydesighdhs$urban1.f <- factor(dhsanalysis3$urban1)
> mydesighdhs$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
> mydesighdhs$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
> mydesighdhs$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
> mydesighdhs$Province1.f <- factor(dhsanalysis3$Province1)
> mydesighdhs$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
> mydesighdhs$occupation2.f <- factor(dhsanalysis3$occupation2)
> mydesighdhs$highested.f <- factor(dhsanalysis3$highested)
> mydesighdhs$Age=Age
> mydesighdhs$Age2=Age2
> mydesighdhs$HIVKnowledge=HIVKnowledge
> mydesighdhs$churchkm=churchkm
> mydesighdhs$lnhospitalkm=lnhospitalkm
> mydesighdhs$lnElevationMean=lnElevationMean

> svyprobitest= svyglm(hivpositive.f~ churchkm +lnhospitalkm +protestant.f+
Age+
Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
+ WealthIndex.f + HIVKnowledge+ occupation2.f+
highested.f,design=mydesighdhs,family = quasibinomial(link =
"probit"),data=dhsanalysis3)
Error in .subset2(x, i, exact = exact) : subscript out of bounds

## Iran my probit model without implementing the survey design in the model
just to see whether Moran test is working
> svyprobitest2= glm(hivpositive.f~ churchkm +lnhospitalkm +protestant.f+
Age+
Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
+ WealthIndex.f + HIVKnowledge+ occupation2.f+ highested.f,family =
quasibinomial(link = "probit"),data=dhsanalysis3)
#Moran test
> moran.test(residuals.glm(svyprobitest),knear2weight)
Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
  objects of different length


Kind Regards,

Michael Chanda Chiseni

Phd Candidate

Department of Economic History

Lund University

Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund



*Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Moran's I with Objects of Different Length

Roger Bivand
Administrator
On Tue, 11 Aug 2020, Chanda Chiseni wrote:

> I am trying to perform moran's I test on residuals from by probit model
> using k-nearest neighbour weights, however i run into an error which i
> cant seem to find the solution anywhere online the error is
>
>> moran.test(residuals.glm(svyprobitest),knear2weight)
> Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
>  objects of different length
>
>
> The codes I use to come up to this are below
>
>> library(foreign)
>> dhsanalysis2= read.dta("DHSSpatialreg11.dta")
> ##removing NAs in PSU
>> dhsanalysis3=subset(dhsanalysis2, !is.na(psu))
> ##generating a survey design
>> mydesighdhs= svydesign(ids =~psu, data =dhsanalysis3, weight = ~wgt,
> strata = ~v023,nest=TRUE)
> ##Defining coordinate colums
>> coordinates(dhsanalysis3)= c("longnum","latnum")
> #Defining Projection
>> proj4string(dhsanalysis3) <- CRS("+init=epsg:4326")
> ##binding longiude and latitude
>> lon2<- dhsanalysis3$longnum
>> lat2<- dhsanalysis3$latnum
>> coords<- cbind(lon2,lat2)
> ##Creating spatial weights based on the nearest neighbour
>> knear2= knearneigh(coords,k=2,longlat=T)
> Warning message:
> In knearneigh(coords, k = 2, longlat = T) :
>  knearneigh: identical points found ### I get a warning message on
> identical points found, is this a problem and how would i deal with this
It usually indicates muddled thinking. You have more than one observation
at the same point, which could be an unintended duplicate (copy of the
same observation), or it could indicate that any spatial process has
multiple values at that point. In geostatistics, one would jitter the
points to introduce distance. In your case, this probably isn't households
living at one address point. All the observations at the same point will
get the same sets of neighbours. If there are many co-located
observations, k in k-nearest may be too small to include them all.

>> knear2.nb= knn2nb(knear2)
>> knear2weight= nb2listw(knear2.nb, style="W",zero.policy=T)
>
> ##declaring categorical variables as factor variables
>> dhsanalysis3$hivpositive.f <- factor(dhsanalysis3$hivpositive)
>> dhsanalysis3$protestant.f <- factor(dhsanalysis3$protestant2)
>> dhsanalysis3$Married.f <- factor(dhsanalysis3$Married)
>> dhsanalysis3$female.f <- factor(dhsanalysis3$female)
>> dhsanalysis3$urban1.f <- factor(dhsanalysis3$urban1)
>> dhsanalysis3$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
>> dhsanalysis3$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
>> dhsanalysis3$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
>> dhsanalysis3$Province1.f <- factor(dhsanalysis3$Province1)
>> dhsanalysis3$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
>> dhsanalysis3$occupation2.f <- factor(dhsanalysis3$occupation2)
>> dhsanalysis3$highested.f <- factor(dhsanalysis3$highested)
>> protestant.f= dhsanalysis3$protestant.f
>> hivpositive.f=dhsanalysi3$hivpositive.f
>> Married.f=dhsanalysis3$Married.f
>> female.f=dhsanalysis3$female.f
>> urban1.f=dhsanalysis3$urban1.f
>> river10kmdum.f=dhsanalysis3$river10kmdum.f
>> Explorer50kmdum.f=dhsanalysis2$Explorer50kmdum.f
>> Province1.f=dhsanalysis3$Province1.f
>> WealthIndex.f=dhsanalysis3$WealthIndex.f
>> occupation2.f=dhsanalysis3$occupation2.f
>> highested.f=dhsanalysis3$highested.f
>> Age=dhsanalysis3$Age
>> Age2=dhsanalysis3$Age2
>> HIVKnowledge=dhsanalysis3$HIVKnowledge
>> churchkm=dhsanalysis3$churchkm
>> lnhospitalkm=dhsanalysis3$lnhospitalkm
>> lnElevationMean=dhsanalysis3$lnElevationMean
>
>> ###VARIABLES IN SURVEY DESIGN
>> mydesighdhs$hivpositive.f <- factor(dhsanalysis3$hivpositive)
>> mydesighdhs$protestant.f <- factor(dhsanalysis3$protestant2)
>> mydesighdhs$Married.f <- factor(dhsanalysis3$Married)
>> mydesighdhs$female.f <- factor(dhsanalysis3$female)
>> mydesighdhs$urban1.f <- factor(dhsanalysis3$urban1)
>> mydesighdhs$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
>> mydesighdhs$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
>> mydesighdhs$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
>> mydesighdhs$Province1.f <- factor(dhsanalysis3$Province1)
>> mydesighdhs$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
>> mydesighdhs$occupation2.f <- factor(dhsanalysis3$occupation2)
>> mydesighdhs$highested.f <- factor(dhsanalysis3$highested)
>> mydesighdhs$Age=Age
>> mydesighdhs$Age2=Age2
>> mydesighdhs$HIVKnowledge=HIVKnowledge
>> mydesighdhs$churchkm=churchkm
>> mydesighdhs$lnhospitalkm=lnhospitalkm
>> mydesighdhs$lnElevationMean=lnElevationMean
>
>> svyprobitest= svyglm(hivpositive.f~ churchkm +lnhospitalkm +protestant.f+
> Age+
> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
> + WealthIndex.f + HIVKnowledge+ occupation2.f+
> highested.f,design=mydesighdhs,family = quasibinomial(link =
> "probit"),data=dhsanalysis3)
> Error in .subset2(x, i, exact = exact) : subscript out of bounds
So you've an error anyway.

>
> ## Iran my probit model without implementing the survey design in the model
> just to see whether Moran test is working
>> svyprobitest2= glm(hivpositive.f~ churchkm +lnhospitalkm +protestant.f+
> Age+
> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
> + WealthIndex.f + HIVKnowledge+ occupation2.f+ highested.f,family =
> quasibinomial(link = "probit"),data=dhsanalysis3)
> #Moran test
>> moran.test(residuals.glm(svyprobitest),knear2weight)
> Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
>  objects of different length
>
Testing residuals using moran.test() is usually wrong, as the expectation
and variance of the statistic are based on a null model (intercept only),
not a model with covariates.

What are length(residuals.glm(svyprobitest)) and length(knear2.nb)? Did
glm drop observations with missing values? If so, you should subset both
the data submitted to glm() and the neighbour object so that they match.

Hope this helps,

Roger

>
> Kind Regards,
>
> Michael Chanda Chiseni
>
> Phd Candidate
>
> Department of Economic History
>
> Lund University
>
> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>
>
>
> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Moran's I with Objects of Different Length

cchiseni
Hi Roger

Thank you for your response.

I am using DHS individual level data , they cluster individuals into about
25-30 people and assign them coordinates depending on where they were
surveyed. So about 30 individuals have similar coordinates. Yes I do have
more than one observation at one point as you have stated, this could be
the source of my problem indeed, My question is how could i go about
creating nearest neighbors and weights by clustering individuals that share
the same point? I sought for a solution online but could not find any.
Sorry I am a novice at spatial econometrics, this may be obvious to some ,
so instead of testing for spatial correlation in my residuals, I should in
essence test for spatial autocorrelation in my outcome variable, for
instance in my analysis I am using a dummy variable of whether an
individual is hiv positive or negative. And is there a specific way of
using a moran.test() on a dummy variable?


Kind Regards,

Michael Chanda Chiseni

Phd Candidate

Department of Economic History

Lund University

Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund



*Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *






On Wed, Aug 12, 2020 at 12:55 PM Roger Bivand <[hidden email]> wrote:

> On Tue, 11 Aug 2020, Chanda Chiseni wrote:
>
> > I am trying to perform moran's I test on residuals from by probit model
> > using k-nearest neighbour weights, however i run into an error which i
> > cant seem to find the solution anywhere online the error is
> >
> >> moran.test(residuals.glm(svyprobitest),knear2weight)
> > Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
> >  objects of different length
> >
> >
> > The codes I use to come up to this are below
> >
> >> library(foreign)
> >> dhsanalysis2= read.dta("DHSSpatialreg11.dta")
> > ##removing NAs in PSU
> >> dhsanalysis3=subset(dhsanalysis2, !is.na(psu))
> > ##generating a survey design
> >> mydesighdhs= svydesign(ids =~psu, data =dhsanalysis3, weight = ~wgt,
> > strata = ~v023,nest=TRUE)
> > ##Defining coordinate colums
> >> coordinates(dhsanalysis3)= c("longnum","latnum")
> > #Defining Projection
> >> proj4string(dhsanalysis3) <- CRS("+init=epsg:4326")
> > ##binding longiude and latitude
> >> lon2<- dhsanalysis3$longnum
> >> lat2<- dhsanalysis3$latnum
> >> coords<- cbind(lon2,lat2)
> > ##Creating spatial weights based on the nearest neighbour
> >> knear2= knearneigh(coords,k=2,longlat=T)
> > Warning message:
> > In knearneigh(coords, k = 2, longlat = T) :
> >  knearneigh: identical points found ### I get a warning message on
> > identical points found, is this a problem and how would i deal with this
>
> It usually indicates muddled thinking. You have more than one observation
> at the same point, which could be an unintended duplicate (copy of the
> same observation), or it could indicate that any spatial process has
> multiple values at that point. In geostatistics, one would jitter the
> points to introduce distance. In your case, this probably isn't households
> living at one address point. All the observations at the same point will
> get the same sets of neighbours. If there are many co-located
> observations, k in k-nearest may be too small to include them all.
>
> >> knear2.nb= knn2nb(knear2)
> >> knear2weight= nb2listw(knear2.nb, style="W",zero.policy=T)
> >
> > ##declaring categorical variables as factor variables
> >> dhsanalysis3$hivpositive.f <- factor(dhsanalysis3$hivpositive)
> >> dhsanalysis3$protestant.f <- factor(dhsanalysis3$protestant2)
> >> dhsanalysis3$Married.f <- factor(dhsanalysis3$Married)
> >> dhsanalysis3$female.f <- factor(dhsanalysis3$female)
> >> dhsanalysis3$urban1.f <- factor(dhsanalysis3$urban1)
> >> dhsanalysis3$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
> >> dhsanalysis3$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
> >> dhsanalysis3$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
> >> dhsanalysis3$Province1.f <- factor(dhsanalysis3$Province1)
> >> dhsanalysis3$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
> >> dhsanalysis3$occupation2.f <- factor(dhsanalysis3$occupation2)
> >> dhsanalysis3$highested.f <- factor(dhsanalysis3$highested)
> >> protestant.f= dhsanalysis3$protestant.f
> >> hivpositive.f=dhsanalysi3$hivpositive.f
> >> Married.f=dhsanalysis3$Married.f
> >> female.f=dhsanalysis3$female.f
> >> urban1.f=dhsanalysis3$urban1.f
> >> river10kmdum.f=dhsanalysis3$river10kmdum.f
> >> Explorer50kmdum.f=dhsanalysis2$Explorer50kmdum.f
> >> Province1.f=dhsanalysis3$Province1.f
> >> WealthIndex.f=dhsanalysis3$WealthIndex.f
> >> occupation2.f=dhsanalysis3$occupation2.f
> >> highested.f=dhsanalysis3$highested.f
> >> Age=dhsanalysis3$Age
> >> Age2=dhsanalysis3$Age2
> >> HIVKnowledge=dhsanalysis3$HIVKnowledge
> >> churchkm=dhsanalysis3$churchkm
> >> lnhospitalkm=dhsanalysis3$lnhospitalkm
> >> lnElevationMean=dhsanalysis3$lnElevationMean
> >
> >> ###VARIABLES IN SURVEY DESIGN
> >> mydesighdhs$hivpositive.f <- factor(dhsanalysis3$hivpositive)
> >> mydesighdhs$protestant.f <- factor(dhsanalysis3$protestant2)
> >> mydesighdhs$Married.f <- factor(dhsanalysis3$Married)
> >> mydesighdhs$female.f <- factor(dhsanalysis3$female)
> >> mydesighdhs$urban1.f <- factor(dhsanalysis3$urban1)
> >> mydesighdhs$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
> >> mydesighdhs$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
> >> mydesighdhs$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
> >> mydesighdhs$Province1.f <- factor(dhsanalysis3$Province1)
> >> mydesighdhs$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
> >> mydesighdhs$occupation2.f <- factor(dhsanalysis3$occupation2)
> >> mydesighdhs$highested.f <- factor(dhsanalysis3$highested)
> >> mydesighdhs$Age=Age
> >> mydesighdhs$Age2=Age2
> >> mydesighdhs$HIVKnowledge=HIVKnowledge
> >> mydesighdhs$churchkm=churchkm
> >> mydesighdhs$lnhospitalkm=lnhospitalkm
> >> mydesighdhs$lnElevationMean=lnElevationMean
> >
> >> svyprobitest= svyglm(hivpositive.f~ churchkm +lnhospitalkm
> +protestant.f+
> > Age+
> >
> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
> > + WealthIndex.f + HIVKnowledge+ occupation2.f+
> > highested.f,design=mydesighdhs,family = quasibinomial(link =
> > "probit"),data=dhsanalysis3)
> > Error in .subset2(x, i, exact = exact) : subscript out of bounds
>
> So you've an error anyway.
>
> >
> > ## Iran my probit model without implementing the survey design in the
> model
> > just to see whether Moran test is working
> >> svyprobitest2= glm(hivpositive.f~ churchkm +lnhospitalkm +protestant.f+
> > Age+
> >
> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
> > + WealthIndex.f + HIVKnowledge+ occupation2.f+ highested.f,family =
> > quasibinomial(link = "probit"),data=dhsanalysis3)
> > #Moran test
> >> moran.test(residuals.glm(svyprobitest),knear2weight)
> > Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
> >  objects of different length
> >
>
> Testing residuals using moran.test() is usually wrong, as the expectation
> and variance of the statistic are based on a null model (intercept only),
> not a model with covariates.
>
> What are length(residuals.glm(svyprobitest)) and length(knear2.nb)? Did
> glm drop observations with missing values? If so, you should subset both
> the data submitted to glm() and the neighbour object so that they match.
>
> Hope this helps,
>
> Roger
>
> >
> > Kind Regards,
> >
> > Michael Chanda Chiseni
> >
> > Phd Candidate
> >
> > Department of Economic History
> >
> > Lund University
> >
> > Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
> >
> >
> >
> > *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Moran's I with Objects of Different Length

Roger Bivand
Administrator
On Wed, 12 Aug 2020, Chanda Chiseni wrote:

> Hi Roger
>
> Thank you for your response.
>
> I am using DHS individual level data , they cluster individuals into about
> 25-30 people and assign them coordinates depending on where they were
> surveyed. So about 30 individuals have similar coordinates. Yes I do have
> more than one observation at one point as you have stated, this could be
> the source of my problem indeed, My question is how could i go about
> creating nearest neighbors and weights by clustering individuals that share
> the same point? I sought for a solution online but could not find any.
Searching online is seldom informative, there is a poor signal-to-noise
ratio.

See for example the HSAR package or other multi-level approaches, review
in https://doi.org/10.1016/j.spasta.2017.01.002 and in references in HSAR.
You have a lower layer of observations in an upper layer of known survey
locations, so a multi-level approach respects that structure, I think.
Probably an upper level MRF in mgcv::gam() or try the hgml package, or
INLA/BayesX for Bayesian approaches.

In these you fit a model with spatially structured random effects at the
upper level (the weights probably need to be symmetric), which then show
whether including this term improves the fit.

Hope this helps,

Roger

> Sorry I am a novice at spatial econometrics, this may be obvious to some ,
> so instead of testing for spatial correlation in my residuals, I should in
> essence test for spatial autocorrelation in my outcome variable, for
> instance in my analysis I am using a dummy variable of whether an
> individual is hiv positive or negative. And is there a specific way of
> using a moran.test() on a dummy variable?
>
>
> Kind Regards,
>
> Michael Chanda Chiseni
>
> Phd Candidate
>
> Department of Economic History
>
> Lund University
>
> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>
>
>
> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>
>
>
>
>
>
> On Wed, Aug 12, 2020 at 12:55 PM Roger Bivand <[hidden email]> wrote:
>
>> On Tue, 11 Aug 2020, Chanda Chiseni wrote:
>>
>>> I am trying to perform moran's I test on residuals from by probit model
>>> using k-nearest neighbour weights, however i run into an error which i
>>> cant seem to find the solution anywhere online the error is
>>>
>>>> moran.test(residuals.glm(svyprobitest),knear2weight)
>>> Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
>>>  objects of different length
>>>
>>>
>>> The codes I use to come up to this are below
>>>
>>>> library(foreign)
>>>> dhsanalysis2= read.dta("DHSSpatialreg11.dta")
>>> ##removing NAs in PSU
>>>> dhsanalysis3=subset(dhsanalysis2, !is.na(psu))
>>> ##generating a survey design
>>>> mydesighdhs= svydesign(ids =~psu, data =dhsanalysis3, weight = ~wgt,
>>> strata = ~v023,nest=TRUE)
>>> ##Defining coordinate colums
>>>> coordinates(dhsanalysis3)= c("longnum","latnum")
>>> #Defining Projection
>>>> proj4string(dhsanalysis3) <- CRS("+init=epsg:4326")
>>> ##binding longiude and latitude
>>>> lon2<- dhsanalysis3$longnum
>>>> lat2<- dhsanalysis3$latnum
>>>> coords<- cbind(lon2,lat2)
>>> ##Creating spatial weights based on the nearest neighbour
>>>> knear2= knearneigh(coords,k=2,longlat=T)
>>> Warning message:
>>> In knearneigh(coords, k = 2, longlat = T) :
>>>  knearneigh: identical points found ### I get a warning message on
>>> identical points found, is this a problem and how would i deal with this
>>
>> It usually indicates muddled thinking. You have more than one observation
>> at the same point, which could be an unintended duplicate (copy of the
>> same observation), or it could indicate that any spatial process has
>> multiple values at that point. In geostatistics, one would jitter the
>> points to introduce distance. In your case, this probably isn't households
>> living at one address point. All the observations at the same point will
>> get the same sets of neighbours. If there are many co-located
>> observations, k in k-nearest may be too small to include them all.
>>
>>>> knear2.nb= knn2nb(knear2)
>>>> knear2weight= nb2listw(knear2.nb, style="W",zero.policy=T)
>>>
>>> ##declaring categorical variables as factor variables
>>>> dhsanalysis3$hivpositive.f <- factor(dhsanalysis3$hivpositive)
>>>> dhsanalysis3$protestant.f <- factor(dhsanalysis3$protestant2)
>>>> dhsanalysis3$Married.f <- factor(dhsanalysis3$Married)
>>>> dhsanalysis3$female.f <- factor(dhsanalysis3$female)
>>>> dhsanalysis3$urban1.f <- factor(dhsanalysis3$urban1)
>>>> dhsanalysis3$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
>>>> dhsanalysis3$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
>>>> dhsanalysis3$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
>>>> dhsanalysis3$Province1.f <- factor(dhsanalysis3$Province1)
>>>> dhsanalysis3$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
>>>> dhsanalysis3$occupation2.f <- factor(dhsanalysis3$occupation2)
>>>> dhsanalysis3$highested.f <- factor(dhsanalysis3$highested)
>>>> protestant.f= dhsanalysis3$protestant.f
>>>> hivpositive.f=dhsanalysi3$hivpositive.f
>>>> Married.f=dhsanalysis3$Married.f
>>>> female.f=dhsanalysis3$female.f
>>>> urban1.f=dhsanalysis3$urban1.f
>>>> river10kmdum.f=dhsanalysis3$river10kmdum.f
>>>> Explorer50kmdum.f=dhsanalysis2$Explorer50kmdum.f
>>>> Province1.f=dhsanalysis3$Province1.f
>>>> WealthIndex.f=dhsanalysis3$WealthIndex.f
>>>> occupation2.f=dhsanalysis3$occupation2.f
>>>> highested.f=dhsanalysis3$highested.f
>>>> Age=dhsanalysis3$Age
>>>> Age2=dhsanalysis3$Age2
>>>> HIVKnowledge=dhsanalysis3$HIVKnowledge
>>>> churchkm=dhsanalysis3$churchkm
>>>> lnhospitalkm=dhsanalysis3$lnhospitalkm
>>>> lnElevationMean=dhsanalysis3$lnElevationMean
>>>
>>>> ###VARIABLES IN SURVEY DESIGN
>>>> mydesighdhs$hivpositive.f <- factor(dhsanalysis3$hivpositive)
>>>> mydesighdhs$protestant.f <- factor(dhsanalysis3$protestant2)
>>>> mydesighdhs$Married.f <- factor(dhsanalysis3$Married)
>>>> mydesighdhs$female.f <- factor(dhsanalysis3$female)
>>>> mydesighdhs$urban1.f <- factor(dhsanalysis3$urban1)
>>>> mydesighdhs$river10kmdum.f <- factor(dhsanalysis3$river10kmdum)
>>>> mydesighdhs$Explorer50kmdum.f <- factor(dhsanalysis3$Explorer50kmdum)
>>>> mydesighdhs$Rail50kmdum.f <- factor(dhsanalysis3$Rail50kmdum)
>>>> mydesighdhs$Province1.f <- factor(dhsanalysis3$Province1)
>>>> mydesighdhs$WealthIndex.f <- factor(dhsanalysis3$WealthIndex)
>>>> mydesighdhs$occupation2.f <- factor(dhsanalysis3$occupation2)
>>>> mydesighdhs$highested.f <- factor(dhsanalysis3$highested)
>>>> mydesighdhs$Age=Age
>>>> mydesighdhs$Age2=Age2
>>>> mydesighdhs$HIVKnowledge=HIVKnowledge
>>>> mydesighdhs$churchkm=churchkm
>>>> mydesighdhs$lnhospitalkm=lnhospitalkm
>>>> mydesighdhs$lnElevationMean=lnElevationMean
>>>
>>>> svyprobitest= svyglm(hivpositive.f~ churchkm +lnhospitalkm
>> +protestant.f+
>>> Age+
>>>
>> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
>>> + WealthIndex.f + HIVKnowledge+ occupation2.f+
>>> highested.f,design=mydesighdhs,family = quasibinomial(link =
>>> "probit"),data=dhsanalysis3)
>>> Error in .subset2(x, i, exact = exact) : subscript out of bounds
>>
>> So you've an error anyway.
>>
>>>
>>> ## Iran my probit model without implementing the survey design in the
>> model
>>> just to see whether Moran test is working
>>>> svyprobitest2= glm(hivpositive.f~ churchkm +lnhospitalkm +protestant.f+
>>> Age+
>>>
>> Age2+Married.f+female.f+urban1.f+river10kmdum.f+Explorer50kmdum.f+Rail50kmdum.f+lnElevationMean+Province1.f
>>> + WealthIndex.f + HIVKnowledge+ occupation2.f+ highested.f,family =
>>> quasibinomial(link = "probit"),data=dhsanalysis3)
>>> #Moran test
>>>> moran.test(residuals.glm(svyprobitest),knear2weight)
>>> Error in moran.test(residuals.glm(svyprobitest), knear2weight) :
>>>  objects of different length
>>>
>>
>> Testing residuals using moran.test() is usually wrong, as the expectation
>> and variance of the statistic are based on a null model (intercept only),
>> not a model with covariates.
>>
>> What are length(residuals.glm(svyprobitest)) and length(knear2.nb)? Did
>> glm drop observations with missing values? If so, you should subset both
>> the data submitted to glm() and the neighbour object so that they match.
>>
>> Hope this helps,
>>
>> Roger
>>
>>>
>>> Kind Regards,
>>>
>>> Michael Chanda Chiseni
>>>
>>> Phd Candidate
>>>
>>> Department of Economic History
>>>
>>> Lund University
>>>
>>> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>>>
>>>
>>>
>>> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: [hidden email]
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway