Hi all,
I would like to ask whether some you conducted bivariate spatial correlation in R. I know the bivariate Moran's I is not implemented in the spdep library. I left a question on SO but also wanted to hear if anyone if the mainlist have come across this. https://stackoverflow.com/questions/45177590/mapofbivariatespatialcorrelationinrbivariatelisa I also know Roger Bivand has implemented the L index proposed by Lee (2001) in spdep, but I'm not I'm not sure whether the L local correlation coefficients can be interpreted the same way as the local Moran's I coefficients. I couldn't find any reference commenting on this issue. I would very much appreciate your thoughts this. best, Rafael HM Pereira http://urbandemographics.blogspot.com [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Mon, 24 Jul 2017, Rafael Pereira wrote:
> Hi all, > > I would like to ask whether some you conducted bivariate spatial > correlation in R. > > I know the bivariate Moran's I is not implemented in the spdep library. > I left a question on SO but also wanted to hear if anyone if the mainlist > have come across this. > https://stackoverflow.com/questions/45177590/mapofbivariatespatialcorrelationinrbivariatelisa > > I also know Roger Bivand has implemented the L index proposed by Lee (2001) > in spdep, but I'm not I'm not sure whether the L local correlation > coefficients can be interpreted the same way as the local Moran's I > coefficients. I couldn't find any reference commenting on this issue. I > would very much appreciate your thoughts this. example makes fundamental mistakes. The code in spdep by Virgilio GómezRubio is for uni and bivariate L, and produces point values of local L. This isn't the main problem, which is rather that you are not taking account of the underlying population counts, nor shrinking any estimates of significance to accommodate population sizes. Population sizes vary from 0 to 11858, with the lower quartile at 3164 and upper 5698: plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in stead? These are also compositional variables (sum to pop2000, or 1 if rates) with the other missing components. You would probably be better served by tools examining spatial segregation, such as for example the seg package. The 0 count populations cause problems for an unofficial alternative, the black/white ratio: oregon.tract1 < oregon.tract[oregon.tract$white > 0,] oregon.tract1$rat < oregon.tract1$black/oregon.tract1$white nb < poly2nb(oregon.tract1) lw < nb2listw(nb) which should still be adjusted by weighting: lm0 < lm(rat ~ 1, weights=pop2000, data=oregon.tract1) I'm not advising this, but running localmoran.sad on this model output yields SAD pvalues < 0.05 after FDR correction only in contiguous tracts on the Washington state line in Portland between the Columbia and Williamette rivers. So do look at the variables you are using before rushing into things. Hope this clarifies, Roger > > best, > > Rafael HM Pereira > http://urbandemographics.blogspot.com > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Roger,
This example was provided only for the sake or making the code easily reproducible for others and I'm more interested in how the bivariate Moran could be implemented in R, but your comments are very much welcomed and I've made changes to the question. My actual case study looks at bivariate spatial correlation between (a) average household income per capita and (b) proportion of jobs in the city that are accessible under 60 minutes by transit. I don't think I could use rates in this case but I will normalize the variables using scale(data$variable). best, Rafael H M Pereira On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand <[hidden email]> wrote: > On Mon, 24 Jul 2017, Rafael Pereira wrote: > > Hi all, >> >> I would like to ask whether some you conducted bivariate spatial >> correlation in R. >> >> I know the bivariate Moran's I is not implemented in the spdep library. >> I left a question on SO but also wanted to hear if anyone if the mainlist >> have come across this. >> https://stackoverflow.com/questions/45177590/mapofbivariat >> espatialcorrelationinrbivariatelisa >> >> I also know Roger Bivand has implemented the L index proposed by Lee >> (2001) >> in spdep, but I'm not I'm not sure whether the L local correlation >> coefficients can be interpreted the same way as the local Moran's I >> coefficients. I couldn't find any reference commenting on this issue. I >> would very much appreciate your thoughts this. >> > > In the SO question, and in the followup, your presumably throwaway > example makes fundamental mistakes. The code in spdep by Virgilio > GómezRubio is for uni and bivariate L, and produces point values of local > L. This isn't the main problem, which is rather that you are not taking > account of the underlying population counts, nor shrinking any estimates of > significance to accommodate population sizes. Population sizes vary from 0 > to 11858, with the lower quartile at 3164 and upper 5698: > plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in stead? > These are also compositional variables (sum to pop2000, or 1 if rates) with > the other missing components. You would probably be better served by tools > examining spatial segregation, such as for example the seg package. > > The 0 count populations cause problems for an unofficial alternative, the > black/white ratio: > > oregon.tract1 < oregon.tract[oregon.tract$white > 0,] > oregon.tract1$rat < oregon.tract1$black/oregon.tract1$white > nb < poly2nb(oregon.tract1) > lw < nb2listw(nb) > > which should still be adjusted by weighting: > > lm0 < lm(rat ~ 1, weights=pop2000, data=oregon.tract1) > > I'm not advising this, but running localmoran.sad on this model output > yields SAD pvalues < 0.05 after FDR correction only in contiguous tracts > on the Washington state line in Portland between the Columbia and > Williamette rivers. So do look at the variables you are using before > rushing into things. > > Hope this clarifies, > > Roger > > >> best, >> >> Rafael HM Pereira >> http://urbandemographics.blogspot.com >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> RsigGeo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> >> >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Wed, 26 Jul 2017, Rafael Pereira wrote:
> Roger, > > This example was provided only for the sake or making the code easily > reproducible for others and I'm more interested in how the bivariate Moran > could be implemented in R, but your comments are very much welcomed and > I've made changes to the question. > > My actual case study looks at bivariate spatial correlation between (a) > average household income per capita and (b) proportion of jobs in the city > that are accessible under 60 minutes by transit. I don't think I could use > rates in this case but I will normalize the variables using > scale(data$variable). subset, or using a builtin data set. My guess is that you do not need bivariate spatial correlation at all, but rather a spatial regression. The "causal" variable would then the the proportion of jobs accessible within 60 minutes by transit, though this is extremely blunt, and lots of other covariates (demography, etc.) impact average household income per capita (per block/tract?). Since there are many missing variables in your specification, any spatial correlation would be most closely associated with them (demography, housing costs, education, etc.), and the choice of units of measurement would dominate the outcome. This is also why bivariate spatial correlation is seldom a good idea, I believe. It can be done, but most likely shouldn't, unless it can be motivated properly. By the way, the weighted and FDRcorrected SAD local Moran's I pvalues of the black/white ratio for Oregon (your toy example) did deliver the goods  if you zoom in in mapview::mapview, you can see that it detects a rate hotspot between the rivers. Roger > > best, > > Rafael H M Pereira > > On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand <[hidden email]> wrote: > >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >> >> Hi all, >>> >>> I would like to ask whether some you conducted bivariate spatial >>> correlation in R. >>> >>> I know the bivariate Moran's I is not implemented in the spdep library. >>> I left a question on SO but also wanted to hear if anyone if the mainlist >>> have come across this. >>> https://stackoverflow.com/questions/45177590/mapofbivariat >>> espatialcorrelationinrbivariatelisa >>> >>> I also know Roger Bivand has implemented the L index proposed by Lee >>> (2001) >>> in spdep, but I'm not I'm not sure whether the L local correlation >>> coefficients can be interpreted the same way as the local Moran's I >>> coefficients. I couldn't find any reference commenting on this issue. I >>> would very much appreciate your thoughts this. >>> >> >> In the SO question, and in the followup, your presumably throwaway >> example makes fundamental mistakes. The code in spdep by Virgilio >> GómezRubio is for uni and bivariate L, and produces point values of local >> L. This isn't the main problem, which is rather that you are not taking >> account of the underlying population counts, nor shrinking any estimates of >> significance to accommodate population sizes. Population sizes vary from 0 >> to 11858, with the lower quartile at 3164 and upper 5698: >> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in stead? >> These are also compositional variables (sum to pop2000, or 1 if rates) with >> the other missing components. You would probably be better served by tools >> examining spatial segregation, such as for example the seg package. >> >> The 0 count populations cause problems for an unofficial alternative, the >> black/white ratio: >> >> oregon.tract1 < oregon.tract[oregon.tract$white > 0,] >> oregon.tract1$rat < oregon.tract1$black/oregon.tract1$white >> nb < poly2nb(oregon.tract1) >> lw < nb2listw(nb) >> >> which should still be adjusted by weighting: >> >> lm0 < lm(rat ~ 1, weights=pop2000, data=oregon.tract1) >> >> I'm not advising this, but running localmoran.sad on this model output >> yields SAD pvalues < 0.05 after FDR correction only in contiguous tracts >> on the Washington state line in Portland between the Columbia and >> Williamette rivers. So do look at the variables you are using before >> rushing into things. >> >> Hope this clarifies, >> >> Roger >> >> >>> best, >>> >>> Rafael HM Pereira >>> http://urbandemographics.blogspot.com >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> RsigGeo mailing list >>> [hidden email] >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>> >>> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] >> EditorinChief of The R Journal, https://journal.rproject.org/index.html >> http://orcid.org/0000000323926140 >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Hi all,
here is a reproducible example to calculate in R bivariate Moran's I and LISA clusters. This example is based on a this answer provided in SO* and it uses a toy model of my data. The R script and the shape file with the data are available on this link. https://gist.github.com/rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is to estimate the spatial association between household income per capita and the gains in accessibility to jobs. The aim is to analyze who benefits the recent changes in the transport system in terms of access to jobs. So the idea is not to find causal relationships, but spatial association between areas of high/low income who had high/low gains in accessibility. The variables in the data show info on the proportion of jobs accessible in both years 2014 and 2017 (access2014, access2017) and the difference between the two years in percentage points (diffaccess). Roger, I know you have shown to be a bit sceptical about this application of bivariate Moran's I. Do you still think a spatial regression would be more appropriate? Also, I would be glad to hear if others have comments on the code. This function is not implemented in any package so it would be great to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com * https://stackoverflow.com/questions/45177590/mapofbivariatespatialcorrelationinrbivariatelisa On Wed, Jul 26, 2017 at 11:07 AM, Roger Bivand <[hidden email]> wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote: > > Roger, >> >> This example was provided only for the sake or making the code easily >> reproducible for others and I'm more interested in how the bivariate >> Moran >> could be implemented in R, but your comments are very much welcomed and >> I've made changes to the question. >> >> My actual case study looks at bivariate spatial correlation between (a) >> average household income per capita and (b) proportion of jobs in the city >> that are accessible under 60 minutes by transit. I don't think I could use >> rates in this case but I will normalize the variables using >> scale(data$variable). >> > > Please provide a reproducible example, either with a link to a data > subset, or using a builtin data set. My guess is that you do not need > bivariate spatial correlation at all, but rather a spatial regression. > > The "causal" variable would then the the proportion of jobs accessible > within 60 minutes by transit, though this is extremely blunt, and lots of > other covariates (demography, etc.) impact average household income per > capita (per block/tract?). Since there are many missing variables in your > specification, any spatial correlation would be most closely associated > with them (demography, housing costs, education, etc.), and the choice of > units of measurement would dominate the outcome. > > This is also why bivariate spatial correlation is seldom a good idea, I > believe. It can be done, but most likely shouldn't, unless it can be > motivated properly. > > By the way, the weighted and FDRcorrected SAD local Moran's I pvalues of > the black/white ratio for Oregon (your toy example) did deliver the goods  > if you zoom in in mapview::mapview, you can see that it detects a rate > hotspot between the rivers. > > Roger > > > >> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand <[hidden email]> >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>> >>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted bivariate spatial >>>> correlation in R. >>>> >>>> I know the bivariate Moran's I is not implemented in the spdep library. >>>> I left a question on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come across this. >>>> https://stackoverflow.com/questions/45177590/mapofbivariat >>>> espatialcorrelationinrbivariatelisa >>>> >>>> I also know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) >>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>> coefficients can be interpreted the same way as the local Moran's I >>>> coefficients. I couldn't find any reference commenting on this issue. I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the SO question, and in the followup, your presumably throwaway >>> example makes fundamental mistakes. The code in spdep by Virgilio >>> GómezRubio is for uni and bivariate L, and produces point values of >>> local >>> L. This isn't the main problem, which is rather that you are not taking >>> account of the underlying population counts, nor shrinking any estimates >>> of >>> significance to accommodate population sizes. Population sizes vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in >>> stead? >>> These are also compositional variables (sum to pop2000, or 1 if rates) >>> with >>> the other missing components. You would probably be better served by >>> tools >>> examining spatial segregation, such as for example the seg package. >>> >>> The 0 count populations cause problems for an unofficial alternative, the >>> black/white ratio: >>> >>> oregon.tract1 < oregon.tract[oregon.tract$white > 0,] >>> oregon.tract1$rat < oregon.tract1$black/oregon.tract1$white >>> nb < poly2nb(oregon.tract1) >>> lw < nb2listw(nb) >>> >>> which should still be adjusted by weighting: >>> >>> lm0 < lm(rat ~ 1, weights=pop2000, data=oregon.tract1) >>> >>> I'm not advising this, but running localmoran.sad on this model output >>> yields SAD pvalues < 0.05 after FDR correction only in contiguous tracts >>> on the Washington state line in Portland between the Columbia and >>> Williamette rivers. So do look at the variables you are using before >>> rushing into things. >>> >>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> RsigGeo mailing list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, Norway. >>> voice: +47 55 95 93 55; email: [hidden email] >>> EditorinChief of The R Journal, https://journal.rproject.org/ >>> index.html >>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> RsigGeo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

Thanks, I'll get back when able, offline now. What are the units of observation, and are aggregate household incomes observed only once?
Roger Roger Bivand Norwegian School of Economics Bergen, Norway Fra: Rafael Pereira Sendt: søndag 30. juli, 00.39 Emne: Re: [RsigGeo] bivariate spatial correlation in R Kopi: Rogério Barbosa, [hidden email] Hi all, here is a reproducible example to calculate in R bivariate Moran's I and LISA clusters. This example is based on a this answer provided in SO* and it uses a toy model of my data. The R script and the shape file with the data are available on this link. https://gist.github.com/rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is to estimate the spatial association between household income per capita and the gains in accessibility to jobs. The aim is to analyze who benefits the recent changes in the transport system in terms of access to jobs. So the idea is not to find causal relationships, but spatial association between areas of high/low income who had high/low gains in accessibility. The variables in the data show info on the proportion of jobs accessible in both years 2014 and 2017 (access2014, access2017) and the difference between the two years in percentage points (diffaccess). Roger, I know you have shown to be a bit sceptical about this application of bivariate Moran's I. Do you still think a spatial regression would be more appropriate? Also, I would be glad to hear if others have comments on the code. This function is not implemented in any package so it would be great to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com * https://stackoverflow.com/questions/45177590/mapofbivariatespatialcorrelationinrbivariatelisa On Wed, Jul 26, 2017 at 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote: > > Roger, >> >> This example was provided only for the sake or making the code easily >> reproducible for others and I'm more interested in how the bivariate >> Moran >> could be implemented in R, but your comments are very much welcomed and >> I've made changes to the question. >> >> My actual case study looks at bivariate spatial correlation between (a) >> average household income per capita and (b) proportion of jobs in the city >> that are accessible under 60 minutes by transit. I don't think I could use >> rates in this case but I will normalize the variables using >> scale(data$variable). >> > > Please provide a reproducible example, either with a link to a data > subset, or using a builtin data set. My guess is that you do not need > bivariate spatial correlation at all, but rather a spatial regression. > > The "causal" variable would then the the proportion of jobs accessible > within 60 minutes by transit, though this is extremely blunt, and lots of > other covariates (demography, etc.) impact average household income per > capita (per block/tract?). Since there are many missing variables in your > specification, any spatial correlation would be most closely associated > with them (demography, housing costs, education, etc.), and the choice of > units of measurement would dominate the outcome. > > This is also why bivariate spatial correlation is seldom a good idea, I > believe. It can be done, but most likely shouldn't, unless it can be > motivated properly. > > By the way, the weighted and FDRcorrected SAD local Moran's I pvalues of > the black/white ratio for Oregon (your toy example) did deliver the goods  > if you zoom in in mapview::mapview, you can see that it detects a rate > hotspot between the rivers. > > Roger > > > >> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>> >>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted bivariate spatial >>>> correlation in R. >>>> >>>> I know the bivariate Moran's I is not implemented in the spdep library. >>>> I left a question on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come across this. >>>> https://stackoverflow.com/questions/45177590/mapofbivariat >>>> espatialcorrelationinrbivariatelisa >>>> >>>> I also know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) >>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>> coefficients can be interpreted the same way as the local Moran's I >>>> coefficients. I couldn't find any reference commenting on this issue. I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the SO question, and in the followup, your presumably throwaway >>> example makes fundamental mistakes. The code in spdep by Virgilio >>> GómezRubio is for uni and bivariate L, and produces point values of >>> local >>> L. This isn't the main problem, which is rather that you are not taking >>> account of the underlying population counts, nor shrinking any estimates >>> of >>> significance to accommodate population sizes. Population sizes vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in >>> stead? >>> These are also compositional variables (sum to pop2000, or 1 if rates) >>> with >>> the other missing components. You would probably be better served by >>> tools >>> examining spatial segregation, such as for example the seg package. >>> >>> The 0 count populations cause problems for an unofficial alternative, the >>> black/white ratio: >>> >>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this, but running localmoran.sad on this model output >>> yields SAD pvalues < 0.05 after FDR correction only in contiguous tracts >>> on the Washington state line in Portland between the Columbia and >>> Williamette rivers. So do look at the variables you are using before >>> rushing into things. >>> >>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> RsigGeo mailing list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, Norway. >>> voice: +47 55 95 93 55; email: [hidden email] >>> EditorinChief of The R Journal, https://journal.rproject.org/ >>> index.html >>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> RsigGeo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Roger,
Population and income data are single point in time and come from the 2010 Census. Accessibility variables in 2014 and 2017 show the proportion of jobs accessible by public transport under 60 minutes. The variable diffaccess shows the difference between these two. It's in percentage points (access2017  access2014) best, Rafael H M Pereira urbandemographics.blogspot.com On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> wrote: > Thanks, I'll get back when able, offline now. What are the units of > observation, and are aggregate household incomes observed only once? > > Roger > > Roger Bivand > Norwegian School of Economics > Bergen, Norway > > > > Fra: Rafael Pereira > Sendt: søndag 30. juli, 00.39 > Emne: Re: [RsigGeo] bivariate spatial correlation in R > Kopi: Rogério Barbosa, [hidden email] > > > Hi all, here is a reproducible example to calculate in R bivariate Moran's > I and LISA clusters. This example is based on a this answer provided in SO* > and it uses a toy model of my data. The R script and the shape file with > the data are available on this link. https://gist.github.com/ > rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is > to estimate the spatial association between household income per capita and > the gains in accessibility to jobs. The aim is to analyze who benefits the > recent changes in the transport system in terms of access to jobs. So the > idea is not to find causal relationships, but spatial association between > areas of high/low income who had high/low gains in accessibility. The > variables in the data show info on the proportion of jobs accessible in > both years 2014 and 2017 (access2014, access2017) and the difference > between the two years in percentage points (diffaccess). Roger, I know you > have shown to be a bit sceptical about this application of bivariate > Moran's I. Do you still think a spatial regression would be more > appropriate? Also, I would be glad to hear if others have comments on the > code. This function is not implemented in any package so it would be great > to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com > * https://stackoverflow.com/questions/45177590/mapof > bivariatespatialcorrelationinrbivariatelisa On Wed, Jul 26, 2017 at > 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote: > > > Roger, >> >> This example was provided only for the sake or making the > code easily >> reproducible for others and I'm more interested in how the > bivariate >> Moran >> could be implemented in R, but your comments are > very much welcomed and >> I've made changes to the question. >> >> My > actual case study looks at bivariate spatial correlation between (a) >> > average household income per capita and (b) proportion of jobs in the city > >> that are accessible under 60 minutes by transit. I don't think I could > use >> rates in this case but I will normalize the variables using >> > scale(data$variable). >> > > Please provide a reproducible example, either > with a link to a data > subset, or using a builtin data set. My guess is > that you do not need > bivariate spatial correlation at all, but rather a > spatial regression. > > The "causal" variable would then the the proportion > of jobs accessible > within 60 minutes by transit, though this is extremely > blunt, and lots of > other covariates (demography, etc.) impact average > household income per > capita (per block/tract?). Since there are many > missing variables in your > specification, any spatial correlation would be > most closely associated > with them (demography, housing costs, education, > etc.), and the choice of > units of measurement would dominate the outcome. > > > This is also why bivariate spatial correlation is seldom a good idea, > I > believe. It can be done, but most likely shouldn't, unless it can be > > motivated properly. > > By the way, the weighted and FDRcorrected SAD > local Moran's I pvalues of > the black/white ratio for Oregon (your toy > example) did deliver the goods  > if you zoom in in mapview::mapview, you > can see that it detects a rate > hotspot between the rivers. > > Roger > > > > >> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, > Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>> > >>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted > bivariate spatial >>>> correlation in R. >>>> >>>> I know the bivariate > Moran's I is not implemented in the spdep library. >>>> I left a question > on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come > across this. >>>> https://stackoverflow.com/questions/45177590/mapof > bivariat >>>> espatialcorrelationinrbivariatelisa >>>> >>>> I also > know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) > >>>> in spdep, but I'm not I'm not sure whether the L local correlation > >>>> coefficients can be interpreted the same way as the local Moran's I > >>>> coefficients. I couldn't find any reference commenting on this issue. > I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the > SO question, and in the followup, your presumably throwaway >>> example > makes fundamental mistakes. The code in spdep by Virgilio >>> GómezRubio > is for uni and bivariate L, and produces point values of >>> local >>> L. > This isn't the main problem, which is rather that you are not taking >>> > account of the underlying population counts, nor shrinking any estimates > >>> of >>> significance to accommodate population sizes. Population sizes > vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper > 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates > in >>> stead? >>> These are also compositional variables (sum to pop2000, > or 1 if rates) >>> with >>> the other missing components. You would > probably be better served by >>> tools >>> examining spatial segregation, > such as for example the seg package. >>> >>> The 0 count populations cause > problems for an unofficial alternative, the >>> black/white ratio: >>> >>> > oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should > still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this, > but running localmoran.sad on this model output >>> yields SAD pvalues < > 0.05 after FDR correction only in contiguous tracts >>> on the Washington > state line in Portland between the Columbia and >>> Williamette rivers. So > do look at the variables you are using before >>> rushing into things. >>> > >>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael > HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> > [[alternative HTML version deleted]] >>>> >>>> > _______________________________________________ >>>> RsigGeo mailing > list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/ > listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of > Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, > Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; email: > [hidden email] >>> EditorinChief of The R Journal, > https://journal.rproject.org/ >>> index.html >>> > http://orcid.org/0000000323926140 >>> https://scholar.google.no/ > citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version > deleted]] >> >> _______________________________________________ >> > RsigGeo mailing list >> [hidden email] >> > https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > >  > Roger Bivand > > Department of Economics, Norwegian School of Economics, > Helleveien 30, > N5045 Bergen, Norway. > voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; > email: [hidden email] > EditorinChief of The R Journal, > https://journal.rproject.org/index.html > http://orcid.org/00000003 > 23926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > > [[alternative HTML version deleted]] _______________________________________________ > RsigGeo mailing list [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo > > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

Rafael,
I'm sorry, but there is no way you can logically "analyze who benefits the recent changes in the transport system in terms of access to jobs" from the data you have. Even if you had aggregate household income data for 2014 and 2017 (not for 2010 only), you would not know whether wealthier families had not dispaced poorer families as accessibility improved. You need individual data, either survey or register, preferably panel, to show that changes in accessibility change the economic welfare of households controlling for movement of households. The timestamps on the data make any attempt to do this very risky; the real findings from a hypothetical sureveybased panel might be completely different, especially if poorer households were displaced (also indirectly, through rising house prices driven by improved accessibility). Gauging the welfare effects of transport investments is very hard to instrument. The closest I could get was to map deciles of the change in access (more negatives than positives) and compare the aspatial income distributions: library(spdep) library(rgdal) map < readOGR(dsn=".", layer="test_map") library(classInt) cI < classIntervals(map$diffaccess, n=10, style="quantile") library(RColorBrewer) ybrpal < brewer.pal(6, "YlOrBr") fC < findColours(cI, ybrpal) qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") map$faccess < factor(findInterval(map$diffaccess, cI$brks, all.inside=TRUE), labels=names(attr(fC, "table"))) qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") acc_income < split(map$income, map$faccess) do.call("rbind", lapply(acc_income, summary)) dens < lapply(acc_income, density) plot(1, ylab="", xlab="", type="n", xlim=c(2000, 11000), ylim=c(0, 0.002)) for (i in seq(along=dens)) lines(dens[[i]], col=i) legend("topright", legend=names(dens), col=1:length(dens), lty=1, bty="n") These density curves really do not suggest any clear relationship, other than that some areas with increased accessibility had higher incomes in 2010. You can examine the reverse relationship  were aggregate areas that were more wealthy in 2010 able to attract more changes to accessibility? The answer seems to be yes, they were able to do this: nb < poly2nb(map) lw < nb2listw(nb, style = "W", zero.policy = T) lm.morantest(lm(diffaccess ~ I(income/1000), map), lw) # SLX model summary(lmSLX(diffaccess ~ I(income/1000), map, lw)) lm.morantest(lmSLX(diffaccess ~ I(income/1000), map, lw), lw) # Spatial Durbin error model  SDEM obj < errorsarlm(diffaccess ~ I(income/1000), map, lw, etype="emixed") summary(impacts(obj)) summary(impacts(lmSLX(diffaccess ~ I(income/1000), map, lw))) LR.sarlm(lmSLX(diffaccess ~ I(income/1000), map, lw), obj) It would be possible to run lm.morantest.sad() on the output of the SDEM model taking global spatial autocorrelation into account. If you need that, follow up in this thread. Bivariate Moran's I should not be used in this case, but could be used in other cases  use in change over time is troubling because randomisation will not be a good guide as t=1 and t=2 are subject to temporal as well as spatial autocorrelation, so you cannot use permutation bootstrap to find a usable measure of significance. Hope this clarifies, and thanks for the code. Roger On Sun, 30 Jul 2017, Rafael Pereira wrote: > Roger, > > Population and income data are single point in time and come from the 2010 > Census. > > Accessibility variables in 2014 and 2017 show the proportion of jobs > accessible by public transport under 60 minutes. The variable diffaccess > shows the difference between these two. It's in percentage points > (access2017  access2014) > > best, > > Rafael H M Pereira > urbandemographics.blogspot.com > > On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> wrote: > >> Thanks, I'll get back when able, offline now. What are the units of >> observation, and are aggregate household incomes observed only once? >> >> Roger >> >> Roger Bivand >> Norwegian School of Economics >> Bergen, Norway >> >> >> >> Fra: Rafael Pereira >> Sendt: søndag 30. juli, 00.39 >> Emne: Re: [RsigGeo] bivariate spatial correlation in R >> Kopi: Rogério Barbosa, [hidden email] >> >> >> Hi all, here is a reproducible example to calculate in R bivariate Moran's >> I and LISA clusters. This example is based on a this answer provided in SO* >> and it uses a toy model of my data. The R script and the shape file with >> the data are available on this link. https://gist.github.com/ >> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is >> to estimate the spatial association between household income per capita and >> the gains in accessibility to jobs. The aim is to analyze who benefits the >> recent changes in the transport system in terms of access to jobs. So the >> idea is not to find causal relationships, but spatial association between >> areas of high/low income who had high/low gains in accessibility. The >> variables in the data show info on the proportion of jobs accessible in >> both years 2014 and 2017 (access2014, access2017) and the difference >> between the two years in percentage points (diffaccess). Roger, I know you >> have shown to be a bit sceptical about this application of bivariate >> Moran's I. Do you still think a spatial regression would be more >> appropriate? Also, I would be glad to hear if others have comments on the >> code. This function is not implemented in any package so it would be great >> to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com >> * https://stackoverflow.com/questions/45177590/mapof >> bivariatespatialcorrelationinrbivariatelisa On Wed, Jul 26, 2017 at >> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote: >>>> Roger, >> >> This example was provided only for the sake or making the >> code easily >> reproducible for others and I'm more interested in how the >> bivariate >> Moran >> could be implemented in R, but your comments are >> very much welcomed and >> I've made changes to the question. >> >> My >> actual case study looks at bivariate spatial correlation between (a) >> >> average household income per capita and (b) proportion of jobs in the city >>>> that are accessible under 60 minutes by transit. I don't think I could >> use >> rates in this case but I will normalize the variables using >> >> scale(data$variable). >> > > Please provide a reproducible example, either >> with a link to a data > subset, or using a builtin data set. My guess is >> that you do not need > bivariate spatial correlation at all, but rather a >> spatial regression. > > The "causal" variable would then the the proportion >> of jobs accessible > within 60 minutes by transit, though this is extremely >> blunt, and lots of > other covariates (demography, etc.) impact average >> household income per > capita (per block/tract?). Since there are many >> missing variables in your > specification, any spatial correlation would be >> most closely associated > with them (demography, housing costs, education, >> etc.), and the choice of > units of measurement would dominate the outcome. >>>> This is also why bivariate spatial correlation is seldom a good idea, >> I > believe. It can be done, but most likely shouldn't, unless it can be > >> motivated properly. > > By the way, the weighted and FDRcorrected SAD >> local Moran's I pvalues of > the black/white ratio for Oregon (your toy >> example) did deliver the goods  > if you zoom in in mapview::mapview, you >> can see that it detects a rate > hotspot between the rivers. > > Roger > > >>>>> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, >> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>> >>>>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted >> bivariate spatial >>>> correlation in R. >>>> >>>> I know the bivariate >> Moran's I is not implemented in the spdep library. >>>> I left a question >> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come >> across this. >>>> https://stackoverflow.com/questions/45177590/mapof >> bivariat >>>> espatialcorrelationinrbivariatelisa >>>> >>>> I also >> know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) >>>>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>>>> coefficients can be interpreted the same way as the local Moran's I >>>>>> coefficients. I couldn't find any reference commenting on this issue. >> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the >> SO question, and in the followup, your presumably throwaway >>> example >> makes fundamental mistakes. The code in spdep by Virgilio >>> GómezRubio >> is for uni and bivariate L, and produces point values of >>> local >>> L. >> This isn't the main problem, which is rather that you are not taking >>> >> account of the underlying population counts, nor shrinking any estimates >>>>> of >>> significance to accommodate population sizes. Population sizes >> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper >> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates >> in >>> stead? >>> These are also compositional variables (sum to pop2000, >> or 1 if rates) >>> with >>> the other missing components. You would >> probably be better served by >>> tools >>> examining spatial segregation, >> such as for example the seg package. >>> >>> The 0 count populations cause >> problems for an unofficial alternative, the >>> black/white ratio: >>> >>> >> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should >> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this, >> but running localmoran.sad on this model output >>> yields SAD pvalues < >> 0.05 after FDR correction only in contiguous tracts >>> on the Washington >> state line in Portland between the Columbia and >>> Williamette rivers. So >> do look at the variables you are using before >>> rushing into things. >>> >>>>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael >> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> >> [[alternative HTML version deleted]] >>>> >>>> >> _______________________________________________ >>>> RsigGeo mailing >> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/ >> listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of >> Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, >> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; email: >> [hidden email] >>> EditorinChief of The R Journal, >> https://journal.rproject.org/ >>> index.html >>> >> http://orcid.org/0000000323926140 >>> https://scholar.google.no/ >> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version >> deleted]] >> >> _______________________________________________ >> >> RsigGeo mailing list >> [hidden email] >> >> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > >  > Roger Bivand >>> Department of Economics, Norwegian School of Economics, > Helleveien 30, >> N5045 Bergen, Norway. > voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; >> email: [hidden email] > EditorinChief of The R Journal, >> https://journal.rproject.org/index.html > http://orcid.org/00000003 >> 23926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > >> [[alternative HTML version deleted]] _______________________________________________ >> RsigGeo mailing list [hidden email] >> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> >> > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Roger,
Thank you for your response. I recognize the data is not ideal and the analysis has limitations because of the lack of information on population displacements that might have occurred over the years. Nonetheless, there are plenty of data + literature showing how the spatial distribution of income classes and land use patterns is fairly stable over time in this city, particularly for a short timescales like in this analysis. Having said this, I believe these two questions (1) what socioeconomic classes have gained more accessibility? and (2) “were wealthier areas in 2010 able to attract more changes to accessibility?” in the end ask the same thing but with different phrasings, though your phrasing (2) is more precise/correct. On the more technical discussion, I see your point that spatial AND temporal correlation in my data would make permutation bootstrap inappropriate to generate significance levels, thus making bivariate Moran’s I biased. Thank you very much for the clarifications! This has been very helpful and I will have a closer look at which spatial regression models are more appropriate for my analysis. On a side note, do you think the function to calculate bivariate Moran’s I is correct? And could it be incorporated in the next version of spdep? If so, please give credit to Rogério Barbosa, the researcher who proposed the code in Stack Overflow. best, Rafael HM Pereira http://urbandemographics.blogspot.com On Mon, Jul 31, 2017 at 10:52 PM, Roger Bivand <[hidden email]> wrote: > Rafael, > > I'm sorry, but there is no way you can logically "analyze who benefits the > recent changes in the transport system in terms of access to jobs" from the > data you have. > > Even if you had aggregate household income data for 2014 and 2017 (not for > 2010 only), you would not know whether wealthier families had not dispaced > poorer families as accessibility improved. You need individual data, either > survey or register, preferably panel, to show that changes in accessibility > change the economic welfare of households controlling for movement of > households. The timestamps on the data make any attempt to do this very > risky; the real findings from a hypothetical sureveybased panel might be > completely different, especially if poorer households were displaced (also > indirectly, through rising house prices driven by improved accessibility). > Gauging the welfare effects of transport investments is very hard to > instrument. > > The closest I could get was to map deciles of the change in access (more > negatives than positives) and compare the aspatial income distributions: > > library(spdep) > library(rgdal) > map < readOGR(dsn=".", layer="test_map") > library(classInt) > cI < classIntervals(map$diffaccess, n=10, style="quantile") > library(RColorBrewer) > ybrpal < brewer.pal(6, "YlOrBr") > fC < findColours(cI, ybrpal) > qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") > map$faccess < factor(findInterval(map$diffaccess, cI$brks, > all.inside=TRUE), labels=names(attr(fC, "table"))) > qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") > acc_income < split(map$income, map$faccess) > do.call("rbind", lapply(acc_income, summary)) > dens < lapply(acc_income, density) > plot(1, ylab="", xlab="", type="n", xlim=c(2000, 11000), ylim=c(0, > 0.002)) > for (i in seq(along=dens)) lines(dens[[i]], col=i) > legend("topright", legend=names(dens), col=1:length(dens), lty=1, bty="n") > > These density curves really do not suggest any clear relationship, other > than that some areas with increased accessibility had higher incomes in > 2010. > > You can examine the reverse relationship  were aggregate areas that were > more wealthy in 2010 able to attract more changes to accessibility? The > answer seems to be yes, they were able to do this: > > nb < poly2nb(map) > lw < nb2listw(nb, style = "W", zero.policy = T) > lm.morantest(lm(diffaccess ~ I(income/1000), map), lw) > # SLX model > summary(lmSLX(diffaccess ~ I(income/1000), map, lw)) > lm.morantest(lmSLX(diffaccess ~ I(income/1000), map, lw), lw) > # Spatial Durbin error model  SDEM > obj < errorsarlm(diffaccess ~ I(income/1000), map, lw, etype="emixed") > summary(impacts(obj)) > summary(impacts(lmSLX(diffaccess ~ I(income/1000), map, lw))) > LR.sarlm(lmSLX(diffaccess ~ I(income/1000), map, lw), obj) > > It would be possible to run lm.morantest.sad() on the output of the SDEM > model taking global spatial autocorrelation into account. If you need that, > follow up in this thread. > > Bivariate Moran's I should not be used in this case, but could be used in > other cases  use in change over time is troubling because randomisation > will not be a good guide as t=1 and t=2 are subject to temporal as well as > spatial autocorrelation, so you cannot use permutation bootstrap to find a > usable measure of significance. > > Hope this clarifies, and thanks for the code. > > Roger > > On Sun, 30 Jul 2017, Rafael Pereira wrote: > > Roger, >> >> Population and income data are single point in time and come from the 2010 >> Census. >> >> Accessibility variables in 2014 and 2017 show the proportion of jobs >> accessible by public transport under 60 minutes. The variable diffaccess >> shows the difference between these two. It's in percentage points >> (access2017  access2014) >> >> best, >> >> Rafael H M Pereira >> urbandemographics.blogspot.com >> >> On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> >> wrote: >> >> Thanks, I'll get back when able, offline now. What are the units of >>> observation, and are aggregate household incomes observed only once? >>> >>> Roger >>> >>> Roger Bivand >>> Norwegian School of Economics >>> Bergen, Norway >>> >>> >>> >>> Fra: Rafael Pereira >>> Sendt: søndag 30. juli, 00.39 >>> Emne: Re: [RsigGeo] bivariate spatial correlation in R >>> Kopi: Rogério Barbosa, [hidden email] >>> >>> >>> Hi all, here is a reproducible example to calculate in R bivariate >>> Moran's >>> I and LISA clusters. This example is based on a this answer provided in >>> SO* >>> and it uses a toy model of my data. The R script and the shape file with >>> the data are available on this link. https://gist.github.com/ >>> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is >>> to estimate the spatial association between household income per capita >>> and >>> the gains in accessibility to jobs. The aim is to analyze who benefits >>> the >>> recent changes in the transport system in terms of access to jobs. So the >>> idea is not to find causal relationships, but spatial association between >>> areas of high/low income who had high/low gains in accessibility. The >>> variables in the data show info on the proportion of jobs accessible in >>> both years 2014 and 2017 (access2014, access2017) and the difference >>> between the two years in percentage points (diffaccess). Roger, I know >>> you >>> have shown to be a bit sceptical about this application of bivariate >>> Moran's I. Do you still think a spatial regression would be more >>> appropriate? Also, I would be glad to hear if others have comments on the >>> code. This function is not implemented in any package so it would be >>> great >>> to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com >>> * https://stackoverflow.com/questions/45177590/mapof >>> bivariatespatialcorrelationinrbivariatelisa On Wed, Jul 26, 2017 >>> at >>> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira >>> wrote: >>> >>>> Roger, >> >> This example was provided only for the sake or making the >>>>> >>>> code easily >> reproducible for others and I'm more interested in how >>> the >>> bivariate >> Moran >> could be implemented in R, but your comments are >>> very much welcomed and >> I've made changes to the question. >> >> My >>> actual case study looks at bivariate spatial correlation between (a) >> >>> average household income per capita and (b) proportion of jobs in the >>> city >>> >>>> that are accessible under 60 minutes by transit. I don't think I could >>>>> >>>> use >> rates in this case but I will normalize the variables using >> >>> scale(data$variable). >> > > Please provide a reproducible example, >>> either >>> with a link to a data > subset, or using a builtin data set. My guess is >>> that you do not need > bivariate spatial correlation at all, but rather >>> a >>> spatial regression. > > The "causal" variable would then the the >>> proportion >>> of jobs accessible > within 60 minutes by transit, though this is >>> extremely >>> blunt, and lots of > other covariates (demography, etc.) impact average >>> household income per > capita (per block/tract?). Since there are many >>> missing variables in your > specification, any spatial correlation would >>> be >>> most closely associated > with them (demography, housing costs, >>> education, >>> etc.), and the choice of > units of measurement would dominate the >>> outcome. >>> >>>> This is also why bivariate spatial correlation is seldom a good idea, >>>>> >>>> I > believe. It can be done, but most likely shouldn't, unless it can >>> be > >>> motivated properly. > > By the way, the weighted and FDRcorrected SAD >>> local Moran's I pvalues of > the black/white ratio for Oregon (your toy >>> example) did deliver the goods  > if you zoom in in mapview::mapview, >>> you >>> can see that it detects a rate > hotspot between the rivers. > > Roger > >>> > >>> >>>> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, >>>>>> >>>>> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira >>> wrote: >>> >>> >>>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted >>>>>> >>>>> bivariate spatial >>>> correlation in R. >>>> >>>> I know the >>> bivariate >>> Moran's I is not implemented in the spdep library. >>>> I left a question >>> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have >>> come >>> across this. >>>> https://stackoverflow.com/questions/45177590/mapof >>> bivariat >>>> espatialcorrelationinrbivariatelisa >>>> >>>> I also >>> know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) >>> >>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>>>>> coefficients can be interpreted the same way as the local Moran's I >>>>>>> coefficients. I couldn't find any reference commenting on this issue. >>>>>>> >>>>>> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> >>> In the >>> SO question, and in the followup, your presumably throwaway >>> example >>> makes fundamental mistakes. The code in spdep by Virgilio >>> GómezRubio >>> is for uni and bivariate L, and produces point values of >>> local >>> >>> L. >>> This isn't the main problem, which is rather that you are not taking >>> >>> account of the underlying population counts, nor shrinking any estimates >>> >>>> of >>> significance to accommodate population sizes. Population sizes >>>>>> >>>>> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper >>> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing >>> rates >>> in >>> stead? >>> These are also compositional variables (sum to pop2000, >>> or 1 if rates) >>> with >>> the other missing components. You would >>> probably be better served by >>> tools >>> examining spatial segregation, >>> such as for example the seg package. >>> >>> The 0 count populations >>> cause >>> problems for an unofficial alternative, the >>> black/white ratio: >>> >>> >>> >>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should >>> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this, >>> but running localmoran.sad on this model output >>> yields SAD pvalues < >>> 0.05 after FDR correction only in contiguous tracts >>> on the Washington >>> state line in Portland between the Columbia and >>> Williamette rivers. >>> So >>> do look at the variables you are using before >>> rushing into things. >>> >>> >>> >>>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael >>>>>> >>>>> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> >>> [[alternative HTML version deleted]] >>>> >>>> >>> _______________________________________________ >>>> RsigGeo mailing >>> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/ >>> listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of >>> Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 >>> Bergen, >>> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; email: >>> [hidden email] >>> EditorinChief of The R Journal, >>> https://journal.rproject.org/ >>> index.html >>> >>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/ >>> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version >>> deleted]] >> >> _______________________________________________ >> >>> RsigGeo mailing list >> [hidden email] >> >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > >  > Roger Bivand >>> >>>> Department of Economics, Norwegian School of Economics, > Helleveien 30, >>>> >>> N5045 Bergen, Norway. > voice: +47 55 95 93 55 >>> <+47%2055%2095%2093%2055>; >>> email: [hidden email] > EditorinChief of The R Journal, >>> https://journal.rproject.org/index.html > http://orcid.org/00000003 >>> 23926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> > >>> [[alternative HTML version deleted]] ______________________________ >>> _________________ >>> RsigGeo mailing list [hidden email] >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>> >>> >>> >> >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Wed, 2 Aug 2017, Rafael Pereira wrote:
> Roger, > > > Thank you for your response. > > > I recognize the data is not ideal and the analysis has limitations because > of the lack of information on population displacements that might have > occurred over the years. Nonetheless, there are plenty of data + literature > showing how the spatial distribution of income classes and land use > patterns is fairly stable over time in this city, particularly for a short > timescales like in this analysis. Having said this, I believe these two > questions (1) what socioeconomic classes have gained more accessibility? > and (2) “were wealthier areas in 2010 able to attract more changes to > accessibility?” in the end ask the same thing but with different phrasings, > though your phrasing (2) is more precise/correct. > > > > On the more technical discussion, I see your point that spatial AND > temporal correlation in my data would make permutation bootstrap > inappropriate to generate significance levels, thus making bivariate > Moran’s I biased. Thank you very much for the clarifications! This has been > very helpful and I will have a closer look at which spatial regression > models are more appropriate for my analysis. > > > On a side note, do you think the function to calculate bivariate Moran’s I > is correct? And could it be incorporated in the next version of spdep? If > so, please give credit to Rogério Barbosa, the researcher who proposed the > code in Stack Overflow. documenting code and fixing bugs found later). I don't see any tests, documentation or accommodation of what spdep expects for edge cases  the function as it stands would need a lot of work to protect users from obvious blunders. There are no references to literature, nor to proven test cases which this implementation should match. We have an implementation of Lee (2001), but this is not the same, right? Which article gives the formal statistical development of the bivariate local Moran's I? Do we know that conditional simulation (permutation bootstrap) is valid in some cases, if so which? Is there a development of parametric bootstrap? Roger > > best, > > Rafael HM Pereira > http://urbandemographics.blogspot.com > > > On Mon, Jul 31, 2017 at 10:52 PM, Roger Bivand <[hidden email]> wrote: > >> Rafael, >> >> I'm sorry, but there is no way you can logically "analyze who benefits the >> recent changes in the transport system in terms of access to jobs" from the >> data you have. >> >> Even if you had aggregate household income data for 2014 and 2017 (not for >> 2010 only), you would not know whether wealthier families had not dispaced >> poorer families as accessibility improved. You need individual data, either >> survey or register, preferably panel, to show that changes in accessibility >> change the economic welfare of households controlling for movement of >> households. The timestamps on the data make any attempt to do this very >> risky; the real findings from a hypothetical sureveybased panel might be >> completely different, especially if poorer households were displaced (also >> indirectly, through rising house prices driven by improved accessibility). >> Gauging the welfare effects of transport investments is very hard to >> instrument. >> >> The closest I could get was to map deciles of the change in access (more >> negatives than positives) and compare the aspatial income distributions: >> >> library(spdep) >> library(rgdal) >> map < readOGR(dsn=".", layer="test_map") >> library(classInt) >> cI < classIntervals(map$diffaccess, n=10, style="quantile") >> library(RColorBrewer) >> ybrpal < brewer.pal(6, "YlOrBr") >> fC < findColours(cI, ybrpal) >> qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") >> map$faccess < factor(findInterval(map$diffaccess, cI$brks, >> all.inside=TRUE), labels=names(attr(fC, "table"))) >> qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") >> acc_income < split(map$income, map$faccess) >> do.call("rbind", lapply(acc_income, summary)) >> dens < lapply(acc_income, density) >> plot(1, ylab="", xlab="", type="n", xlim=c(2000, 11000), ylim=c(0, >> 0.002)) >> for (i in seq(along=dens)) lines(dens[[i]], col=i) >> legend("topright", legend=names(dens), col=1:length(dens), lty=1, bty="n") >> >> These density curves really do not suggest any clear relationship, other >> than that some areas with increased accessibility had higher incomes in >> 2010. >> >> You can examine the reverse relationship  were aggregate areas that were >> more wealthy in 2010 able to attract more changes to accessibility? The >> answer seems to be yes, they were able to do this: >> >> nb < poly2nb(map) >> lw < nb2listw(nb, style = "W", zero.policy = T) >> lm.morantest(lm(diffaccess ~ I(income/1000), map), lw) >> # SLX model >> summary(lmSLX(diffaccess ~ I(income/1000), map, lw)) >> lm.morantest(lmSLX(diffaccess ~ I(income/1000), map, lw), lw) >> # Spatial Durbin error model  SDEM >> obj < errorsarlm(diffaccess ~ I(income/1000), map, lw, etype="emixed") >> summary(impacts(obj)) >> summary(impacts(lmSLX(diffaccess ~ I(income/1000), map, lw))) >> LR.sarlm(lmSLX(diffaccess ~ I(income/1000), map, lw), obj) >> >> It would be possible to run lm.morantest.sad() on the output of the SDEM >> model taking global spatial autocorrelation into account. If you need that, >> follow up in this thread. >> >> Bivariate Moran's I should not be used in this case, but could be used in >> other cases  use in change over time is troubling because randomisation >> will not be a good guide as t=1 and t=2 are subject to temporal as well as >> spatial autocorrelation, so you cannot use permutation bootstrap to find a >> usable measure of significance. >> >> Hope this clarifies, and thanks for the code. >> >> Roger >> >> On Sun, 30 Jul 2017, Rafael Pereira wrote: >> >> Roger, >>> >>> Population and income data are single point in time and come from the 2010 >>> Census. >>> >>> Accessibility variables in 2014 and 2017 show the proportion of jobs >>> accessible by public transport under 60 minutes. The variable diffaccess >>> shows the difference between these two. It's in percentage points >>> (access2017  access2014) >>> >>> best, >>> >>> Rafael H M Pereira >>> urbandemographics.blogspot.com >>> >>> On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> >>> wrote: >>> >>> Thanks, I'll get back when able, offline now. What are the units of >>>> observation, and are aggregate household incomes observed only once? >>>> >>>> Roger >>>> >>>> Roger Bivand >>>> Norwegian School of Economics >>>> Bergen, Norway >>>> >>>> >>>> >>>> Fra: Rafael Pereira >>>> Sendt: søndag 30. juli, 00.39 >>>> Emne: Re: [RsigGeo] bivariate spatial correlation in R >>>> Kopi: Rogério Barbosa, [hidden email] >>>> >>>> >>>> Hi all, here is a reproducible example to calculate in R bivariate >>>> Moran's >>>> I and LISA clusters. This example is based on a this answer provided in >>>> SO* >>>> and it uses a toy model of my data. The R script and the shape file with >>>> the data are available on this link. https://gist.github.com/ >>>> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is >>>> to estimate the spatial association between household income per capita >>>> and >>>> the gains in accessibility to jobs. The aim is to analyze who benefits >>>> the >>>> recent changes in the transport system in terms of access to jobs. So the >>>> idea is not to find causal relationships, but spatial association between >>>> areas of high/low income who had high/low gains in accessibility. The >>>> variables in the data show info on the proportion of jobs accessible in >>>> both years 2014 and 2017 (access2014, access2017) and the difference >>>> between the two years in percentage points (diffaccess). Roger, I know >>>> you >>>> have shown to be a bit sceptical about this application of bivariate >>>> Moran's I. Do you still think a spatial regression would be more >>>> appropriate? Also, I would be glad to hear if others have comments on the >>>> code. This function is not implemented in any package so it would be >>>> great >>>> to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com >>>> * https://stackoverflow.com/questions/45177590/mapof >>>> bivariatespatialcorrelationinrbivariatelisa On Wed, Jul 26, 2017 >>>> at >>>> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira >>>> wrote: >>>> >>>>> Roger, >> >> This example was provided only for the sake or making the >>>>>> >>>>> code easily >> reproducible for others and I'm more interested in how >>>> the >>>> bivariate >> Moran >> could be implemented in R, but your comments are >>>> very much welcomed and >> I've made changes to the question. >> >> My >>>> actual case study looks at bivariate spatial correlation between (a) >> >>>> average household income per capita and (b) proportion of jobs in the >>>> city >>>> >>>>> that are accessible under 60 minutes by transit. I don't think I could >>>>>> >>>>> use >> rates in this case but I will normalize the variables using >> >>>> scale(data$variable). >> > > Please provide a reproducible example, >>>> either >>>> with a link to a data > subset, or using a builtin data set. My guess is >>>> that you do not need > bivariate spatial correlation at all, but rather >>>> a >>>> spatial regression. > > The "causal" variable would then the the >>>> proportion >>>> of jobs accessible > within 60 minutes by transit, though this is >>>> extremely >>>> blunt, and lots of > other covariates (demography, etc.) impact average >>>> household income per > capita (per block/tract?). Since there are many >>>> missing variables in your > specification, any spatial correlation would >>>> be >>>> most closely associated > with them (demography, housing costs, >>>> education, >>>> etc.), and the choice of > units of measurement would dominate the >>>> outcome. >>>> >>>>> This is also why bivariate spatial correlation is seldom a good idea, >>>>>> >>>>> I > believe. It can be done, but most likely shouldn't, unless it can >>>> be > >>>> motivated properly. > > By the way, the weighted and FDRcorrected SAD >>>> local Moran's I pvalues of > the black/white ratio for Oregon (your toy >>>> example) did deliver the goods  > if you zoom in in mapview::mapview, >>>> you >>>> can see that it detects a rate > hotspot between the rivers. > > Roger > >>>>> >>>> >>>>> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, >>>>>>> >>>>>> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira >>>> wrote: >>> >>>> >>>>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted >>>>>>> >>>>>> bivariate spatial >>>> correlation in R. >>>> >>>> I know the >>>> bivariate >>>> Moran's I is not implemented in the spdep library. >>>> I left a question >>>> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have >>>> come >>>> across this. >>>> https://stackoverflow.com/questions/45177590/mapof >>>> bivariat >>>> espatialcorrelationinrbivariatelisa >>>> >>>> I also >>>> know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) >>>> >>>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>>>>>> coefficients can be interpreted the same way as the local Moran's I >>>>>>>> coefficients. I couldn't find any reference commenting on this issue. >>>>>>>> >>>>>>> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> >>>> In the >>>> SO question, and in the followup, your presumably throwaway >>> example >>>> makes fundamental mistakes. The code in spdep by Virgilio >>> GómezRubio >>>> is for uni and bivariate L, and produces point values of >>> local >>> >>>> L. >>>> This isn't the main problem, which is rather that you are not taking >>> >>>> account of the underlying population counts, nor shrinking any estimates >>>> >>>>> of >>> significance to accommodate population sizes. Population sizes >>>>>>> >>>>>> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper >>>> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing >>>> rates >>>> in >>> stead? >>> These are also compositional variables (sum to pop2000, >>>> or 1 if rates) >>> with >>> the other missing components. You would >>>> probably be better served by >>> tools >>> examining spatial segregation, >>>> such as for example the seg package. >>> >>> The 0 count populations >>>> cause >>>> problems for an unofficial alternative, the >>> black/white ratio: >>> >>>>>>> >>>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should >>>> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this, >>>> but running localmoran.sad on this model output >>> yields SAD pvalues < >>>> 0.05 after FDR correction only in contiguous tracts >>> on the Washington >>>> state line in Portland between the Columbia and >>> Williamette rivers. >>>> So >>>> do look at the variables you are using before >>> rushing into things. >>>>>>> >>>> >>>>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael >>>>>>> >>>>>> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _______________________________________________ >>>> RsigGeo mailing >>>> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/ >>>> listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of >>>> Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 >>>> Bergen, >>>> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; email: >>>> [hidden email] >>> EditorinChief of The R Journal, >>>> https://journal.rproject.org/ >>> index.html >>> >>>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/ >>>> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version >>>> deleted]] >> >> _______________________________________________ >> >>>> RsigGeo mailing list >> [hidden email] >> >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > >  > Roger Bivand >>>> >>>>> Department of Economics, Norwegian School of Economics, > Helleveien 30, >>>>> >>>> N5045 Bergen, Norway. > voice: +47 55 95 93 55 >>>> <+47%2055%2095%2093%2055>; >>>> email: [hidden email] > EditorinChief of The R Journal, >>>> https://journal.rproject.org/index.html > http://orcid.org/00000003 >>>> 23926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>>>> >>>> [[alternative HTML version deleted]] ______________________________ >>>> _________________ >>>> RsigGeo mailing list [hidden email] >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>> >>>> >>>> >>> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] >> EditorinChief of The R Journal, https://journal.rproject.org/index.html >> http://orcid.org/0000000323926140 >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >> > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Hi Roger,
you're correct in pointing out that the code would much more work to be incorporated in the spdep library. This is something that Rogerio and I are considering to do in the next few months as both of us find more time in our agendas. Thank you again for the discussion. The amount of work and attention you put into this community is exceptional for someone as busy as you are and this is much appreciated! best, Rafa On Thu, Aug 3, 2017 at 10:13 AM, Roger Bivand <[hidden email]> wrote: > On Wed, 2 Aug 2017, Rafael Pereira wrote: > > Roger, >> >> >> Thank you for your response. >> >> >> I recognize the data is not ideal and the analysis has limitations because >> of the lack of information on population displacements that might have >> occurred over the years. Nonetheless, there are plenty of data + >> literature >> showing how the spatial distribution of income classes and land use >> patterns is fairly stable over time in this city, particularly for a short >> timescales like in this analysis. Having said this, I believe these two >> questions (1) what socioeconomic classes have gained more accessibility? >> and (2) “were wealthier areas in 2010 able to attract more changes to >> accessibility?” in the end ask the same thing but with different >> phrasings, >> though your phrasing (2) is more precise/correct. >> >> >> >> On the more technical discussion, I see your point that spatial AND >> temporal correlation in my data would make permutation bootstrap >> inappropriate to generate significance levels, thus making bivariate >> Moran’s I biased. Thank you very much for the clarifications! This has >> been >> very helpful and I will have a closer look at which spatial regression >> models are more appropriate for my analysis. >> >> >> On a side note, do you think the function to calculate bivariate Moran’s I >> is correct? And could it be incorporated in the next version of spdep? If >> so, please give credit to Rogério Barbosa, the researcher who proposed the >> code in Stack Overflow. >> > > Perhaps, but SO is usually ephemeral (nobody takes responsibility for > documenting code and fixing bugs found later). I don't see any tests, > documentation or accommodation of what spdep expects for edge cases  the > function as it stands would need a lot of work to protect users from > obvious blunders. There are no references to literature, nor to proven test > cases which this implementation should match. We have an implementation of > Lee (2001), but this is not the same, right? Which article gives the formal > statistical development of the bivariate local Moran's I? Do we know that > conditional simulation (permutation bootstrap) is valid in some cases, if > so which? Is there a development of parametric bootstrap? > > > Roger > > >> best, >> >> Rafael HM Pereira >> http://urbandemographics.blogspot.com >> >> >> On Mon, Jul 31, 2017 at 10:52 PM, Roger Bivand <[hidden email]> >> wrote: >> >> Rafael, >>> >>> I'm sorry, but there is no way you can logically "analyze who benefits >>> the >>> recent changes in the transport system in terms of access to jobs" from >>> the >>> data you have. >>> >>> Even if you had aggregate household income data for 2014 and 2017 (not >>> for >>> 2010 only), you would not know whether wealthier families had not >>> dispaced >>> poorer families as accessibility improved. You need individual data, >>> either >>> survey or register, preferably panel, to show that changes in >>> accessibility >>> change the economic welfare of households controlling for movement of >>> households. The timestamps on the data make any attempt to do this very >>> risky; the real findings from a hypothetical sureveybased panel might be >>> completely different, especially if poorer households were displaced >>> (also >>> indirectly, through rising house prices driven by improved >>> accessibility). >>> Gauging the welfare effects of transport investments is very hard to >>> instrument. >>> >>> The closest I could get was to map deciles of the change in access (more >>> negatives than positives) and compare the aspatial income distributions: >>> >>> library(spdep) >>> library(rgdal) >>> map < readOGR(dsn=".", layer="test_map") >>> library(classInt) >>> cI < classIntervals(map$diffaccess, n=10, style="quantile") >>> library(RColorBrewer) >>> ybrpal < brewer.pal(6, "YlOrBr") >>> fC < findColours(cI, ybrpal) >>> qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") >>> map$faccess < factor(findInterval(map$diffaccess, cI$brks, >>> all.inside=TRUE), labels=names(attr(fC, "table"))) >>> qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2") >>> acc_income < split(map$income, map$faccess) >>> do.call("rbind", lapply(acc_income, summary)) >>> dens < lapply(acc_income, density) >>> plot(1, ylab="", xlab="", type="n", xlim=c(2000, 11000), ylim=c(0, >>> 0.002)) >>> for (i in seq(along=dens)) lines(dens[[i]], col=i) >>> legend("topright", legend=names(dens), col=1:length(dens), lty=1, >>> bty="n") >>> >>> These density curves really do not suggest any clear relationship, other >>> than that some areas with increased accessibility had higher incomes in >>> 2010. >>> >>> You can examine the reverse relationship  were aggregate areas that were >>> more wealthy in 2010 able to attract more changes to accessibility? The >>> answer seems to be yes, they were able to do this: >>> >>> nb < poly2nb(map) >>> lw < nb2listw(nb, style = "W", zero.policy = T) >>> lm.morantest(lm(diffaccess ~ I(income/1000), map), lw) >>> # SLX model >>> summary(lmSLX(diffaccess ~ I(income/1000), map, lw)) >>> lm.morantest(lmSLX(diffaccess ~ I(income/1000), map, lw), lw) >>> # Spatial Durbin error model  SDEM >>> obj < errorsarlm(diffaccess ~ I(income/1000), map, lw, etype="emixed") >>> summary(impacts(obj)) >>> summary(impacts(lmSLX(diffaccess ~ I(income/1000), map, lw))) >>> LR.sarlm(lmSLX(diffaccess ~ I(income/1000), map, lw), obj) >>> >>> It would be possible to run lm.morantest.sad() on the output of the SDEM >>> model taking global spatial autocorrelation into account. If you need >>> that, >>> follow up in this thread. >>> >>> Bivariate Moran's I should not be used in this case, but could be used in >>> other cases  use in change over time is troubling because randomisation >>> will not be a good guide as t=1 and t=2 are subject to temporal as well >>> as >>> spatial autocorrelation, so you cannot use permutation bootstrap to find >>> a >>> usable measure of significance. >>> >>> Hope this clarifies, and thanks for the code. >>> >>> Roger >>> >>> On Sun, 30 Jul 2017, Rafael Pereira wrote: >>> >>> Roger, >>> >>>> >>>> Population and income data are single point in time and come from the >>>> 2010 >>>> Census. >>>> >>>> Accessibility variables in 2014 and 2017 show the proportion of jobs >>>> accessible by public transport under 60 minutes. The variable diffaccess >>>> shows the difference between these two. It's in percentage points >>>> (access2017  access2014) >>>> >>>> best, >>>> >>>> Rafael H M Pereira >>>> urbandemographics.blogspot.com >>>> >>>> On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> >>>> wrote: >>>> >>>> Thanks, I'll get back when able, offline now. What are the units of >>>> >>>>> observation, and are aggregate household incomes observed only once? >>>>> >>>>> Roger >>>>> >>>>> Roger Bivand >>>>> Norwegian School of Economics >>>>> Bergen, Norway >>>>> >>>>> >>>>> >>>>> Fra: Rafael Pereira >>>>> Sendt: søndag 30. juli, 00.39 >>>>> Emne: Re: [RsigGeo] bivariate spatial correlation in R >>>>> Kopi: Rogério Barbosa, [hidden email] >>>>> >>>>> >>>>> Hi all, here is a reproducible example to calculate in R bivariate >>>>> Moran's >>>>> I and LISA clusters. This example is based on a this answer provided in >>>>> SO* >>>>> and it uses a toy model of my data. The R script and the shape file >>>>> with >>>>> the data are available on this link. https://gist.github.com/ >>>>> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does >>>>> is >>>>> to estimate the spatial association between household income per capita >>>>> and >>>>> the gains in accessibility to jobs. The aim is to analyze who benefits >>>>> the >>>>> recent changes in the transport system in terms of access to jobs. So >>>>> the >>>>> idea is not to find causal relationships, but spatial association >>>>> between >>>>> areas of high/low income who had high/low gains in accessibility. The >>>>> variables in the data show info on the proportion of jobs accessible in >>>>> both years 2014 and 2017 (access2014, access2017) and the difference >>>>> between the two years in percentage points (diffaccess). Roger, I know >>>>> you >>>>> have shown to be a bit sceptical about this application of bivariate >>>>> Moran's I. Do you still think a spatial regression would be more >>>>> appropriate? Also, I would be glad to hear if others have comments on >>>>> the >>>>> code. This function is not implemented in any package so it would be >>>>> great >>>>> to have some feedback. Rafael H M Pereira >>>>> urbandemographics.blogspot.com >>>>> * https://stackoverflow.com/questions/45177590/mapof >>>>> bivariatespatialcorrelationinrbivariatelisa On Wed, Jul 26, 2017 >>>>> at >>>>> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira >>>>> wrote: >>>>> >>>>> Roger, >> >> This example was provided only for the sake or making the >>>>>> >>>>>>> >>>>>>> code easily >> reproducible for others and I'm more interested in how >>>>>> >>>>> the >>>>> bivariate >> Moran >> could be implemented in R, but your comments are >>>>> very much welcomed and >> I've made changes to the question. >> >> My >>>>> actual case study looks at bivariate spatial correlation between (a) >>>>> >> >>>>> average household income per capita and (b) proportion of jobs in the >>>>> city >>>>> >>>>> that are accessible under 60 minutes by transit. I don't think I could >>>>>> >>>>>>> >>>>>>> use >> rates in this case but I will normalize the variables using >> >>>>>> >>>>> scale(data$variable). >> > > Please provide a reproducible example, >>>>> either >>>>> with a link to a data > subset, or using a builtin data set. My guess >>>>> is >>>>> that you do not need > bivariate spatial correlation at all, but >>>>> rather >>>>> a >>>>> spatial regression. > > The "causal" variable would then the the >>>>> proportion >>>>> of jobs accessible > within 60 minutes by transit, though this is >>>>> extremely >>>>> blunt, and lots of > other covariates (demography, etc.) impact average >>>>> household income per > capita (per block/tract?). Since there are many >>>>> missing variables in your > specification, any spatial correlation >>>>> would >>>>> be >>>>> most closely associated > with them (demography, housing costs, >>>>> education, >>>>> etc.), and the choice of > units of measurement would dominate the >>>>> outcome. >>>>> >>>>> This is also why bivariate spatial correlation is seldom a good idea, >>>>>> >>>>>>> >>>>>>> I > believe. It can be done, but most likely shouldn't, unless it can >>>>>> >>>>> be > >>>>> motivated properly. > > By the way, the weighted and FDRcorrected SAD >>>>> local Moran's I pvalues of > the black/white ratio for Oregon (your >>>>> toy >>>>> example) did deliver the goods  > if you zoom in in mapview::mapview, >>>>> you >>>>> can see that it detects a rate > hotspot between the rivers. > > Roger >>>>> > >>>>> >>>>>> >>>>>> >>>>> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, >>>>>> >>>>>>> >>>>>>>> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira >>>>>>> >>>>>> wrote: >>> >>>>> >>>>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted >>>>>> >>>>>>> >>>>>>>> bivariate spatial >>>> correlation in R. >>>> >>>> I know the >>>>>>> >>>>>> bivariate >>>>> Moran's I is not implemented in the spdep library. >>>> I left a >>>>> question >>>>> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have >>>>> come >>>>> across this. >>>> https://stackoverflow.com/questions/45177590/mapof >>>>> bivariat >>>> espatialcorrelationinrbivariatelisa >>>> >>>> I >>>>> also >>>>> know Roger Bivand has implemented the L index proposed by Lee >>>> >>>>> (2001) >>>>> >>>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>>>> >>>>>>> coefficients can be interpreted the same way as the local Moran's I >>>>>>>>> coefficients. I couldn't find any reference commenting on this >>>>>>>>> issue. >>>>>>>>> >>>>>>>>> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> >>>>>>>> >>>>>>> In the >>>>> SO question, and in the followup, your presumably throwaway >>> >>>>> example >>>>> makes fundamental mistakes. The code in spdep by Virgilio >>> >>>>> GómezRubio >>>>> is for uni and bivariate L, and produces point values of >>> local >>> >>>>> L. >>>>> This isn't the main problem, which is rather that you are not taking >>>>> >>> >>>>> account of the underlying population counts, nor shrinking any >>>>> estimates >>>>> >>>>> of >>> significance to accommodate population sizes. Population sizes >>>>>> >>>>>>> >>>>>>>> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and >>>>>>> upper >>>>>>> >>>>>> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing >>>>> rates >>>>> in >>> stead? >>> These are also compositional variables (sum to >>>>> pop2000, >>>>> or 1 if rates) >>> with >>> the other missing components. You would >>>>> probably be better served by >>> tools >>> examining spatial >>>>> segregation, >>>>> such as for example the seg package. >>> >>> The 0 count populations >>>>> cause >>>>> problems for an unofficial alternative, the >>> black/white ratio: >>> >>>>> >>>>>> >>>>>>>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which >>>>> should >>>>> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising >>>>> this, >>>>> but running localmoran.sad on this model output >>> yields SAD >>>>> pvalues < >>>>> 0.05 after FDR correction only in contiguous tracts >>> on the >>>>> Washington >>>>> state line in Portland between the Columbia and >>> Williamette rivers. >>>>> So >>>>> do look at the variables you are using before >>> rushing into things. >>>>> >>>>>> >>>>>>>> >>>>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael >>>>>> >>>>>>> >>>>>>>> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> >>>>>>> >>>>>> [[alternative HTML version deleted]] >>>> >>>> >>>>> _______________________________________________ >>>> RsigGeo mailing >>>>> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/ >>>>> listinfo/rsiggeo >>>> >>>> >>>>  >>> Roger Bivand >>> Department of >>>>> Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 >>>>> Bergen, >>>>> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; email: >>>>> [hidden email] >>> EditorinChief of The R Journal, >>>>> https://journal.rproject.org/ >>> index.html >>> >>>>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/ >>>>> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version >>>>> deleted]] >> >> _______________________________________________ >> >>>>> RsigGeo mailing list >> [hidden email] >> >>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > >  > Roger >>>>> Bivand >>>>> >>>>> Department of Economics, Norwegian School of Economics, > Helleveien >>>>>> 30, >>>>>> >>>>>> N5045 Bergen, Norway. > voice: +47 55 95 93 55 >>>>> <+47%2055%2095%2093%2055>; >>>>> email: [hidden email] > EditorinChief of The R Journal, >>>>> https://journal.rproject.org/index.html > http://orcid.org/00000003 >>>>> 23926140 > https://scholar.google.no/cita >>>>> tions?user=AWeghB0AAAAJ&hl=en >>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] ______________________________ >>>>> _________________ >>>>> RsigGeo mailing list [hidden email] >>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>>>> >>>>> >>>>> >>>>> >>>>  >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, Norway. >>> voice: +47 55 95 93 55; email: [hidden email] >>> EditorinChief of The R Journal, https://journal.rproject.org/ >>> index.html >>> http://orcid.org/0000000323926140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >>> >> >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] > EditorinChief of The R Journal, https://journal.rproject.org/index.html > http://orcid.org/0000000323926140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Free forum by Nabble  Edit this page 