Dear List
As I was working on a project, I realized that when I use dnearneigh from spdep, two (or more) points that have the exact same coordinates are not considered neighbours and thus are not linked (even when the lower bound is put to 0 or even to 1). See below for an example. (However this does not happen if the parameter longlat is set to false) Does the function behave the same way for you? Am I missing something? Is this an expected behavior? And if so, if there a way to change that ? In the example below, points 1 and 2 are not connected to each other/are not neighbours (as you can see since the both have only one link, to 3), even though they have the exact same coordinates (and are thus less than 25km apart), while point 3 is connected to both point 1 and 2. If I want to assess autocorrelation using, for instance joincount.test, this is then an issue... >/library(data.table) />/library(spdep) />/pointstable < data.table(XCoord=c(13.667029,13.667029,13.667028), /YCoord=c(42.772396,42.772396,42.772396)) >/print(pointstable) / XCoord YCoord 1: 13.667029 42.772396 2: 13.667029 42.772396 3: 13.667028 42.772396 >/coords <cbind(pointstable$XCoord, pointstable$YCoord) />/nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = TRUE) />/nbLocal< dnearneigh(coords, d1=1, d2=25, longlat = TRUE) #both lines /produce the same output >/summary(nbLocal) /Neighbour list object: Number of regions: 3 Number of nonzero links: 4 Percentage nonzero weights: 44.44444 Average number of links: 1.333333 Link number distribution: 1 2 2 1 2 least connected regions: 1 2 with 1 link 1 most connected region: 3 with 2 links >// Thanks Maël [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Do not post HTMLmail, only plain text. Your example is not reproducible
Please read the help file, the bounds are described as being between lower (greater than) and upper (less than or equal to) bounds. Since the distance between identical points is strictly zero, they are not neighbours because the distance must be > d1 and <= d2. If d1 is < 0, it is reset to 0, as it is assumed that a negative lower bound is a user error (and it would break the underlying compiled code). In any case, no reasonable crosssectional spatial process has duplicated point (nugget) observations in situations in which spatial weights would be used (spatiotemporal panels will have, but then time differs). Hope this clarifies, Roger
Dear Roger,
Thank you for your answer, (And sorry for the HTML posting). The issue persists if I specify "GE" for the lower bound, but only when the parameter latlong is set to TRUE (see example below). Regarding the nature of my data, it is a series of record of Jews arrested during the Holocaust in Italy. Those are point data, and some people have been arrested at the same place and at the same time (hence my problem). I am trying to assess spatial autocorrelation for a binary attribute (whether they survived the Holocaust or not), and I plan to use a Joincount method, for which I need a spatial weight matrix. Is using Joincount on such a dataset wrong ? Best Code: library(data.table) library(spdep) pointstable < data.table(XCoord=c(13.667029,13.667029,13.667028), YCoord=c(42.772396,42.772396,42.772396)) print(pointstable) coords <cbind(pointstable$XCoord, pointstable$YCoord) nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = TRUE, bound = c("GE", "LE")) summary(nbLocal) nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = FALSE, bound = c("GE", "LE")) summary(nbLocal) Output: > print(pointstable) XCoord YCoord 1: 13.66703 42.7724 2: 13.66703 42.7724 3: 13.66703 42.7724 > nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = TRUE, bound = c("GE", "LE")) > summary(nbLocal) Neighbour list object: Number of regions: 3 Number of nonzero links: 4 Percentage nonzero weights: 44.44444 Average number of links: 1.333333 Link number distribution: 1 2 2 1 2 least connected regions: 1 2 with 1 link 1 most connected region: 3 with 2 links > nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = FALSE, bound = c("GE", "LE")) > summary(nbLocal) Neighbour list object: Number of regions: 3 Number of nonzero links: 6 Percentage nonzero weights: 66.66667 Average number of links: 2 Link number distribution: 2 3 3 least connected regions: 1 2 3 with 2 links 3 most connected regions: 1 2 3 with 2 links 
On Wed, 12 Apr 2017, Maël Le Noc wrote:
> Dear Roger, > Thank you for your answer, (And sorry for the HTML posting). > > The issue persists if I specify "GE" for the lower bound, but only when > the parameter latlong is set to TRUE (see example below). Thanks, very useful. The Great Circle distance measure returned NotANumber for zero distance, because of an unprotected division by zero. I've committed a patched source version to RForge. Look on https://rforge.rproject.org/R/?group_id=182 later today for a version with today's date and Rev: 693  should show up mid to late evening CEST. Please say whether this performs as expected. > > > Regarding the nature of my data, it is a series of record of Jews > arrested during the Holocaust in Italy. Those are point data, and some > people have been arrested at the same place and at the same time (hence > my problem). I am trying to assess spatial autocorrelation for a binary > attribute (whether they survived the Holocaust or not), and I plan to > use a Joincount method, for which I need a spatial weight matrix. Is > using Joincount on such a dataset wrong ? Joincount should be OK, but if you have covariates you could try to remove the mean model first and only then see whether there is a spatially structured random effect, for example with hglm, R2BayesX, INLA, or similar. For hglm see for example: https://journal.rproject.org/archive/2015/RJ2015017/index.html The data you most likely do not have (addresses with residents at risk of arrest but not arrested) would also help, giving you a risk of arrest measure by address. There is also a spatial probit literature that might be relevant; if you have timestamps, you will likely find that operational factors play in, with arrests in a small area at the same time. Hope this helps, Roger > > Best > > > > Code: > > library(data.table) > library(spdep) > pointstable < data.table(XCoord=c(13.667029,13.667029,13.667028), > YCoord=c(42.772396,42.772396,42.772396)) > print(pointstable) > coords <cbind(pointstable$XCoord, pointstable$YCoord) > nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = TRUE, bound = > c("GE", "LE")) > summary(nbLocal) > nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = FALSE, bound = > c("GE", "LE")) > summary(nbLocal) > > > Output: >> print(pointstable) > XCoord YCoord > 1: 13.66703 42.7724 > 2: 13.66703 42.7724 > 3: 13.66703 42.7724 > >> nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = TRUE, bound = > c("GE", "LE")) >> summary(nbLocal) > Neighbour list object: > Number of regions: 3 > Number of nonzero links: 4 > Percentage nonzero weights: 44.44444 > Average number of links: 1.333333 > Link number distribution: > > 1 2 > 2 1 > 2 least connected regions: > 1 2 with 1 link > 1 most connected region: > 3 with 2 links > >> nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = FALSE, bound = > c("GE", "LE")) >> summary(nbLocal) > Neighbour list object: > Number of regions: 3 > Number of nonzero links: 6 > Percentage nonzero weights: 66.66667 > Average number of links: 2 > Link number distribution: > > 2 > 3 > 3 least connected regions: > 1 2 3 with 2 links > 3 most connected regions: > 1 2 3 with 2 links > > > > On 12/04/2017 02:27, Roger Bivand wrote: >> Do not post HTMLmail, only plain text. Your example is not reproducible >> because you used HTMLmail. >> >> Please read the help file, the bounds are described as being between >> lower (greater than) and upper (less than or equal to) bounds. Since the >> distance between identical points is strictly zero, they are not >> neighbours because the distance must be > d1 and <= d2. If d1 is < 0, it >> is reset to 0, as it is assumed that a negative lower bound is a user >> error (and it would break the underlying compiled code). >> >> In any case, no reasonable crosssectional spatial process has >> duplicated point (nugget) observations in situations in which spatial >> weights would be used (spatiotemporal panels will have, but then time >> differs). >> >> Hope this clarifies, >> >> Roger >> >> On Wed, 12 Apr 2017, Maël Le Noc via RsigGeo wrote: >> >>> Dear List >>> >>> As I was working on a project, I realized that when I use dnearneigh >>> from spdep, two (or more) points that have the exact same coordinates >>> are not considered neighbours and thus are not linked (even when the >>> lower bound is put to 0 or even to 1). See below for an example. >>> (However this does not happen if the parameter longlat is set to false) >>> >>> Does the function behave the same way for you? Am I missing something? >>> Is this an expected behavior? And if so, if there a way to change that ? >>> >>> In the example below, points 1 and 2 are not connected to each other/are >>> not neighbours (as you can see since the both have only one link, to 3), >>> even though they have the exact same coordinates (and are thus less than >>> 25km apart), while point 3 is connected to both point 1 and 2. >>> If I want to assess autocorrelation using, for instance joincount.test, >>> this is then an issue... >>> >>>> /library(data.table) />/library(spdep) />/pointstable < >>>> data.table(XCoord=c(13.667029,13.667029,13.667028), >>>> /YCoord=c(42.772396,42.772396,42.772396)) >>>> /print(pointstable) / XCoord YCoord >>> 1: 13.667029 42.772396 >>> 2: 13.667029 42.772396 >>> 3: 13.667028 42.772396 >>>> /coords <cbind(pointstable$XCoord, pointstable$YCoord) />/nbLocal< >>>> dnearneigh(coords, d1=0, d2=25, longlat = TRUE) />/nbLocal< >>>> dnearneigh(coords, d1=1, d2=25, longlat = TRUE) #both lines /produce >>>> the same output >>>> /summary(nbLocal) /Neighbour list object: >>> Number of regions: 3 >>> Number of nonzero links: 4 >>> Percentage nonzero weights: 44.44444 >>> Average number of links: 1.333333 >>> Link number distribution: >>> >>> 1 2 >>> 2 1 >>> 2 least connected regions: >>> 1 2 with 1 link >>> 1 most connected region: >>> 3 with 2 links >>>> // >>> Thanks >>> Maël >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> RsigGeo mailing list >>> [hidden email] >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >> > > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] EditorinChief of The R Journal, https://journal.rproject.org/index.html http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Thank you Roger,
Your patched version now works as expected! And thank you for your suggestions, I am currently looking into all of that. Best Maël Those are point data, and some >> people have been arrested at the same place and at the same time (hence >> my problem). I am trying to assess spatial autocorrelation for a binary >> attribute (whether they survived the Holocaust or not), and I plan to >> use a Joincount method, for which I need a spatial weight matrix. Is >> using Joincount on such a dataset wrong ? > > Joincount should be OK, but if you have covariates you could try to > remove the mean model first and only then see whether there is a > spatially structured random effect, for example with hglm, R2BayesX, > INLA, or similar. For hglm see for example: > > https://journal.rproject.org/archive/2015/RJ2015017/index.html > > The data you most likely do not have (addresses with residents at risk > of arrest but not arrested) would also help, giving you a risk of > arrest measure by address. There is also a spatial probit literature > that might be relevant; if you have timestamps, you will likely find > that operational factors play in, with arrests in a small area at the > same time. > > Hope this helps, > > Roger > >> >> Best >> >> >> >> Code: >> >> library(data.table) >> library(spdep) >> pointstable < data.table(XCoord=c(13.667029,13.667029,13.667028), >> YCoord=c(42.772396,42.772396,42.772396)) >> print(pointstable) >> coords <cbind(pointstable$XCoord, pointstable$YCoord) >> nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = TRUE, bound = >> c("GE", "LE")) >> summary(nbLocal) >> nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = FALSE, bound = >> c("GE", "LE")) >> summary(nbLocal) >> >> >> Output: >>> print(pointstable) >> XCoord YCoord >> 1: 13.66703 42.7724 >> 2: 13.66703 42.7724 >> 3: 13.66703 42.7724 >> >>> nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = TRUE, bound = >> c("GE", "LE")) >>> summary(nbLocal) >> Neighbour list object: >> Number of regions: 3 >> Number of nonzero links: 4 >> Percentage nonzero weights: 44.44444 >> Average number of links: 1.333333 >> Link number distribution: >> >> 1 2 >> 2 1 >> 2 least connected regions: >> 1 2 with 1 link >> 1 most connected region: >> 3 with 2 links >> >>> nbLocal< dnearneigh(coords, d1=0, d2=25, longlat = FALSE, bound = >> c("GE", "LE")) >>> summary(nbLocal) >> Neighbour list object: >> Number of regions: 3 >> Number of nonzero links: 6 >> Percentage nonzero weights: 66.66667 >> Average number of links: 2 >> Link number distribution: >> >> 2 >> 3 >> 3 least connected regions: >> 1 2 3 with 2 links >> 3 most connected regions: >> 1 2 3 with 2 links >> >> >> >> On 12/04/2017 02:27, Roger Bivand wrote: >>> Do not post HTMLmail, only plain text. Your example is not >>> reproducible >>> because you used HTMLmail. >>> >>> Please read the help file, the bounds are described as being between >>> lower (greater than) and upper (less than or equal to) bounds. Since >>> the >>> distance between identical points is strictly zero, they are not >>> neighbours because the distance must be > d1 and <= d2. If d1 is < >>> 0, it >>> is reset to 0, as it is assumed that a negative lower bound is a user >>> error (and it would break the underlying compiled code). >>> >>> In any case, no reasonable crosssectional spatial process has >>> duplicated point (nugget) observations in situations in which spatial >>> weights would be used (spatiotemporal panels will have, but then time >>> differs). >>> >>> Hope this clarifies, >>> >>> Roger >>> >>> On Wed, 12 Apr 2017, Maël Le Noc via RsigGeo wrote: >>> >>>> Dear List >>>> >>>> As I was working on a project, I realized that when I use dnearneigh >>>> from spdep, two (or more) points that have the exact same coordinates >>>> are not considered neighbours and thus are not linked (even when the >>>> lower bound is put to 0 or even to 1). See below for an example. >>>> (However this does not happen if the parameter longlat is set to >>>> false) >>>> >>>> Does the function behave the same way for you? Am I missing something? >>>> Is this an expected behavior? And if so, if there a way to change >>>> that ? >>>> >>>> In the example below, points 1 and 2 are not connected to each >>>> other/are >>>> not neighbours (as you can see since the both have only one link, >>>> to 3), >>>> even though they have the exact same coordinates (and are thus less >>>> than >>>> 25km apart), while point 3 is connected to both point 1 and 2. >>>> If I want to assess autocorrelation using, for instance >>>> joincount.test, >>>> this is then an issue... >>>> >>>>> /library(data.table) />/library(spdep) />/pointstable < >>>>> data.table(XCoord=c(13.667029,13.667029,13.667028), >>>>> /YCoord=c(42.772396,42.772396,42.772396)) >>>>> /print(pointstable) / XCoord YCoord >>>> 1: 13.667029 42.772396 >>>> 2: 13.667029 42.772396 >>>> 3: 13.667028 42.772396 >>>>> /coords <cbind(pointstable$XCoord, pointstable$YCoord) />/nbLocal< >>>>> dnearneigh(coords, d1=0, d2=25, longlat = TRUE) />/nbLocal< >>>>> dnearneigh(coords, d1=1, d2=25, longlat = TRUE) #both lines /produce >>>>> the same output >>>>> /summary(nbLocal) /Neighbour list object: >>>> Number of regions: 3 >>>> Number of nonzero links: 4 >>>> Percentage nonzero weights: 44.44444 >>>> Average number of links: 1.333333 >>>> Link number distribution: >>>> >>>> 1 2 >>>> 2 1 >>>> 2 least connected regions: >>>> 1 2 with 1 link >>>> 1 most connected region: >>>> 3 with 2 links >>>>> // >>>> Thanks >>>> Maël >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> RsigGeo mailing list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo >>> >> >> > _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
