Hi all,
Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. # Create a data frame of 10 voters, picked at random voter.1 = c(1, 75.52187, 40.62320) voter.2 = c(2,75.56373, 40.55216) voter.3 = c(3,75.39587, 40.55416) voter.4 = c(4,75.42248, 40.64326) voter.5 = c(5,75.56654, 40.54948) voter.6 = c(6,75.56257, 40.67375) voter.7 = c(7, 75.51888, 40.59715) voter.8 = c(8, 75.59879, 40.60014) voter.9 = c(9, 75.59879, 40.60014) voter.10 = c(10, 75.50877, 40.53129) # Bind the vectors together voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) # Rename the columns colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") # Change the class from a matrix to a data frame voter.subset = as.data.frame(voter.subset) # Load in the required packages library(spdep) library(sp) # Set the coordinates coordinates(voter.subset) = c("Longitude", "Latitude") coords = coordinates(voter.subset) # Jitter to ensure no duplicate points coords = jitter(coords, factor = 1) # Find the first nearest neighbor of each point one.nn = knearneigh(coords, k=1) # Convert the first nearest neighbor to format "nb" one.nn_nb = knn2nb(one.nn, sym = F) Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. Warmest, Ben  Benjamin Lieberman Muhlenberg College 2019 Mobile: 301.299.8928 [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Dear Benjamin,
I'm not sure how you define "first order neighbors" for a point. The first thing that comes to my mind is to use their corresponding voronoi polygons and define neighborhood from there. Following your code: v < dismo::voronoi(coords) par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) plot(coords, type = "n", xlab = NA, ylab = NA) plot(v, add = TRUE) text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) plot(coords, type = "n", xlab = NA, ylab = NA) plot(poly2nb(v), coords, add = TRUE, col = "gray") ƒacu. On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: > Hi all, > > Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. > > While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. > > There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. > > Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. > > # Create a data frame of 10 voters, picked at random > voter.1 = c(1, 75.52187, 40.62320) > voter.2 = c(2,75.56373, 40.55216) > voter.3 = c(3,75.39587, 40.55416) > voter.4 = c(4,75.42248, 40.64326) > voter.5 = c(5,75.56654, 40.54948) > voter.6 = c(6,75.56257, 40.67375) > voter.7 = c(7, 75.51888, 40.59715) > voter.8 = c(8, 75.59879, 40.60014) > voter.9 = c(9, 75.59879, 40.60014) > voter.10 = c(10, 75.50877, 40.53129) > > # Bind the vectors together > voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) > > # Rename the columns > colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") > > # Change the class from a matrix to a data frame > voter.subset = as.data.frame(voter.subset) > > # Load in the required packages > library(spdep) > library(sp) > > # Set the coordinates > coordinates(voter.subset) = c("Longitude", "Latitude") > coords = coordinates(voter.subset) > > # Jitter to ensure no duplicate points > coords = jitter(coords, factor = 1) > > # Find the first nearest neighbor of each point > one.nn = knearneigh(coords, k=1) > > # Convert the first nearest neighbor to format "nb" > one.nn_nb = knn2nb(one.nn, sym = F) > > Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. > > Warmest, > Ben >  > Benjamin Lieberman > Muhlenberg College 2019 > Mobile: 301.299.8928 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Fri, 13 Jul 2018, Facundo Muñoz wrote:
> Dear Benjamin, > > I'm not sure how you define "first order neighbors" for a point. The > first thing that comes to my mind is to use their corresponding voronoi > polygons and define neighborhood from there. Following your code: Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point. Also note that voronoi and other graphbased neighbours should only use planar coordinates  including dismo::voronoi, which uses deldir::deldir()  just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull. > > v < dismo::voronoi(coords) > par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) > plot(coords, type = "n", xlab = NA, ylab = NA) > plot(v, add = TRUE) > text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) > plot(coords, type = "n", xlab = NA, ylab = NA) > plot(poly2nb(v), coords, add = TRUE, col = "gray") > > ƒacu. > > > On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: >> Hi all, >> >> Currently, I am working with U.S. voter data. Below, I included a brief >> example of the structure of the data with some reproducible code. My >> data set consists of roughly 233,000 (233k) entries, each specifying a >> voter and their particular latitude/longitude pair. subject to the strictest privacy rules. Voter data does not in essence have position  the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity. Why does position and voter data not have position? Which location should you use  residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data"  you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.). >> I have been using the spdep package with the hope of creating a CAR >> model. To begin the analysis, we need to find all first order neighbors >> of every point in the data. >> >> While spdep has fantastic commands for finding k nearest neighbors >> (knearneigh), and a useful command for finding lag of order 3 or more >> (nblag), I have yet to find a method which is suitable for our purposes >> (lag = 1, or lag =2). Additionally, I looked into altering the nblag >> command to accommodate maxlag = 1 or maxlag = 2, but the command relies >> on an nb format, which is problematic as we are looking for the >> underlying neighborhood structure. >> >> There has been numerous work done with polygons, or data which already >> is in “nb” format, but after reading the literature, it seems that >> polygons are not appropriate, nor are distance based neighbor >> techniques, due to density fluctuations over the area of interest. >> >> Below is some reproducible code I wrote. I would like to note that I am >> currently working in R 1.1.453 on a MacBook. >> >> # Create a data frame of 10 voters, picked at random >> voter.1 = c(1, 75.52187, 40.62320) >> voter.2 = c(2,75.56373, 40.55216) >> voter.3 = c(3,75.39587, 40.55416) >> voter.4 = c(4,75.42248, 40.64326) >> voter.5 = c(5,75.56654, 40.54948) >> voter.6 = c(6,75.56257, 40.67375) >> voter.7 = c(7, 75.51888, 40.59715) >> voter.8 = c(8, 75.59879, 40.60014) >> voter.9 = c(9, 75.59879, 40.60014) >> voter.10 = c(10, 75.50877, 40.53129) >> >> # Bind the vectors together >> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) >> >> # Rename the columns >> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") >> >> # Change the class from a matrix to a data frame >> voter.subset = as.data.frame(voter.subset) >> >> # Load in the required packages >> library(spdep) >> library(sp) >> >> # Set the coordinates >> coordinates(voter.subset) = c("Longitude", "Latitude") >> coords = coordinates(voter.subset) >> >> # Jitter to ensure no duplicate points >> coords = jitter(coords, factor = 1) >> >> # Find the first nearest neighbor of each point >> one.nn = knearneigh(coords, k=1) See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar). >> >> # Convert the first nearest neighbor to format "nb" >> one.nn_nb = knn2nb(one.nn, sym = F) >> >> Thank you in advance for any help you may offer, and for taking the >> time to read this. I have consulted Applied Spatial Data Analysis with >> R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the >> spdep documentation, and the nb vignette (Bivand, April 3, 2018) from >> earlier this year. >> >> Warmest, >> Ben >>  >> Benjamin Lieberman >> Muhlenberg College 2019 >> Mobile: 301.299.8928 >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> RsigGeo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/rsiggeo > > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Roger anf Facu,
Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance. I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses HighOrder Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, nonplanarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here. If this is something that does not seem feasible, maybe another tactic is necessary. Again, thank you all for the help. Warmest  Benjamin Lieberman Muhlenberg College 2019 Mobile: 301.299.8928 > On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email]> wrote: > > On Fri, 13 Jul 2018, Facundo Muñoz wrote: > >> Dear Benjamin, >> >> I'm not sure how you define "first order neighbors" for a point. The >> first thing that comes to my mind is to use their corresponding voronoi >> polygons and define neighborhood from there. Following your code: > > Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point. > > Also note that voronoi and other graphbased neighbours should only use planar coordinates  including dismo::voronoi, which uses deldir::deldir()  just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull. > >> >> v < dismo::voronoi(coords) >> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) >> plot(coords, type = "n", xlab = NA, ylab = NA) >> plot(v, add = TRUE) >> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) >> plot(coords, type = "n", xlab = NA, ylab = NA) >> plot(poly2nb(v), coords, add = TRUE, col = "gray") >> >> ƒacu. >> >> >> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: >>> Hi all, >>> >>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. > > Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position  the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity. > > Why does position and voter data not have position? Which location should you use  residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data"  you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.). > >>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. >>> >>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. >>> >>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. >>> >>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. > > You mean RStudio, there is no such version of R. > >>> >>> # Create a data frame of 10 voters, picked at random >>> voter.1 = c(1, 75.52187, 40.62320) >>> voter.2 = c(2,75.56373, 40.55216) >>> voter.3 = c(3,75.39587, 40.55416) >>> voter.4 = c(4,75.42248, 40.64326) >>> voter.5 = c(5,75.56654, 40.54948) >>> voter.6 = c(6,75.56257, 40.67375) >>> voter.7 = c(7, 75.51888, 40.59715) >>> voter.8 = c(8, 75.59879, 40.60014) >>> voter.9 = c(9, 75.59879, 40.60014) >>> voter.10 = c(10, 75.50877, 40.53129) >>> > > These are in geographical coordinates. > >>> # Bind the vectors together >>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) >>> >>> # Rename the columns >>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") >>> >>> # Change the class from a matrix to a data frame >>> voter.subset = as.data.frame(voter.subset) >>> >>> # Load in the required packages >>> library(spdep) >>> library(sp) >>> >>> # Set the coordinates >>> coordinates(voter.subset) = c("Longitude", "Latitude") >>> coords = coordinates(voter.subset) >>> >>> # Jitter to ensure no duplicate points >>> coords = jitter(coords, factor = 1) >>> > > jitter does not respect geographical coordinated (decimal degree metric). > >>> # Find the first nearest neighbor of each point >>> one.nn = knearneigh(coords, k=1) > > See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar). > >>> >>> # Convert the first nearest neighbor to format "nb" >>> one.nn_nb = knn2nb(one.nn, sym = F) >>> >>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. >>> >>> Warmest, >>> Ben >>>  >>> Benjamin Lieberman >>> Muhlenberg College 2019 >>> Mobile: 301.299.8928 >>> >>> >>> >>> [[alternative HTML version deleted]] > > Plain text only, please. > >>> >>> _______________________________________________ >>> RsigGeo mailing list >>> [hidden email] <mailto:[hidden email]> >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> RsigGeo mailing list >> [hidden email] <mailto:[hidden email]> >> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >> > >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> > http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________> > RsigGeo mailing list > [hidden email] <mailto:[hidden email]> > https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
All
I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data.  Benjamin Lieberman Muhlenberg College 2019 Mobile: 301.299.8928 > On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote: > > Roger anf Facu, > > Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance. > > I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses HighOrder Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, nonplanarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here. > > If this is something that does not seem feasible, maybe another tactic is necessary. > > Again, thank you all for the help. > > Warmest >  > Benjamin Lieberman > Muhlenberg College 2019 > Mobile: 301.299.8928 > >> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]>> wrote: >> >> On Fri, 13 Jul 2018, Facundo Muñoz wrote: >> >>> Dear Benjamin, >>> >>> I'm not sure how you define "first order neighbors" for a point. The >>> first thing that comes to my mind is to use their corresponding voronoi >>> polygons and define neighborhood from there. Following your code: >> >> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point. >> >> Also note that voronoi and other graphbased neighbours should only use planar coordinates  including dismo::voronoi, which uses deldir::deldir()  just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull. >> >>> >>> v < dismo::voronoi(coords) >>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) >>> plot(coords, type = "n", xlab = NA, ylab = NA) >>> plot(v, add = TRUE) >>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) >>> plot(coords, type = "n", xlab = NA, ylab = NA) >>> plot(poly2nb(v), coords, add = TRUE, col = "gray") >>> >>> ƒacu. >>> >>> >>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: >>>> Hi all, >>>> >>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. >> >> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position  the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity. >> >> Why does position and voter data not have position? Which location should you use  residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data"  you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.). >> >>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. >>>> >>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. >>>> >>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. >>>> >>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. >> >> You mean RStudio, there is no such version of R. >> >>>> >>>> # Create a data frame of 10 voters, picked at random >>>> voter.1 = c(1, 75.52187, 40.62320) >>>> voter.2 = c(2,75.56373, 40.55216) >>>> voter.3 = c(3,75.39587, 40.55416) >>>> voter.4 = c(4,75.42248, 40.64326) >>>> voter.5 = c(5,75.56654, 40.54948) >>>> voter.6 = c(6,75.56257, 40.67375) >>>> voter.7 = c(7, 75.51888, 40.59715) >>>> voter.8 = c(8, 75.59879, 40.60014) >>>> voter.9 = c(9, 75.59879, 40.60014) >>>> voter.10 = c(10, 75.50877, 40.53129) >>>> >> >> These are in geographical coordinates. >> >>>> # Bind the vectors together >>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) >>>> >>>> # Rename the columns >>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") >>>> >>>> # Change the class from a matrix to a data frame >>>> voter.subset = as.data.frame(voter.subset) >>>> >>>> # Load in the required packages >>>> library(spdep) >>>> library(sp) >>>> >>>> # Set the coordinates >>>> coordinates(voter.subset) = c("Longitude", "Latitude") >>>> coords = coordinates(voter.subset) >>>> >>>> # Jitter to ensure no duplicate points >>>> coords = jitter(coords, factor = 1) >>>> >> >> jitter does not respect geographical coordinated (decimal degree metric). >> >>>> # Find the first nearest neighbor of each point >>>> one.nn = knearneigh(coords, k=1) >> >> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar). >> >>>> >>>> # Convert the first nearest neighbor to format "nb" >>>> one.nn_nb = knn2nb(one.nn, sym = F) >>>> >>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. >>>> >>>> Warmest, >>>> Ben >>>>  >>>> Benjamin Lieberman >>>> Muhlenberg College 2019 >>>> Mobile: 301.299.8928 >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >> >> Plain text only, please. >> >>>> >>>> _______________________________________________ >>>> RsigGeo mailing list >>>> [hidden email] <mailto:[hidden email]> >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> RsigGeo mailing list >>> [hidden email] <mailto:[hidden email]> >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>> >> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> >> http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________> >> RsigGeo mailing list >> [hidden email] <mailto:[hidden email]> >> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

In reply to this post by BL250604
On Fri, 13 Jul 2018, Benjamin Lieberman wrote:
> Roger anf Facu, > > Thank you very much for the help. In terms of the data, I only provided > the ID and Lat/Long pairs because they were the only covariates which > were necessary. The data set we are using was purchased and contains > voter registration information, voter history, and census tract > information, after some geocoding took place. The locations are the > residents houses, in this instance. > > I have rerun the knn with longlat = T, but I am still hung up on the > idea of the first order neighbors. I have reread the vignette and > section 5 discusses HighOrder Neighbors, but there isn’t any mention of > first or second order neighbors, as you mentioned above (“first order > neighbors are not defined”). One of the pieces of literature I found > said that polygons are problematic to work with, as are tesslations for > precisely the reason you mentioned, nonplanarity. For this reason, I am > hung up on the idea of how to find all first order neighbors for a > point, especially as the number of first order neighbors varies from > point to point, and such knearneigh would not be appropriate here. methods (or knn). You still have not defined "first order neighbors". That is your call alone. If you believe that voter behaviour is like a contagious disease, define contagion, and from that "first order neighbours". If you are simply accounting for missing background covariates that have a larger spatial footprint rather than votervoter interaction, it probably doesn't matter much. What is the implied model here  that voters behave by observing the behaviour of their proximate neighbours (giving similar behaviour for near neighbours) or that voters are patched/segregated by residence, and near neighbours behave similarly not because of information spillovers between voters, but because the voters are subject to aggregate social/economic conditions? Roger > > If this is something that does not seem feasible, maybe another tactic > is necessary. > > Again, thank you all for the help. > > Warmest >  > Benjamin Lieberman > Muhlenberg College 2019 > Mobile: 301.299.8928 > >> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email]> wrote: >> >> On Fri, 13 Jul 2018, Facundo Muñoz wrote: >> >>> Dear Benjamin, >>> >>> I'm not sure how you define "first order neighbors" for a point. The >>> first thing that comes to my mind is to use their corresponding voronoi >>> polygons and define neighborhood from there. Following your code: >> >> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point. >> >> Also note that voronoi and other graphbased neighbours should only use planar coordinates  including dismo::voronoi, which uses deldir::deldir()  just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull. >> >>> >>> v < dismo::voronoi(coords) >>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) >>> plot(coords, type = "n", xlab = NA, ylab = NA) >>> plot(v, add = TRUE) >>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) >>> plot(coords, type = "n", xlab = NA, ylab = NA) >>> plot(poly2nb(v), coords, add = TRUE, col = "gray") >>> >>> ƒacu. >>> >>> >>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: >>>> Hi all, >>>> >>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. >> >> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position  the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity. >> >> Why does position and voter data not have position? Which location should you use  residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data"  you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.). >> >>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. >>>> >>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. >>>> >>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. >>>> >>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. >> >> You mean RStudio, there is no such version of R. >> >>>> >>>> # Create a data frame of 10 voters, picked at random >>>> voter.1 = c(1, 75.52187, 40.62320) >>>> voter.2 = c(2,75.56373, 40.55216) >>>> voter.3 = c(3,75.39587, 40.55416) >>>> voter.4 = c(4,75.42248, 40.64326) >>>> voter.5 = c(5,75.56654, 40.54948) >>>> voter.6 = c(6,75.56257, 40.67375) >>>> voter.7 = c(7, 75.51888, 40.59715) >>>> voter.8 = c(8, 75.59879, 40.60014) >>>> voter.9 = c(9, 75.59879, 40.60014) >>>> voter.10 = c(10, 75.50877, 40.53129) >>>> >> >> These are in geographical coordinates. >> >>>> # Bind the vectors together >>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) >>>> >>>> # Rename the columns >>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") >>>> >>>> # Change the class from a matrix to a data frame >>>> voter.subset = as.data.frame(voter.subset) >>>> >>>> # Load in the required packages >>>> library(spdep) >>>> library(sp) >>>> >>>> # Set the coordinates >>>> coordinates(voter.subset) = c("Longitude", "Latitude") >>>> coords = coordinates(voter.subset) >>>> >>>> # Jitter to ensure no duplicate points >>>> coords = jitter(coords, factor = 1) >>>> >> >> jitter does not respect geographical coordinated (decimal degree metric). >> >>>> # Find the first nearest neighbor of each point >>>> one.nn = knearneigh(coords, k=1) >> >> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar). >> >>>> >>>> # Convert the first nearest neighbor to format "nb" >>>> one.nn_nb = knn2nb(one.nn, sym = F) >>>> >>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. >>>> >>>> Warmest, >>>> Ben >>>>  >>>> Benjamin Lieberman >>>> Muhlenberg College 2019 >>>> Mobile: 301.299.8928 >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >> >> Plain text only, please. >> >>>> >>>> _______________________________________________ >>>> RsigGeo mailing list >>>> [hidden email] <mailto:[hidden email]> >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> RsigGeo mailing list >>> [hidden email] <mailto:[hidden email]> >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>> >> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> >> http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________> >> RsigGeo mailing list >> [hidden email] <mailto:[hidden email]> >> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Administrator

In reply to this post by BL250604
On Fri, 13 Jul 2018, Benjamin Lieberman wrote:
> All > > I would like to note that as the data is proprietary, and for obvious > privacy concerns, the lat/long pairs were randomly generated, and were > not taken directly from the data. Thanks for the clarification. Note that if the data are a sample, that is not a complete listing for one or more study areas, you don't know who the first order neighbour (the most proximate other voter) is, because that indidivual may not be in the sample. Your fallback then is to treat the data as aggregates, unless you rule out local sampling variability. Roger > > >  > Benjamin Lieberman > Muhlenberg College 2019 > Mobile: 301.299.8928 > >> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote: >> >> Roger anf Facu, >> >> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance. >> >> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses HighOrder Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, nonplanarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here. >> >> If this is something that does not seem feasible, maybe another tactic is necessary. >> >> Again, thank you all for the help. >> >> Warmest >>  >> Benjamin Lieberman >> Muhlenberg College 2019 >> Mobile: 301.299.8928 >> >>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]>> wrote: >>> >>> On Fri, 13 Jul 2018, Facundo Muñoz wrote: >>> >>>> Dear Benjamin, >>>> >>>> I'm not sure how you define "first order neighbors" for a point. The >>>> first thing that comes to my mind is to use their corresponding voronoi >>>> polygons and define neighborhood from there. Following your code: >>> >>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point. >>> >>> Also note that voronoi and other graphbased neighbours should only use planar coordinates  including dismo::voronoi, which uses deldir::deldir()  just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull. >>> >>>> >>>> v < dismo::voronoi(coords) >>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) >>>> plot(coords, type = "n", xlab = NA, ylab = NA) >>>> plot(v, add = TRUE) >>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) >>>> plot(coords, type = "n", xlab = NA, ylab = NA) >>>> plot(poly2nb(v), coords, add = TRUE, col = "gray") >>>> >>>> ƒacu. >>>> >>>> >>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: >>>>> Hi all, >>>>> >>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. >>> >>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position  the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity. >>> >>> Why does position and voter data not have position? Which location should you use  residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data"  you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.). >>> >>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. >>>>> >>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. >>>>> >>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. >>>>> >>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. >>> >>> You mean RStudio, there is no such version of R. >>> >>>>> >>>>> # Create a data frame of 10 voters, picked at random >>>>> voter.1 = c(1, 75.52187, 40.62320) >>>>> voter.2 = c(2,75.56373, 40.55216) >>>>> voter.3 = c(3,75.39587, 40.55416) >>>>> voter.4 = c(4,75.42248, 40.64326) >>>>> voter.5 = c(5,75.56654, 40.54948) >>>>> voter.6 = c(6,75.56257, 40.67375) >>>>> voter.7 = c(7, 75.51888, 40.59715) >>>>> voter.8 = c(8, 75.59879, 40.60014) >>>>> voter.9 = c(9, 75.59879, 40.60014) >>>>> voter.10 = c(10, 75.50877, 40.53129) >>>>> >>> >>> These are in geographical coordinates. >>> >>>>> # Bind the vectors together >>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) >>>>> >>>>> # Rename the columns >>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") >>>>> >>>>> # Change the class from a matrix to a data frame >>>>> voter.subset = as.data.frame(voter.subset) >>>>> >>>>> # Load in the required packages >>>>> library(spdep) >>>>> library(sp) >>>>> >>>>> # Set the coordinates >>>>> coordinates(voter.subset) = c("Longitude", "Latitude") >>>>> coords = coordinates(voter.subset) >>>>> >>>>> # Jitter to ensure no duplicate points >>>>> coords = jitter(coords, factor = 1) >>>>> >>> >>> jitter does not respect geographical coordinated (decimal degree metric). >>> >>>>> # Find the first nearest neighbor of each point >>>>> one.nn = knearneigh(coords, k=1) >>> >>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar). >>> >>>>> >>>>> # Convert the first nearest neighbor to format "nb" >>>>> one.nn_nb = knn2nb(one.nn, sym = F) >>>>> >>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. >>>>> >>>>> Warmest, >>>>> Ben >>>>>  >>>>> Benjamin Lieberman >>>>> Muhlenberg College 2019 >>>>> Mobile: 301.299.8928 >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>> >>> Plain text only, please. >>> >>>>> >>>>> _______________________________________________ >>>>> RsigGeo mailing list >>>>> [hidden email] <mailto:[hidden email]> >>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> RsigGeo mailing list >>>> [hidden email] <mailto:[hidden email]> >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>>> >>> >>>  >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N5045 Bergen, Norway. >>> voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> >>> http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________> >>> RsigGeo mailing list >>> [hidden email] <mailto:[hidden email]> >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> > > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Roger
Thank you so much for the help. In our case, first order neighbors are all neighbors who are adjacent to a voter. Second order neighbors are then all neighbors who are adjacent to the first order neighbors. Hope that this could clarify what I have been referencing this time. I will try the method you suggested, thank you. Best, Ben  Benjamin Lieberman Muhlenberg College 2019 Mobile: 301.299.8928 > On Jul 13, 2018, at 7:30 AM, Roger Bivand <[hidden email]> wrote: > > On Fri, 13 Jul 2018, Benjamin Lieberman wrote: > >> All >> >> I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data. > > Thanks for the clarification. Note that if the data are a sample, that is not a complete listing for one or more study areas, you don't know who the first order neighbour (the most proximate other voter) is, because that indidivual may not be in the sample. Your fallback then is to treat the data as aggregates, unless you rule out local sampling variability. > > Roger > >> >> >>  >> Benjamin Lieberman >> Muhlenberg College 2019 >> Mobile: 301.299.8928 >> >>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote: >>> >>> Roger anf Facu, >>> >>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance. >>> >>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses HighOrder Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, nonplanarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here. >>> >>> If this is something that does not seem feasible, maybe another tactic is necessary. >>> >>> Again, thank you all for the help. >>> >>> Warmest >>>  >>> Benjamin Lieberman >>> Muhlenberg College 2019 >>> Mobile: 301.299.8928 >>> >>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote: >>>> >>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote: >>>> >>>>> Dear Benjamin, >>>>> >>>>> I'm not sure how you define "first order neighbors" for a point. The >>>>> first thing that comes to my mind is to use their corresponding voronoi >>>>> polygons and define neighborhood from there. Following your code: >>>> >>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point. >>>> >>>> Also note that voronoi and other graphbased neighbours should only use planar coordinates  including dismo::voronoi, which uses deldir::deldir()  just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull. >>>> >>>>> >>>>> v < dismo::voronoi(coords) >>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) >>>>> plot(coords, type = "n", xlab = NA, ylab = NA) >>>>> plot(v, add = TRUE) >>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) >>>>> plot(coords, type = "n", xlab = NA, ylab = NA) >>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray") >>>>> >>>>> ƒacu. >>>>> >>>>> >>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: >>>>>> Hi all, >>>>>> >>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. >>>> >>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position  the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity. >>>> >>>> Why does position and voter data not have position? Which location should you use  residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data"  you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.). >>>> >>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. >>>>>> >>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. >>>>>> >>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. >>>>>> >>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. >>>> >>>> You mean RStudio, there is no such version of R. >>>> >>>>>> >>>>>> # Create a data frame of 10 voters, picked at random >>>>>> voter.1 = c(1, 75.52187, 40.62320) >>>>>> voter.2 = c(2,75.56373, 40.55216) >>>>>> voter.3 = c(3,75.39587, 40.55416) >>>>>> voter.4 = c(4,75.42248, 40.64326) >>>>>> voter.5 = c(5,75.56654, 40.54948) >>>>>> voter.6 = c(6,75.56257, 40.67375) >>>>>> voter.7 = c(7, 75.51888, 40.59715) >>>>>> voter.8 = c(8, 75.59879, 40.60014) >>>>>> voter.9 = c(9, 75.59879, 40.60014) >>>>>> voter.10 = c(10, 75.50877, 40.53129) >>>>>> >>>> >>>> These are in geographical coordinates. >>>> >>>>>> # Bind the vectors together >>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) >>>>>> >>>>>> # Rename the columns >>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") >>>>>> >>>>>> # Change the class from a matrix to a data frame >>>>>> voter.subset = as.data.frame(voter.subset) >>>>>> >>>>>> # Load in the required packages >>>>>> library(spdep) >>>>>> library(sp) >>>>>> >>>>>> # Set the coordinates >>>>>> coordinates(voter.subset) = c("Longitude", "Latitude") >>>>>> coords = coordinates(voter.subset) >>>>>> >>>>>> # Jitter to ensure no duplicate points >>>>>> coords = jitter(coords, factor = 1) >>>>>> >>>> >>>> jitter does not respect geographical coordinated (decimal degree metric). >>>> >>>>>> # Find the first nearest neighbor of each point >>>>>> one.nn = knearneigh(coords, k=1) >>>> >>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar). >>>> >>>>>> >>>>>> # Convert the first nearest neighbor to format "nb" >>>>>> one.nn_nb = knn2nb(one.nn, sym = F) >>>>>> >>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. >>>>>> >>>>>> Warmest, >>>>>> Ben >>>>>>  >>>>>> Benjamin Lieberman >>>>>> Muhlenberg College 2019 >>>>>> Mobile: 301.299.8928 >>>>>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>> >>>> Plain text only, please. >>>> >>>>>> >>>>>> _______________________________________________ >>>>>> RsigGeo mailing list >>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> <https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> RsigGeo mailing list >>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> <https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo>> >>>>> >>>> >>>>  >>>> Roger Bivand >>>> Department of Economics, Norwegian School of Economics, >>>> Helleveien 30, N5045 Bergen, Norway. >>>> voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>> http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> <http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140>> >>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________><https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>> >>>> RsigGeo mailing list >>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> <https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo>> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> RsigGeo mailing list >> [hidden email] <mailto:[hidden email]> >> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >> > >  > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N5045 Bergen, Norway. > voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> > http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en> [[alternative HTML version deleted]] _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo 
Administrator

On Fri, 13 Jul 2018, Benjamin Lieberman wrote:
> Roger > > Thank you so much for the help. In our case, first order neighbors are > all neighbors who are adjacent to a voter. Second order neighbors are > then all neighbors who are adjacent to the first order neighbors. Hope > that this could clarify what I have been referencing this time. So you need to define what you mean by adjacent for the purposes of your study. This depends on knowing the underlying behavioural patterns affecting interaction. Roger > > I will try the method you suggested, thank you. > > Best, > Ben >  > Benjamin Lieberman > Muhlenberg College 2019 > Mobile: 301.299.8928 > >> On Jul 13, 2018, at 7:30 AM, Roger Bivand <[hidden email]> wrote: >> >> On Fri, 13 Jul 2018, Benjamin Lieberman wrote: >> >>> All >>> >>> I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data. >> >> Thanks for the clarification. Note that if the data are a sample, that is not a complete listing for one or more study areas, you don't know who the first order neighbour (the most proximate other voter) is, because that indidivual may not be in the sample. Your fallback then is to treat the data as aggregates, unless you rule out local sampling variability. >> >> Roger >> >>> >>> >>>  >>> Benjamin Lieberman >>> Muhlenberg College 2019 >>> Mobile: 301.299.8928 >>> >>>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote: >>>> >>>> Roger anf Facu, >>>> >>>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance. >>>> >>>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses HighOrder Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, nonplanarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here. >>>> >>>> If this is something that does not seem feasible, maybe another tactic is necessary. >>>> >>>> Again, thank you all for the help. >>>> >>>> Warmest >>>>  >>>> Benjamin Lieberman >>>> Muhlenberg College 2019 >>>> Mobile: 301.299.8928 >>>> >>>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote: >>>>> >>>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote: >>>>> >>>>>> Dear Benjamin, >>>>>> >>>>>> I'm not sure how you define "first order neighbors" for a point. The >>>>>> first thing that comes to my mind is to use their corresponding voronoi >>>>>> polygons and define neighborhood from there. Following your code: >>>>> >>>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point. >>>>> >>>>> Also note that voronoi and other graphbased neighbours should only use planar coordinates  including dismo::voronoi, which uses deldir::deldir()  just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull. >>>>> >>>>>> >>>>>> v < dismo::voronoi(coords) >>>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0)) >>>>>> plot(coords, type = "n", xlab = NA, ylab = NA) >>>>>> plot(v, add = TRUE) >>>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID) >>>>>> plot(coords, type = "n", xlab = NA, ylab = NA) >>>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray") >>>>>> >>>>>> ƒacu. >>>>>> >>>>>> >>>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. >>>>> >>>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position  the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity. >>>>> >>>>> Why does position and voter data not have position? Which location should you use  residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data"  you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.). >>>>> >>>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data. >>>>>>> >>>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure. >>>>>>> >>>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest. >>>>>>> >>>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook. >>>>> >>>>> You mean RStudio, there is no such version of R. >>>>> >>>>>>> >>>>>>> # Create a data frame of 10 voters, picked at random >>>>>>> voter.1 = c(1, 75.52187, 40.62320) >>>>>>> voter.2 = c(2,75.56373, 40.55216) >>>>>>> voter.3 = c(3,75.39587, 40.55416) >>>>>>> voter.4 = c(4,75.42248, 40.64326) >>>>>>> voter.5 = c(5,75.56654, 40.54948) >>>>>>> voter.6 = c(6,75.56257, 40.67375) >>>>>>> voter.7 = c(7, 75.51888, 40.59715) >>>>>>> voter.8 = c(8, 75.59879, 40.60014) >>>>>>> voter.9 = c(9, 75.59879, 40.60014) >>>>>>> voter.10 = c(10, 75.50877, 40.53129) >>>>>>> >>>>> >>>>> These are in geographical coordinates. >>>>> >>>>>>> # Bind the vectors together >>>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10) >>>>>>> >>>>>>> # Rename the columns >>>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude") >>>>>>> >>>>>>> # Change the class from a matrix to a data frame >>>>>>> voter.subset = as.data.frame(voter.subset) >>>>>>> >>>>>>> # Load in the required packages >>>>>>> library(spdep) >>>>>>> library(sp) >>>>>>> >>>>>>> # Set the coordinates >>>>>>> coordinates(voter.subset) = c("Longitude", "Latitude") >>>>>>> coords = coordinates(voter.subset) >>>>>>> >>>>>>> # Jitter to ensure no duplicate points >>>>>>> coords = jitter(coords, factor = 1) >>>>>>> >>>>> >>>>> jitter does not respect geographical coordinated (decimal degree metric). >>>>> >>>>>>> # Find the first nearest neighbor of each point >>>>>>> one.nn = knearneigh(coords, k=1) >>>>> >>>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar). >>>>> >>>>>>> >>>>>>> # Convert the first nearest neighbor to format "nb" >>>>>>> one.nn_nb = knn2nb(one.nn, sym = F) >>>>>>> >>>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, GomezRubio), as well as other SigGeo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year. >>>>>>> >>>>>>> Warmest, >>>>>>> Ben >>>>>>>  >>>>>>> Benjamin Lieberman >>>>>>> Muhlenberg College 2019 >>>>>>> Mobile: 301.299.8928 >>>>>>> >>>>>>> >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>> >>>>> Plain text only, please. >>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> RsigGeo mailing list >>>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> <https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> RsigGeo mailing list >>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> <https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo>> >>>>>> >>>>> >>>>>  >>>>> Roger Bivand >>>>> Department of Economics, Norwegian School of Economics, >>>>> Helleveien 30, N5045 Bergen, Norway. >>>>> voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>>> http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> <http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140>> >>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________><https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>> >>>>> RsigGeo mailing list >>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> >>>>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> <https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> RsigGeo mailing list >>> [hidden email] <mailto:[hidden email]> >>> https://stat.ethz.ch/mailman/listinfo/rsiggeo <https://stat.ethz.ch/mailman/listinfo/rsiggeo> >>> >> >>  >> Roger Bivand >> Department of Economics, Norwegian School of Economics, >> Helleveien 30, N5045 Bergen, Norway. >> voice: +47 55 95 93 55; email: [hidden email] <mailto:[hidden email]> >> http://orcid.org/0000000323926140 <http://orcid.org/0000000323926140> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en> > > [[alternative HTML version deleted]] > > _______________________________________________ > RsigGeo mailing list > [hidden email] > https://stat.ethz.ch/mailman/listinfo/rsiggeo > Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N5045 Bergen, Norway. voice: +47 55 95 93 55; email: [hidden email] http://orcid.org/0000000323926140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ RsigGeo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/rsiggeo
Roger Bivand
Department of Economics Norwegian School of Economics Helleveien 30 N5045 Bergen, Norway 
Free forum by Nabble  Edit this page 