Creating Spatial Weight Matrices with Large Data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating Spatial Weight Matrices with Large Data

cchiseni
I am currently working with a census data that has about 758 000
individuals. I am trying to create a spatial weight matrix using the X-Y
coordinates for their place of birth . However, i am running into problems
when I try to create the nb type weights matrix using the poly2nb, R is
taking super long and after running for a long time it crushes. I have
increased R's memory size to about 80000 but this is still not working.

Is there a way i can get around this problem? If anyone has any ideas on
how i can create a spatial weight matrix for such a large data set please
help.

Kind Regards,


Michael Chanda Chiseni

Phd Candidate

Department of Economic History

Lund University

Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund



*Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Creating Spatial Weight Matrices with Large Data

Roger Bivand
Administrator
On Mon, 2 Dec 2019, Chanda Chiseni wrote:

> I am currently working with a census data that has about 758 000
> individuals. I am trying to create a spatial weight matrix using the X-Y
> coordinates for their place of birth . However, i am running into problems
> when I try to create the nb type weights matrix using the poly2nb, R is
> taking super long and after running for a long time it crushes. I have
> increased R's memory size to about 80000 but this is still not working.

Please provide the (shortened) code used. poly2nb() is used for polygons,
not points. If you were using distances between points, you may have used
a distance threshold such that many observations have many neighbours.
Also ask yourself whether this is not a multi-level problem, in that
spatial interactions perhaps occur between aggregates of observations, not
the observations themselves.

>
> Is there a way i can get around this problem? If anyone has any ideas on
> how i can create a spatial weight matrix for such a large data set please
> help.

An nb object (and listw) are just lists of length n, so a neighbour object
with 800K observations and 4 neighbours each only takes about 13MB, the
listw takes 38MB. What you can use them for may be another problem, and
much of the data may actually simply be noise not signal.

Roger

>
> Kind Regards,
>
>
> Michael Chanda Chiseni
>
> Phd Candidate
>
> Department of Economic History
>
> Lund University
>
> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>
>
>
> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Creating Spatial Weight Matrices with Large Data

cchiseni
 Hi Roger

Thank you for your very helpful feedback. I was indeed treating my point
data as polygons and did not impose a distance thresh hold.Essentially, as
you stated, many observations had many neighbors. I have since tried to you
K-neighbors and imposed a restriction of k=4. However, this is still taking
a bit long.

 #Increasing the memory capacity
 memory.limit(size = 80000)
 ## defining data
 censusdata= CensusFinal_Analysis_R1

#Creating Matrix of Coordinates
 sp_point <- cbind(censusdata$X, censusdata$Y)

colnames(sp_point)= c("Long","Lat")
head(sp_point)

## Create the K nearest neighbour
censusdata.4nn = knearneigh(sp_point,k=4,longlat = TRUE)

I get stuck at the stage where i try to create the K nearest neighbor, the
operation is quite slow. Am i still doing something wrong?


Kind Regards,

Michael Chanda Chiseni

Phd Candidate

Department of Economic History

Lund University

Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund



*Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *






On Mon, Dec 2, 2019 at 1:00 PM Roger Bivand <[hidden email]> wrote:

> On Mon, 2 Dec 2019, Chanda Chiseni wrote:
>
> > I am currently working with a census data that has about 758 000
> > individuals. I am trying to create a spatial weight matrix using the X-Y
> > coordinates for their place of birth . However, i am running into
> problems
> > when I try to create the nb type weights matrix using the poly2nb, R is
> > taking super long and after running for a long time it crushes. I have
> > increased R's memory size to about 80000 but this is still not working.
>
> Please provide the (shortened) code used. poly2nb() is used for polygons,
> not points. If you were using distances between points, you may have used
> a distance threshold such that many observations have many neighbours.
> Also ask yourself whether this is not a multi-level problem, in that
> spatial interactions perhaps occur between aggregates of observations, not
> the observations themselves.
>
> >
> > Is there a way i can get around this problem? If anyone has any ideas on
> > how i can create a spatial weight matrix for such a large data set please
> > help.
>
> An nb object (and listw) are just lists of length n, so a neighbour object
> with 800K observations and 4 neighbours each only takes about 13MB, the
> listw takes 38MB. What you can use them for may be another problem, and
> much of the data may actually simply be noise not signal.
>
> Roger
>
> >
> > Kind Regards,
> >
> >
> > Michael Chanda Chiseni
> >
> > Phd Candidate
> >
> > Department of Economic History
> >
> > Lund University
> >
> > Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
> >
> >
> >
> > *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Creating Spatial Weight Matrices with Large Data

Roger Bivand
Administrator
On Tue, 3 Dec 2019, Chanda Chiseni wrote:

> Hi Roger
>
> Thank you for your very helpful feedback. I was indeed treating my point
> data as polygons and did not impose a distance thresh hold.Essentially, as
> you stated, many observations had many neighbors. I have since tried to you
> K-neighbors and imposed a restriction of k=4. However, this is still taking
> a bit long.
>
> #Increasing the memory capacity
> memory.limit(size = 80000)
> ## defining data
> censusdata= CensusFinal_Analysis_R1
>
> #Creating Matrix of Coordinates
> sp_point <- cbind(censusdata$X, censusdata$Y)
>
> colnames(sp_point)= c("Long","Lat")
> head(sp_point)
>
> ## Create the K nearest neighbour
> censusdata.4nn = knearneigh(sp_point,k=4,longlat = TRUE)
Don't use geographical coordinates. Project first, then K-nearest
neighbours uses RANN, which is fast (Euclidean as against Great Circle
distances).

Roger

>
> I get stuck at the stage where i try to create the K nearest neighbor, the
> operation is quite slow. Am i still doing something wrong?
>
>
> Kind Regards,
>
> Michael Chanda Chiseni
>
> Phd Candidate
>
> Department of Economic History
>
> Lund University
>
> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>
>
>
> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>
>
>
>
>
>
> On Mon, Dec 2, 2019 at 1:00 PM Roger Bivand <[hidden email]> wrote:
>
>> On Mon, 2 Dec 2019, Chanda Chiseni wrote:
>>
>>> I am currently working with a census data that has about 758 000
>>> individuals. I am trying to create a spatial weight matrix using the X-Y
>>> coordinates for their place of birth . However, i am running into
>> problems
>>> when I try to create the nb type weights matrix using the poly2nb, R is
>>> taking super long and after running for a long time it crushes. I have
>>> increased R's memory size to about 80000 but this is still not working.
>>
>> Please provide the (shortened) code used. poly2nb() is used for polygons,
>> not points. If you were using distances between points, you may have used
>> a distance threshold such that many observations have many neighbours.
>> Also ask yourself whether this is not a multi-level problem, in that
>> spatial interactions perhaps occur between aggregates of observations, not
>> the observations themselves.
>>
>>>
>>> Is there a way i can get around this problem? If anyone has any ideas on
>>> how i can create a spatial weight matrix for such a large data set please
>>> help.
>>
>> An nb object (and listw) are just lists of length n, so a neighbour object
>> with 800K observations and 4 neighbours each only takes about 13MB, the
>> listw takes 38MB. What you can use them for may be another problem, and
>> much of the data may actually simply be noise not signal.
>>
>> Roger
>>
>>>
>>> Kind Regards,
>>>
>>>
>>> Michael Chanda Chiseni
>>>
>>> Phd Candidate
>>>
>>> Department of Economic History
>>>
>>> Lund University
>>>
>>> Visiting address: Alfa 1, Scheelevägen 15 B, 22363 Lund
>>>
>>>
>>>
>>> *Africa is not poor, it is poorly managed (Ellen Johnson-Sirleaf ). *
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: [hidden email]
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway