Count occurrences less memory expensive than superimpose function in several spatial objects

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Count occurrences less memory expensive than superimpose function in several spatial objects

R-sig-geo mailing list
Dear r-sig-geo Members,

 ??? I'll like to read several shapefiles, count occurrences in the same
coordinate and create a final shapefile with a threshold number of
occurrences. I try to convert the shapefiles in ppp object (because I
have some part of my data set in shapefile and another in ppp objects)
and applied superimpose function without success. In my synthetic example :

#Packages
library(spatstat)
library(dplyr)
library(sp)
library(rgdal)
library(raster)


#Point process example
data(ants)
ants.df<-as.data.frame(ants) #Convert to data frame

# Sample 75% in original dataset, repeat this 9 times and create a
shapefile in each loop

for(i in 1:9){
s.ants.df<-sample_frac(ants.df, 0.75)
s.ants<-ppp(x=s.ants.df[,1],y=s.ants.df[,2],window=ants$window)#Create
new ppp object
sample.pts<-cbind(s.ants$x,s.ants$y)
pts.sampling = SpatialPoints(sample.pts)
UTMcoor.df <- SpatialPointsDataFrame(pts.sampling,
data.frame(id=1:length(pts.sampling)))
writeOGR(UTMcoor.df, ".",paste0('sample.shape',i), driver="ESRI
Shapefile",overwrite=TRUE)
}

#Read all the 9 shapefiles created
all_shape <- list.files(pattern="\\.shp$", full.names=TRUE)
all_shape_list <- lapply(all_shape, shapefile)

#Convert shapefile to ppp statstat
target <- vector("list", length(all_shape_list))
for(i in 1:length(all_shape_list)){
target[[i]] <- ppp(x=all_shape_list[[i]]@coords[,1],
y=all_shape_list[[i]]@coords[,2],window=ants$window)}

#Join all ppp objects using multiplicity
target_sub<-do.call(superimpose,target)
res<-multiplicity(target_sub)

#Occurrences in the same coordinate > 5
res.xy<-as.data.frame(target_sub$x,target_sub$y,res)
res_F<-res.xy[res.xy$res>5,]

#Final shapefile
final.pts<-cbind(res_F[,1],res_F[,2])
pts.final = SpatialPoints(final.pts)
UTMcoor.df <- SpatialPointsDataFrame(pts.final,
data.frame(id=1:length(pts.final)))
UTMcoor.df2 <-remove.duplicates(UTMcoor.df)
writeOGR(UTMcoor.df2, ".", paste0('final.ants'), driver="ESRI
Shapefile",overwrite=TRUE)


This approach works very well in this synthetic example!!! But in my
real data set a have the 99 shapefiles with 10^7 coordinates and when I
try to use the do.call(superimpose,target) function my 32GB RAM memory
crashed.

Please any ideas for how I can create a new shapefile with a criteria
occurrences exposed but less memory expensive than superimpose all the
objects created?

Thanks in advanced,
Alexandre

--
Alexandre dos Santos
Geotechnologies and Spatial Statistics applied to Forest Entomology
Instituto Federal de Mato Grosso (IFMT) - Campus Caceres
Caixa Postal 244 (PO Box)
Avenida dos Ramires, s/n - Vila Real
Caceres - MT - CEP 78201-380 (ZIP code)
Phone: (+55) 65 99686-6970 / (+55) 65 3221-2674
Lattes CV: http://lattes.cnpq.br/1360403201088680
OrcID: orcid.org/0000-0001-8232-6722
ResearchGate: www.researchgate.net/profile/Alexandre_Santos10
Publons: https://publons.com/researcher/3085587/alexandre-dos-santos/
--


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Count occurrences less memory expensive than superimpose function in several spatial objects

Vijay Lulla
Hi Alexandre,
As far as I can tell (mostly from reading the docs...no prior experience of
using multiplicity or superimpose myself) it appears that they are just
calculating the number of unique values for a combination of x,y coordinate
pairs. So, you can do this by using the group by semantics of either
tidyverse or SQL to generate the res.xy data.frame. Below is an example of
generating res.xy alternatively using data.table (I'm not as familiar with
tidyverse):

target_sub1 <- rbindlist(lapply(target, as.data.table))
res1 <- target_sub1[, .(res=.N), by=.(x,y)]
res.xy1 = res1[target_sub1, on=c("x","y")]

all.equal(res.xy, res.xy1, check.attributes=FALSE) # should return TRUE

If you're using SQL then you just join the raw table with the grouped table
and you should get the table coordinates and occurrences. And, considering
the number of coordinates you have I recommend either data.table or SQL to
generate the final output.
HTH,
Vijay.


On Wed, Aug 19, 2020 at 4:22 PM ASANTOS via R-sig-Geo <
[hidden email]> wrote:

> Dear r-sig-geo Members,
>
>  ??? I'll like to read several shapefiles, count occurrences in the same
> coordinate and create a final shapefile with a threshold number of
> occurrences. I try to convert the shapefiles in ppp object (because I
> have some part of my data set in shapefile and another in ppp objects)
> and applied superimpose function without success. In my synthetic example :
>
> #Packages
> library(spatstat)
> library(dplyr)
> library(sp)
> library(rgdal)
> library(raster)
>
>
> #Point process example
> data(ants)
> ants.df<-as.data.frame(ants) #Convert to data frame
>
> # Sample 75% in original dataset, repeat this 9 times and create a
> shapefile in each loop
>
> for(i in 1:9){
> s.ants.df<-sample_frac(ants.df, 0.75)
> s.ants<-ppp(x=s.ants.df[,1],y=s.ants.df[,2],window=ants$window)#Create
> new ppp object
> sample.pts<-cbind(s.ants$x,s.ants$y)
> pts.sampling = SpatialPoints(sample.pts)
> UTMcoor.df <- SpatialPointsDataFrame(pts.sampling,
> data.frame(id=1:length(pts.sampling)))
> writeOGR(UTMcoor.df, ".",paste0('sample.shape',i), driver="ESRI
> Shapefile",overwrite=TRUE)
> }
>
> #Read all the 9 shapefiles created
> all_shape <- list.files(pattern="\\.shp$", full.names=TRUE)
> all_shape_list <- lapply(all_shape, shapefile)
>
> #Convert shapefile to ppp statstat
> target <- vector("list", length(all_shape_list))
> for(i in 1:length(all_shape_list)){
> target[[i]] <- ppp(x=all_shape_list[[i]]@coords[,1],
> y=all_shape_list[[i]]@coords[,2],window=ants$window)}
>
> #Join all ppp objects using multiplicity
> target_sub<-do.call(superimpose,target)
> res<-multiplicity(target_sub)
>
> #Occurrences in the same coordinate > 5
> res.xy<-as.data.frame(target_sub$x,target_sub$y,res)
> res_F<-res.xy[res.xy$res>5,]
>
> #Final shapefile
> final.pts<-cbind(res_F[,1],res_F[,2])
> pts.final = SpatialPoints(final.pts)
> UTMcoor.df <- SpatialPointsDataFrame(pts.final,
> data.frame(id=1:length(pts.final)))
> UTMcoor.df2 <-remove.duplicates(UTMcoor.df)
> writeOGR(UTMcoor.df2, ".", paste0('final.ants'), driver="ESRI
> Shapefile",overwrite=TRUE)
>
>
> This approach works very well in this synthetic example!!! But in my
> real data set a have the 99 shapefiles with 10^7 coordinates and when I
> try to use the do.call(superimpose,target) function my 32GB RAM memory
> crashed.
>
> Please any ideas for how I can create a new shapefile with a criteria
> occurrences exposed but less memory expensive than superimpose all the
> objects created?
>
> Thanks in advanced,
> Alexandre
>
> --
> Alexandre dos Santos
> Geotechnologies and Spatial Statistics applied to Forest Entomology
> Instituto Federal de Mato Grosso (IFMT) - Campus Caceres
> Caixa Postal 244 (PO Box)
> Avenida dos Ramires, s/n - Vila Real
> Caceres - MT - CEP 78201-380 (ZIP code)
> Phone: (+55) 65 99686-6970 / (+55) 65 3221-2674
> Lattes CV: http://lattes.cnpq.br/1360403201088680
> OrcID: orcid.org/0000-0001-8232-6722
> ResearchGate: www.researchgate.net/profile/Alexandre_Santos10
> Publons: https://publons.com/researcher/3085587/alexandre-dos-santos/
> --
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>


--
Vijay Lulla, PhD
ORCID | <https://orcid.org/0000-0002-0823-2522> Homepage
<http://vlulla.github.io> | Google Scholar
<https://scholar.google.com/citations?user=VjhJWOgAAAAJ&hl=en> | Github
<https://github.com/vlulla>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Count occurrences less memory expensive than superimpose function in several spatial objects [SOLVED]

R-sig-geo mailing list
Thanks Vijay,

Now works and I don't to need pass my shapefiles objects to ppp for
using the superimpose function. The solution now is:

#Packages library(spatstat) library(dplyr) library(sp) library(rgdal)
library(raster) library(data.table) #Point process example data(ants)
ants.df<-as.data.frame(ants) #Convert to data frame # Sample 75% in
original dataset, repeat this 9 times and create a shapefile in each
loop for(i in 1:9){ s.ants.df<-sample_frac(ants.df, 0.75)
s.ants<-ppp(x=s.ants.df[,1],y=s.ants.df[,2],window=ants$window)#Create
new ppp object sample.pts<-cbind(s.ants$x,s.ants$y) pts.sampling =
SpatialPoints(sample.pts) UTMcoor.df <-
SpatialPointsDataFrame(pts.sampling,
data.frame(id=1:length(pts.sampling))) writeOGR(UTMcoor.df,
".",paste0('sample.shape',i), driver="ESRI Shapefile",overwrite=TRUE) }
#Read all the 9 shapefiles created all_shape <-
list.files(pattern="\\.shp$", full.names=TRUE) all_shape_list <-
lapply(all_shape, shapefile) # target_sub1 <-
rbindlist(lapply(all_shape_list, as.data.table)) res1 <- target_sub1[,
.(res=.N), by=.(coords.x1,coords.x2)] res.xy1 = res1[target_sub1,
on=c("coords.x1","coords.x2")] all.equal(res.xy1, res.xy1,
check.attributes=FALSE) # should return TRUE #Occurrences in the same
coordinate > 5 res_F<-res.xy1[res.xy1$res>5,]
res_F<-as.data.frame(res_F) #Final shapefile
final.pts<-cbind(res_F[,1],res_F[,2]) pts.final =
SpatialPoints(final.pts) UTMcoor.df <- SpatialPointsDataFrame(pts.final,
data.frame(id=1:length(pts.final))) UTMcoor.df2
<-remove.duplicates(UTMcoor.df) writeOGR(UTMcoor.df2, ".",
paste0('final.ants'), driver="ESRI Shapefile", overwrite=TRUE) #<END>

Alexandre

--
Alexandre dos Santos
Geotechnologies and Spatial Statistics applied to Forest Entomology
Instituto Federal de Mato Grosso (IFMT) - Campus Caceres
Caixa Postal 244 (PO Box)
Avenida dos Ramires, s/n - Vila Real
Caceres - MT - CEP 78201-380 (ZIP code)
Phone: (+55) 65 99686-6970 / (+55) 65 3221-2674
Lattes CV: http://lattes.cnpq.br/1360403201088680
OrcID: orcid.org/0000-0001-8232-6722
ResearchGate: www.researchgate.net/profile/Alexandre_Santos10
Publons: https://publons.com/researcher/3085587/alexandre-dos-santos/
--

Em 19/08/2020 20:49, Vijay Lulla escreveu:

> Hi Alexandre,
> As far as I can tell (mostly from reading the docs...no prior experience of
> using multiplicity or superimpose myself) it appears that they are just
> calculating the number of unique values for a combination of x,y coordinate
> pairs. So, you can do this by using the group by semantics of either
> tidyverse or SQL to generate the res.xy data.frame. Below is an example of
> generating res.xy alternatively using data.table (I'm not as familiar with
> tidyverse):
>
> target_sub1 <- rbindlist(lapply(target, as.data.table))
> res1 <- target_sub1[, .(res=.N), by=.(x,y)]
> res.xy1 = res1[target_sub1, on=c("x","y")]
>
> all.equal(res.xy, res.xy1, check.attributes=FALSE) # should return TRUE
>
> If you're using SQL then you just join the raw table with the grouped table
> and you should get the table coordinates and occurrences. And, considering
> the number of coordinates you have I recommend either data.table or SQL to
> generate the final output.
> HTH,
> Vijay.
>
>
> On Wed, Aug 19, 2020 at 4:22 PM ASANTOS via R-sig-Geo <
> [hidden email]> wrote:
>
>> Dear r-sig-geo Members,
>>
>>   ??? I'll like to read several shapefiles, count occurrences in the same
>> coordinate and create a final shapefile with a threshold number of
>> occurrences. I try to convert the shapefiles in ppp object (because I
>> have some part of my data set in shapefile and another in ppp objects)
>> and applied superimpose function without success. In my synthetic example :
>>
>> #Packages
>> library(spatstat)
>> library(dplyr)
>> library(sp)
>> library(rgdal)
>> library(raster)
>>
>>
>> #Point process example
>> data(ants)
>> ants.df<-as.data.frame(ants) #Convert to data frame
>>
>> # Sample 75% in original dataset, repeat this 9 times and create a
>> shapefile in each loop
>>
>> for(i in 1:9){
>> s.ants.df<-sample_frac(ants.df, 0.75)
>> s.ants<-ppp(x=s.ants.df[,1],y=s.ants.df[,2],window=ants$window)#Create
>> new ppp object
>> sample.pts<-cbind(s.ants$x,s.ants$y)
>> pts.sampling = SpatialPoints(sample.pts)
>> UTMcoor.df <- SpatialPointsDataFrame(pts.sampling,
>> data.frame(id=1:length(pts.sampling)))
>> writeOGR(UTMcoor.df, ".",paste0('sample.shape',i), driver="ESRI
>> Shapefile",overwrite=TRUE)
>> }
>>
>> #Read all the 9 shapefiles created
>> all_shape <- list.files(pattern="\\.shp$", full.names=TRUE)
>> all_shape_list <- lapply(all_shape, shapefile)
>>
>> #Convert shapefile to ppp statstat
>> target <- vector("list", length(all_shape_list))
>> for(i in 1:length(all_shape_list)){
>> target[[i]] <- ppp(x=all_shape_list[[i]]@coords[,1],
>> y=all_shape_list[[i]]@coords[,2],window=ants$window)}
>>
>> #Join all ppp objects using multiplicity
>> target_sub<-do.call(superimpose,target)
>> res<-multiplicity(target_sub)
>>
>> #Occurrences in the same coordinate > 5
>> res.xy<-as.data.frame(target_sub$x,target_sub$y,res)
>> res_F<-res.xy[res.xy$res>5,]
>>
>> #Final shapefile
>> final.pts<-cbind(res_F[,1],res_F[,2])
>> pts.final = SpatialPoints(final.pts)
>> UTMcoor.df <- SpatialPointsDataFrame(pts.final,
>> data.frame(id=1:length(pts.final)))
>> UTMcoor.df2 <-remove.duplicates(UTMcoor.df)
>> writeOGR(UTMcoor.df2, ".", paste0('final.ants'), driver="ESRI
>> Shapefile",overwrite=TRUE)
>>
>>
>> This approach works very well in this synthetic example!!! But in my
>> real data set a have the 99 shapefiles with 10^7 coordinates and when I
>> try to use the do.call(superimpose,target) function my 32GB RAM memory
>> crashed.
>>
>> Please any ideas for how I can create a new shapefile with a criteria
>> occurrences exposed but less memory expensive than superimpose all the
>> objects created?
>>
>> Thanks in advanced,
>> Alexandre
>>
>> --
>> Alexandre dos Santos
>> Geotechnologies and Spatial Statistics applied to Forest Entomology
>> Instituto Federal de Mato Grosso (IFMT) - Campus Caceres
>> Caixa Postal 244 (PO Box)
>> Avenida dos Ramires, s/n - Vila Real
>> Caceres - MT - CEP 78201-380 (ZIP code)
>> Phone: (+55) 65 99686-6970 / (+55) 65 3221-2674
>> Lattes CV: http://lattes.cnpq.br/1360403201088680
>> OrcID: orcid.org/0000-0001-8232-6722
>> ResearchGate: www.researchgate.net/profile/Alexandre_Santos10
>> Publons: https://publons.com/researcher/3085587/alexandre-dos-santos/
>> --
>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo