Issues with a GDB file in R?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Issues with a GDB file in R?

Aguirre Perez, Roman
Hi everyone,

I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on

https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012?tab=download

whose size is 2.61 GB once unzipped.

My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on

http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

in order to compute the area covered by each class within each NUTS3 region.

Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.

Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:

ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")

Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
layer: "clc12_Version_18_5"
Driver: OpenFileGDB;
number of rows: 2370829
Feature type: wkbPolygon with 3 dimensions
Extent: (-2693292 -3086662) - (10037210 5440568)
Null geometry IDs: 2156240
CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
Number of fields: 6
name                      type  length  typeName
1 code_12              4       3           String
2 ID                         4       18         String
3 Remark               4       20         String
4 Area_Ha             2       0           Real
5 Shape_Length   2       0           Real
6 Shape_Area       2       0           Real

Then I tried to load it typing

clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")

After 3-5 minutes, it appears the following text:

OGR data source with driver: OpenFileGDB
Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
with 2370829 features
It has 6 fields

Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:

Warning messages:
1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
Dropping null geometries: 2156240
2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
Z-dimension discarded

Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.

By the way, these are my computer and software specifications:

MacBook Pro 2012
Processor 2.6 GHz Intel Core i7
Memory 16 GB DDR3
R 3.5.0
RStudio 1.1.447


Best,
Roman.

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Shaun Walbridge
Hello Roman,

A couple of suggested options: You can try and use the arcgisbinding package [1] to pull the data directly into R via ArcGIS, as you mentioned you have ArcGIS available [presumably on Windows or in a VM]. It will access the data directly, and let you create sp and sf objects out of Geodatabases with little work. A basic workflow looks like this:

d <- arc.open("path/file.gdb/layer_name")
df <- arc.select(d) # here, you can filter columns and attributes, see [2]
# create an sf object
df.sf <- arc.data2sf(df)
# alternatively, create an sp object
df.sp <- arc.data2sp(df)

You can also write the results back to a geodatabase, or any other format that ArcGIS understands. Another option is trying the SpatiaLite version of the data at the URL you posted. You should be able to access this using R directly, provided your rgdal installation is correctly built to read spatialite databases. If it is, try the same process you mentioned using rgdal, but point it at the SpatialLite database instead. You could also use command-line OGR to convert the data, that has a few options like where clause filtering that aren't directly available via readOGR.

If neither of these options work, let me know and I can convert the data for you into a format of your preference.

Cheers,
Shaun

1. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=fCPRb7QX-vd5bnO9gIJHCiX852SVUtyYX--xtCKtpfk&m=VcOyJ24SX8iVhqYGKDSOibdcFFiTWW3s5ctLZkiGYyY&s=dDYwxM4KOos8syBwUUUuSWLcyR4ZJlpdzbVvvfd28Oc&e=
2. https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=fCPRb7QX-vd5bnO9gIJHCiX852SVUtyYX--xtCKtpfk&m=VcOyJ24SX8iVhqYGKDSOibdcFFiTWW3s5ctLZkiGYyY&s=oXh8b_5yVefz29INtNDKGdVJ86_GpF3gxuPY2aUOPyY&e=



On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, Roman" <[hidden email] on behalf of [hidden email]> wrote:

    Hi everyone,
   
    I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on
   
    https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
   
    whose size is 2.61 GB once unzipped.
   
    My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on
   
    https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
   
    in order to compute the area covered by each class within each NUTS3 region.
   
    Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.
   
    Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:
   
    ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
   
    Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
    layer: "clc12_Version_18_5"
    Driver: OpenFileGDB;
    number of rows: 2370829
    Feature type: wkbPolygon with 3 dimensions
    Extent: (-2693292 -3086662) - (10037210 5440568)
    Null geometry IDs: 2156240
    CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
    Number of fields: 6
    name                      type  length  typeName
    1 code_12              4       3           String
    2 ID                         4       18         String
    3 Remark               4       20         String
    4 Area_Ha             2       0           Real
    5 Shape_Length   2       0           Real
    6 Shape_Area       2       0           Real
   
    Then I tried to load it typing
   
    clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
   
    After 3-5 minutes, it appears the following text:
   
    OGR data source with driver: OpenFileGDB
    Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
    with 2370829 features
    It has 6 fields
   
    Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:
   
    Warning messages:
    1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
    Dropping null geometries: 2156240
    2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
    Z-dimension discarded
   
    Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.
   
    By the way, these are my computer and software specifications:
   
    MacBook Pro 2012
    Processor 2.6 GHz Intel Core i7
    Memory 16 GB DDR3
    R 3.5.0
    RStudio 1.1.447
   
   
    Best,
    Roman.
       


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Cotton Rockwood
In reply to this post by Aguirre Perez, Roman
Hi Roman -
As far as I know, 'readOGR' does not read tables in file geodatabases. (see https://gis.stackexchange.com/questions/184013/read-a-table-from-an-esri-file-geodatabase-gdb-using-r). I'm guessing this is the problem you are running into with R. In addition to Sean's suggestions, you might also want to try using the 'sf' package since it can read file geodatabase tables and usually increases read times (and other spatial operations) significantly compared to 'sp' and 'readOGR'. I'm not sure if it will address the issues you seem to be having with the geometries as suggested by the GRASS, QGIS and ArcGIS errors. The specific 'sf' function is:
'st_read' and the syntax is very similar to 'readOGR'.  You will need to make sure you have the newest version: 0.6-1.
-Cotton

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Aguirre Perez, Roman
Sent: Wednesday, May 02, 2018 10:35 AM
To: [hidden email]
Subject: [R-sig-Geo] Issues with a GDB file in R?

Hi everyone,

I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on

https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012?tab=download

whose size is 2.61 GB once unzipped.

My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on

http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

in order to compute the area covered by each class within each NUTS3 region.

Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.

Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:

ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")

Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
layer: "clc12_Version_18_5"
Driver: OpenFileGDB;
number of rows: 2370829
Feature type: wkbPolygon with 3 dimensions
Extent: (-2693292 -3086662) - (10037210 5440568) Null geometry IDs: 2156240
CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs Number of fields: 6
name                      type  length  typeName
1 code_12              4       3           String
2 ID                         4       18         String
3 Remark               4       20         String
4 Area_Ha             2       0           Real
5 Shape_Length   2       0           Real
6 Shape_Area       2       0           Real

Then I tried to load it typing

clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")

After 3-5 minutes, it appears the following text:

OGR data source with driver: OpenFileGDB
Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
with 2370829 features
It has 6 fields

Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:

Warning messages:
1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
Dropping null geometries: 2156240
2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
Z-dimension discarded

Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.

By the way, these are my computer and software specifications:

MacBook Pro 2012
Processor 2.6 GHz Intel Core i7
Memory 16 GB DDR3
R 3.5.0
RStudio 1.1.447


Best,
Roman.

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Bacou, Melanie

Cotton,
Why use the ESRI format? Will be much easier to download the GeoTIFF rasters and raster::extract() to conduct your analysis.
--Mel.


On 05/02/2018 02:56 PM, Cotton Rockwood wrote:
Hi Roman -
As far as I know, 'readOGR' does not read tables in file geodatabases. (see https://gis.stackexchange.com/questions/184013/read-a-table-from-an-esri-file-geodatabase-gdb-using-r). I'm guessing this is the problem you are running into with R. In addition to Sean's suggestions, you might also want to try using the 'sf' package since it can read file geodatabase tables and usually increases read times (and other spatial operations) significantly compared to 'sp' and 'readOGR'. I'm not sure if it will address the issues you seem to be having with the geometries as suggested by the GRASS, QGIS and ArcGIS errors. The specific 'sf' function is:
'st_read' and the syntax is very similar to 'readOGR'.  You will need to make sure you have the newest version: 0.6-1.
-Cotton

-----Original Message-----
From: R-sig-Geo [[hidden email]] On Behalf Of Aguirre Perez, Roman
Sent: Wednesday, May 02, 2018 10:35 AM
To: [hidden email]
Subject: [R-sig-Geo] Issues with a GDB file in R?

Hi everyone,

I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on

https://land.copernicus.eu/pan-european/corine-.land-cover/clc-2012?tab=download

whose size is 2.61 GB once unzipped.

My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on

http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

in order to compute the area covered by each class within each NUTS3 region.

Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.

Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:

ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")

Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
layer: "clc12_Version_18_5"
Driver: OpenFileGDB;
number of rows: 2370829
Feature type: wkbPolygon with 3 dimensions
Extent: (-2693292 -3086662) - (10037210 5440568) Null geometry IDs: 2156240
CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs Number of fields: 6
name                      type  length  typeName
1 code_12              4       3           String
2 ID                         4       18         String
3 Remark               4       20         String
4 Area_Ha             2       0           Real
5 Shape_Length   2       0           Real
6 Shape_Area       2       0           Real

Then I tried to load it typing

clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")

After 3-5 minutes, it appears the following text:

OGR data source with driver: OpenFileGDB
Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
with 2370829 features
It has 6 fields

Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:

Warning messages:
1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
Dropping null geometries: 2156240
2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
Z-dimension discarded

Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.

By the way, these are my computer and software specifications:

MacBook Pro 2012
Processor 2.6 GHz Intel Core i7
Memory 16 GB DDR3
R 3.5.0
RStudio 1.1.447


Best,
Roman.

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Aguirre Perez, Roman
In reply to this post by Shaun Walbridge
Hello Shaun,

Thanks a lot for replying and providing me alternative options.


Unfortunately, I can't try both options anymore as I ran out of ArcGIS due to I was using a university PC which is not available now (I installed an ArcGIS trial version there), but I'll try it later. I also failed on installing the sf package - I'll dig a bit more on it. Nevertheless, I already downloaded the SpatialLite version so I'll try the second option and I'll let you know how it goes.


Regards,
Roman.  

On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]> wrote:

    Hello Roman,
   
    A couple of suggested options: You can try and use the arcgisbinding package [1] to pull the data directly into R via ArcGIS, as you mentioned you have ArcGIS available [presumably on Windows or in a VM]. It will access the data directly, and let you create sp and sf objects out of Geodatabases with little work. A basic workflow looks like this:
   
    d <- arc.open("path/file.gdb/layer_name")
    df <- arc.select(d) # here, you can filter columns and attributes, see [2]
    # create an sf object
    df.sf <- arc.data2sf(df)
    # alternatively, create an sp object
    df.sp <- arc.data2sp(df)
   
    You can also write the results back to a geodatabase, or any other format that ArcGIS understands. Another option is trying the SpatiaLite version of the data at the URL you posted. You should be able to access this using R directly, provided your rgdal installation is correctly built to read spatialite databases. If it is, try the same process you mentioned using rgdal, but point it at the SpatialLite database instead. You could also use command-line OGR to convert the data, that has a few options like where clause filtering that aren't directly available via readOGR.
   
    If neither of these options work, let me know and I can convert the data for you into a format of your preference.
   
    Cheers,
    Shaun
   
    1. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
    2. https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
   
   
   
    On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, Roman" <[hidden email] on behalf of [hidden email]> wrote:
   
        Hi everyone,
       
        I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on
       
        https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
       
        whose size is 2.61 GB once unzipped.
       
        My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on
       
        https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
       
        in order to compute the area covered by each class within each NUTS3 region.
       
        Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.
       
        Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:
       
        ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
       
        Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
        layer: "clc12_Version_18_5"
        Driver: OpenFileGDB;
        number of rows: 2370829
        Feature type: wkbPolygon with 3 dimensions
        Extent: (-2693292 -3086662) - (10037210 5440568)
        Null geometry IDs: 2156240
        CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
        Number of fields: 6
        name                      type  length  typeName
        1 code_12              4       3           String
        2 ID                         4       18         String
        3 Remark               4       20         String
        4 Area_Ha             2       0           Real
        5 Shape_Length   2       0           Real
        6 Shape_Area       2       0           Real
       
        Then I tried to load it typing
       
        clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
       
        After 3-5 minutes, it appears the following text:
       
        OGR data source with driver: OpenFileGDB
        Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
        with 2370829 features
        It has 6 fields
       
        Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:
       
        Warning messages:
        1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
        Dropping null geometries: 2156240
        2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
        Z-dimension discarded
       
        Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.
       
        By the way, these are my computer and software specifications:
       
        MacBook Pro 2012
        Processor 2.6 GHz Intel Core i7
        Memory 16 GB DDR3
        R 3.5.0
        RStudio 1.1.447
       
       
        Best,
        Roman.
           
   
   

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Roger Bivand
Administrator
On Thu, 3 May 2018, Aguirre Perez, Roman wrote:

> Hello Shaun,
>
> Thanks a lot for replying and providing me alternative options.
>
>
> Unfortunately, I can't try both options anymore as I ran out of ArcGIS
> due to I was using a university PC which is not available now (I
> installed an ArcGIS trial version there), but I'll try it later. I also
> failed on installing the sf package - I'll dig a bit more on it.
> Nevertheless, I already downloaded the SpatialLite version so I'll try
> the second option and I'll let you know how it goes.
I think Melanie got it right - the apparent extra detail and precision
you'd get from vector-vector overlay is illusory, so going with geoTiff
should get you there (you can check to see whether 100m resolution differs
from 250m). The only reason to choose vector-vector would be that the
Corine vector contains categories not represented in the raster version.
Using raster also steps around the polygon deficiencies in the GDB (and
probably SQLite) representations.

Roger

>
>
> Regards,
> Roman.
>
> On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]> wrote:
>
>    Hello Roman,
>
>    A couple of suggested options: You can try and use the arcgisbinding package [1] to pull the data directly into R via ArcGIS, as you mentioned you have ArcGIS available [presumably on Windows or in a VM]. It will access the data directly, and let you create sp and sf objects out of Geodatabases with little work. A basic workflow looks like this:
>
>    d <- arc.open("path/file.gdb/layer_name")
>    df <- arc.select(d) # here, you can filter columns and attributes, see [2]
>    # create an sf object
>    df.sf <- arc.data2sf(df)
>    # alternatively, create an sp object
>    df.sp <- arc.data2sp(df)
>
>    You can also write the results back to a geodatabase, or any other format that ArcGIS understands. Another option is trying the SpatiaLite version of the data at the URL you posted. You should be able to access this using R directly, provided your rgdal installation is correctly built to read spatialite databases. If it is, try the same process you mentioned using rgdal, but point it at the SpatialLite database instead. You could also use command-line OGR to convert the data, that has a few options like where clause filtering that aren't directly available via readOGR.
>
>    If neither of these options work, let me know and I can convert the data for you into a format of your preference.
>
>    Cheers,
>    Shaun
>
>    1. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>    2. https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>
>
>
>    On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, Roman" <[hidden email] on behalf of [hidden email]> wrote:
>
>        Hi everyone,
>
>        I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on
>
>        https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>
>        whose size is 2.61 GB once unzipped.
>
>        My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on
>
>        https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>
>        in order to compute the area covered by each class within each NUTS3 region.
>
>        Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.
>
>        Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:
>
>        ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>
>        Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>        layer: "clc12_Version_18_5"
>        Driver: OpenFileGDB;
>        number of rows: 2370829
>        Feature type: wkbPolygon with 3 dimensions
>        Extent: (-2693292 -3086662) - (10037210 5440568)
>        Null geometry IDs: 2156240
>        CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
>        Number of fields: 6
>        name                      type  length  typeName
>        1 code_12              4       3           String
>        2 ID                         4       18         String
>        3 Remark               4       20         String
>        4 Area_Ha             2       0           Real
>        5 Shape_Length   2       0           Real
>        6 Shape_Area       2       0           Real
>
>        Then I tried to load it typing
>
>        clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>
>        After 3-5 minutes, it appears the following text:
>
>        OGR data source with driver: OpenFileGDB
>        Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
>        with 2370829 features
>        It has 6 fields
>
>        Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:
>
>        Warning messages:
>        1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
>        Dropping null geometries: 2156240
>        2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
>        Z-dimension discarded
>
>        Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.
>
>        By the way, these are my computer and software specifications:
>
>        MacBook Pro 2012
>        Processor 2.6 GHz Intel Core i7
>        Memory 16 GB DDR3
>        R 3.5.0
>        RStudio 1.1.447
>
>
>        Best,
>        Roman.
>
>
>
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Barbour, Russell
In reply to this post by Cotton Rockwood
Please unsubscribe me from this list

Russell “Skip” Barbour Ph.D.
Associate  Director,
Interdisciplinary Research Methods Core
Center for Interdisciplinary Research on AIDS
Yale SCHOOL OF PUBLIC HEALTH
135 College Street, Suite 200, New Haven, CT 06510
Tel:203-764-4332 email:[hidden email]
http://cira.yale.edu/people/russell-barbour-phd 

“"You just think lovely wonderful thoughts," Peter explained, "and they lift you up in the air.”
Peter Pan teaching Wendy how to fly

-----Original Message-----
From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Cotton Rockwood
Sent: Wednesday, May 2, 2018 2:56 PM
To: Aguirre Perez, Roman <[hidden email]>; [hidden email]
Subject: Re: [R-sig-Geo] Issues with a GDB file in R?

Hi Roman -

As far as I know, 'readOGR' does not read tables in file geodatabases. (see https://urldefense.proofpoint.com/v2/url?u=https-3A__gis.stackexchange.com_questions_184013_read-2Da-2Dtable-2Dfrom-2Dan-2Desri-2Dfile-2Dgeodatabase-2Dgdb-2Dusing-2Dr&d=DwIGaQ&c=cjytLXgP8ixuoHflwc-poQ&r=hqlhsfPkcleHC6lIU7H-PK6ab0cWjzsLSsRESstmcuM&m=AyJmViJgTgWrKjYVj6e7XUAt5NWdIU_UynXzlW-ynw8&s=VNhaJinb3W90c6yiWpvXy2FKFi0wRu2659-E_P-IGxo&e=). I'm guessing this is the problem you are running into with R. In addition to Sean's suggestions, you might also want to try using the 'sf' package since it can read file geodatabase tables and usually increases read times (and other spatial operations) significantly compared to 'sp' and 'readOGR'. I'm not sure if it will address the issues you seem to be having with the geometries as suggested by the GRASS, QGIS and ArcGIS errors. The specific 'sf' function is:

'st_read' and the syntax is very similar to 'readOGR'.  You will need to make sure you have the newest version: 0.6-1.

-Cotton



-----Original Message-----

From: R-sig-Geo [mailto:[hidden email]] On Behalf Of Aguirre Perez, Roman

Sent: Wednesday, May 02, 2018 10:35 AM

To: [hidden email]

Subject: [R-sig-Geo] Issues with a GDB file in R?



Hi everyone,



I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on



https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=cjytLXgP8ixuoHflwc-poQ&r=hqlhsfPkcleHC6lIU7H-PK6ab0cWjzsLSsRESstmcuM&m=AyJmViJgTgWrKjYVj6e7XUAt5NWdIU_UynXzlW-ynw8&s=0H4NNqQ2_ZV2e4VJgk6EsYBt8vnOIuiFqvfiVC1VdaQ&e=



whose size is 2.61 GB once unzipped.



My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on



https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=cjytLXgP8ixuoHflwc-poQ&r=hqlhsfPkcleHC6lIU7H-PK6ab0cWjzsLSsRESstmcuM&m=AyJmViJgTgWrKjYVj6e7XUAt5NWdIU_UynXzlW-ynw8&s=On4UScxpDq3FXH_zMMi-4oexQ73Bol0dfV6ELlj18co&e=



in order to compute the area covered by each class within each NUTS3 region.



Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.



Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:



ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")



Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",

layer: "clc12_Version_18_5"

Driver: OpenFileGDB;

number of rows: 2370829

Feature type: wkbPolygon with 3 dimensions

Extent: (-2693292 -3086662) - (10037210 5440568) Null geometry IDs: 2156240

CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs Number of fields: 6

name                      type  length  typeName

1 code_12              4       3           String

2 ID                         4       18         String

3 Remark               4       20         String

4 Area_Ha             2       0           Real

5 Shape_Length   2       0           Real

6 Shape_Area       2       0           Real



Then I tried to load it typing



clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")



After 3-5 minutes, it appears the following text:



OGR data source with driver: OpenFileGDB

Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"

with 2370829 features

It has 6 fields



Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:



Warning messages:

1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :

Dropping null geometries: 2156240

2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :

Z-dimension discarded



Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.



By the way, these are my computer and software specifications:



MacBook Pro 2012

Processor 2.6 GHz Intel Core i7

Memory 16 GB DDR3

R 3.5.0

RStudio 1.1.447





Best,

Roman.



        [[alternative HTML version deleted]]



_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dsig-2Dgeo&d=DwIGaQ&c=cjytLXgP8ixuoHflwc-poQ&r=hqlhsfPkcleHC6lIU7H-PK6ab0cWjzsLSsRESstmcuM&m=AyJmViJgTgWrKjYVj6e7XUAt5NWdIU_UynXzlW-ynw8&s=pYHh6icaGQ-A3K8nOdCWGQyMRAFBs7p1a7SBozBPEfE&e=

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dsig-2Dgeo&d=DwIGaQ&c=cjytLXgP8ixuoHflwc-poQ&r=hqlhsfPkcleHC6lIU7H-PK6ab0cWjzsLSsRESstmcuM&m=AyJmViJgTgWrKjYVj6e7XUAt5NWdIU_UynXzlW-ynw8&s=pYHh6icaGQ-A3K8nOdCWGQyMRAFBs7p1a7SBozBPEfE&e=
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Aguirre Perez, Roman
In reply to this post by Roger Bivand
Hi again,

first of all, thanks a lot for your comments. It's becoming quite interesting how to perform this task with such amount of data.


Here is an update...

Cotton, I could read the geodatabase by using the commands I shared. However, my R session crashed after overlaying it with a small set of polygons. I guess it was because the size of the sp object (around 14.5 GB). It's also worth mentioning that it was just a "multipolygon" (is that the correct word?) formed by 2370829 polygons. I haven’t succeeded on installing the sf package, but I will keep trying it.

Melanie, I already downloaded and read the geoTiff version. At first sight, it seems that this object doesn’t have enough features as to perform the overlying. I might have a biased idea of how raster objects work - I'm too stick to the representation of shapefiles which sounds quite related with Roger's idea. So I will start to explore it in order to gain a bit more understanding on it.

Roger, I certainly need to know which category is associated with each CLC polygon in order to compute the area covered by each of these classes within each NUTS3 region. Therefore, I still need to use a vector-vector overlay, right?

I really appreciate any feedback in advance as well as details that I should take into account to understand more about how to work with this kind of data. I will also keep you up to date on how it goes if you like.


Best,
Roman.

On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:

    On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
   
    > Hello Shaun,
    >
    > Thanks a lot for replying and providing me alternative options.
    >
    >
    > Unfortunately, I can't try both options anymore as I ran out of ArcGIS
    > due to I was using a university PC which is not available now (I
    > installed an ArcGIS trial version there), but I'll try it later. I also
    > failed on installing the sf package - I'll dig a bit more on it.
    > Nevertheless, I already downloaded the SpatialLite version so I'll try
    > the second option and I'll let you know how it goes.
   
    I think Melanie got it right - the apparent extra detail and precision
    you'd get from vector-vector overlay is illusory, so going with geoTiff
    should get you there (you can check to see whether 100m resolution differs
    from 250m). The only reason to choose vector-vector would be that the
    Corine vector contains categories not represented in the raster version.
    Using raster also steps around the polygon deficiencies in the GDB (and
    probably SQLite) representations.
   
    Roger
   
    >
    >
    > Regards,
    > Roman.
    >
    > On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]> wrote:
    >
    >    Hello Roman,
    >
    >    A couple of suggested options: You can try and use the arcgisbinding package [1] to pull the data directly into R via ArcGIS, as you mentioned you have ArcGIS available [presumably on Windows or in a VM]. It will access the data directly, and let you create sp and sf objects out of Geodatabases with little work. A basic workflow looks like this:
    >
    >    d <- arc.open("path/file.gdb/layer_name")
    >    df <- arc.select(d) # here, you can filter columns and attributes, see [2]
    >    # create an sf object
    >    df.sf <- arc.data2sf(df)
    >    # alternatively, create an sp object
    >    df.sp <- arc.data2sp(df)
    >
    >    You can also write the results back to a geodatabase, or any other format that ArcGIS understands. Another option is trying the SpatiaLite version of the data at the URL you posted. You should be able to access this using R directly, provided your rgdal installation is correctly built to read spatialite databases. If it is, try the same process you mentioned using rgdal, but point it at the SpatialLite database instead. You could also use command-line OGR to convert the data, that has a few options like where clause filtering that aren't directly available via readOGR.
    >
    >    If neither of these options work, let me know and I can convert the data for you into a format of your preference.
    >
    >    Cheers,
    >    Shaun
    >
    >    1. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
    >    2. https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
    >
    >
    >
    >    On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, Roman" <[hidden email] on behalf of [hidden email]> wrote:
    >
    >        Hi everyone,
    >
    >        I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on
    >
    >        https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
    >
    >        whose size is 2.61 GB once unzipped.
    >
    >        My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on
    >
    >        https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
    >
    >        in order to compute the area covered by each class within each NUTS3 region.
    >
    >        Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.
    >
    >        Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:
    >
    >        ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >
    >        Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
    >        layer: "clc12_Version_18_5"
    >        Driver: OpenFileGDB;
    >        number of rows: 2370829
    >        Feature type: wkbPolygon with 3 dimensions
    >        Extent: (-2693292 -3086662) - (10037210 5440568)
    >        Null geometry IDs: 2156240
    >        CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
    >        Number of fields: 6
    >        name                      type  length  typeName
    >        1 code_12              4       3           String
    >        2 ID                         4       18         String
    >        3 Remark               4       20         String
    >        4 Area_Ha             2       0           Real
    >        5 Shape_Length   2       0           Real
    >        6 Shape_Area       2       0           Real
    >
    >        Then I tried to load it typing
    >
    >        clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >
    >        After 3-5 minutes, it appears the following text:
    >
    >        OGR data source with driver: OpenFileGDB
    >        Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
    >        with 2370829 features
    >        It has 6 fields
    >
    >        Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:
    >
    >        Warning messages:
    >        1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
    >        Dropping null geometries: 2156240
    >        2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
    >        Z-dimension discarded
    >
    >        Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.
    >
    >        By the way, these are my computer and software specifications:
    >
    >        MacBook Pro 2012
    >        Processor 2.6 GHz Intel Core i7
    >        Memory 16 GB DDR3
    >        R 3.5.0
    >        RStudio 1.1.447
    >
    >
    >        Best,
    >        Roman.
    >
    >
    >
    >
    > _______________________________________________
    > R-sig-Geo mailing list
    > [hidden email]
    > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
   
    --
    Roger Bivand
    Department of Economics, Norwegian School of Economics,
    Helleveien 30, N-5045 Bergen, Norway.
    voice: +47 55 95 93 55; e-mail: [hidden email]
    http://orcid.org/0000-0003-2392-6140
    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Roger Bivand
Administrator
On Thu, 3 May 2018, Aguirre Perez, Roman wrote:

> Hi again,
>
> first of all, thanks a lot for your comments. It's becoming quite
> interesting how to perform this task with such amount of data.
>
>
> Here is an update...
>
> Cotton, I could read the geodatabase by using the commands I shared.
> However, my R session crashed after overlaying it with a small set of
> polygons. I guess it was because the size of the sp object (around 14.5
> GB). It's also worth mentioning that it was just a "multipolygon" (is
> that the correct word?) formed by 2370829 polygons. I haven’t succeeded
> on installing the sf package, but I will keep trying it.
>
> Melanie, I already downloaded and read the geoTiff version. At first
> sight, it seems that this object doesn’t have enough features as to
> perform the overlying. I might have a biased idea of how raster objects
> work - I'm too stick to the representation of shapefiles which sounds
> quite related with Roger's idea. So I will start to explore it in order
> to gain a bit more understanding on it.
>
> Roger, I certainly need to know which category is associated with each
> CLC polygon in order to compute the area covered by each of these
> classes within each NUTS3 region. Therefore, I still need to use a
> vector-vector overlay, right?
On the contrary. The geoTiffs are coded as described in the documentation.
Your output is simply the count of raster cells by NUTS3 for each of the
categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
a lot of sea and no data because of the French overseas territories.

You could possibly use PostGIS to do the intersections, or use GRASS for
raster-vector counts (say through rgrass7). In R, you would want to add a
NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.

I would suggest using GRASS as the most obvious route, reading the raster,
reading the NUTS3 boundaries into a separate location, projecting to LAEA,
then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
get 32K output categories, the 48 corine categories times the count of
NUTS3 regions minus the nulls (there aren't many glaciers in most
regions). You'll then need to match back the Corine codes and the NUTS3
codes - see in the category label file shown by r.category.

I'll try to provide code tomorrow.

Roger

>
> I really appreciate any feedback in advance as well as details that I
> should take into account to understand more about how to work with this
> kind of data. I will also keep you up to date on how it goes if you
> like.
>
>
> Best,
> Roman.
>
> On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:
>
>    On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>
>    > Hello Shaun,
>    >
>    > Thanks a lot for replying and providing me alternative options.
>    >
>    >
>    > Unfortunately, I can't try both options anymore as I ran out of ArcGIS
>    > due to I was using a university PC which is not available now (I
>    > installed an ArcGIS trial version there), but I'll try it later. I also
>    > failed on installing the sf package - I'll dig a bit more on it.
>    > Nevertheless, I already downloaded the SpatialLite version so I'll try
>    > the second option and I'll let you know how it goes.
>
>    I think Melanie got it right - the apparent extra detail and precision
>    you'd get from vector-vector overlay is illusory, so going with geoTiff
>    should get you there (you can check to see whether 100m resolution differs
>    from 250m). The only reason to choose vector-vector would be that the
>    Corine vector contains categories not represented in the raster version.
>    Using raster also steps around the polygon deficiencies in the GDB (and
>    probably SQLite) representations.
>
>    Roger
>
>    >
>    >
>    > Regards,
>    > Roman.
>    >
>    > On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]> wrote:
>    >
>    >    Hello Roman,
>    >
>    >    A couple of suggested options: You can try and use the arcgisbinding package [1] to pull the data directly into R via ArcGIS, as you mentioned you have ArcGIS available [presumably on Windows or in a VM]. It will access the data directly, and let you create sp and sf objects out of Geodatabases with little work. A basic workflow looks like this:
>    >
>    >    d <- arc.open("path/file.gdb/layer_name")
>    >    df <- arc.select(d) # here, you can filter columns and attributes, see [2]
>    >    # create an sf object
>    >    df.sf <- arc.data2sf(df)
>    >    # alternatively, create an sp object
>    >    df.sp <- arc.data2sp(df)
>    >
>    >    You can also write the results back to a geodatabase, or any other format that ArcGIS understands. Another option is trying the SpatiaLite version of the data at the URL you posted. You should be able to access this using R directly, provided your rgdal installation is correctly built to read spatialite databases. If it is, try the same process you mentioned using rgdal, but point it at the SpatialLite database instead. You could also use command-line OGR to convert the data, that has a few options like where clause filtering that aren't directly available via readOGR.
>    >
>    >    If neither of these options work, let me know and I can convert the data for you into a format of your preference.
>    >
>    >    Cheers,
>    >    Shaun
>    >
>    >    1. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>    >    2. https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>    >
>    >
>    >
>    >    On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, Roman" <[hidden email] on behalf of [hidden email]> wrote:
>    >
>    >        Hi everyone,
>    >
>    >        I’ve been struggling with a ESRI Geodatabase file (clc12_Version_18_5.gdb) which is a layer of land cover classes available on
>    >
>    >        https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>    >
>    >        whose size is 2.61 GB once unzipped.
>    >
>    >        My ultimate task is to overlay it with the last NUTS3 administrative boundaries shapefile (2013) available on
>    >
>    >        https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>    >
>    >        in order to compute the area covered by each class within each NUTS3 region.
>    >
>    >        Despite the ease of friendly software for performing this task, haven’t been capable of doing it - GRASS didn’t load the file as the log reports problems with the polygons, QGIS shows a warning regarding a specific object and ArcGIS got frozen. I guess it’s because the PC I used doesn’t have enough capacity. Unfortunately, I don’t have access to a more powerful one.
>    >
>    >        Anyway, I decided to try with R -after all, I’ll perform my analysis with it. So I started exploring this GDB with rgdal:
>    >
>    >        ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>    >
>    >        Source: "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>    >        layer: "clc12_Version_18_5"
>    >        Driver: OpenFileGDB;
>    >        number of rows: 2370829
>    >        Feature type: wkbPolygon with 3 dimensions
>    >        Extent: (-2693292 -3086662) - (10037210 5440568)
>    >        Null geometry IDs: 2156240
>    >        CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
>    >        Number of fields: 6
>    >        name                      type  length  typeName
>    >        1 code_12              4       3           String
>    >        2 ID                         4       18         String
>    >        3 Remark               4       20         String
>    >        4 Area_Ha             2       0           Real
>    >        5 Shape_Length   2       0           Real
>    >        6 Shape_Area       2       0           Real
>    >
>    >        Then I tried to load it typing
>    >
>    >        clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>    >
>    >        After 3-5 minutes, it appears the following text:
>    >
>    >        OGR data source with driver: OpenFileGDB
>    >        Source: "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer: "clc12_Version_18_5"
>    >        with 2370829 features
>    >        It has 6 fields
>    >
>    >        Unfortunately, after trying 5 times (each one took around 8 hours) I couldn’t get anything but the following messages:
>    >
>    >        Warning messages:
>    >        1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
>    >        Dropping null geometries: 2156240
>    >        2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer = "clc12_Version_18_5") :
>    >        Z-dimension discarded
>    >
>    >        Could anyone advise me how to tackle it? I also would appreciate suggestions on how to work with geodatabases - it’s my first time I work with these kind of files so I don’t even know their structure.
>    >
>    >        By the way, these are my computer and software specifications:
>    >
>    >        MacBook Pro 2012
>    >        Processor 2.6 GHz Intel Core i7
>    >        Memory 16 GB DDR3
>    >        R 3.5.0
>    >        RStudio 1.1.447
>    >
>    >
>    >        Best,
>    >        Roman.
>    >
>    >
>    >
>    >
>    > _______________________________________________
>    > R-sig-Geo mailing list
>    > [hidden email]
>    > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>    --
>    Roger Bivand
>    Department of Economics, Norwegian School of Economics,
>    Helleveien 30, N-5045 Bergen, Norway.
>    voice: +47 55 95 93 55; e-mail: [hidden email]
>    http://orcid.org/0000-0003-2392-6140
>    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Roger Bivand
Administrator
On Thu, 3 May 2018, Roger Bivand wrote:

> On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>
>>  Hi again,
>>
>>  first of all, thanks a lot for your comments. It's becoming quite
>>  interesting how to perform this task with such amount of data.
>>
>>
>>  Here is an update...
>>
>>  Cotton, I could read the geodatabase by using the commands I shared.
>>  However, my R session crashed after overlaying it with a small set of
>>  polygons. I guess it was because the size of the sp object (around 14.5
>>  GB). It's also worth mentioning that it was just a "multipolygon" (is that
>>  the correct word?) formed by 2370829 polygons. I haven’t succeeded on
>>  installing the sf package, but I will keep trying it.
>>
>>  Melanie, I already downloaded and read the geoTiff version. At first
>>  sight, it seems that this object doesn’t have enough features as to
>>  perform the overlying. I might have a biased idea of how raster objects
>>  work - I'm too stick to the representation of shapefiles which sounds
>>  quite related with Roger's idea. So I will start to explore it in order to
>>  gain a bit more understanding on it.
>>
>>  Roger, I certainly need to know which category is associated with each CLC
>>  polygon in order to compute the area covered by each of these classes
>>  within each NUTS3 region. Therefore, I still need to use a vector-vector
>>  overlay, right?
>
> On the contrary. The geoTiffs are coded as described in the documentation.
> Your output is simply the count of raster cells by NUTS3 for each of the
> categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes a
> lot of sea and no data because of the French overseas territories.
>
> You could possibly use PostGIS to do the intersections, or use GRASS for
> raster-vector counts (say through rgrass7). In R, you would want to add a
> NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
>
> I would suggest using GRASS as the most obvious route, reading the raster,
> reading the NUTS3 boundaries into a separate location, projecting to LAEA,
> then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You get
> 32K output categories, the 48 corine categories times the count of NUTS3
> regions minus the nulls (there aren't many glaciers in most regions). You'll
> then need to match back the Corine codes and the NUTS3 codes - see in the
> category label file shown by r.category.
>
> I'll try to provide code tomorrow.
Combining R and GRASS seems to work, but no guarantees.

library(sp)
library(rgdal)
Gi <- GDALinfo("g250_clc12_V18_5.tif")
makeSG <- function(x) {
   stopifnot(class(x) == "GDALobj")
   p4 <- attr(x, "projection")
   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
     c(x[2], x[1]))
   SpatialGrid(gt, CRS(p4))
}
SG <- makeSG(Gi)
nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
library(rgrass7)
td <- tempdir()
iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
# your GRASS will be where you installed it
writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
   flags="o")
execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
   label_column="FID")
execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
   intern=TRUE)
r_stats1 <- gsub("\\*", "NA", r_stats0)
con_stats <- textConnection(r_stats1)
stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
   "area"), colClasses=c("integer", "numeric"))
close(con_stats)
r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
r_cats1 <- gsub(";", "", r_cats0)
r_cats2 <- gsub("\t", " ", r_cats1)
r_cats3 <- gsub("no data", "no_data", r_cats2)
r_cats4 <- gsub("category ", "", r_cats3)
r_cats4[1] <- paste0(r_cats4[1], "NA NA")
r_cats_split <- strsplit(r_cats4, " ")
cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
   nuts=sapply(r_cats_split, "[", 2),
   corine=as.integer(sapply(r_cats_split, "[", 3)))
catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
   sum)
library(foreign)
corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
o <- match(colnames(agg_areas), as.character(corine_labels$Value))
colnames(agg_areas) <- corine_labels$LABEL3[o]
agg_areas_df <- as.data.frame(agg_areas)
agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"

This should be ready to merge with the NUTS3 boundaries, if needed.

agg_areas_df1$FID <- row.names(agg_areas_df1)
nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")

For the vector parts you could use sf and the provisional rgrass7sf on
github, but that wouldn't yet let you construct a skeleton SpatialGrid to
define the GRASS location. Using GRASS for the heavy lifting (the raster
is 51000 by 35000), and avoiding vector for overlay, this doesn't need
much memory (GRASS handles rasters by row). The GRASS temporary location
only takes 130MB of disk space. You could go for the 100m raster
resolution, but I doubt that the outcome would vary much - anyone like to
try?

If the sub-polygons by NUTS and corine categories are actually needed, the
output of r.cross could be passed to r.to.vect:

execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
   type="area")

but this is more demanding in memory terms.

Interesting case, and it does show that combining GIS and R delivers the
goods - SAGA would probably work equivalently.

Roger

>
> Roger
>
>>
>>  I really appreciate any feedback in advance as well as details that I
>>  should take into account to understand more about how to work with this
>>  kind of data. I will also keep you up to date on how it goes if you like.
>>
>>
>>  Best,
>>  Roman.
>>
>>  On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:
>>
>>     On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>>
>>    >  Hello Shaun,
>>    >
>>    >  Thanks a lot for replying and providing me alternative options.
>>    >
>>    >
>>    >  Unfortunately, I can't try both options anymore as I ran out of
>>    >  ArcGIS
>>    >  due to I was using a university PC which is not available now (I
>>    >  installed an ArcGIS trial version there), but I'll try it later. I
>>    >  also
>>    >  failed on installing the sf package - I'll dig a bit more on it.
>>    >  Nevertheless, I already downloaded the SpatialLite version so I'll
>>    >  try
>>    >  the second option and I'll let you know how it goes.
>>
>>     I think Melanie got it right - the apparent extra detail and precision
>>     you'd get from vector-vector overlay is illusory, so going with geoTiff
>>     should get you there (you can check to see whether 100m resolution
>>     differs
>>     from 250m). The only reason to choose vector-vector would be that the
>>     Corine vector contains categories not represented in the raster
>>     version.
>>     Using raster also steps around the polygon deficiencies in the GDB (and
>>     probably SQLite) representations.
>>
>>     Roger
>>
>>    >
>>    >
>>    >  Regards,
>>    >  Roman.
>>    >
>>    >  On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]> wrote:
>>    >
>>    >     Hello Roman,
>>    >
>>    >     A couple of suggested options: You can try and use the
>>    >     arcgisbinding package [1] to pull the data directly into R via
>>    >     ArcGIS, as you mentioned you have ArcGIS available [presumably on
>>    >     Windows or in a VM]. It will access the data directly, and let you
>>    >     create sp and sf objects out of Geodatabases with little work. A
>>    >     basic workflow looks like this:
>>    >
>>    >     d <- arc.open("path/file.gdb/layer_name")
>>    >     df <- arc.select(d) # here, you can filter columns and attributes,
>>    >     see [2]
>>    >     # create an sf object
>>    >     df.sf <- arc.data2sf(df)
>>    >     # alternatively, create an sp object
>>    >     df.sp <- arc.data2sp(df)
>>    >
>>    >     You can also write the results back to a geodatabase, or any other
>>    >     format that ArcGIS understands. Another option is trying the
>>    >     SpatiaLite version of the data at the URL you posted. You should
>>    >     be able to access this using R directly, provided your rgdal
>>    >     installation is correctly built to read spatialite databases. If
>>    >     it is, try the same process you mentioned using rgdal, but point
>>    >     it at the SpatialLite database instead. You could also use
>>    >     command-line OGR to convert the data, that has a few options like
>>    >     where clause filtering that aren't directly available via readOGR.
>>    >
>>    >     If neither of these options work, let me know and I can convert
>>    >     the data for you into a format of your preference.
>>    >
>>    >     Cheers,
>>    >     Shaun
>>    >
>>    >     1.
>>    >     https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>>    >     2.
>>    >     https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>>    >
>>    >
>>    >
>>    >     On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez, Roman"
>>    >     <[hidden email] on behalf of [hidden email]>
>>    >     wrote:
>>    >
>>    >         Hi everyone,
>>    >
>>    >         I’ve been struggling with a ESRI Geodatabase file
>>    >         (clc12_Version_18_5.gdb) which is a layer of land cover
>>    >         classes available on
>>    >
>>    >         https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>>    >
>>    >         whose size is 2.61 GB once unzipped.
>>    >
>>    >         My ultimate task is to overlay it with the last NUTS3
>>    >         administrative boundaries shapefile (2013) available on
>>    >
>>    >         https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>>    >
>>    >         in order to compute the area covered by each class within each
>>    >         NUTS3 region.
>>    >
>>    >         Despite the ease of friendly software for performing this
>>    >         task, haven’t been capable of doing it - GRASS didn’t load the
>>    >         file as the log reports problems with the polygons, QGIS shows
>>    >         a warning regarding a specific object and ArcGIS got frozen. I
>>    >         guess it’s because the PC I used doesn’t have enough capacity.
>>    >         Unfortunately, I don’t have access to a more powerful one.
>>    >
>>    >         Anyway, I decided to try with R -after all, I’ll perform my
>>    >         analysis with it. So I started exploring this GDB with rgdal:
>>    >
>>    >         ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>    >
>>    >         Source:
>>    >         "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>>    >         layer: "clc12_Version_18_5"
>>    >         Driver: OpenFileGDB;
>>    >         number of rows: 2370829
>>    >         Feature type: wkbPolygon with 3 dimensions
>>    >         Extent: (-2693292 -3086662) - (10037210 5440568)
>>    >         Null geometry IDs: 2156240
>>    >         CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000
>>    >         +ellps=GRS80 +units=m +no_defs
>>    >         Number of fields: 6
>>    >         name                      type  length  typeName
>>    >         1 code_12              4       3           String
>>    >         2 ID                         4       18         String
>>    >         3 Remark               4       20         String
>>    >         4 Area_Ha             2       0           Real
>>    >         5 Shape_Length   2       0           Real
>>    >         6 Shape_Area       2       0           Real
>>    >
>>    >         Then I tried to load it typing
>>    >
>>    >         clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>    >
>>    >         After 3-5 minutes, it appears the following text:
>>    >
>>    >         OGR data source with driver: OpenFileGDB
>>    >         Source:
>>    >         "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb", layer:
>>    >         "clc12_Version_18_5"
>>    >         with 2370829 features
>>    >         It has 6 fields
>>    >
>>    >         Unfortunately, after trying 5 times (each one took around 8
>>    >         hours) I couldn’t get anything but the following messages:
>>    >
>>    >         Warning messages:
>>    >         1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>    >         "clc12_Version_18_5") :
>>    >         Dropping null geometries: 2156240
>>    >         2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>    >         "clc12_Version_18_5") :
>>    >         Z-dimension discarded
>>    >
>>    >         Could anyone advise me how to tackle it? I also would
>>    >         appreciate suggestions on how to work with geodatabases - it’s
>>    >         my first time I work with these kind of files so I don’t even
>>    >         know their structure.
>>    >
>>    >         By the way, these are my computer and software specifications:
>>    >
>>    >         MacBook Pro 2012
>>    >         Processor 2.6 GHz Intel Core i7
>>    >         Memory 16 GB DDR3
>>    >         R 3.5.0
>>    >         RStudio 1.1.447
>>    >
>>    >
>>    >         Best,
>>    >         Roman.
>>    >
>>    >
>>    >
>>    >
>>    >  _______________________________________________
>>    >  R-sig-Geo mailing list
>>    >  [hidden email]
>>    >  https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>     --
>>     Roger Bivand
>>     Department of Economics, Norwegian School of Economics,
>>     Helleveien 30, N-5045 Bergen, Norway.
>>     voice: +47 55 95 93 55; e-mail: [hidden email]
>>     http://orcid.org/0000-0003-2392-6140
>>     https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Roger Bivand
Administrator
On Fri, 4 May 2018, Roger Bivand wrote:

> On Thu, 3 May 2018, Roger Bivand wrote:
>
>>  On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>>
>>>   Hi again,
>>>
>>>   first of all, thanks a lot for your comments. It's becoming quite
>>>   interesting how to perform this task with such amount of data.
>>>
>>>
>>>   Here is an update...
>>>
>>>   Cotton, I could read the geodatabase by using the commands I shared.
>>>   However, my R session crashed after overlaying it with a small set of
>>>   polygons. I guess it was because the size of the sp object (around 14.5
>>>   GB). It's also worth mentioning that it was just a "multipolygon" (is
>>>   that
>>>   the correct word?) formed by 2370829 polygons. I haven’t succeeded on
>>>   installing the sf package, but I will keep trying it.
>>>
>>>   Melanie, I already downloaded and read the geoTiff version. At first
>>>   sight, it seems that this object doesn’t have enough features as to
>>>   perform the overlying. I might have a biased idea of how raster objects
>>>   work - I'm too stick to the representation of shapefiles which sounds
>>>   quite related with Roger's idea. So I will start to explore it in order
>>>   to
>>>   gain a bit more understanding on it.
>>>
>>>   Roger, I certainly need to know which category is associated with each
>>>   CLC
>>>   polygon in order to compute the area covered by each of these classes
>>>   within each NUTS3 region. Therefore, I still need to use a vector-vector
>>>   overlay, right?
>>
>>  On the contrary. The geoTiffs are coded as described in the documentation.
>>  Your output is simply the count of raster cells by NUTS3 for each of the
>>  categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
>>  a lot of sea and no data because of the French overseas territories.
>>
>>  You could possibly use PostGIS to do the intersections, or use GRASS for
>>  raster-vector counts (say through rgrass7). In R, you would want to add a
>>  NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
>>
>>  I would suggest using GRASS as the most obvious route, reading the raster,
>>  reading the NUTS3 boundaries into a separate location, projecting to LAEA,
>>  then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
>>  get 32K output categories, the 48 corine categories times the count of
>>  NUTS3 regions minus the nulls (there aren't many glaciers in most
>>  regions). You'll then need to match back the Corine codes and the NUTS3
>>  codes - see in the category label file shown by r.category.
>>
>>  I'll try to provide code tomorrow.
>
> Combining R and GRASS seems to work, but no guarantees.
>
> library(sp)
> library(rgdal)
> Gi <- GDALinfo("g250_clc12_V18_5.tif")
> makeSG <- function(x) {
>   stopifnot(class(x) == "GDALobj")
>   p4 <- attr(x, "projection")
>   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
>     c(x[2], x[1]))
>  SpatialGrid(gt, CRS(p4))
> }
> SG <- makeSG(Gi)
> nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
> nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
> library(rgrass7)
> td <- tempdir()
> iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
> # your GRASS will be where you installed it
> writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
> execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
>  flags="o")
> execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
>  label_column="FID")
> execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
> r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
>  intern=TRUE)
> r_stats1 <- gsub("\\*", "NA", r_stats0)
> con_stats <- textConnection(r_stats1)
> stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
>  "area"), colClasses=c("integer", "numeric"))
> close(con_stats)
> r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
> r_cats1 <- gsub(";", "", r_cats0)
> r_cats2 <- gsub("\t", " ", r_cats1)
> r_cats3 <- gsub("no data", "no_data", r_cats2)
> r_cats4 <- gsub("category ", "", r_cats3)
> r_cats4[1] <- paste0(r_cats4[1], "NA NA")
> r_cats_split <- strsplit(r_cats4, " ")
> cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
>   nuts=sapply(r_cats_split, "[", 2),
>   corine=as.integer(sapply(r_cats_split, "[", 3)))
> catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
> agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
>  sum)
> library(foreign)
> corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
> o <- match(colnames(agg_areas), as.character(corine_labels$Value))
> colnames(agg_areas) <- corine_labels$LABEL3[o]
> agg_areas_df <- as.data.frame(agg_areas)
> agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
>   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"
>
> This should be ready to merge with the NUTS3 boundaries, if needed.
>
> agg_areas_df1$FID <- row.names(agg_areas_df1)
> nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
>
And to write out as GPKG:

names(nuts_corine)[1] <- "FID_"
writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")

It seems that at least this driver treats "FID" as a reserved field name.

> For the vector parts you could use sf and the provisional rgrass7sf on
> github, but that wouldn't yet let you construct a skeleton SpatialGrid to
> define the GRASS location. Using GRASS for the heavy lifting (the raster is
> 51000 by 35000), and avoiding vector for overlay, this doesn't need much
> memory (GRASS handles rasters by row). The GRASS temporary location only
> takes 130MB of disk space. You could go for the 100m raster resolution, but I
> doubt that the outcome would vary much - anyone like to try?
>
> If the sub-polygons by NUTS and corine categories are actually needed, the
> output of r.cross could be passed to r.to.vect:
>
> execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
>   type="area")
>
> but this is more demanding in memory terms.
Reading into R is not possible, object too large.

Roger

>
> Interesting case, and it does show that combining GIS and R delivers the
> goods - SAGA would probably work equivalently.
>
> Roger
>
>>
>>  Roger
>>
>>>
>>>   I really appreciate any feedback in advance as well as details that I
>>>   should take into account to understand more about how to work with this
>>>   kind of data. I will also keep you up to date on how it goes if you
>>>   like.
>>>
>>>
>>>   Best,
>>>   Roman.
>>>
>>>   On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:
>>>
>>>      On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>>>
>>>    >   Hello Shaun,
>>>    >
>>>    >   Thanks a lot for replying and providing me alternative options.
>>>    >
>>>    >
>>>    >   Unfortunately, I can't try both options anymore as I ran out of
>>>    >   ArcGIS
>>>    >   due to I was using a university PC which is not available now (I
>>>    >   installed an ArcGIS trial version there), but I'll try it later. I
>>>    >   also
>>>    >   failed on installing the sf package - I'll dig a bit more on it.
>>>    >   Nevertheless, I already downloaded the SpatialLite version so I'll
>>>    >   try
>>>    >   the second option and I'll let you know how it goes.
>>>
>>>      I think Melanie got it right - the apparent extra detail and
>>>      precision
>>>      you'd get from vector-vector overlay is illusory, so going with
>>>      geoTiff
>>>      should get you there (you can check to see whether 100m resolution
>>>      differs
>>>      from 250m). The only reason to choose vector-vector would be that the
>>>      Corine vector contains categories not represented in the raster
>>>      version.
>>>      Using raster also steps around the polygon deficiencies in the GDB
>>>      (and
>>>      probably SQLite) representations.
>>>
>>>      Roger
>>>
>>>    >
>>>    >
>>>    >   Regards,
>>>    >   Roman.
>>>    >
>>>    >   On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]>
>>>    >   wrote:
>>>    >
>>>    >      Hello Roman,
>>>    >
>>>    >      A couple of suggested options: You can try and use the
>>>    >      arcgisbinding package [1] to pull the data directly into R via
>>>    >      ArcGIS, as you mentioned you have ArcGIS available [presumably
>>>    >      on
>>>    >      Windows or in a VM]. It will access the data directly, and let
>>>    >      you
>>>    >      create sp and sf objects out of Geodatabases with little work. A
>>>    >      basic workflow looks like this:
>>>    >
>>>    >      d <- arc.open("path/file.gdb/layer_name")
>>>    >      df <- arc.select(d) # here, you can filter columns and
>>>    >      attributes,
>>>    >      see [2]
>>>    >      # create an sf object
>>>    >      df.sf <- arc.data2sf(df)
>>>    >      # alternatively, create an sp object
>>>    >      df.sp <- arc.data2sp(df)
>>>    >
>>>    >      You can also write the results back to a geodatabase, or any
>>>    >      other
>>>    >      format that ArcGIS understands. Another option is trying the
>>>    >      SpatiaLite version of the data at the URL you posted. You should
>>>    >      be able to access this using R directly, provided your rgdal
>>>    >      installation is correctly built to read spatialite databases. If
>>>    >      it is, try the same process you mentioned using rgdal, but point
>>>    >      it at the SpatialLite database instead. You could also use
>>>    >      command-line OGR to convert the data, that has a few options
>>>    >      like
>>>    >      where clause filtering that aren't directly available via
>>>    >      readOGR.
>>>    >
>>>    >      If neither of these options work, let me know and I can convert
>>>    >      the data for you into a format of your preference.
>>>    >
>>>    >      Cheers,
>>>    >      Shaun
>>>    >
>>>    >      1.
>>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>>>    >      2.
>>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>>>    >
>>>    >
>>>    >
>>>    >      On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
>>>    >      Roman"
>>>    >      <[hidden email] on behalf of
>>>    >      [hidden email]>
>>>    >      wrote:
>>>    >
>>>    >          Hi everyone,
>>>    >
>>>    >          I’ve been struggling with a ESRI Geodatabase file
>>>    >          (clc12_Version_18_5.gdb) which is a layer of land cover
>>>    >          classes available on
>>>    >
>>>    >          https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>>>    >
>>>    >          whose size is 2.61 GB once unzipped.
>>>    >
>>>    >          My ultimate task is to overlay it with the last NUTS3
>>>    >          administrative boundaries shapefile (2013) available on
>>>    >
>>>    >          https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>>>    >
>>>    >          in order to compute the area covered by each class within
>>>    >          each
>>>    >          NUTS3 region.
>>>    >
>>>    >          Despite the ease of friendly software for performing this
>>>    >          task, haven’t been capable of doing it - GRASS didn’t load
>>>    >          the
>>>    >          file as the log reports problems with the polygons, QGIS
>>>    >          shows
>>>    >          a warning regarding a specific object and ArcGIS got frozen.
>>>    >          I
>>>    >          guess it’s because the PC I used doesn’t have enough
>>>    >          capacity.
>>>    >          Unfortunately, I don’t have access to a more powerful one.
>>>    >
>>>    >          Anyway, I decided to try with R -after all, I’ll perform my
>>>    >          analysis with it. So I started exploring this GDB with
>>>    >          rgdal:
>>>    >
>>>    >          ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>>    >
>>>    >          Source:
>>>    >          "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>>>    >          layer: "clc12_Version_18_5"
>>>    >          Driver: OpenFileGDB;
>>>    >          number of rows: 2370829
>>>    >          Feature type: wkbPolygon with 3 dimensions
>>>    >          Extent: (-2693292 -3086662) - (10037210 5440568)
>>>    >          Null geometry IDs: 2156240
>>>    >          CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
>>>    >          +y_0=3210000
>>>    >          +ellps=GRS80 +units=m +no_defs
>>>    >          Number of fields: 6
>>>    >          name                      type  length  typeName
>>>    >          1 code_12              4       3           String
>>>    >          2 ID                         4       18         String
>>>    >          3 Remark               4       20         String
>>>    >          4 Area_Ha             2       0           Real
>>>    >          5 Shape_Length   2       0           Real
>>>    >          6 Shape_Area       2       0           Real
>>>    >
>>>    >          Then I tried to load it typing
>>>    >
>>>    >          clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>>>    >
>>>    >          After 3-5 minutes, it appears the following text:
>>>    >
>>>    >          OGR data source with driver: OpenFileGDB
>>>    >          Source:
>>>    >          "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
>>>    >          layer:
>>>    >          "clc12_Version_18_5"
>>>    >          with 2370829 features
>>>    >          It has 6 fields
>>>    >
>>>    >          Unfortunately, after trying 5 times (each one took around 8
>>>    >          hours) I couldn’t get anything but the following messages:
>>>    >
>>>    >          Warning messages:
>>>    >          1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>>    >          "clc12_Version_18_5") :
>>>    >          Dropping null geometries: 2156240
>>>    >          2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>>>    >          "clc12_Version_18_5") :
>>>    >          Z-dimension discarded
>>>    >
>>>    >          Could anyone advise me how to tackle it? I also would
>>>    >          appreciate suggestions on how to work with geodatabases -
>>>    >          it’s
>>>    >          my first time I work with these kind of files so I don’t
>>>    >          even
>>>    >          know their structure.
>>>    >
>>>    >          By the way, these are my computer and software
>>>    >          specifications:
>>>    >
>>>    >          MacBook Pro 2012
>>>    >          Processor 2.6 GHz Intel Core i7
>>>    >          Memory 16 GB DDR3
>>>    >          R 3.5.0
>>>    >          RStudio 1.1.447
>>>    >
>>>    >
>>>    >          Best,
>>>    >          Roman.
>>>    >
>>>    >
>>>    >
>>>    >
>>>    >   _______________________________________________
>>>    >   R-sig-Geo mailing list
>>>    >   [hidden email]
>>>    >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>      --
>>>      Roger Bivand
>>>      Department of Economics, Norwegian School of Economics,
>>>      Helleveien 30, N-5045 Bergen, Norway.
>>>      voice: +47 55 95 93 55; e-mail: [hidden email]
>>>      http://orcid.org/0000-0003-2392-6140
>>>      https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>>
>>
>>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Aguirre Perez, Roman
On 05/05/2018, 12:03, "Roger Bivand" <[hidden email]> wrote:

    On Fri, 4 May 2018, Roger Bivand wrote:
   
    > On Thu, 3 May 2018, Roger Bivand wrote:
    >
    >>  On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
    >>
    >>>   Hi again,
    >>>
    >>>   first of all, thanks a lot for your comments. It's becoming quite
    >>>   interesting how to perform this task with such amount of data.
    >>>
    >>>
    >>>   Here is an update...
    >>>
    >>>   Cotton, I could read the geodatabase by using the commands I shared.
    >>>   However, my R session crashed after overlaying it with a small set of
    >>>   polygons. I guess it was because the size of the sp object (around 14.5
    >>>   GB). It's also worth mentioning that it was just a "multipolygon" (is
    >>>   that
    >>>   the correct word?) formed by 2370829 polygons. I haven’t succeeded on
    >>>   installing the sf package, but I will keep trying it.
    >>>
    >>>   Melanie, I already downloaded and read the geoTiff version. At first
    >>>   sight, it seems that this object doesn’t have enough features as to
    >>>   perform the overlying. I might have a biased idea of how raster objects
    >>>   work - I'm too stick to the representation of shapefiles which sounds
    >>>   quite related with Roger's idea. So I will start to explore it in order
    >>>   to
    >>>   gain a bit more understanding on it.
    >>>
    >>>   Roger, I certainly need to know which category is associated with each
    >>>   CLC
    >>>   polygon in order to compute the area covered by each of these classes
    >>>   within each NUTS3 region. Therefore, I still need to use a vector-vector
    >>>   overlay, right?
    >>
    >>  On the contrary. The geoTiffs are coded as described in the documentation.
    >>  Your output is simply the count of raster cells by NUTS3 for each of the
    >>  categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
    >>  a lot of sea and no data because of the French overseas territories.
    >>
    >>  You could possibly use PostGIS to do the intersections, or use GRASS for
    >>  raster-vector counts (say through rgrass7). In R, you would want to add a
    >>  NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
    >>
    >>  I would suggest using GRASS as the most obvious route, reading the raster,
    >>  reading the NUTS3 boundaries into a separate location, projecting to LAEA,
    >>  then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
    >>  get 32K output categories, the 48 corine categories times the count of
    >>  NUTS3 regions minus the nulls (there aren't many glaciers in most
    >>  regions). You'll then need to match back the Corine codes and the NUTS3
    >>  codes - see in the category label file shown by r.category.
    >>
    >>  I'll try to provide code tomorrow.
    >
    > Combining R and GRASS seems to work, but no guarantees.
    >
    > library(sp)
    > library(rgdal)
    > Gi <- GDALinfo("g250_clc12_V18_5.tif")
    > makeSG <- function(x) {
    >   stopifnot(class(x) == "GDALobj")
    >   p4 <- attr(x, "projection")
    >   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
    >     c(x[2], x[1]))
    >  SpatialGrid(gt, CRS(p4))
    > }
    > SG <- makeSG(Gi)
    > nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
    > nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
    > library(rgrass7)
    > td <- tempdir()
    > iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
    > # your GRASS will be where you installed it
    > writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
    > execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
    >  flags="o")
    > execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
    >  label_column="FID")
    > execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
    > r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
    >  intern=TRUE)
    > r_stats1 <- gsub("\\*", "NA", r_stats0)
    > con_stats <- textConnection(r_stats1)
    > stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
    >  "area"), colClasses=c("integer", "numeric"))
    > close(con_stats)
    > r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
    > r_cats1 <- gsub(";", "", r_cats0)
    > r_cats2 <- gsub("\t", " ", r_cats1)
    > r_cats3 <- gsub("no data", "no_data", r_cats2)
    > r_cats4 <- gsub("category ", "", r_cats3)
    > r_cats4[1] <- paste0(r_cats4[1], "NA NA")
    > r_cats_split <- strsplit(r_cats4, " ")
    > cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
    >   nuts=sapply(r_cats_split, "[", 2),
    >   corine=as.integer(sapply(r_cats_split, "[", 3)))
    > catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
    > agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
    >  sum)
    > library(foreign)
    > corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
    > o <- match(colnames(agg_areas), as.character(corine_labels$Value))
    > colnames(agg_areas) <- corine_labels$LABEL3[o]
    > agg_areas_df <- as.data.frame(agg_areas)
    > agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
    >   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"
    >
    > This should be ready to merge with the NUTS3 boundaries, if needed.
    >
    > agg_areas_df1$FID <- row.names(agg_areas_df1)
    > nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
    >
   
    And to write out as GPKG:
   
    names(nuts_corine)[1] <- "FID_"
    writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")
   
    It seems that at least this driver treats "FID" as a reserved field name.
   
    > For the vector parts you could use sf and the provisional rgrass7sf on
    > github, but that wouldn't yet let you construct a skeleton SpatialGrid to
    > define the GRASS location. Using GRASS for the heavy lifting (the raster is
    > 51000 by 35000), and avoiding vector for overlay, this doesn't need much
    > memory (GRASS handles rasters by row). The GRASS temporary location only
    > takes 130MB of disk space. You could go for the 100m raster resolution, but I
    > doubt that the outcome would vary much - anyone like to try?
    >
    > If the sub-polygons by NUTS and corine categories are actually needed, the
    > output of r.cross could be passed to r.to.vect:
    >
    > execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
    >   type="area")
    >
    > but this is more demanding in memory terms.

Hi Roger,

thanks a lot again for helping me with this issue.


Unfortunately, I haven't managed to call GRASS within R, neither install sf. My guess is that those issues are because I'm using both newest versions OS X High Sierra and R 3.5.0 which are not stable yet.  

    Reading into R is not possible, object too large.

Bearing in mind the previous issues plus this last comment, I decided to use a PC with another OS (Ubuntu) aiming at solving them and working with a bit more space (32 GB).  
   
    Roger

In the meantime, I really would appreciate you guidance for gaining background on this subject. Could you please suggest me resources to better understand the features of objects involved in Spatial Statistics? I should mention that the knowledge I have about it is resumed in your book Applied Spatial Data Analysis with R.


Best,
Roman.    

    >
    > Interesting case, and it does show that combining GIS and R delivers the
    > goods - SAGA would probably work equivalently.
    >
    > Roger
    >
    >>
    >>  Roger
    >>
    >>>
    >>>   I really appreciate any feedback in advance as well as details that I
    >>>   should take into account to understand more about how to work with this
    >>>   kind of data. I will also keep you up to date on how it goes if you
    >>>   like.
    >>>
    >>>
    >>>   Best,
    >>>   Roman.
    >>>
    >>>   On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:
    >>>
    >>>      On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
    >>>
    >>>    >   Hello Shaun,
    >>>    >
    >>>    >   Thanks a lot for replying and providing me alternative options.
    >>>    >
    >>>    >
    >>>    >   Unfortunately, I can't try both options anymore as I ran out of
    >>>    >   ArcGIS
    >>>    >   due to I was using a university PC which is not available now (I
    >>>    >   installed an ArcGIS trial version there), but I'll try it later. I
    >>>    >   also
    >>>    >   failed on installing the sf package - I'll dig a bit more on it.
    >>>    >   Nevertheless, I already downloaded the SpatialLite version so I'll
    >>>    >   try
    >>>    >   the second option and I'll let you know how it goes.
    >>>
    >>>      I think Melanie got it right - the apparent extra detail and
    >>>      precision
    >>>      you'd get from vector-vector overlay is illusory, so going with
    >>>      geoTiff
    >>>      should get you there (you can check to see whether 100m resolution
    >>>      differs
    >>>      from 250m). The only reason to choose vector-vector would be that the
    >>>      Corine vector contains categories not represented in the raster
    >>>      version.
    >>>      Using raster also steps around the polygon deficiencies in the GDB
    >>>      (and
    >>>      probably SQLite) representations.
    >>>
    >>>      Roger
    >>>
    >>>    >
    >>>    >
    >>>    >   Regards,
    >>>    >   Roman.
    >>>    >
    >>>    >   On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]>
    >>>    >   wrote:
    >>>    >
    >>>    >      Hello Roman,
    >>>    >
    >>>    >      A couple of suggested options: You can try and use the
    >>>    >      arcgisbinding package [1] to pull the data directly into R via
    >>>    >      ArcGIS, as you mentioned you have ArcGIS available [presumably
    >>>    >      on
    >>>    >      Windows or in a VM]. It will access the data directly, and let
    >>>    >      you
    >>>    >      create sp and sf objects out of Geodatabases with little work. A
    >>>    >      basic workflow looks like this:
    >>>    >
    >>>    >      d <- arc.open("path/file.gdb/layer_name")
    >>>    >      df <- arc.select(d) # here, you can filter columns and
    >>>    >      attributes,
    >>>    >      see [2]
    >>>    >      # create an sf object
    >>>    >      df.sf <- arc.data2sf(df)
    >>>    >      # alternatively, create an sp object
    >>>    >      df.sp <- arc.data2sp(df)
    >>>    >
    >>>    >      You can also write the results back to a geodatabase, or any
    >>>    >      other
    >>>    >      format that ArcGIS understands. Another option is trying the
    >>>    >      SpatiaLite version of the data at the URL you posted. You should
    >>>    >      be able to access this using R directly, provided your rgdal
    >>>    >      installation is correctly built to read spatialite databases. If
    >>>    >      it is, try the same process you mentioned using rgdal, but point
    >>>    >      it at the SpatialLite database instead. You could also use
    >>>    >      command-line OGR to convert the data, that has a few options
    >>>    >      like
    >>>    >      where clause filtering that aren't directly available via
    >>>    >      readOGR.
    >>>    >
    >>>    >      If neither of these options work, let me know and I can convert
    >>>    >      the data for you into a format of your preference.
    >>>    >
    >>>    >      Cheers,
    >>>    >      Shaun
    >>>    >
    >>>    >      1.
    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
    >>>    >      2.
    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
    >>>    >
    >>>    >
    >>>    >
    >>>    >      On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
    >>>    >      Roman"
    >>>    >      <[hidden email] on behalf of
    >>>    >      [hidden email]>
    >>>    >      wrote:
    >>>    >
    >>>    >          Hi everyone,
    >>>    >
    >>>    >          I’ve been struggling with a ESRI Geodatabase file
    >>>    >          (clc12_Version_18_5.gdb) which is a layer of land cover
    >>>    >          classes available on
    >>>    >
    >>>    >          https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
    >>>    >
    >>>    >          whose size is 2.61 GB once unzipped.
    >>>    >
    >>>    >          My ultimate task is to overlay it with the last NUTS3
    >>>    >          administrative boundaries shapefile (2013) available on
    >>>    >
    >>>    >          https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
    >>>    >
    >>>    >          in order to compute the area covered by each class within
    >>>    >          each
    >>>    >          NUTS3 region.
    >>>    >
    >>>    >          Despite the ease of friendly software for performing this
    >>>    >          task, haven’t been capable of doing it - GRASS didn’t load
    >>>    >          the
    >>>    >          file as the log reports problems with the polygons, QGIS
    >>>    >          shows
    >>>    >          a warning regarding a specific object and ArcGIS got frozen.
    >>>    >          I
    >>>    >          guess it’s because the PC I used doesn’t have enough
    >>>    >          capacity.
    >>>    >          Unfortunately, I don’t have access to a more powerful one.
    >>>    >
    >>>    >          Anyway, I decided to try with R -after all, I’ll perform my
    >>>    >          analysis with it. So I started exploring this GDB with
    >>>    >          rgdal:
    >>>    >
    >>>    >          ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >>>    >
    >>>    >          Source:
    >>>    >          "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
    >>>    >          layer: "clc12_Version_18_5"
    >>>    >          Driver: OpenFileGDB;
    >>>    >          number of rows: 2370829
    >>>    >          Feature type: wkbPolygon with 3 dimensions
    >>>    >          Extent: (-2693292 -3086662) - (10037210 5440568)
    >>>    >          Null geometry IDs: 2156240
    >>>    >          CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
    >>>    >          +y_0=3210000
    >>>    >          +ellps=GRS80 +units=m +no_defs
    >>>    >          Number of fields: 6
    >>>    >          name                      type  length  typeName
    >>>    >          1 code_12              4       3           String
    >>>    >          2 ID                         4       18         String
    >>>    >          3 Remark               4       20         String
    >>>    >          4 Area_Ha             2       0           Real
    >>>    >          5 Shape_Length   2       0           Real
    >>>    >          6 Shape_Area       2       0           Real
    >>>    >
    >>>    >          Then I tried to load it typing
    >>>    >
    >>>    >          clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >>>    >
    >>>    >          After 3-5 minutes, it appears the following text:
    >>>    >
    >>>    >          OGR data source with driver: OpenFileGDB
    >>>    >          Source:
    >>>    >          "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
    >>>    >          layer:
    >>>    >          "clc12_Version_18_5"
    >>>    >          with 2370829 features
    >>>    >          It has 6 fields
    >>>    >
    >>>    >          Unfortunately, after trying 5 times (each one took around 8
    >>>    >          hours) I couldn’t get anything but the following messages:
    >>>    >
    >>>    >          Warning messages:
    >>>    >          1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
    >>>    >          "clc12_Version_18_5") :
    >>>    >          Dropping null geometries: 2156240
    >>>    >          2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
    >>>    >          "clc12_Version_18_5") :
    >>>    >          Z-dimension discarded
    >>>    >
    >>>    >          Could anyone advise me how to tackle it? I also would
    >>>    >          appreciate suggestions on how to work with geodatabases -
    >>>    >          it’s
    >>>    >          my first time I work with these kind of files so I don’t
    >>>    >          even
    >>>    >          know their structure.
    >>>    >
    >>>    >          By the way, these are my computer and software
    >>>    >          specifications:
    >>>    >
    >>>    >          MacBook Pro 2012
    >>>    >          Processor 2.6 GHz Intel Core i7
    >>>    >          Memory 16 GB DDR3
    >>>    >          R 3.5.0
    >>>    >          RStudio 1.1.447
    >>>    >
    >>>    >
    >>>    >          Best,
    >>>    >          Roman.
    >>>    >
    >>>    >
    >>>    >
    >>>    >
    >>>    >   _______________________________________________
    >>>    >   R-sig-Geo mailing list
    >>>    >   [hidden email]
    >>>    >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
    >>>
    >>>      --
    >>>      Roger Bivand
    >>>      Department of Economics, Norwegian School of Economics,
    >>>      Helleveien 30, N-5045 Bergen, Norway.
    >>>      voice: +47 55 95 93 55; e-mail: [hidden email]
    >>>      http://orcid.org/0000-0003-2392-6140
    >>>      https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
    >>>
    >>>
    >>
    >>
    >
    >
   
    --
    Roger Bivand
    Department of Economics, Norwegian School of Economics,
    Helleveien 30, N-5045 Bergen, Norway.
    voice: +47 55 95 93 55; e-mail: [hidden email]
    http://orcid.org/0000-0003-2392-6140
    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Roger Bivand
Administrator
On Sun, 6 May 2018, Aguirre Perez, Roman wrote:

> On 05/05/2018, 12:03, "Roger Bivand" <[hidden email]> wrote:
>
>    On Fri, 4 May 2018, Roger Bivand wrote:
>
>    > On Thu, 3 May 2018, Roger Bivand wrote:
>    >
>    >>  On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>    >>
>    >>>   Hi again,
>    >>>
>    >>>   first of all, thanks a lot for your comments. It's becoming quite
>    >>>   interesting how to perform this task with such amount of data.
>    >>>
>    >>>
>    >>>   Here is an update...
>    >>>
>    >>>   Cotton, I could read the geodatabase by using the commands I shared.
>    >>>   However, my R session crashed after overlaying it with a small set of
>    >>>   polygons. I guess it was because the size of the sp object (around 14.5
>    >>>   GB). It's also worth mentioning that it was just a "multipolygon" (is
>    >>>   that
>    >>>   the correct word?) formed by 2370829 polygons. I haven’t succeeded on
>    >>>   installing the sf package, but I will keep trying it.
>    >>>
>    >>>   Melanie, I already downloaded and read the geoTiff version. At first
>    >>>   sight, it seems that this object doesn’t have enough features as to
>    >>>   perform the overlying. I might have a biased idea of how raster objects
>    >>>   work - I'm too stick to the representation of shapefiles which sounds
>    >>>   quite related with Roger's idea. So I will start to explore it in order
>    >>>   to
>    >>>   gain a bit more understanding on it.
>    >>>
>    >>>   Roger, I certainly need to know which category is associated with each
>    >>>   CLC
>    >>>   polygon in order to compute the area covered by each of these classes
>    >>>   within each NUTS3 region. Therefore, I still need to use a vector-vector
>    >>>   overlay, right?
>    >>
>    >>  On the contrary. The geoTiffs are coded as described in the documentation.
>    >>  Your output is simply the count of raster cells by NUTS3 for each of the
>    >>  categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
>    >>  a lot of sea and no data because of the French overseas territories.
>    >>
>    >>  You could possibly use PostGIS to do the intersections, or use GRASS for
>    >>  raster-vector counts (say through rgrass7). In R, you would want to add a
>    >>  NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
>    >>
>    >>  I would suggest using GRASS as the most obvious route, reading the raster,
>    >>  reading the NUTS3 boundaries into a separate location, projecting to LAEA,
>    >>  then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
>    >>  get 32K output categories, the 48 corine categories times the count of
>    >>  NUTS3 regions minus the nulls (there aren't many glaciers in most
>    >>  regions). You'll then need to match back the Corine codes and the NUTS3
>    >>  codes - see in the category label file shown by r.category.
>    >>
>    >>  I'll try to provide code tomorrow.
>    >
>    > Combining R and GRASS seems to work, but no guarantees.
>    >
>    > library(sp)
>    > library(rgdal)
>    > Gi <- GDALinfo("g250_clc12_V18_5.tif")
>    > makeSG <- function(x) {
>    >   stopifnot(class(x) == "GDALobj")
>    >   p4 <- attr(x, "projection")
>    >   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
>    >     c(x[2], x[1]))
>    >  SpatialGrid(gt, CRS(p4))
>    > }
>    > SG <- makeSG(Gi)
>    > nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
>    > nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
>    > library(rgrass7)
>    > td <- tempdir()
>    > iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
>    > # your GRASS will be where you installed it
>    > writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
>    > execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
>    >  flags="o")
>    > execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
>    >  label_column="FID")
>    > execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
>    > r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
>    >  intern=TRUE)
>    > r_stats1 <- gsub("\\*", "NA", r_stats0)
>    > con_stats <- textConnection(r_stats1)
>    > stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
>    >  "area"), colClasses=c("integer", "numeric"))
>    > close(con_stats)
>    > r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
>    > r_cats1 <- gsub(";", "", r_cats0)
>    > r_cats2 <- gsub("\t", " ", r_cats1)
>    > r_cats3 <- gsub("no data", "no_data", r_cats2)
>    > r_cats4 <- gsub("category ", "", r_cats3)
>    > r_cats4[1] <- paste0(r_cats4[1], "NA NA")
>    > r_cats_split <- strsplit(r_cats4, " ")
>    > cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
>    >   nuts=sapply(r_cats_split, "[", 2),
>    >   corine=as.integer(sapply(r_cats_split, "[", 3)))
>    > catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
>    > agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
>    >  sum)
>    > library(foreign)
>    > corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
>    > o <- match(colnames(agg_areas), as.character(corine_labels$Value))
>    > colnames(agg_areas) <- corine_labels$LABEL3[o]
>    > agg_areas_df <- as.data.frame(agg_areas)
>    > agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
>    >   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"
>    >
>    > This should be ready to merge with the NUTS3 boundaries, if needed.
>    >
>    > agg_areas_df1$FID <- row.names(agg_areas_df1)
>    > nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
>    >
>
>    And to write out as GPKG:
>
>    names(nuts_corine)[1] <- "FID_"
>    writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")
>
>    It seems that at least this driver treats "FID" as a reserved field name.
>
>    > For the vector parts you could use sf and the provisional rgrass7sf on
>    > github, but that wouldn't yet let you construct a skeleton SpatialGrid to
>    > define the GRASS location. Using GRASS for the heavy lifting (the raster is
>    > 51000 by 35000), and avoiding vector for overlay, this doesn't need much
>    > memory (GRASS handles rasters by row). The GRASS temporary location only
>    > takes 130MB of disk space. You could go for the 100m raster resolution, but I
>    > doubt that the outcome would vary much - anyone like to try?
>    >
>    > If the sub-polygons by NUTS and corine categories are actually needed, the
>    > output of r.cross could be passed to r.to.vect:
>    >
>    > execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
>    >   type="area")
>    >
>    > but this is more demanding in memory terms.
>
> Hi Roger,
>
> thanks a lot again for helping me with this issue.
>
>
> Unfortunately, I haven't managed to call GRASS within R, neither install
> sf. My guess is that those issues are because I'm using both newest
> versions OS X High Sierra and R 3.5.0 which are not stable yet.
You do not need sf. You do need GRASS, and, yes, OSX causes a lot of
problems because it is no longer attentive to the needs of numerical
analysts (certification of installs mandated at cost to FOSS communities;
gaping empty root password; only interested in pay-for software; never
contributes to FOSS AFAIK). Apple does not contribute to R.

>
>    Reading into R is not possible, object too large.
>
> Bearing in mind the previous issues plus this last comment, I decided to
> use a PC with another OS (Ubuntu) aiming at solving them and working
> with a bit more space (32 GB).

Wrong comment. The vector corine object is impossible to handle anywhere,
and probably should never have been constructed. I really doubt that it is
more "accurate" than the 100m raster, and doubt that the areas sums by
NUTS3 region differ much between 250m and 100m.

You need to go back to the way in which Corine was derived, and to your
actual needs (which you have not described). If what you need are areas of
Corine classes by NUTS3 region, the code I provided gives you this. If
what you need is the full intersection of Corine classes and all NUTS3
regions, then you cannot visualize them (without zooming - rendering is
raster anyway) nor can you analyze them in any meaningful way, so you'd
need to revisit you research question. An intersection of a subset of
NUTS3 regions yielding raster patches or vector entities is possible and
could be visualized, but you seem to want the full extent of the NUTS3
regions.

If you need the intersection between one/few NUTS3 region(s) and Corine,
you can still use the raster approach, but I don't know PostGIS well
enough to advise (forthcoming R Journal article;
https://cran.r-project.org/package=rpostgis). You can read the NUTS3
boundaries into R, but probably neither of the Corine vector files (not
just memory, you mentioned earlier that they suffered from topological
issues which would prevent intersection without intensive cleaning.

Using GRASS, you get the best of both worlds, script in R and run the
analysis at scale in GRASS using memory-conserving implementations.

Roger

>
>    Roger
>
> In the meantime, I really would appreciate you guidance for gaining
> background on this subject. Could you please suggest me resources to
> better understand the features of objects involved in Spatial
> Statistics? I should mention that the knowledge I have about it is
> resumed in your book Applied Spatial Data Analysis with R.
>
>
> Best,
> Roman.
>
>    >
>    > Interesting case, and it does show that combining GIS and R delivers the
>    > goods - SAGA would probably work equivalently.
>    >
>    > Roger
>    >
>    >>
>    >>  Roger
>    >>
>    >>>
>    >>>   I really appreciate any feedback in advance as well as details that I
>    >>>   should take into account to understand more about how to work with this
>    >>>   kind of data. I will also keep you up to date on how it goes if you
>    >>>   like.
>    >>>
>    >>>
>    >>>   Best,
>    >>>   Roman.
>    >>>
>    >>>   On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:
>    >>>
>    >>>      On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>    >>>
>    >>>    >   Hello Shaun,
>    >>>    >
>    >>>    >   Thanks a lot for replying and providing me alternative options.
>    >>>    >
>    >>>    >
>    >>>    >   Unfortunately, I can't try both options anymore as I ran out of
>    >>>    >   ArcGIS
>    >>>    >   due to I was using a university PC which is not available now (I
>    >>>    >   installed an ArcGIS trial version there), but I'll try it later. I
>    >>>    >   also
>    >>>    >   failed on installing the sf package - I'll dig a bit more on it.
>    >>>    >   Nevertheless, I already downloaded the SpatialLite version so I'll
>    >>>    >   try
>    >>>    >   the second option and I'll let you know how it goes.
>    >>>
>    >>>      I think Melanie got it right - the apparent extra detail and
>    >>>      precision
>    >>>      you'd get from vector-vector overlay is illusory, so going with
>    >>>      geoTiff
>    >>>      should get you there (you can check to see whether 100m resolution
>    >>>      differs
>    >>>      from 250m). The only reason to choose vector-vector would be that the
>    >>>      Corine vector contains categories not represented in the raster
>    >>>      version.
>    >>>      Using raster also steps around the polygon deficiencies in the GDB
>    >>>      (and
>    >>>      probably SQLite) representations.
>    >>>
>    >>>      Roger
>    >>>
>    >>>    >
>    >>>    >
>    >>>    >   Regards,
>    >>>    >   Roman.
>    >>>    >
>    >>>    >   On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]>
>    >>>    >   wrote:
>    >>>    >
>    >>>    >      Hello Roman,
>    >>>    >
>    >>>    >      A couple of suggested options: You can try and use the
>    >>>    >      arcgisbinding package [1] to pull the data directly into R via
>    >>>    >      ArcGIS, as you mentioned you have ArcGIS available [presumably
>    >>>    >      on
>    >>>    >      Windows or in a VM]. It will access the data directly, and let
>    >>>    >      you
>    >>>    >      create sp and sf objects out of Geodatabases with little work. A
>    >>>    >      basic workflow looks like this:
>    >>>    >
>    >>>    >      d <- arc.open("path/file.gdb/layer_name")
>    >>>    >      df <- arc.select(d) # here, you can filter columns and
>    >>>    >      attributes,
>    >>>    >      see [2]
>    >>>    >      # create an sf object
>    >>>    >      df.sf <- arc.data2sf(df)
>    >>>    >      # alternatively, create an sp object
>    >>>    >      df.sp <- arc.data2sp(df)
>    >>>    >
>    >>>    >      You can also write the results back to a geodatabase, or any
>    >>>    >      other
>    >>>    >      format that ArcGIS understands. Another option is trying the
>    >>>    >      SpatiaLite version of the data at the URL you posted. You should
>    >>>    >      be able to access this using R directly, provided your rgdal
>    >>>    >      installation is correctly built to read spatialite databases. If
>    >>>    >      it is, try the same process you mentioned using rgdal, but point
>    >>>    >      it at the SpatialLite database instead. You could also use
>    >>>    >      command-line OGR to convert the data, that has a few options
>    >>>    >      like
>    >>>    >      where clause filtering that aren't directly available via
>    >>>    >      readOGR.
>    >>>    >
>    >>>    >      If neither of these options work, let me know and I can convert
>    >>>    >      the data for you into a format of your preference.
>    >>>    >
>    >>>    >      Cheers,
>    >>>    >      Shaun
>    >>>    >
>    >>>    >      1.
>    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>    >>>    >      2.
>    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>    >>>    >
>    >>>    >
>    >>>    >
>    >>>    >      On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
>    >>>    >      Roman"
>    >>>    >      <[hidden email] on behalf of
>    >>>    >      [hidden email]>
>    >>>    >      wrote:
>    >>>    >
>    >>>    >          Hi everyone,
>    >>>    >
>    >>>    >          I’ve been struggling with a ESRI Geodatabase file
>    >>>    >          (clc12_Version_18_5.gdb) which is a layer of land cover
>    >>>    >          classes available on
>    >>>    >
>    >>>    >          https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>    >>>    >
>    >>>    >          whose size is 2.61 GB once unzipped.
>    >>>    >
>    >>>    >          My ultimate task is to overlay it with the last NUTS3
>    >>>    >          administrative boundaries shapefile (2013) available on
>    >>>    >
>    >>>    >          https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>    >>>    >
>    >>>    >          in order to compute the area covered by each class within
>    >>>    >          each
>    >>>    >          NUTS3 region.
>    >>>    >
>    >>>    >          Despite the ease of friendly software for performing this
>    >>>    >          task, haven’t been capable of doing it - GRASS didn’t load
>    >>>    >          the
>    >>>    >          file as the log reports problems with the polygons, QGIS
>    >>>    >          shows
>    >>>    >          a warning regarding a specific object and ArcGIS got frozen.
>    >>>    >          I
>    >>>    >          guess it’s because the PC I used doesn’t have enough
>    >>>    >          capacity.
>    >>>    >          Unfortunately, I don’t have access to a more powerful one.
>    >>>    >
>    >>>    >          Anyway, I decided to try with R -after all, I’ll perform my
>    >>>    >          analysis with it. So I started exploring this GDB with
>    >>>    >          rgdal:
>    >>>    >
>    >>>    >          ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>    >>>    >
>    >>>    >          Source:
>    >>>    >          "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>    >>>    >          layer: "clc12_Version_18_5"
>    >>>    >          Driver: OpenFileGDB;
>    >>>    >          number of rows: 2370829
>    >>>    >          Feature type: wkbPolygon with 3 dimensions
>    >>>    >          Extent: (-2693292 -3086662) - (10037210 5440568)
>    >>>    >          Null geometry IDs: 2156240
>    >>>    >          CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
>    >>>    >          +y_0=3210000
>    >>>    >          +ellps=GRS80 +units=m +no_defs
>    >>>    >          Number of fields: 6
>    >>>    >          name                      type  length  typeName
>    >>>    >          1 code_12              4       3           String
>    >>>    >          2 ID                         4       18         String
>    >>>    >          3 Remark               4       20         String
>    >>>    >          4 Area_Ha             2       0           Real
>    >>>    >          5 Shape_Length   2       0           Real
>    >>>    >          6 Shape_Area       2       0           Real
>    >>>    >
>    >>>    >          Then I tried to load it typing
>    >>>    >
>    >>>    >          clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>    >>>    >
>    >>>    >          After 3-5 minutes, it appears the following text:
>    >>>    >
>    >>>    >          OGR data source with driver: OpenFileGDB
>    >>>    >          Source:
>    >>>    >          "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
>    >>>    >          layer:
>    >>>    >          "clc12_Version_18_5"
>    >>>    >          with 2370829 features
>    >>>    >          It has 6 fields
>    >>>    >
>    >>>    >          Unfortunately, after trying 5 times (each one took around 8
>    >>>    >          hours) I couldn’t get anything but the following messages:
>    >>>    >
>    >>>    >          Warning messages:
>    >>>    >          1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>    >>>    >          "clc12_Version_18_5") :
>    >>>    >          Dropping null geometries: 2156240
>    >>>    >          2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>    >>>    >          "clc12_Version_18_5") :
>    >>>    >          Z-dimension discarded
>    >>>    >
>    >>>    >          Could anyone advise me how to tackle it? I also would
>    >>>    >          appreciate suggestions on how to work with geodatabases -
>    >>>    >          it’s
>    >>>    >          my first time I work with these kind of files so I don’t
>    >>>    >          even
>    >>>    >          know their structure.
>    >>>    >
>    >>>    >          By the way, these are my computer and software
>    >>>    >          specifications:
>    >>>    >
>    >>>    >          MacBook Pro 2012
>    >>>    >          Processor 2.6 GHz Intel Core i7
>    >>>    >          Memory 16 GB DDR3
>    >>>    >          R 3.5.0
>    >>>    >          RStudio 1.1.447
>    >>>    >
>    >>>    >
>    >>>    >          Best,
>    >>>    >          Roman.
>    >>>    >
>    >>>    >
>    >>>    >
>    >>>    >
>    >>>    >   _______________________________________________
>    >>>    >   R-sig-Geo mailing list
>    >>>    >   [hidden email]
>    >>>    >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>    >>>
>    >>>      --
>    >>>      Roger Bivand
>    >>>      Department of Economics, Norwegian School of Economics,
>    >>>      Helleveien 30, N-5045 Bergen, Norway.
>    >>>      voice: +47 55 95 93 55; e-mail: [hidden email]
>    >>>      http://orcid.org/0000-0003-2392-6140
>    >>>      https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>    >>>
>    >>>
>    >>
>    >>
>    >
>    >
>
>    --
>    Roger Bivand
>    Department of Economics, Norwegian School of Economics,
>    Helleveien 30, N-5045 Bergen, Norway.
>    voice: +47 55 95 93 55; e-mail: [hidden email]
>    http://orcid.org/0000-0003-2392-6140
>    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Aguirre Perez, Roman


On 07/05/2018, 08:36, "Roger Bivand" <[hidden email]> wrote:

    On Sun, 6 May 2018, Aguirre Perez, Roman wrote:
   
    > On 05/05/2018, 12:03, "Roger Bivand" <[hidden email]> wrote:
    >
    >    On Fri, 4 May 2018, Roger Bivand wrote:
    >
    >    > On Thu, 3 May 2018, Roger Bivand wrote:
    >    >
    >    >>  On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
    >    >>
    >    >>>   Hi again,
    >    >>>
    >    >>>   first of all, thanks a lot for your comments. It's becoming quite
    >    >>>   interesting how to perform this task with such amount of data.
    >    >>>
    >    >>>
    >    >>>   Here is an update...
    >    >>>
    >    >>>   Cotton, I could read the geodatabase by using the commands I shared.
    >    >>>   However, my R session crashed after overlaying it with a small set of
    >    >>>   polygons. I guess it was because the size of the sp object (around 14.5
    >    >>>   GB). It's also worth mentioning that it was just a "multipolygon" (is
    >    >>>   that
    >    >>>   the correct word?) formed by 2370829 polygons. I haven’t succeeded on
    >    >>>   installing the sf package, but I will keep trying it.
    >    >>>
    >    >>>   Melanie, I already downloaded and read the geoTiff version. At first
    >    >>>   sight, it seems that this object doesn’t have enough features as to
    >    >>>   perform the overlying. I might have a biased idea of how raster objects
    >    >>>   work - I'm too stick to the representation of shapefiles which sounds
    >    >>>   quite related with Roger's idea. So I will start to explore it in order
    >    >>>   to
    >    >>>   gain a bit more understanding on it.
    >    >>>
    >    >>>   Roger, I certainly need to know which category is associated with each
    >    >>>   CLC
    >    >>>   polygon in order to compute the area covered by each of these classes
    >    >>>   within each NUTS3 region. Therefore, I still need to use a vector-vector
    >    >>>   overlay, right?
    >    >>
    >    >>  On the contrary. The geoTiffs are coded as described in the documentation.
    >    >>  Your output is simply the count of raster cells by NUTS3 for each of the
    >    >>  categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
    >    >>  a lot of sea and no data because of the French overseas territories.
    >    >>
    >    >>  You could possibly use PostGIS to do the intersections, or use GRASS for
    >    >>  raster-vector counts (say through rgrass7). In R, you would want to add a
    >    >>  NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
    >    >>
    >    >>  I would suggest using GRASS as the most obvious route, reading the raster,
    >    >>  reading the NUTS3 boundaries into a separate location, projecting to LAEA,
    >    >>  then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
    >    >>  get 32K output categories, the 48 corine categories times the count of
    >    >>  NUTS3 regions minus the nulls (there aren't many glaciers in most
    >    >>  regions). You'll then need to match back the Corine codes and the NUTS3
    >    >>  codes - see in the category label file shown by r.category.
    >    >>
    >    >>  I'll try to provide code tomorrow.
    >    >
    >    > Combining R and GRASS seems to work, but no guarantees.
    >    >
    >    > library(sp)
    >    > library(rgdal)
    >    > Gi <- GDALinfo("g250_clc12_V18_5.tif")
    >    > makeSG <- function(x) {
    >    >   stopifnot(class(x) == "GDALobj")
    >    >   p4 <- attr(x, "projection")
    >    >   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
    >    >     c(x[2], x[1]))
    >    >  SpatialGrid(gt, CRS(p4))
    >    > }
    >    > SG <- makeSG(Gi)
    >    > nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
    >    > nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
    >    > library(rgrass7)
    >    > td <- tempdir()
    >    > iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
    >    > # your GRASS will be where you installed it
    >    > writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
    >    > execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
    >    >  flags="o")
    >    > execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
    >    >  label_column="FID")
    >    > execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
    >    > r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
    >    >  intern=TRUE)
    >    > r_stats1 <- gsub("\\*", "NA", r_stats0)
    >    > con_stats <- textConnection(r_stats1)
    >    > stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
    >    >  "area"), colClasses=c("integer", "numeric"))
    >    > close(con_stats)
    >    > r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
    >    > r_cats1 <- gsub(";", "", r_cats0)
    >    > r_cats2 <- gsub("\t", " ", r_cats1)
    >    > r_cats3 <- gsub("no data", "no_data", r_cats2)
    >    > r_cats4 <- gsub("category ", "", r_cats3)
    >    > r_cats4[1] <- paste0(r_cats4[1], "NA NA")
    >    > r_cats_split <- strsplit(r_cats4, " ")
    >    > cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
    >    >   nuts=sapply(r_cats_split, "[", 2),
    >    >   corine=as.integer(sapply(r_cats_split, "[", 3)))
    >    > catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
    >    > agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
    >    >  sum)
    >    > library(foreign)
    >    > corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
    >    > o <- match(colnames(agg_areas), as.character(corine_labels$Value))
    >    > colnames(agg_areas) <- corine_labels$LABEL3[o]
    >    > agg_areas_df <- as.data.frame(agg_areas)
    >    > agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
    >    >   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"
    >    >
    >    > This should be ready to merge with the NUTS3 boundaries, if needed.
    >    >
    >    > agg_areas_df1$FID <- row.names(agg_areas_df1)
    >    > nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
    >    >
    >
    >    And to write out as GPKG:
    >
    >    names(nuts_corine)[1] <- "FID_"
    >    writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")
    >
    >    It seems that at least this driver treats "FID" as a reserved field name.
    >
    >    > For the vector parts you could use sf and the provisional rgrass7sf on
    >    > github, but that wouldn't yet let you construct a skeleton SpatialGrid to
    >    > define the GRASS location. Using GRASS for the heavy lifting (the raster is
    >    > 51000 by 35000), and avoiding vector for overlay, this doesn't need much
    >    > memory (GRASS handles rasters by row). The GRASS temporary location only
    >    > takes 130MB of disk space. You could go for the 100m raster resolution, but I
    >    > doubt that the outcome would vary much - anyone like to try?
    >    >
    >    > If the sub-polygons by NUTS and corine categories are actually needed, the
    >    > output of r.cross could be passed to r.to.vect:
    >    >
    >    > execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
    >    >   type="area")
    >    >
    >    > but this is more demanding in memory terms.
    >
    > Hi Roger,
    >
    > thanks a lot again for helping me with this issue.
    >
    >
    > Unfortunately, I haven't managed to call GRASS within R, neither install
    > sf. My guess is that those issues are because I'm using both newest
    > versions OS X High Sierra and R 3.5.0 which are not stable yet.
   
    You do not need sf. You do need GRASS, and, yes, OSX causes a lot of
    problems because it is no longer attentive to the needs of numerical
    analysts (certification of installs mandated at cost to FOSS communities;
    gaping empty root password; only interested in pay-for software; never
    contributes to FOSS AFAIK). Apple does not contribute to R.

Ah, I got it!
   
    >
    >    Reading into R is not possible, object too large.
    >
    > Bearing in mind the previous issues plus this last comment, I decided to
    > use a PC with another OS (Ubuntu) aiming at solving them and working
    > with a bit more space (32 GB).
   
    Wrong comment. The vector corine object is impossible to handle anywhere,
    and probably should never have been constructed. I really doubt that it is
    more "accurate" than the 100m raster, and doubt that the areas sums by
    NUTS3 region differ much between 250m and 100m.


   
    You need to go back to the way in which Corine was derived, and to your
    actual needs (which you have not described). If what you need are areas of
    Corine classes by NUTS3 region, the code I provided gives you this. If
    what you need is the full intersection of Corine classes and all NUTS3
    regions, then you cannot visualize them (without zooming - rendering is
    raster anyway) nor can you analyze them in any meaningful way, so you'd
    need to revisit you research question. An intersection of a subset of
    NUTS3 regions yielding raster patches or vector entities is possible and
    could be visualized, but you seem to want the full extent of the NUTS3
    regions.

My ultimate goal is to compute population within a buffer by downscaling the NUTS3 population on the CLC resolution using algorithms like in Gallego (2010): A population density grid of the European Union. One of them is based on a regression model so , yes, I need to compute the areas of CLC classes within each NUTS3 region. Thanks a lot again for the code. Regarding visualization, I'm planning to convey results at NUTS3 level.
   
    If you need the intersection between one/few NUTS3 region(s) and Corine,
    you can still use the raster approach, but I don't know PostGIS well
    enough to advise (forthcoming R Journal article;
    https://cran.r-project.org/package=rpostgis). You can read the NUTS3
    boundaries into R, but probably neither of the Corine vector files (not
    just memory, you mentioned earlier that they suffered from topological
    issues which would prevent intersection without intensive cleaning.

At some point I might need to implement it on a lower level - within a country for instance - so I will keep it on the lookout. I also will have a look on PostGIS.
   
    Using GRASS, you get the best of both worlds, script in R and run the
    analysis at scale in GRASS using memory-conserving implementations.

Hot tip!

Roman.
   
    Roger
   
    >
    >    Roger
    >
    > In the meantime, I really would appreciate you guidance for gaining
    > background on this subject. Could you please suggest me resources to
    > better understand the features of objects involved in Spatial
    > Statistics? I should mention that the knowledge I have about it is
    > resumed in your book Applied Spatial Data Analysis with R.
    >
    >
    > Best,
    > Roman.
    >
    >    >
    >    > Interesting case, and it does show that combining GIS and R delivers the
    >    > goods - SAGA would probably work equivalently.
    >    >
    >    > Roger
    >    >
    >    >>
    >    >>  Roger
    >    >>
    >    >>>
    >    >>>   I really appreciate any feedback in advance as well as details that I
    >    >>>   should take into account to understand more about how to work with this
    >    >>>   kind of data. I will also keep you up to date on how it goes if you
    >    >>>   like.
    >    >>>
    >    >>>
    >    >>>   Best,
    >    >>>   Roman.
    >    >>>
    >    >>>   On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:
    >    >>>
    >    >>>      On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
    >    >>>
    >    >>>    >   Hello Shaun,
    >    >>>    >
    >    >>>    >   Thanks a lot for replying and providing me alternative options.
    >    >>>    >
    >    >>>    >
    >    >>>    >   Unfortunately, I can't try both options anymore as I ran out of
    >    >>>    >   ArcGIS
    >    >>>    >   due to I was using a university PC which is not available now (I
    >    >>>    >   installed an ArcGIS trial version there), but I'll try it later. I
    >    >>>    >   also
    >    >>>    >   failed on installing the sf package - I'll dig a bit more on it.
    >    >>>    >   Nevertheless, I already downloaded the SpatialLite version so I'll
    >    >>>    >   try
    >    >>>    >   the second option and I'll let you know how it goes.
    >    >>>
    >    >>>      I think Melanie got it right - the apparent extra detail and
    >    >>>      precision
    >    >>>      you'd get from vector-vector overlay is illusory, so going with
    >    >>>      geoTiff
    >    >>>      should get you there (you can check to see whether 100m resolution
    >    >>>      differs
    >    >>>      from 250m). The only reason to choose vector-vector would be that the
    >    >>>      Corine vector contains categories not represented in the raster
    >    >>>      version.
    >    >>>      Using raster also steps around the polygon deficiencies in the GDB
    >    >>>      (and
    >    >>>      probably SQLite) representations.
    >    >>>
    >    >>>      Roger
    >    >>>
    >    >>>    >
    >    >>>    >
    >    >>>    >   Regards,
    >    >>>    >   Roman.
    >    >>>    >
    >    >>>    >   On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]>
    >    >>>    >   wrote:
    >    >>>    >
    >    >>>    >      Hello Roman,
    >    >>>    >
    >    >>>    >      A couple of suggested options: You can try and use the
    >    >>>    >      arcgisbinding package [1] to pull the data directly into R via
    >    >>>    >      ArcGIS, as you mentioned you have ArcGIS available [presumably
    >    >>>    >      on
    >    >>>    >      Windows or in a VM]. It will access the data directly, and let
    >    >>>    >      you
    >    >>>    >      create sp and sf objects out of Geodatabases with little work. A
    >    >>>    >      basic workflow looks like this:
    >    >>>    >
    >    >>>    >      d <- arc.open("path/file.gdb/layer_name")
    >    >>>    >      df <- arc.select(d) # here, you can filter columns and
    >    >>>    >      attributes,
    >    >>>    >      see [2]
    >    >>>    >      # create an sf object
    >    >>>    >      df.sf <- arc.data2sf(df)
    >    >>>    >      # alternatively, create an sp object
    >    >>>    >      df.sp <- arc.data2sp(df)
    >    >>>    >
    >    >>>    >      You can also write the results back to a geodatabase, or any
    >    >>>    >      other
    >    >>>    >      format that ArcGIS understands. Another option is trying the
    >    >>>    >      SpatiaLite version of the data at the URL you posted. You should
    >    >>>    >      be able to access this using R directly, provided your rgdal
    >    >>>    >      installation is correctly built to read spatialite databases. If
    >    >>>    >      it is, try the same process you mentioned using rgdal, but point
    >    >>>    >      it at the SpatialLite database instead. You could also use
    >    >>>    >      command-line OGR to convert the data, that has a few options
    >    >>>    >      like
    >    >>>    >      where clause filtering that aren't directly available via
    >    >>>    >      readOGR.
    >    >>>    >
    >    >>>    >      If neither of these options work, let me know and I can convert
    >    >>>    >      the data for you into a format of your preference.
    >    >>>    >
    >    >>>    >      Cheers,
    >    >>>    >      Shaun
    >    >>>    >
    >    >>>    >      1.
    >    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
    >    >>>    >      2.
    >    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
    >    >>>    >
    >    >>>    >
    >    >>>    >
    >    >>>    >      On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
    >    >>>    >      Roman"
    >    >>>    >      <[hidden email] on behalf of
    >    >>>    >      [hidden email]>
    >    >>>    >      wrote:
    >    >>>    >
    >    >>>    >          Hi everyone,
    >    >>>    >
    >    >>>    >          I’ve been struggling with a ESRI Geodatabase file
    >    >>>    >          (clc12_Version_18_5.gdb) which is a layer of land cover
    >    >>>    >          classes available on
    >    >>>    >
    >    >>>    >          https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
    >    >>>    >
    >    >>>    >          whose size is 2.61 GB once unzipped.
    >    >>>    >
    >    >>>    >          My ultimate task is to overlay it with the last NUTS3
    >    >>>    >          administrative boundaries shapefile (2013) available on
    >    >>>    >
    >    >>>    >          https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
    >    >>>    >
    >    >>>    >          in order to compute the area covered by each class within
    >    >>>    >          each
    >    >>>    >          NUTS3 region.
    >    >>>    >
    >    >>>    >          Despite the ease of friendly software for performing this
    >    >>>    >          task, haven’t been capable of doing it - GRASS didn’t load
    >    >>>    >          the
    >    >>>    >          file as the log reports problems with the polygons, QGIS
    >    >>>    >          shows
    >    >>>    >          a warning regarding a specific object and ArcGIS got frozen.
    >    >>>    >          I
    >    >>>    >          guess it’s because the PC I used doesn’t have enough
    >    >>>    >          capacity.
    >    >>>    >          Unfortunately, I don’t have access to a more powerful one.
    >    >>>    >
    >    >>>    >          Anyway, I decided to try with R -after all, I’ll perform my
    >    >>>    >          analysis with it. So I started exploring this GDB with
    >    >>>    >          rgdal:
    >    >>>    >
    >    >>>    >          ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >    >>>    >
    >    >>>    >          Source:
    >    >>>    >          "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
    >    >>>    >          layer: "clc12_Version_18_5"
    >    >>>    >          Driver: OpenFileGDB;
    >    >>>    >          number of rows: 2370829
    >    >>>    >          Feature type: wkbPolygon with 3 dimensions
    >    >>>    >          Extent: (-2693292 -3086662) - (10037210 5440568)
    >    >>>    >          Null geometry IDs: 2156240
    >    >>>    >          CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
    >    >>>    >          +y_0=3210000
    >    >>>    >          +ellps=GRS80 +units=m +no_defs
    >    >>>    >          Number of fields: 6
    >    >>>    >          name                      type  length  typeName
    >    >>>    >          1 code_12              4       3           String
    >    >>>    >          2 ID                         4       18         String
    >    >>>    >          3 Remark               4       20         String
    >    >>>    >          4 Area_Ha             2       0           Real
    >    >>>    >          5 Shape_Length   2       0           Real
    >    >>>    >          6 Shape_Area       2       0           Real
    >    >>>    >
    >    >>>    >          Then I tried to load it typing
    >    >>>    >
    >    >>>    >          clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
    >    >>>    >
    >    >>>    >          After 3-5 minutes, it appears the following text:
    >    >>>    >
    >    >>>    >          OGR data source with driver: OpenFileGDB
    >    >>>    >          Source:
    >    >>>    >          "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
    >    >>>    >          layer:
    >    >>>    >          "clc12_Version_18_5"
    >    >>>    >          with 2370829 features
    >    >>>    >          It has 6 fields
    >    >>>    >
    >    >>>    >          Unfortunately, after trying 5 times (each one took around 8
    >    >>>    >          hours) I couldn’t get anything but the following messages:
    >    >>>    >
    >    >>>    >          Warning messages:
    >    >>>    >          1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
    >    >>>    >          "clc12_Version_18_5") :
    >    >>>    >          Dropping null geometries: 2156240
    >    >>>    >          2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
    >    >>>    >          "clc12_Version_18_5") :
    >    >>>    >          Z-dimension discarded
    >    >>>    >
    >    >>>    >          Could anyone advise me how to tackle it? I also would
    >    >>>    >          appreciate suggestions on how to work with geodatabases -
    >    >>>    >          it’s
    >    >>>    >          my first time I work with these kind of files so I don’t
    >    >>>    >          even
    >    >>>    >          know their structure.
    >    >>>    >
    >    >>>    >          By the way, these are my computer and software
    >    >>>    >          specifications:
    >    >>>    >
    >    >>>    >          MacBook Pro 2012
    >    >>>    >          Processor 2.6 GHz Intel Core i7
    >    >>>    >          Memory 16 GB DDR3
    >    >>>    >          R 3.5.0
    >    >>>    >          RStudio 1.1.447
    >    >>>    >
    >    >>>    >
    >    >>>    >          Best,
    >    >>>    >          Roman.
    >    >>>    >
    >    >>>    >
    >    >>>    >
    >    >>>    >
    >    >>>    >   _______________________________________________
    >    >>>    >   R-sig-Geo mailing list
    >    >>>    >   [hidden email]
    >    >>>    >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
    >    >>>
    >    >>>      --
    >    >>>      Roger Bivand
    >    >>>      Department of Economics, Norwegian School of Economics,
    >    >>>      Helleveien 30, N-5045 Bergen, Norway.
    >    >>>      voice: +47 55 95 93 55; e-mail: [hidden email]
    >    >>>      http://orcid.org/0000-0003-2392-6140
    >    >>>      https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
    >    >>>
    >    >>>
    >    >>
    >    >>
    >    >
    >    >
    >
    >    --
    >    Roger Bivand
    >    Department of Economics, Norwegian School of Economics,
    >    Helleveien 30, N-5045 Bergen, Norway.
    >    voice: +47 55 95 93 55; e-mail: [hidden email]
    >    http://orcid.org/0000-0003-2392-6140
    >    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
    >
    >
   
    --
    Roger Bivand
    Department of Economics, Norwegian School of Economics,
    Helleveien 30, N-5045 Bergen, Norway.
    voice: +47 55 95 93 55; e-mail: [hidden email]
    http://orcid.org/0000-0003-2392-6140
    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Issues with a GDB file in R?

Roger Bivand
Administrator
On Mon, 7 May 2018, Aguirre Perez, Roman wrote:

>
>
> On 07/05/2018, 08:36, "Roger Bivand" <[hidden email]> wrote:
>
>    On Sun, 6 May 2018, Aguirre Perez, Roman wrote:
>
>    > On 05/05/2018, 12:03, "Roger Bivand" <[hidden email]> wrote:
>    >
>    >    On Fri, 4 May 2018, Roger Bivand wrote:
>    >
>    >    > On Thu, 3 May 2018, Roger Bivand wrote:
>    >    >
>    >    >>  On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>    >    >>
>    >    >>>   Hi again,
>    >    >>>
>    >    >>>   first of all, thanks a lot for your comments. It's becoming quite
>    >    >>>   interesting how to perform this task with such amount of data.
>    >    >>>
>    >    >>>
>    >    >>>   Here is an update...
>    >    >>>
>    >    >>>   Cotton, I could read the geodatabase by using the commands I shared.
>    >    >>>   However, my R session crashed after overlaying it with a small set of
>    >    >>>   polygons. I guess it was because the size of the sp object (around 14.5
>    >    >>>   GB). It's also worth mentioning that it was just a "multipolygon" (is
>    >    >>>   that
>    >    >>>   the correct word?) formed by 2370829 polygons. I haven’t succeeded on
>    >    >>>   installing the sf package, but I will keep trying it.
>    >    >>>
>    >    >>>   Melanie, I already downloaded and read the geoTiff version. At first
>    >    >>>   sight, it seems that this object doesn’t have enough features as to
>    >    >>>   perform the overlying. I might have a biased idea of how raster objects
>    >    >>>   work - I'm too stick to the representation of shapefiles which sounds
>    >    >>>   quite related with Roger's idea. So I will start to explore it in order
>    >    >>>   to
>    >    >>>   gain a bit more understanding on it.
>    >    >>>
>    >    >>>   Roger, I certainly need to know which category is associated with each
>    >    >>>   CLC
>    >    >>>   polygon in order to compute the area covered by each of these classes
>    >    >>>   within each NUTS3 region. Therefore, I still need to use a vector-vector
>    >    >>>   overlay, right?
>    >    >>
>    >    >>  On the contrary. The geoTiffs are coded as described in the documentation.
>    >    >>  Your output is simply the count of raster cells by NUTS3 for each of the
>    >    >>  categories. The geoTiff is in ETRS_1989_LAEA, so is projected; it includes
>    >    >>  a lot of sea and no data because of the French overseas territories.
>    >    >>
>    >    >>  You could possibly use PostGIS to do the intersections, or use GRASS for
>    >    >>  raster-vector counts (say through rgrass7). In R, you would want to add a
>    >    >>  NUTS3 ID band to the land cover raster, then aggregate by NUTS3 ID.
>    >    >>
>    >    >>  I would suggest using GRASS as the most obvious route, reading the raster,
>    >    >>  reading the NUTS3 boundaries into a separate location, projecting to LAEA,
>    >    >>  then rasterising the NUTS3 regions (v.to.rast) and running r.cross. You
>    >    >>  get 32K output categories, the 48 corine categories times the count of
>    >    >>  NUTS3 regions minus the nulls (there aren't many glaciers in most
>    >    >>  regions). You'll then need to match back the Corine codes and the NUTS3
>    >    >>  codes - see in the category label file shown by r.category.
>    >    >>
>    >    >>  I'll try to provide code tomorrow.
>    >    >
>    >    > Combining R and GRASS seems to work, but no guarantees.
>    >    >
>    >    > library(sp)
>    >    > library(rgdal)
>    >    > Gi <- GDALinfo("g250_clc12_V18_5.tif")
>    >    > makeSG <- function(x) {
>    >    >   stopifnot(class(x) == "GDALobj")
>    >    >   p4 <- attr(x, "projection")
>    >    >   gt <- GridTopology(c(x[4]+(x[6]/2), x[5]+(x[7]/2)), c(x[6], x[7]),
>    >    >     c(x[2], x[1]))
>    >    >  SpatialGrid(gt, CRS(p4))
>    >    > }
>    >    > SG <- makeSG(Gi)
>    >    > nuts_ll <- readOGR("NUTS_RG_01M_2013_4326_LEVL_3.shp")
>    >    > nuts_laea <- spTransform(nuts_ll, CRS(attr(Gi, "projection")))
>    >    > library(rgrass7)
>    >    > td <- tempdir()
>    >    > iG <- initGRASS("/home/rsb/topics/grass/g740/grass-7.4.0", td, SG)
>    >    > # your GRASS will be where you installed it
>    >    > writeVECT(nuts_laea, "nuts", v.in.ogr_flags="o")
>    >    > execGRASS("r.in.gdal", input="g250_clc12_V18_5.tif", output="corine",
>    >    >  flags="o")
>    >    > execGRASS("v.to.rast", input="nuts", output="nuts3", use="cat",
>    >    >  label_column="FID")
>    >    > execGRASS("r.cross", input="nuts3,corine", output="cross_nuts3")
>    >    > r_stats0 <- execGRASS("r.stats", input="cross_nuts3", flags="a",
>    >    >  intern=TRUE)
>    >    > r_stats1 <- gsub("\\*", "NA", r_stats0)
>    >    > con_stats <- textConnection(r_stats1)
>    >    > stats <- read.table(con_stats, header=FALSE, col.names=c("cross_cat",
>    >    >  "area"), colClasses=c("integer", "numeric"))
>    >    > close(con_stats)
>    >    > r_cats0 <- execGRASS("r.category", map="cross_nuts3", intern=TRUE)
>    >    > r_cats1 <- gsub(";", "", r_cats0)
>    >    > r_cats2 <- gsub("\t", " ", r_cats1)
>    >    > r_cats3 <- gsub("no data", "no_data", r_cats2)
>    >    > r_cats4 <- gsub("category ", "", r_cats3)
>    >    > r_cats4[1] <- paste0(r_cats4[1], "NA NA")
>    >    > r_cats_split <- strsplit(r_cats4, " ")
>    >    > cats <- data.frame(cross_cat=as.integer(sapply(r_cats_split, "[", 1)),
>    >    >   nuts=sapply(r_cats_split, "[", 2),
>    >    >   corine=as.integer(sapply(r_cats_split, "[", 3)))
>    >    > catstats <- merge(cats, stats, by="cross_cat", all=TRUE)
>    >    > agg_areas <- tapply(catstats$area, list(catstats$nuts, catstats$corine),
>    >    >  sum)
>    >    > library(foreign)
>    >    > corine_labels <- read.dbf("g250_clc12_V18_5.tif.vat.dbf", as.is=TRUE)
>    >    > o <- match(colnames(agg_areas), as.character(corine_labels$Value))
>    >    > colnames(agg_areas) <- corine_labels$LABEL3[o]
>    >    > agg_areas_df <- as.data.frame(agg_areas)
>    >    > agg_areas_df1 <- agg_areas_df[-which(!(row.names(agg_areas_df) %in%
>    >    >   as.character(nuts_ll$FID))),] # dropping "NA"      "no_data"
>    >    >
>    >    > This should be ready to merge with the NUTS3 boundaries, if needed.
>    >    >
>    >    > agg_areas_df1$FID <- row.names(agg_areas_df1)
>    >    > nuts_corine <- merge(nuts_laea, agg_areas_df1, by="FID")
>    >    >
>    >
>    >    And to write out as GPKG:
>    >
>    >    names(nuts_corine)[1] <- "FID_"
>    >    writeOGR(nuts_corine, "nuts_corine.gpkg", layer="corine", driver="GPKG")
>    >
>    >    It seems that at least this driver treats "FID" as a reserved field name.
>    >
>    >    > For the vector parts you could use sf and the provisional rgrass7sf on
>    >    > github, but that wouldn't yet let you construct a skeleton SpatialGrid to
>    >    > define the GRASS location. Using GRASS for the heavy lifting (the raster is
>    >    > 51000 by 35000), and avoiding vector for overlay, this doesn't need much
>    >    > memory (GRASS handles rasters by row). The GRASS temporary location only
>    >    > takes 130MB of disk space. You could go for the 100m raster resolution, but I
>    >    > doubt that the outcome would vary much - anyone like to try?
>    >    >
>    >    > If the sub-polygons by NUTS and corine categories are actually needed, the
>    >    > output of r.cross could be passed to r.to.vect:
>    >    >
>    >    > execGRASS("r.to.vect", input="cross_nuts3", output="cross_nuts3",
>    >    >   type="area")
>    >    >
>    >    > but this is more demanding in memory terms.
>    >
>    > Hi Roger,
>    >
>    > thanks a lot again for helping me with this issue.
>    >
>    >
>    > Unfortunately, I haven't managed to call GRASS within R, neither install
>    > sf. My guess is that those issues are because I'm using both newest
>    > versions OS X High Sierra and R 3.5.0 which are not stable yet.
>
>    You do not need sf. You do need GRASS, and, yes, OSX causes a lot of
>    problems because it is no longer attentive to the needs of numerical
>    analysts (certification of installs mandated at cost to FOSS communities;
>    gaping empty root password; only interested in pay-for software; never
>    contributes to FOSS AFAIK). Apple does not contribute to R.
>
> Ah, I got it!
>
>    >
>    >    Reading into R is not possible, object too large.
>    >
>    > Bearing in mind the previous issues plus this last comment, I decided to
>    > use a PC with another OS (Ubuntu) aiming at solving them and working
>    > with a bit more space (32 GB).
>
>    Wrong comment. The vector corine object is impossible to handle anywhere,
>    and probably should never have been constructed. I really doubt that it is
>    more "accurate" than the 100m raster, and doubt that the areas sums by
>    NUTS3 region differ much between 250m and 100m.
>
>
>
>    You need to go back to the way in which Corine was derived, and to your
>    actual needs (which you have not described). If what you need are areas of
>    Corine classes by NUTS3 region, the code I provided gives you this. If
>    what you need is the full intersection of Corine classes and all NUTS3
>    regions, then you cannot visualize them (without zooming - rendering is
>    raster anyway) nor can you analyze them in any meaningful way, so you'd
>    need to revisit you research question. An intersection of a subset of
>    NUTS3 regions yielding raster patches or vector entities is possible and
>    could be visualized, but you seem to want the full extent of the NUTS3
>    regions.
>
> My ultimate goal is to compute population within a buffer by downscaling
> the NUTS3 population on the CLC resolution using algorithms like in
> Gallego (2010): A population density grid of the European Union. One of
> them is based on a regression model so , yes, I need to compute the
> areas of CLC classes within each NUTS3 region. Thanks a lot again for
> the code. Regarding visualization, I'm planning to convey results at
> NUTS3 level.
Then staying with raster will be sensible, just interpolate the population
counts to the raster resolution you want (wouldn't the Gallego 1ha be very
like the Corine 100m resolution?). If you avoid vector for everything
apart from rasterising the NUTS3 boundaries, you should be OK. See
https://doi.org/10.1371/journal.pone.0174993 for a US dasymetric example -
and see http://sil.uc.edu/webapps/socscape_usa/ for the online tool.

Roger

>
>    If you need the intersection between one/few NUTS3 region(s) and Corine,
>    you can still use the raster approach, but I don't know PostGIS well
>    enough to advise (forthcoming R Journal article;
>    https://cran.r-project.org/package=rpostgis). You can read the NUTS3
>    boundaries into R, but probably neither of the Corine vector files (not
>    just memory, you mentioned earlier that they suffered from topological
>    issues which would prevent intersection without intensive cleaning.
>
> At some point I might need to implement it on a lower level - within a
> country for instance - so I will keep it on the lookout. I also will
> have a look on PostGIS.
>
>    Using GRASS, you get the best of both worlds, script in R and run the
>    analysis at scale in GRASS using memory-conserving implementations.
>
> Hot tip!
>
> Roman.
>
>    Roger
>
>    >
>    >    Roger
>    >
>    > In the meantime, I really would appreciate you guidance for gaining
>    > background on this subject. Could you please suggest me resources to
>    > better understand the features of objects involved in Spatial
>    > Statistics? I should mention that the knowledge I have about it is
>    > resumed in your book Applied Spatial Data Analysis with R.
>    >
>    >
>    > Best,
>    > Roman.
>    >
>    >    >
>    >    > Interesting case, and it does show that combining GIS and R delivers the
>    >    > goods - SAGA would probably work equivalently.
>    >    >
>    >    > Roger
>    >    >
>    >    >>
>    >    >>  Roger
>    >    >>
>    >    >>>
>    >    >>>   I really appreciate any feedback in advance as well as details that I
>    >    >>>   should take into account to understand more about how to work with this
>    >    >>>   kind of data. I will also keep you up to date on how it goes if you
>    >    >>>   like.
>    >    >>>
>    >    >>>
>    >    >>>   Best,
>    >    >>>   Roman.
>    >    >>>
>    >    >>>   On 03/05/2018, 12:26, "Roger Bivand" <[hidden email]> wrote:
>    >    >>>
>    >    >>>      On Thu, 3 May 2018, Aguirre Perez, Roman wrote:
>    >    >>>
>    >    >>>    >   Hello Shaun,
>    >    >>>    >
>    >    >>>    >   Thanks a lot for replying and providing me alternative options.
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >   Unfortunately, I can't try both options anymore as I ran out of
>    >    >>>    >   ArcGIS
>    >    >>>    >   due to I was using a university PC which is not available now (I
>    >    >>>    >   installed an ArcGIS trial version there), but I'll try it later. I
>    >    >>>    >   also
>    >    >>>    >   failed on installing the sf package - I'll dig a bit more on it.
>    >    >>>    >   Nevertheless, I already downloaded the SpatialLite version so I'll
>    >    >>>    >   try
>    >    >>>    >   the second option and I'll let you know how it goes.
>    >    >>>
>    >    >>>      I think Melanie got it right - the apparent extra detail and
>    >    >>>      precision
>    >    >>>      you'd get from vector-vector overlay is illusory, so going with
>    >    >>>      geoTiff
>    >    >>>      should get you there (you can check to see whether 100m resolution
>    >    >>>      differs
>    >    >>>      from 250m). The only reason to choose vector-vector would be that the
>    >    >>>      Corine vector contains categories not represented in the raster
>    >    >>>      version.
>    >    >>>      Using raster also steps around the polygon deficiencies in the GDB
>    >    >>>      (and
>    >    >>>      probably SQLite) representations.
>    >    >>>
>    >    >>>      Roger
>    >    >>>
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >   Regards,
>    >    >>>    >   Roman.
>    >    >>>    >
>    >    >>>    >   On 02/05/2018, 18:53, "Shaun Walbridge" <[hidden email]>
>    >    >>>    >   wrote:
>    >    >>>    >
>    >    >>>    >      Hello Roman,
>    >    >>>    >
>    >    >>>    >      A couple of suggested options: You can try and use the
>    >    >>>    >      arcgisbinding package [1] to pull the data directly into R via
>    >    >>>    >      ArcGIS, as you mentioned you have ArcGIS available [presumably
>    >    >>>    >      on
>    >    >>>    >      Windows or in a VM]. It will access the data directly, and let
>    >    >>>    >      you
>    >    >>>    >      create sp and sf objects out of Geodatabases with little work. A
>    >    >>>    >      basic workflow looks like this:
>    >    >>>    >
>    >    >>>    >      d <- arc.open("path/file.gdb/layer_name")
>    >    >>>    >      df <- arc.select(d) # here, you can filter columns and
>    >    >>>    >      attributes,
>    >    >>>    >      see [2]
>    >    >>>    >      # create an sf object
>    >    >>>    >      df.sf <- arc.data2sf(df)
>    >    >>>    >      # alternatively, create an sp object
>    >    >>>    >      df.sp <- arc.data2sp(df)
>    >    >>>    >
>    >    >>>    >      You can also write the results back to a geodatabase, or any
>    >    >>>    >      other
>    >    >>>    >      format that ArcGIS understands. Another option is trying the
>    >    >>>    >      SpatiaLite version of the data at the URL you posted. You should
>    >    >>>    >      be able to access this using R directly, provided your rgdal
>    >    >>>    >      installation is correctly built to read spatialite databases. If
>    >    >>>    >      it is, try the same process you mentioned using rgdal, but point
>    >    >>>    >      it at the SpatialLite database instead. You could also use
>    >    >>>    >      command-line OGR to convert the data, that has a few options
>    >    >>>    >      like
>    >    >>>    >      where clause filtering that aren't directly available via
>    >    >>>    >      readOGR.
>    >    >>>    >
>    >    >>>    >      If neither of these options work, let me know and I can convert
>    >    >>>    >      the data for you into a format of your preference.
>    >    >>>    >
>    >    >>>    >      Cheers,
>    >    >>>    >      Shaun
>    >    >>>    >
>    >    >>>    >      1.
>    >    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_R-2DArcGIS_r-2Dbridge-2Dinstall&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=nRO2IG9dlrdKYGMSp3rCf7nwJqRFQoIWnY5zSFrPOQM&e=
>    >    >>>    >      2.
>    >    >>>    >      https://urldefense.proofpoint.com/v2/url?u=https-3A__rdrr.io_github_R-2DArcGIS_r-2Dbridge_man_arc.select.html&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=yckcjR6k3nADiVsNiAhGwcZB--0A8DQgvLSJ27upmyk&m=dkecE8PWOn5sSrtSXqX2R-VwJuFH_In4BvpC5da4LJ0&s=89N-cLe5N3dhZ6MTsM2OAYwmDImBH88ZrVaA5Hos0n4&e=
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >      On 5/2/18, 1:34 PM, "R-sig-Geo on behalf of Aguirre Perez,
>    >    >>>    >      Roman"
>    >    >>>    >      <[hidden email] on behalf of
>    >    >>>    >      [hidden email]>
>    >    >>>    >      wrote:
>    >    >>>    >
>    >    >>>    >          Hi everyone,
>    >    >>>    >
>    >    >>>    >          I’ve been struggling with a ESRI Geodatabase file
>    >    >>>    >          (clc12_Version_18_5.gdb) which is a layer of land cover
>    >    >>>    >          classes available on
>    >    >>>    >
>    >    >>>    >          https://urldefense.proofpoint.com/v2/url?u=https-3A__land.copernicus.eu_pan-2Deuropean_corine-2Dland-2Dcover_clc-2D2012-3Ftab-3Ddownload&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=9LaO4HiB5C5ewiuN_IeSQJSgq2tl5_-oMECPChtZa_U&e=
>    >    >>>    >
>    >    >>>    >          whose size is 2.61 GB once unzipped.
>    >    >>>    >
>    >    >>>    >          My ultimate task is to overlay it with the last NUTS3
>    >    >>>    >          administrative boundaries shapefile (2013) available on
>    >    >>>    >
>    >    >>>    >          https://urldefense.proofpoint.com/v2/url?u=http-3A__ec.europa.eu_eurostat_web_gisco_geodata_reference-2Ddata_administrative-2Dunits-2Dstatistical-2Dunits&d=DwIGaQ&c=n6-cguzQvX_tUIrZOS_4Og&r=YFaRLkcUCdDkLrpTbNOUV9J1CwYBCTMwgm5tdQkRSm4&m=qIGI8rXMP_JTDhSFz8NjjQMAyNAnpnUGFRBxqfH4bPU&s=VExYSoR8FY_vhZaPNJpoOx0ZCZNMU9hMTtlGJZj2joU&e=
>    >    >>>    >
>    >    >>>    >          in order to compute the area covered by each class within
>    >    >>>    >          each
>    >    >>>    >          NUTS3 region.
>    >    >>>    >
>    >    >>>    >          Despite the ease of friendly software for performing this
>    >    >>>    >          task, haven’t been capable of doing it - GRASS didn’t load
>    >    >>>    >          the
>    >    >>>    >          file as the log reports problems with the polygons, QGIS
>    >    >>>    >          shows
>    >    >>>    >          a warning regarding a specific object and ArcGIS got frozen.
>    >    >>>    >          I
>    >    >>>    >          guess it’s because the PC I used doesn’t have enough
>    >    >>>    >          capacity.
>    >    >>>    >          Unfortunately, I don’t have access to a more powerful one.
>    >    >>>    >
>    >    >>>    >          Anyway, I decided to try with R -after all, I’ll perform my
>    >    >>>    >          analysis with it. So I started exploring this GDB with
>    >    >>>    >          rgdal:
>    >    >>>    >
>    >    >>>    >          ogrInfo(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>    >    >>>    >
>    >    >>>    >          Source:
>    >    >>>    >          "/Users/Roman/Desktop/clc12gdb/clc12_Version_18_5.gdb",
>    >    >>>    >          layer: "clc12_Version_18_5"
>    >    >>>    >          Driver: OpenFileGDB;
>    >    >>>    >          number of rows: 2370829
>    >    >>>    >          Feature type: wkbPolygon with 3 dimensions
>    >    >>>    >          Extent: (-2693292 -3086662) - (10037210 5440568)
>    >    >>>    >          Null geometry IDs: 2156240
>    >    >>>    >          CRS: +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000
>    >    >>>    >          +y_0=3210000
>    >    >>>    >          +ellps=GRS80 +units=m +no_defs
>    >    >>>    >          Number of fields: 6
>    >    >>>    >          name                      type  length  typeName
>    >    >>>    >          1 code_12              4       3           String
>    >    >>>    >          2 ID                         4       18         String
>    >    >>>    >          3 Remark               4       20         String
>    >    >>>    >          4 Area_Ha             2       0           Real
>    >    >>>    >          5 Shape_Length   2       0           Real
>    >    >>>    >          6 Shape_Area       2       0           Real
>    >    >>>    >
>    >    >>>    >          Then I tried to load it typing
>    >    >>>    >
>    >    >>>    >          clc<-readOGR(dsn="clc12_Version_18_5.gdb",layer="clc12_Version_18_5")
>    >    >>>    >
>    >    >>>    >          After 3-5 minutes, it appears the following text:
>    >    >>>    >
>    >    >>>    >          OGR data source with driver: OpenFileGDB
>    >    >>>    >          Source:
>    >    >>>    >          "/Users/Roman/Desktop/clc12shp/clc12_Version_18_5.gdb",
>    >    >>>    >          layer:
>    >    >>>    >          "clc12_Version_18_5"
>    >    >>>    >          with 2370829 features
>    >    >>>    >          It has 6 fields
>    >    >>>    >
>    >    >>>    >          Unfortunately, after trying 5 times (each one took around 8
>    >    >>>    >          hours) I couldn’t get anything but the following messages:
>    >    >>>    >
>    >    >>>    >          Warning messages:
>    >    >>>    >          1: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>    >    >>>    >          "clc12_Version_18_5") :
>    >    >>>    >          Dropping null geometries: 2156240
>    >    >>>    >          2: In readOGR(dsn = "clc12_Version_18_5.gdb", layer =
>    >    >>>    >          "clc12_Version_18_5") :
>    >    >>>    >          Z-dimension discarded
>    >    >>>    >
>    >    >>>    >          Could anyone advise me how to tackle it? I also would
>    >    >>>    >          appreciate suggestions on how to work with geodatabases -
>    >    >>>    >          it’s
>    >    >>>    >          my first time I work with these kind of files so I don’t
>    >    >>>    >          even
>    >    >>>    >          know their structure.
>    >    >>>    >
>    >    >>>    >          By the way, these are my computer and software
>    >    >>>    >          specifications:
>    >    >>>    >
>    >    >>>    >          MacBook Pro 2012
>    >    >>>    >          Processor 2.6 GHz Intel Core i7
>    >    >>>    >          Memory 16 GB DDR3
>    >    >>>    >          R 3.5.0
>    >    >>>    >          RStudio 1.1.447
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >          Best,
>    >    >>>    >          Roman.
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >
>    >    >>>    >   _______________________________________________
>    >    >>>    >   R-sig-Geo mailing list
>    >    >>>    >   [hidden email]
>    >    >>>    >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>    >    >>>
>    >    >>>      --
>    >    >>>      Roger Bivand
>    >    >>>      Department of Economics, Norwegian School of Economics,
>    >    >>>      Helleveien 30, N-5045 Bergen, Norway.
>    >    >>>      voice: +47 55 95 93 55; e-mail: [hidden email]
>    >    >>>      http://orcid.org/0000-0003-2392-6140
>    >    >>>      https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>    >    >>>
>    >    >>>
>    >    >>
>    >    >>
>    >    >
>    >    >
>    >
>    >    --
>    >    Roger Bivand
>    >    Department of Economics, Norwegian School of Economics,
>    >    Helleveien 30, N-5045 Bergen, Norway.
>    >    voice: +47 55 95 93 55; e-mail: [hidden email]
>    >    http://orcid.org/0000-0003-2392-6140
>    >    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>    >
>    >
>
>    --
>    Roger Bivand
>    Department of Economics, Norwegian School of Economics,
>    Helleveien 30, N-5045 Bergen, Norway.
>    voice: +47 55 95 93 55; e-mail: [hidden email]
>    http://orcid.org/0000-0003-2392-6140
>    https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway