readOGR workaround for Japanese UTF-8 geojson

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

readOGR workaround for Japanese UTF-8 geojson

Alan Engel
I am working on a project https://github.com/AlanInTsukuba/jpucd that
involves

extracting shapefiles and property data from Japanese geojson files. When

reading with readOGR(ibarakipath1 , encoding="UTF-8", use_iconv=TRUE),

I find that the subsets of cannot be written with writeOGR without losing

text fields that are in Japanese text. I found the following workaround but

wonder if there is a better way to do this.


Environment: RGui, Windows10



# load ibaraki shapefiles, extract TX subset, write to geojson

library(jpucd)

shppath <- system.file("extdata",package="jpucd")



ibarakipath1 <-
paste(shppath,"JPGen2005CTgenlCY2000P08Ibaraki.geojson",sep="/")



#^ JPGen2005CTgenlCY2000P08Ibaraki.geojson is a UTF-8 encoded geojson file

#^     having Japanese names in property fields. To be able to

#^    read these fields, they need to be converted (to switch-jis?).

#^     The following command does this.

#^ This can also be done by use_iconv=FALSE and setting

#^     the encoding of the Japanese columns using Encoding(x) <- "UTF-8".



ibaraki <- readOGR(ibarakipath1 , encoding="UTF-8", use_iconv=FALSE) ##
use_iconv=TRUE

## loads so that the Japanese fields are readable but writeOGR doesn’t
write them.

head(ibaraki@data)



#^ Apply Encoding(x) <- “UTF-8”

for (name in colnames(ibaraki@data[,sapply(ibaraki @data, is.character)])){

  Encoding(ibaraki @data[[name]]) <- "UTF-8"}



#^ Get TX subset

tx2000 <- ibaraki[ibaraki@data$CITY_NAME=="つくば市"|ibaraki@data$CITY_NAME=="
守谷町"

              |ibaraki@data$CITY_NAME=="伊奈町"|ibaraki@data$CITY_NAME=="谷和原村
",]

head(tx2000@data)



#^ Write it.

dsn <- "TsukubaExpressCensusDistricts2000.geojson"

writeOGR(tx2000 , dsn,layer="TsukubaExpressCensusDistricts2000" ,
driver="GeoJSON", dataset_options = NULL,

layer_options=NULL, verbose = FALSE, check_exists=NULL,

overwrite_layer=FALSE, delete_dsn=FALSE, morphToESRI=NULL,

encoding="UTF-8")



Thank you.

Alan

https://alanintsukuba.github.io/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: readOGR workaround for Japanese UTF-8 geojson

Roger Bivand
Administrator
On Mon, 29 Jun 2020, Alan Engel wrote:

> I am working on a project https://github.com/AlanInTsukuba/jpucd that
> involves extracting shapefiles and property data from Japanese geojson
> files. When reading with readOGR(ibarakipath1 , encoding="UTF-8",
> use_iconv=TRUE), I find that the subsets of cannot be written with
> writeOGR without losing text fields that are in Japanese text. I found
> the following workaround but wonder if there is a better way to do this.
>

Firstly, the ESRI shapefile driver should only be used for reading legacy
files with known text encodings. They use DBF files to store attribute
data, which should never now be used in new work. They have restrictions
on field name length, imprecision in storing numerical data, and big
problems in storing any text that is not ASCII (see
https://cran.r-project.org/web/packages/rgdal/vignettes/OGR_shape_encoding.pdf).

All new projects must use more modern formats, preferably GeoPackage GPKG
http://www.geopackage.org/spec/, which resolves all of the problems
mentioned.

If your project is using R on Windows, you  need to be aware in addition
of
https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html 
that is that R on Windows is moving towards UTF-8 in order to reduce
internal and cross-platform encoding problems.

Finally, you should be starting new work using the sf workflow, not
sp/rgdal. sp/rgdal are being maintained to support their reverse
dependencies only (and especially for spatial vector data, for which sf
provides full support).

Roger

>
> Environment: RGui, Windows10
>
>
>
> # load ibaraki shapefiles, extract TX subset, write to geojson
>
> library(jpucd)
>
> shppath <- system.file("extdata",package="jpucd")
>
>
>
> ibarakipath1 <-
> paste(shppath,"JPGen2005CTgenlCY2000P08Ibaraki.geojson",sep="/")
>
>
>
> #^ JPGen2005CTgenlCY2000P08Ibaraki.geojson is a UTF-8 encoded geojson file
>
> #^     having Japanese names in property fields. To be able to
>
> #^    read these fields, they need to be converted (to switch-jis?).
>
> #^     The following command does this.
>
> #^ This can also be done by use_iconv=FALSE and setting
>
> #^     the encoding of the Japanese columns using Encoding(x) <- "UTF-8".
>
>
>
> ibaraki <- readOGR(ibarakipath1 , encoding="UTF-8", use_iconv=FALSE) ##
> use_iconv=TRUE
>
> ## loads so that the Japanese fields are readable but writeOGR doesn’t
> write them.
>
> head(ibaraki@data)
>
>
>
> #^ Apply Encoding(x) <- “UTF-8”
>
> for (name in colnames(ibaraki@data[,sapply(ibaraki @data, is.character)])){
>
>  Encoding(ibaraki @data[[name]]) <- "UTF-8"}
>
>
>
> #^ Get TX subset
>
> tx2000 <- ibaraki[ibaraki@data$CITY_NAME=="つくば市"|ibaraki@data$CITY_NAME=="
> 守谷町"
>
>              |ibaraki@data$CITY_NAME=="伊奈町"|ibaraki@data$CITY_NAME=="谷和原村
> ",]
>
> head(tx2000@data)
>
>
>
> #^ Write it.
>
> dsn <- "TsukubaExpressCensusDistricts2000.geojson"
>
> writeOGR(tx2000 , dsn,layer="TsukubaExpressCensusDistricts2000" ,
> driver="GeoJSON", dataset_options = NULL,
>
> layer_options=NULL, verbose = FALSE, check_exists=NULL,
>
> overwrite_layer=FALSE, delete_dsn=FALSE, morphToESRI=NULL,
>
> encoding="UTF-8")
>
>
>
> Thank you.
>
> Alan
>
> https://alanintsukuba.github.io/
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway