Merging shapefiles and csv

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging shapefiles and csv

HallS
Hi all,

I'm struggling to know how this will come across as my data is confidential.

Basically I have a shapefile (.shp) and a csv file while contain the same regions (i.e.) a column which has the same information.  Using this link: https://sites.google.com/site/eospansite/alobotips/spatial_r_tips/rshp_xls
I managed to get quite far but once I got to the writeOGR command, I get the error
 Error in writeOGR(RSANHS, dsn = "C:/Users/Laptop/Documents/Rworkspace/",  :
  number of objects mismatch

shape1@data <- merge(shape1@data,csv,by.x="RSA",
+                           by.y="RSA", all.x=T, sort=F)
>
> ###Checking it
> dim(shape@data)
[1] 1745    2
> dim(shape1@data)
[1] 1747    5

This shows a discrepancy in two rows between the original shapefile and the new merged one.  When I looked at the merged file in full, there were a number of NA rows at the bottom where there was no corresponding data to the shapefile.  I tried shape1@data <- na.exclude(shape1@data) and with na.omit, and this did reduce the number of rows to 1690, but the problem persists.

Sorry if this is a really unhelpful question, I'm not sure how to do it when data is confidential.
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

Lyndon Estes-2
I am not sure about the mismatch issue, but I thinking merging the
data slot of spatialPolygonsDataFrame with a data frame produces
undesirable results.

I wrote a function a while back that does the merge in such a way that
the problems are avoided, and perhaps this might help.  I think there
are other, more recent, and undoubtedly better solutions (in fact I
recall seeing a very recent thread about this, but not sure where)
than this one that you could find.

joinAttributeTable <- function(x, y, xcol, ycol) {
# Merges data frame to SpatialPolygonsDataFrame, keeping the correct
order. Code from suggestions at:
# https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003064.html
# Args:
#   x: SpatialPolygonsDataFrame
#   y: Name of data.frame to merge
#   xcol: Merge column name
#   ycol: Merge column name
# Returns: Shapefile with merged attribute table

  x$sort_id <- 1:nrow(as(x, "data.frame"))  # Column containing
original row order for later sorting

  x.dat <- as(x, "data.frame")  # Create new data.frame object
  x.dat2 <- merge(x.dat, y, by.x = xcol, by.y = ycol)  # Merge
  x.dat2.ord <- x.dat2[order(x.dat2$sort_id), ]  # Reorder back to original
  x2 <- x[x$sort_id %in% x.dat2$sort_id, ]  # Make new set of
polygons, dropping those which aren't in merge
  x2.dat <- as(x2, "data.frame")  # Make update x2 into a data.frame
  row.names(x.dat2.ord) <- row.names(x2.dat)  # Reassign row.names
from original data.frame
  x2@data <- x.dat2.ord  # Assign to shapefile the new data.frame
  return(x2)
}

Hope it helps.

Best, Lyndon


On Thu, Jul 31, 2014 at 8:32 AM, HallS <[hidden email]> wrote:

> Hi all,
>
> I'm struggling to know how this will come across as my data is confidential.
>
> Basically I have a shapefile (.shp) and a csv file while contain the same
> regions (i.e.) a column which has the same information.  Using this link:
> https://sites.google.com/site/eospansite/alobotips/spatial_r_tips/rshp_xls
> I managed to get quite far but once I got to the writeOGR command, I get the
> error
>  Error in writeOGR(RSANHS, dsn = "C:/Users/Laptop/Documents/Rworkspace/",  :
>   number of objects mismatch
>
> shape1@data <- merge(shape1@data,csv,by.x="RSA",
> +                           by.y="RSA", all.x=T, sort=F)
>>
>> ###Checking it
>> dim(shape@data)
> [1] 1745    2
>> dim(shape1@data)
> [1] 1747    5
>
> This shows a discrepancy in two rows between the original shapefile and the
> new merged one.  When I looked at the merged file in full, there were a
> number of NA rows at the bottom where there was no corresponding data to the
> shapefile.  I tried shape1@data <- na.exclude(shape1@data) and with na.omit,
> and this did reduce the number of rows to 1690, but the problem persists.
>
> Sorry if this is a really unhelpful question, I'm not sure how to do it when
> data is confidential.
>
>
>
> --
> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839.html
> Sent from the R-sig-geo mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

Rafael Wüest
Hi there

have a look at

?sp::merge

Should do what you need.

HTH, Rafael

On 31/07/2014 15:14, Lyndon Estes wrote:

> I am not sure about the mismatch issue, but I thinking merging the
> data slot of spatialPolygonsDataFrame with a data frame produces
> undesirable results.
>
> I wrote a function a while back that does the merge in such a way that
> the problems are avoided, and perhaps this might help.  I think there
> are other, more recent, and undoubtedly better solutions (in fact I
> recall seeing a very recent thread about this, but not sure where)
> than this one that you could find.
>
> joinAttributeTable <- function(x, y, xcol, ycol) {
> # Merges data frame to SpatialPolygonsDataFrame, keeping the correct
> order. Code from suggestions at:
> # https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003064.html
> # Args:
> #   x: SpatialPolygonsDataFrame
> #   y: Name of data.frame to merge
> #   xcol: Merge column name
> #   ycol: Merge column name
> # Returns: Shapefile with merged attribute table
>
>    x$sort_id <- 1:nrow(as(x, "data.frame"))  # Column containing
> original row order for later sorting
>
>    x.dat <- as(x, "data.frame")  # Create new data.frame object
>    x.dat2 <- merge(x.dat, y, by.x = xcol, by.y = ycol)  # Merge
>    x.dat2.ord <- x.dat2[order(x.dat2$sort_id), ]  # Reorder back to original
>    x2 <- x[x$sort_id %in% x.dat2$sort_id, ]  # Make new set of
> polygons, dropping those which aren't in merge
>    x2.dat <- as(x2, "data.frame")  # Make update x2 into a data.frame
>    row.names(x.dat2.ord) <- row.names(x2.dat)  # Reassign row.names
> from original data.frame
>    x2@data <- x.dat2.ord  # Assign to shapefile the new data.frame
>    return(x2)
> }
>
> Hope it helps.
>
> Best, Lyndon
>
>
> On Thu, Jul 31, 2014 at 8:32 AM, HallS <[hidden email]> wrote:
>> Hi all,
>>
>> I'm struggling to know how this will come across as my data is confidential.
>>
>> Basically I have a shapefile (.shp) and a csv file while contain the same
>> regions (i.e.) a column which has the same information.  Using this link:
>> https://sites.google.com/site/eospansite/alobotips/spatial_r_tips/rshp_xls
>> I managed to get quite far but once I got to the writeOGR command, I get the
>> error
>>   Error in writeOGR(RSANHS, dsn = "C:/Users/Laptop/Documents/Rworkspace/",  :
>>    number of objects mismatch
>>
>> shape1@data <- merge(shape1@data,csv,by.x="RSA",
>> +                           by.y="RSA", all.x=T, sort=F)
>>>
>>> ###Checking it
>>> dim(shape@data)
>> [1] 1745    2
>>> dim(shape1@data)
>> [1] 1747    5
>>
>> This shows a discrepancy in two rows between the original shapefile and the
>> new merged one.  When I looked at the merged file in full, there were a
>> number of NA rows at the bottom where there was no corresponding data to the
>> shapefile.  I tried shape1@data <- na.exclude(shape1@data) and with na.omit,
>> and this did reduce the number of rows to 1690, but the problem persists.
>>
>> Sorry if this is a really unhelpful question, I'm not sure how to do it when
>> data is confidential.
>>
>>
>>
>> --
>> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839.html
>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Rafael Wüest
[hidden email]
http://www.rowueest.net

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

HallS
Hi,Thank you Lyndon and Rafael for your thoughts.  After the sp::merge comment I followed the code below, but again it failed at the write OGR bit, but this time with "Error in writeOGR(spatial.data, dsn = "C:/Users/Laptop/Documents/Rworkspace/Shape",  :   Creating Name field failed"
This could be because the "new_layer" command...does this have to be named anything in particular?  Does it have to match the file name etc.  Lyndon I'll try yours next but must admit it's confused me a little.

Joining New Data to an Existing sp Object
# use to read in some vector data
library(rgdal)

# read something in, rows are identified by a column called 'id'
spatial.data <- readOGR(...)

# read in some tabular data, rows are identified by a column called 'id'
new_table <- read.csv(...)

# 'join' the new data with merge()
# all.x=TRUE is used to ensure we have the same number of rows after the join
# in case that the new table has fewer
merged <- merge(x=spatial.data@data, y=new_table, by.x='id', by.y='id', all.x=TRUE)

# generate a vector that represents the original ordering of rows in the sp object
correct.ordering <- match(spatial.data@data$id, merged$id)

# overwrite the original dataframe with the new merged dataframe, in the correct order
spatial.data@data <- merged[correct.ordering, ]

# check the ordering of the merged data, with the original spatial data
cbind(spatial.data@data$id, merged$id[correct.ordering])
Correctly Write 'NA' Values to Shapefile [bug in writeOGR()]
# libraries we need
require(rgdal)
require(foreign)

# pass 1: write the shapefile
writeOGR(spatial.data, dsn='new_folder', driver='ESRI Shapefile', layer='new_layer')

# re-make the DBF:
write.dbf(spatial.data@data, file='new_folder/new_layer.dbf')


> Date: Thu, 31 Jul 2014 15:19:01 +0200
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R-sig-Geo] Merging shapefiles and csv
>
> Hi there
>
> have a look at
>
> ?sp::merge
>
> Should do what you need.
>
> HTH, Rafael
>
> On 31/07/2014 15:14, Lyndon Estes wrote:
> > I am not sure about the mismatch issue, but I thinking merging the
> > data slot of spatialPolygonsDataFrame with a data frame produces
> > undesirable results.
> >
> > I wrote a function a while back that does the merge in such a way that
> > the problems are avoided, and perhaps this might help.  I think there
> > are other, more recent, and undoubtedly better solutions (in fact I
> > recall seeing a very recent thread about this, but not sure where)
> > than this one that you could find.
> >
> > joinAttributeTable <- function(x, y, xcol, ycol) {
> > # Merges data frame to SpatialPolygonsDataFrame, keeping the correct
> > order. Code from suggestions at:
> > # https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003064.html
> > # Args:
> > #   x: SpatialPolygonsDataFrame
> > #   y: Name of data.frame to merge
> > #   xcol: Merge column name
> > #   ycol: Merge column name
> > # Returns: Shapefile with merged attribute table
> >
> >    x$sort_id <- 1:nrow(as(x, "data.frame"))  # Column containing
> > original row order for later sorting
> >
> >    x.dat <- as(x, "data.frame")  # Create new data.frame object
> >    x.dat2 <- merge(x.dat, y, by.x = xcol, by.y = ycol)  # Merge
> >    x.dat2.ord <- x.dat2[order(x.dat2$sort_id), ]  # Reorder back to original
> >    x2 <- x[x$sort_id %in% x.dat2$sort_id, ]  # Make new set of
> > polygons, dropping those which aren't in merge
> >    x2.dat <- as(x2, "data.frame")  # Make update x2 into a data.frame
> >    row.names(x.dat2.ord) <- row.names(x2.dat)  # Reassign row.names
> > from original data.frame
> >    x2@data <- x.dat2.ord  # Assign to shapefile the new data.frame
> >    return(x2)
> > }
> >
> > Hope it helps.
> >
> > Best, Lyndon
> >
> >
> > On Thu, Jul 31, 2014 at 8:32 AM, HallS <[hidden email]> wrote:
> >> Hi all,
> >>
> >> I'm struggling to know how this will come across as my data is confidential.
> >>
> >> Basically I have a shapefile (.shp) and a csv file while contain the same
> >> regions (i.e.) a column which has the same information.  Using this link:
> >> https://sites.google.com/site/eospansite/alobotips/spatial_r_tips/rshp_xls
> >> I managed to get quite far but once I got to the writeOGR command, I get the
> >> error
> >>   Error in writeOGR(RSANHS, dsn = "C:/Users/Laptop/Documents/Rworkspace/",  :
> >>    number of objects mismatch
> >>
> >> shape1@data <- merge(shape1@data,csv,by.x="RSA",
> >> +                           by.y="RSA", all.x=T, sort=F)
> >>>
> >>> ###Checking it
> >>> dim(shape@data)
> >> [1] 1745    2
> >>> dim(shape1@data)
> >> [1] 1747    5
> >>
> >> This shows a discrepancy in two rows between the original shapefile and the
> >> new merged one.  When I looked at the merged file in full, there were a
> >> number of NA rows at the bottom where there was no corresponding data to the
> >> shapefile.  I tried shape1@data <- na.exclude(shape1@data) and with na.omit,
> >> and this did reduce the number of rows to 1690, but the problem persists.
> >>
> >> Sorry if this is a really unhelpful question, I'm not sure how to do it when
> >> data is confidential.
> >>
> >>
> >>
> >> --
> >> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839.html
> >> Sent from the R-sig-geo mailing list archive at Nabble.com.
> >>
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> [hidden email]
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Rafael Wüest
> [hidden email]
> http://www.rowueest.net
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
     
        [[alternative HTML version deleted]]


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

Rolando Valdez
Hi,

I have used this proceeding:

spatial.data <- readOGR(….)
stat.data <- read.csv(….)
spatial.data@data=data.frame(stat.data)

This will merge your statistical data to your spatial data. Make sure you have the same order in both sides of your data.

Hope this help, greetings.

El 31/07/2014, a las 09:10, sam cruickshank <[hidden email]> escribió:

> Hi,Thank you Lyndon and Rafael for your thoughts.  After the sp::merge comment I followed the code below, but again it failed at the write OGR bit, but this time with "Error in writeOGR(spatial.data, dsn = "C:/Users/Laptop/Documents/Rworkspace/Shape",  :   Creating Name field failed"
> This could be because the "new_layer" command...does this have to be named anything in particular?  Does it have to match the file name etc.  Lyndon I'll try yours next but must admit it's confused me a little.
>
> Joining New Data to an Existing sp Object
> # use to read in some vector data
> library(rgdal)
>
> # read something in, rows are identified by a column called 'id'
> spatial.data <- readOGR(...)
>
> # read in some tabular data, rows are identified by a column called 'id'
> new_table <- read.csv(...)
>
> # 'join' the new data with merge()
> # all.x=TRUE is used to ensure we have the same number of rows after the join
> # in case that the new table has fewer
> merged <- merge(x=spatial.data@data, y=new_table, by.x='id', by.y='id', all.x=TRUE)
>
> # generate a vector that represents the original ordering of rows in the sp object
> correct.ordering <- match(spatial.data@data$id, merged$id)
>
> # overwrite the original dataframe with the new merged dataframe, in the correct order
> spatial.data@data <- merged[correct.ordering, ]
>
> # check the ordering of the merged data, with the original spatial data
> cbind(spatial.data@data$id, merged$id[correct.ordering])
> Correctly Write 'NA' Values to Shapefile [bug in writeOGR()]
> # libraries we need
> require(rgdal)
> require(foreign)
>
> # pass 1: write the shapefile
> writeOGR(spatial.data, dsn='new_folder', driver='ESRI Shapefile', layer='new_layer')
>
> # re-make the DBF:
> write.dbf(spatial.data@data, file='new_folder/new_layer.dbf')
>
>
>> Date: Thu, 31 Jul 2014 15:19:01 +0200
>> From: [hidden email]
>> To: [hidden email]
>> Subject: Re: [R-sig-Geo] Merging shapefiles and csv
>>
>> Hi there
>>
>> have a look at
>>
>> ?sp::merge
>>
>> Should do what you need.
>>
>> HTH, Rafael
>>
>> On 31/07/2014 15:14, Lyndon Estes wrote:
>>> I am not sure about the mismatch issue, but I thinking merging the
>>> data slot of spatialPolygonsDataFrame with a data frame produces
>>> undesirable results.
>>>
>>> I wrote a function a while back that does the merge in such a way that
>>> the problems are avoided, and perhaps this might help.  I think there
>>> are other, more recent, and undoubtedly better solutions (in fact I
>>> recall seeing a very recent thread about this, but not sure where)
>>> than this one that you could find.
>>>
>>> joinAttributeTable <- function(x, y, xcol, ycol) {
>>> # Merges data frame to SpatialPolygonsDataFrame, keeping the correct
>>> order. Code from suggestions at:
>>> # https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003064.html
>>> # Args:
>>> #   x: SpatialPolygonsDataFrame
>>> #   y: Name of data.frame to merge
>>> #   xcol: Merge column name
>>> #   ycol: Merge column name
>>> # Returns: Shapefile with merged attribute table
>>>
>>>   x$sort_id <- 1:nrow(as(x, "data.frame"))  # Column containing
>>> original row order for later sorting
>>>
>>>   x.dat <- as(x, "data.frame")  # Create new data.frame object
>>>   x.dat2 <- merge(x.dat, y, by.x = xcol, by.y = ycol)  # Merge
>>>   x.dat2.ord <- x.dat2[order(x.dat2$sort_id), ]  # Reorder back to original
>>>   x2 <- x[x$sort_id %in% x.dat2$sort_id, ]  # Make new set of
>>> polygons, dropping those which aren't in merge
>>>   x2.dat <- as(x2, "data.frame")  # Make update x2 into a data.frame
>>>   row.names(x.dat2.ord) <- row.names(x2.dat)  # Reassign row.names
>>> from original data.frame
>>>   x2@data <- x.dat2.ord  # Assign to shapefile the new data.frame
>>>   return(x2)
>>> }
>>>
>>> Hope it helps.
>>>
>>> Best, Lyndon
>>>
>>>
>>> On Thu, Jul 31, 2014 at 8:32 AM, HallS <[hidden email]> wrote:
>>>> Hi all,
>>>>
>>>> I'm struggling to know how this will come across as my data is confidential.
>>>>
>>>> Basically I have a shapefile (.shp) and a csv file while contain the same
>>>> regions (i.e.) a column which has the same information.  Using this link:
>>>> https://sites.google.com/site/eospansite/alobotips/spatial_r_tips/rshp_xls
>>>> I managed to get quite far but once I got to the writeOGR command, I get the
>>>> error
>>>>  Error in writeOGR(RSANHS, dsn = "C:/Users/Laptop/Documents/Rworkspace/",  :
>>>>   number of objects mismatch
>>>>
>>>> shape1@data <- merge(shape1@data,csv,by.x="RSA",
>>>> +                           by.y="RSA", all.x=T, sort=F)
>>>>>
>>>>> ###Checking it
>>>>> dim(shape@data)
>>>> [1] 1745    2
>>>>> dim(shape1@data)
>>>> [1] 1747    5
>>>>
>>>> This shows a discrepancy in two rows between the original shapefile and the
>>>> new merged one.  When I looked at the merged file in full, there were a
>>>> number of NA rows at the bottom where there was no corresponding data to the
>>>> shapefile.  I tried shape1@data <- na.exclude(shape1@data) and with na.omit,
>>>> and this did reduce the number of rows to 1690, but the problem persists.
>>>>
>>>> Sorry if this is a really unhelpful question, I'm not sure how to do it when
>>>> data is confidential.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839.html
>>>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> [hidden email]
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Rafael Wüest
>> [hidden email]
>> http://www.rowueest.net
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>    
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Rolando Valdez

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

HallS
In reply to this post by HallS
Hi all,

OK thanks for all your help.  I believe now I have successfully merged the two, when I run str(merged_file) it shows all of the correct variables, and when I run plot(merged_file), I still get the outline of the correct shape.

Thank you for all your help, I must confess writeOGR doesn't work still, but I thought this was required in order to draw a choropleth map and I realise now this is not the case and it can be done with just the merged file I believe.  

Thanks again
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

Lyndon Estes-2
Hi Sam,

One thing you might check is the column names in your csv file. If you
can change them to something simpler, without periods in them, etc,
that might get rid of the problem.  There are some threads on this
problem as well, I believe.

Cheers, Lyndon

On Thu, Jul 31, 2014 at 11:27 AM, HallS <[hidden email]> wrote:

> Hi all,
>
> OK thanks for all your help.  I believe now I have successfully merged the
> two, when I run str(merged_file) it shows all of the correct variables, and
> when I run plot(merged_file), I still get the outline of the correct shape.
>
> Thank you for all your help, I must confess writeOGR doesn't work still, but
> I thought this was required in order to draw a choropleth map and I realise
> now this is not the case and it can be done with just the merged file I
> believe.
>
> Thanks again
>
>
>
> --
> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839p7586845.html
> Sent from the R-sig-geo mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

Roger Bivand
Administrator
In reply to this post by Rolando Valdez
On Thu, 31 Jul 2014, Rolando Valdez wrote:

> Hi,
>
> I have used this proceeding:
>
> spatial.data <- readOGR(….)
> stat.data <- read.csv(….)
> spatial.data@data=data.frame(stat.data)
>
> This will merge your statistical data to your spatial data. Make sure
> you have the same order in both sides of your data.
No, this only holds under very specific assumptions (same observations in
both data objects in the same order). Please do refer to the vignette in
the maptools package, and to previous threads which have advised that
merge() should not be used, and that the row.names of the data frames be
used as ID keys. Typically using match() on the row.names of the two
objects will show which are not correctly aligned.

Hope this clarifies,

Roger

>
> Hope this help, greetings.
>
> El 31/07/2014, a las 09:10, sam cruickshank <[hidden email]> escribió:
>
>> Hi,Thank you Lyndon and Rafael for your thoughts.  After the sp::merge comment I followed the code below, but again it failed at the write OGR bit, but this time with "Error in writeOGR(spatial.data, dsn = "C:/Users/Laptop/Documents/Rworkspace/Shape",  :   Creating Name field failed"
>> This could be because the "new_layer" command...does this have to be named anything in particular?  Does it have to match the file name etc.  Lyndon I'll try yours next but must admit it's confused me a little.
>>
>> Joining New Data to an Existing sp Object
>> # use to read in some vector data
>> library(rgdal)
>>
>> # read something in, rows are identified by a column called 'id'
>> spatial.data <- readOGR(...)
>>
>> # read in some tabular data, rows are identified by a column called 'id'
>> new_table <- read.csv(...)
>>
>> # 'join' the new data with merge()
>> # all.x=TRUE is used to ensure we have the same number of rows after the join
>> # in case that the new table has fewer
>> merged <- merge(x=spatial.data@data, y=new_table, by.x='id', by.y='id', all.x=TRUE)
>>
>> # generate a vector that represents the original ordering of rows in the sp object
>> correct.ordering <- match(spatial.data@data$id, merged$id)
>>
>> # overwrite the original dataframe with the new merged dataframe, in the correct order
>> spatial.data@data <- merged[correct.ordering, ]
>>
>> # check the ordering of the merged data, with the original spatial data
>> cbind(spatial.data@data$id, merged$id[correct.ordering])
>> Correctly Write 'NA' Values to Shapefile [bug in writeOGR()]
>> # libraries we need
>> require(rgdal)
>> require(foreign)
>>
>> # pass 1: write the shapefile
>> writeOGR(spatial.data, dsn='new_folder', driver='ESRI Shapefile', layer='new_layer')
>>
>> # re-make the DBF:
>> write.dbf(spatial.data@data, file='new_folder/new_layer.dbf')
>>
>>
>>> Date: Thu, 31 Jul 2014 15:19:01 +0200
>>> From: [hidden email]
>>> To: [hidden email]
>>> Subject: Re: [R-sig-Geo] Merging shapefiles and csv
>>>
>>> Hi there
>>>
>>> have a look at
>>>
>>> ?sp::merge
>>>
>>> Should do what you need.
>>>
>>> HTH, Rafael
>>>
>>> On 31/07/2014 15:14, Lyndon Estes wrote:
>>>> I am not sure about the mismatch issue, but I thinking merging the
>>>> data slot of spatialPolygonsDataFrame with a data frame produces
>>>> undesirable results.
>>>>
>>>> I wrote a function a while back that does the merge in such a way that
>>>> the problems are avoided, and perhaps this might help.  I think there
>>>> are other, more recent, and undoubtedly better solutions (in fact I
>>>> recall seeing a very recent thread about this, but not sure where)
>>>> than this one that you could find.
>>>>
>>>> joinAttributeTable <- function(x, y, xcol, ycol) {
>>>> # Merges data frame to SpatialPolygonsDataFrame, keeping the correct
>>>> order. Code from suggestions at:
>>>> # https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003064.html
>>>> # Args:
>>>> #   x: SpatialPolygonsDataFrame
>>>> #   y: Name of data.frame to merge
>>>> #   xcol: Merge column name
>>>> #   ycol: Merge column name
>>>> # Returns: Shapefile with merged attribute table
>>>>
>>>>   x$sort_id <- 1:nrow(as(x, "data.frame"))  # Column containing
>>>> original row order for later sorting
>>>>
>>>>   x.dat <- as(x, "data.frame")  # Create new data.frame object
>>>>   x.dat2 <- merge(x.dat, y, by.x = xcol, by.y = ycol)  # Merge
>>>>   x.dat2.ord <- x.dat2[order(x.dat2$sort_id), ]  # Reorder back to original
>>>>   x2 <- x[x$sort_id %in% x.dat2$sort_id, ]  # Make new set of
>>>> polygons, dropping those which aren't in merge
>>>>   x2.dat <- as(x2, "data.frame")  # Make update x2 into a data.frame
>>>>   row.names(x.dat2.ord) <- row.names(x2.dat)  # Reassign row.names
>>>> from original data.frame
>>>>   x2@data <- x.dat2.ord  # Assign to shapefile the new data.frame
>>>>   return(x2)
>>>> }
>>>>
>>>> Hope it helps.
>>>>
>>>> Best, Lyndon
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 8:32 AM, HallS <[hidden email]> wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm struggling to know how this will come across as my data is confidential.
>>>>>
>>>>> Basically I have a shapefile (.shp) and a csv file while contain the same
>>>>> regions (i.e.) a column which has the same information.  Using this link:
>>>>> https://sites.google.com/site/eospansite/alobotips/spatial_r_tips/rshp_xls
>>>>> I managed to get quite far but once I got to the writeOGR command, I get the
>>>>> error
>>>>>  Error in writeOGR(RSANHS, dsn = "C:/Users/Laptop/Documents/Rworkspace/",  :
>>>>>   number of objects mismatch
>>>>>
>>>>> shape1@data <- merge(shape1@data,csv,by.x="RSA",
>>>>> +                           by.y="RSA", all.x=T, sort=F)
>>>>>>
>>>>>> ###Checking it
>>>>>> dim(shape@data)
>>>>> [1] 1745    2
>>>>>> dim(shape1@data)
>>>>> [1] 1747    5
>>>>>
>>>>> This shows a discrepancy in two rows between the original shapefile and the
>>>>> new merged one.  When I looked at the merged file in full, there were a
>>>>> number of NA rows at the bottom where there was no corresponding data to the
>>>>> shapefile.  I tried shape1@data <- na.exclude(shape1@data) and with na.omit,
>>>>> and this did reduce the number of rows to 1690, but the problem persists.
>>>>>
>>>>> Sorry if this is a really unhelpful question, I'm not sure how to do it when
>>>>> data is confidential.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839.html
>>>>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> [hidden email]
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> [hidden email]
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>> --
>>> Rafael Wüest
>>> [hidden email]
>>> http://www.rowueest.net
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> Rolando Valdez
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: [hidden email]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

Roger Bivand
Administrator
In reply to this post by HallS
On Thu, 31 Jul 2014, HallS wrote:

> Hi all,
>
> OK thanks for all your help.  I believe now I have successfully merged the
> two, when I run str(merged_file) it shows all of the correct variables, and
> when I run plot(merged_file), I still get the outline of the correct shape.

Beware that the data from the objects may be jumbled - never use merge,
always use match() on the row.names vectors of the objects to ensure that
the key-IDs agree. Jumbled data happens, it is important not to think
"shapefile" but to think DBMS with the ID key your way of staying sane.

Hope this clarifies,

Roger

>
> Thank you for all your help, I must confess writeOGR doesn't work still, but
> I thought this was required in order to draw a choropleth map and I realise
> now this is not the case and it can be done with just the merged file I
> believe.
>
> Thanks again
>
>
>
> --
> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839p7586845.html
> Sent from the R-sig-geo mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: [hidden email]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: Merging shapefiles and csv

HallS
In reply to this post by HallS
Hi all,

Thanks for all your help, Roger you were right!  The merge function looked like it had done it fine but when I came to plot it the shapefile clearly hadn't joined correctly.  I think this is some kind of bodge, but I've managed to use the match() function to join the csv with the shapefile, then use an if function along with gunioncascade to join the internal polygons according to col_2.

Match code:
frame@data=gc[match(shape@data[,"Col_1"], csv[,"Col_1"]),]

Then for each higher level (Col_2):

TERRITORY1 <- gUnionCascaded(frame[ which(frame$TERR=='1'), ])
plot(TERRITORY1)

TERRITORY2 <- gUnionCascaded(frame[ which(frame$TERR=='2'), ])
plot(TERRITORY2)

I'm hoping I can now use a concatenation of "TERRITORY1", "TERRITORY2" etc. to map all of it, or single ones if I want.  Just hoping the rest of my data from the csv is all joined on correctly.  Working on ggplot2 map now.  Thanks for everyone's help, just thought I'd update this!