Help to eliminate duplicated from data.frame but Special Problem

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Help to eliminate duplicated from data.frame but Special Problem

gianni lavaredo
Dear Reseacher,
i need to resolve the following problem. I wish to delete duplicate row from
a data.frame but not all duplicate row:


ex:

my.df <- data.frame(Id=c(1,2,3,4,5,5,6,7,8,8,8,9),
value1=c(10,20,30,40,50,50,60,70,80,80,81,90),
value2=c(100,200,300,400,500,500,600,700,800,800,799,900))


> my.df
   Id value1 value2
1   1     10    100
2   2     20    200
3   3     30    300
4   4     40    400
5   5     50    500
6   5     50    500
7   6     60    600
8   7     70    700
9   8     80    800
10  8     80    800
11  8     81    799
12  9     90    900


eliminate

> my.df
   Id value1 value2
1   1     10    100
2   2     20    200
3   3     30    300
4   4     40    400
5   5     50    500
7   6     60    600
8   7     70    700
9   8     80    800
11  8     81    799
12  9     90    900

but if I use

xx <-  my.df[!duplicated( my.df$Id), ]

my result is

> xx
   Id value1 value2
1   1     10    100
2   2     20    200
3   3     30    300
4   4     40    400
5   5     50    500
7   6     60    600
8   7     70    700
9   8     80    800
12  9     90    900


thanks in advance
Gianni

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Help to eliminate duplicated from data.frame but Special Problem

Sarah Goslee
So you want to look at all rows, not just the index?
Then specify that:

> my.df[!duplicated(my.df),]
   Id value1 value2
1   1     10    100
2   2     20    200
3   3     30    300
4   4     40    400
5   5     50    500
7   6     60    600
8   7     70    700
9   8     80    800
11  8     81    799
12  9     90    900

R will do exactly what you tell it, and only that.

And thank you for including a workable example!

Sarah

On Wed, Mar 9, 2011 at 9:42 AM, gianni lavaredo
<[hidden email]> wrote:

> Dear Reseacher,
> i need to resolve the following problem. I wish to delete duplicate row from
> a data.frame but not all duplicate row:
>
>
> ex:
>
> my.df <- data.frame(Id=c(1,2,3,4,5,5,6,7,8,8,8,9),
> value1=c(10,20,30,40,50,50,60,70,80,80,81,90),
> value2=c(100,200,300,400,500,500,600,700,800,800,799,900))
>
>
>> my.df
>   Id value1 value2
> 1   1     10    100
> 2   2     20    200
> 3   3     30    300
> 4   4     40    400
> 5   5     50    500
> 6   5     50    500
> 7   6     60    600
> 8   7     70    700
> 9   8     80    800
> 10  8     80    800
> 11  8     81    799
> 12  9     90    900
>
>
> eliminate
>
>> my.df
>   Id value1 value2
> 1   1     10    100
> 2   2     20    200
> 3   3     30    300
> 4   4     40    400
> 5   5     50    500
> 7   6     60    600
> 8   7     70    700
> 9   8     80    800
> 11  8     81    799
> 12  9     90    900
>
> but if I use
>
> xx <-  my.df[!duplicated( my.df$Id), ]
>
> my result is
>
>> xx
>   Id value1 value2
> 1   1     10    100
> 2   2     20    200
> 3   3     30    300
> 4   4     40    400
> 5   5     50    500
> 7   6     60    600
> 8   7     70    700
> 9   8     80    800
> 12  9     90    900
>
>
> thanks in advance
> Gianni
>



--
Sarah Goslee
http://www.functionaldiversity.org

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Help to eliminate duplicated from data.frame but Special Problem

Jon Skoien
In reply to this post by gianni lavaredo
Hi Gianni,

 From the example it seems like you want to check if value1 is
duplicated, not Id:
 > my.df[!duplicated(my.df$value1),]
You can also remove duplicated rows based on the values of more than one
column:
 > my.df[!duplicated(my.df[,c("Id","value1")]),]
Does any of these do what you want?

Cheers,
Jon


On 3/9/2011 3:42 PM, gianni lavaredo wrote:

> Dear Reseacher,
> i need to resolve the following problem. I wish to delete duplicate row from
> a data.frame but not all duplicate row:
>
>
> ex:
>
> my.df<- data.frame(Id=c(1,2,3,4,5,5,6,7,8,8,8,9),
> value1=c(10,20,30,40,50,50,60,70,80,80,81,90),
> value2=c(100,200,300,400,500,500,600,700,800,800,799,900))
>
>
>> my.df
>     Id value1 value2
> 1   1     10    100
> 2   2     20    200
> 3   3     30    300
> 4   4     40    400
> 5   5     50    500
> 6   5     50    500
> 7   6     60    600
> 8   7     70    700
> 9   8     80    800
> 10  8     80    800
> 11  8     81    799
> 12  9     90    900
>
>
> eliminate
>
>> my.df
>     Id value1 value2
> 1   1     10    100
> 2   2     20    200
> 3   3     30    300
> 4   4     40    400
> 5   5     50    500
> 7   6     60    600
> 8   7     70    700
> 9   8     80    800
> 11  8     81    799
> 12  9     90    900
>
> but if I use
>
> xx<-  my.df[!duplicated( my.df$Id), ]
>
> my result is
>
>> xx
>     Id value1 value2
> 1   1     10    100
> 2   2     20    200
> 3   3     30    300
> 4   4     40    400
> 5   5     50    500
> 7   6     60    600
> 8   7     70    700
> 9   8     80    800
> 12  9     90    900
>
>
> thanks in advance
> Gianni
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Reply | Threaded
Open this post in threaded view
|

Re: Help to eliminate duplicated from data.frame but Special Problem

Georg Ruß
In reply to this post by gianni lavaredo
On 09/03/11 15:42:40, gianni lavaredo wrote:

> Dear Reseacher,
> i need to resolve the following problem. I wish to delete duplicate row from
> a data.frame but not all duplicate row:
>
> ex:
>
> my.df <- data.frame(Id=c(1,2,3,4,5,5,6,7,8,8,8,9),
> value1=c(10,20,30,40,50,50,60,70,80,80,81,90),
> value2=c(100,200,300,400,500,500,600,700,800,800,799,900))
>
>
> > my.df
>    Id value1 value2
> 1   1     10    100
> 2   2     20    200
> 3   3     30    300
> 4   4     40    400
> 5   5     50    500
> 6   5     50    500
> 7   6     60    600
> 8   7     70    700
> 9   8     80    800
> 10  8     80    800
> 11  8     81    799
> 12  9     90    900

Does "unique(my.df)" solve your issue?

What's this got to do with R-sig-geo, anyway?

Regards,
Georg.
--
Research Assistant
Otto-von-Guericke-Universität Magdeburg
[hidden email]
http://research.georgruss.de

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo