date - R: Flagging Sample from Same Specimen w/ Different DOB -
I have a dataset that contains duplicate samples, which have different date of birth. This should not be a clear case, so I am trying to come up with a path to mark / mark those special specimens. In the end, only those samples that have next to 1 will be duplicated to those samples which will have a separate DOB, all duplicates will be the same DOB and unique samples will have 0. Here is a simplified version of the data.
test.df & lt; -data.frame (sample = C ("A", "A", "B", "C", "B", "D", "C", "D", "E"), DOB = C As ('2000-05-10'), as in the date ('2002-04-13'), Date ('2001/05/12'), as.Date ('2003/06/01 '), As.Date (' 2003/04/21 '), as.Date (' 2000/10/20 '), et date (' 2003-06-01 '), et date (' 2000-10-20 '), Et date (' 2001-11-23 ') Sample DOB 1A 2000-05-10 2A 2002-04-13 3b 2001-05-12 4C 2003-06-01 5b 2003-04 -21 6D 2000-10-20 7C 2003-06-01 8D 2000-10-209E 2001-11-23 And thus, in the form of the final result Want to
sample dob diff.dob 1a 2000-05-10 1 2a 2002-04-13 1 3b 2001-05-12 1 4C 2003-06-01 0 5b 2003-24-2 1 1 6D 2000-10-20 0 7C 2003-06-01 0 8D 2000-10-20 09E 2001-11-23 0 Duplicate recognition clearly The easiest part is, I'm having trouble adding more columns for just 1 and 0 and having trouble getting annoyed if the actual duplicate has a different DOB. Any help would greatly appreciate. Thanks.
You ave
test.df $ diff.dob < - test.df, ave (as.numeric (DOB), sample, function = function (x) length (unique (x)) = 1)) or dplyr < / Code>
Library (dplyr) using test.df%>% group_by (sample)%>% mutate is Diff.dob = (n_distinct (DOB)! = 1) +0) # Sample DOB diff.dob # 1A 2000-05-10 1 # 2A 2002-04-13 1 # 3B 2001-05-12 1 # 4C 2003-06-01 0 # 5 B 2003-04-21 # 6D 2000-10-20 0 # C 2003-06-01 0 # 8D 2000-10-20 0 # 9E 2001-11-223 0 or data.table
library (data.table) using setDT (test.df) [, Diff.dob: = (! AnyDuplicated ( DOB) & nbsp; 1; +0, Sample] [] or base R with another potential v Lp
indx1 & lt; -! With (test.df, duplicate (DOB) | duplicate (DOB, fromLast = TRUE) tbl & lt; - table (test.df $ sample)! = 1 (sample% (TBL) [TBL] & amp; Indx1 in test.df% names +0 # [1] 1 1 1 0 0 0 0 0
Comments
Post a Comment