处理组合数据框中的重复项



首先,非常感谢大家帮助回答我的问题。你们太棒了。

我将再次需要你的帮助在编码与R。这种情况出现在两个dataframe中,其中Dataframe1描述一个葡萄牙语类,Dataframe2描述一个数学类。我确实想找到重复的(因为有一些,因为一个学生选修了两门课),而不是删除他,而是展开列&;class &;比如"数学+葡萄牙语"之类的。

我试图通过创建两个新的dataframe来简化我的dataframe(实际上它们要大得多,但最终的方法应该是一样的)。有一个副本(父母都是医生的学生)。我只是想让他有一次在数据框架中,用"数学+葡萄牙语"的措辞;在&;class &;

对于重复项的识别,列"等级";必须被忽略

非常感谢你的帮助。愿一切都好!亚历山大
# Creation of Dataset 1 (Portuguese students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("teacher",9),rep("doctor",1))            
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Portuguese",10)
Grade <- round(runif(10,1,5),0)
DataframeP <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeP)
#Creation of Dataset 2 (Math students)
school <- c(rep("S1",7),rep("S2",3))
Age <- c(18,18,19,19,20,20,21,21,22,22)
professionf <- c(rep("lawyer",9),rep("doctor",1))            
professionm <- c(rep("police",9),rep("doctor",1))
Class <- rep("Math",10)
Grade <- round(runif(10,1,5),0)
DataframeM <- cbind(school, Age, professionf,professionm,Grade,Class)
View(DataframeM)
#Combination of the two Dataframes, where the identification of the dupicates should take place
DF_All <- rbind(DataframeM,DataframeP)
View(DF_All)

应该可以了,亲爱的Alexander!

library(data.table)
require(dplyr)
df_merged <- merge(x = DataframeP, y = DataframeM, by = c("school", "Age", "professionf",  "penter code hererofessionm"), all = TRUE)
df_merged <- within(df_merged, Class.x[Class.x == 'Portuguese' & Class.y == 'Math'] <- 'Portoguese + Math')
df_merged$Class.x = coalesce(df_merged$Class.x, df_merged$Class.y)
df_merged$Grade.x = coalesce(df_merged$Grade.x, df_merged$Grade.y)
df_merged <- df_merged[1:(length(df_merged)-2)]
setnames(df_merged, old = c('Grade.x','Class.x'), new = c('Grade','Class'))
df_merged

最新更新