我有两个数据集代表不同的组:
student_details <- c("John", "Henrick", "Maria", "Lucas", "Ali")
student_class <- c("High School", "College", "Preschool", "High School", "college")
df1 <- data.frame(student_details, student_class)
#另一个dataframe
Student_details<-c("Bracy","Evin")
Student_class<-c("High school","College")
Student_rank<-c("A","A+")
df2<-data.frame(Student_class,Student_details,Student_rank)
df2
我需要重新绑定df1和df2,即使长度是不相等的,并在最后创建第三列,称为&;dataset"指示它来自哪个数据集:
您可以使用data.table
包中的rbindlist()
函数来完成此操作。
两个数据框架中的列名必须相同,因为您希望通过列名进行绑定。
#convert uppercase letters in column names to lower case.
names(df2) <- tolower(names(df2))
接下来,将它们绑定在一起:
library(data.table)
final_df <- rbindlist(list(df1, df2), use.names = T, fill = T, idcol = "dataset")
final_df
输出:
dataset student_details student_class student_rank
1: 1 John High School <NA>
2: 1 Henrick College <NA>
3: 1 Maria Preschool <NA>
4: 1 Lucas High School <NA>
5: 1 Ali college <NA>
6: 2 Bracy High school A
7: 2 Evin College A+
我假设您的列名student_details,student_class在数据帧中是相同的。您可以使用比rbind更灵活的bind_rows。它将创建NA值。
student_details <- c("John", "Henrick", "Maria", "Lucas", "Ali")
student_class <- c("High School", "College", "Preschool", "High School", "college")
df1 <- data.frame(student_details, student_class)
student_details<-c("Bracy","Evin")
student_class<-c("High school","College")
student_rank<-c("A","A+")
df2<-data.frame(student_details,student_class,student_rank)
library(dplyr)
df_full<-bind_rows(df1,df2)
对于特定的df1
和df2
,我们可以从基础R尝试merge
> merge(df1, df2, all = TRUE, sort = FALSE)
student_details student_class student_rank
1 John High School <NA>
2 Henrick College <NA>
3 Maria Preschool <NA>
4 Lucas High School <NA>
5 Ali college <NA>
6 Bracy High school A
7 Evin College A+
但是使用rbindlist
的data.table
选项应该在一般意义上工作(见@Flap的答案)