看一下下面的数据帧(df)
<表类>
日期
模块
26-11-2021
NA, Advanced chemistry, Biochemistry
25-11-2021
食品物理、食品化学
表类>
我们可以用tidyr
中的separate_rows
library(dplyr)
library(tidyr)
df1 %>%
separate_rows(Modules, sep = ",\s*") %>%
arrange(Date, Modules) %>%
group_by(Date) %>%
summarise(Modules = toString(Modules))
# A tibble: 2 × 2
Date Modules
<chr> <chr>
1 25-11-2021 Food chemistry, Food physics
2 26-11-2021 Advanced chemistry, Biochemistry, NA
数据df1 <- structure(list(Date = c("26-11-2021", "25-11-2021"),
Modules = c("NA, Advanced chemistry, Biochemistry",
"Food physics, Food chemistry")), row.names = c(NA, -2L), class = "data.frame")
使用toString
将排序后的字符串折叠成一个字符串。
df$Modules <- sapply(strsplit(as.character(df$Modules), ',\s*'),
function(x) toString(sort(x)))
df
# Date Modules
#1 26-11-2021 Advanced chemistry, Biochemistry, NA
#2 25-11-2021 Food chemistry, Food physics
df <- structure(list(Date = c("26-11-2021", "25-11-2021"),
Modules = c("NA, Advanced chemistry, Biochemistry",
"Food physics, Food chemistry")),
row.names = c(NA, -2L), class = "data.frame")
我猜你可以使用顺序?
v<-c("Z","X","Y","A","B","C")
df<-data.frame(1:6,v)
> df[order(df$v),]
X1.6 v
4 4 A
5 5 B
6 6 C
2 2 X
3 3 Y
1 1 Z