我对R有点陌生,如果我没有使用正确的行话,请原谅。我试图整理一个有一个变量的数据框架,将我想要拆分为不同变量的数据进行聚集。本质上,它看起来是这样的:
ID Scores
01 Math: 5, Physics: 4, English: 3
02 English: 5, Math: 3, Physics: 6.9
03 Math: 3.75, Chemistry: 4, English: 3
04 History: 8, Math: 2, Physics: 3
我希望它看起来像这样:
ID Math Chemistry English History Physics
01 5 NaN 3 NaN 4
02 3 NaN 5 NaN 6.9
03 3.75 4 3 NaN NaN
04 2 NaN NaN 8 3
非常感谢!
我建议使用带有一些tidyr
函数的tidyverse
方法。您可以先在行级别上分离变量Scores
,然后在列级别上分离。最后,您可以重新整形以获得所需的输出。这里的代码:
library(tidyverse)
#Data
df <- structure(list(ID = c(1, 2, 3, 4), Scores = c("Math: 5, Physics: 4, English: 3",
"English: 5, Math: 3, Physics: 6.9", "Math: 3.75, Chemistry: 4, English: 3",
"History: 8, Math: 2, Physics: 3")), class = "data.frame", row.names = c(NA,
-4L))
代码:
#Code
df %>% separate_rows(Scores,sep = ',') %>%
#Format
mutate(Scores=trimws(Scores)) %>%
#Separate again by :
separate(Scores,sep=':',into = c('Subject','Grade')) %>%
#Format
mutate(Subject=trimws(Subject),Grade=as.numeric(trimws(Grade))) %>%
pivot_wider(names_from = Subject,values_from=Grade)
输出:
# A tibble: 4 x 6
ID Math Physics English Chemistry History
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 5 4 3 NA NA
2 2 3 6.9 5 NA NA
3 3 3.75 NA 3 4 NA
4 4 2 3 NA NA 8