我有一个大数据集,格式如下:
ID | 得分 |
---|---|
1 | 英语3,法语7,地理8 |
2 | 西班牙语7,经典4 |
3 | 物理5、英语5、体育7、艺术4 |
您可以首先将strsplit
列Scores
列按主题分数对(在列表中(进行拆分,然后将列表列unnest
划分为行。然后separate
将受试者得分配对为Subject
和Score
列。最后将数据从";"长";格式为";宽";总体安排
谢谢@G。Grothendieck改进我的代码:(
library(tidyverse)
df %>%
separate_rows(Scores, sep = ", ") %>%
separate(Scores, sep = " ", into = c("Subject", "Score")) %>%
pivot_wider(names_from = "Subject", values_from = "Score")
# A tibble: 3 × 9
ID English French Geography Spanish Classics Physics PE Art
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 3 7 8 NA NA NA NA NA
2 2 NA NA NA 7 4 NA NA NA
3 3 5 NA NA NA NA 5 7 4
使用数据表
library(data.table)
setDT(dt)
dt <- dt[, .(class_grade = unlist(str_split(Scores, ", "))), by = ID]
dt[, c("class", "grade") := tstrsplit(class_grade, " ")]
dcast(dt, ID ~ class, value.var = c("grade"), sep = "")
结果
# ID Art Classics English French Geography PE Physics Spanish
# 1: 1 <NA> <NA> 3 7 8 <NA> <NA> <NA>
# 2: 2 <NA> 4 <NA> <NA> <NA> <NA> <NA> 7
# 3: 3 4 <NA> 5 <NA> <NA> 7 5 <NA>
数据
dt <- structure(list(ID = 1:3, Scores = c("English 3, French 7, Geography 8",
"Spanish 7, Classics 4", "Physics 5, English 5, PE 7, Art 4")), row.names = c(NA,
-3L), class = c("data.frame"))