1.0 | 2.0 | 3.0 |
---|---|---|
loud | 投诉 | 问题|
疼痛 | 压力 | 困惑//tr>|
迟钝 | 疼痛 | 压力 |
我们可以在base R
中使用match
-从unlist
ed数据中获得unique
元素作为vector
,在列上循环,获得匹配元素的索引,replace
是具有匹配元素的指数,并在考虑长度后转换为data.frame
v1 <- unique(unlist(df1))
lst1 <- lapply(df1, (x)
{i1 <- match(x, v1)
replace(rep(NA, max(i1)), i1, v1[i1])
})
list2DF(lapply(lst1, `length<-`, max(lengths(lst1))))
1.0 2.0 3.0
1 loud <NA> <NA>
2 pain pain <NA>
3 dull <NA> <NA>
4 <NA> complaint <NA>
5 <NA> stress stress
6 <NA> <NA> problems
7 <NA> <NA> confused
数据
df1 <- structure(list(`1.0` = c("loud", "pain", "dull"), `2.0` = c("complaint",
"stress", "pain"), `3.0` = c("problems", "confused", "stress"
)), class = "data.frame", row.names = c(NA, -3L))
这是一个tidyverse
版本。
suppressMessages(library(tidyverse))
x = tibble(`1.0` = c("loud", "pain", "dull"),
`2.0` = c("complaint", "stress", "pain"),
`3.0` = c("problems", "confused", "stress"))
x %>%
gather("version", "value") %>%
mutate(id = value) %>%
spread(version, value) %>%
select(-id)
#> # A tibble: 7 x 3
#> `1.0` `2.0` `3.0`
#> <chr> <chr> <chr>
#> 1 <NA> complaint <NA>
#> 2 <NA> <NA> confused
#> 3 dull <NA> <NA>
#> 4 loud <NA> <NA>
#> 5 pain pain <NA>
#> 6 <NA> <NA> problems
#> 7 <NA> stress stress
创建于2023-04-11由reprex包(v2.0.0)
如果您需要按出现的顺序排列行,您可以将第二条语句更改为
mutate(id = fct_inorder(value)) %>%
注:函数gather
和spread
由pivot_longer
和pivot_wider
函数取代。在我看来,旧的更容易使用,对于这种情况来说已经足够好了。新功能要强大得多。
快速高效的数据传输解决方案:
x <- data.table(`1.0` = c("loud", "pain", "dull"),
`2.0` = c("complaint", "stress", "pain"),
`3.0` = c("problems", "confused", "stress"))
dcast(unique(melt(x, measure.vars = names(x))), value ~ variable)
value 1.0 2.0 3.0
1: complaint <NA> complaint <NA>
2: confused <NA> <NA> confused
3: dull dull <NA> <NA>
4: loud loud <NA> <NA>
5: pain pain pain <NA>
6: problems <NA> <NA> problems
7: stress <NA> stress stress
以下是使用stack
+reshape
的基本R选项
reshape(
transform(stack(df), v = values),
direction = "wide",
idvar = "values",
timevar = "ind"
)[-1]
它给出
v.x1 v.x2 v.x3
1 loud <NA> <NA>
2 pain pain <NA>
3 dull <NA> <NA>
4 <NA> complaint <NA>
5 <NA> stress stress
7 <NA> <NA> problems
8 <NA> <NA> confused
数据
> dput(df)
structure(list(x1 = c("loud", "pain", "dull"), x2 = c("complaint",
"stress", "pain"), x3 = c("problems", "confused", "stress")), class = "data.frame", row.names = c(NA,
-3L))