r-我想在列之间匹配相似的单词

  • 本文关键字:相似 单词 之间 r matching
  • 更新时间 :
  • 英文 :

问题困惑//tr>
1.0 2.0 3.0
loud 投诉
疼痛 压力
迟钝 疼痛 压力

我们可以在base R中使用match-从unlisted数据中获得unique元素作为vector,在列上循环,获得匹配元素的索引,replace是具有匹配元素的指数,并在考虑长度后转换为data.frame

v1 <- unique(unlist(df1))
lst1 <- lapply(df1, (x) 
{i1 <- match(x, v1)
replace(rep(NA, max(i1)), i1, v1[i1])
})
list2DF(lapply(lst1, `length<-`, max(lengths(lst1))))
1.0       2.0      3.0
1 loud      <NA>     <NA>
2 pain      pain     <NA>
3 dull      <NA>     <NA>
4 <NA> complaint     <NA>
5 <NA>    stress   stress
6 <NA>      <NA> problems
7 <NA>      <NA> confused

数据

df1 <- structure(list(`1.0` = c("loud", "pain", "dull"), `2.0` = c("complaint", 
"stress", "pain"), `3.0` = c("problems", "confused", "stress"
)), class = "data.frame", row.names = c(NA, -3L))

这是一个tidyverse版本。

suppressMessages(library(tidyverse))
x = tibble(`1.0` = c("loud", "pain", "dull"),
`2.0` = c("complaint", "stress", "pain"),
`3.0` = c("problems", "confused", "stress"))
x %>% 
gather("version", "value") %>% 
mutate(id = value) %>% 
spread(version, value) %>% 
select(-id)
#> # A tibble: 7 x 3
#>   `1.0` `2.0`     `3.0`   
#>   <chr> <chr>     <chr>   
#> 1 <NA>  complaint <NA>    
#> 2 <NA>  <NA>      confused
#> 3 dull  <NA>      <NA>    
#> 4 loud  <NA>      <NA>    
#> 5 pain  pain      <NA>    
#> 6 <NA>  <NA>      problems
#> 7 <NA>  stress    stress

创建于2023-04-11由reprex包(v2.0.0)

如果您需要按出现的顺序排列行,您可以将第二条语句更改为

mutate(id = fct_inorder(value)) %>% 

注:函数gatherspreadpivot_longerpivot_wider函数取代。在我看来,旧的更容易使用,对于这种情况来说已经足够好了。新功能要强大得多。

快速高效的数据传输解决方案:

x <- data.table(`1.0` = c("loud", "pain", "dull"),
`2.0` = c("complaint", "stress", "pain"),
`3.0` = c("problems", "confused", "stress"))
dcast(unique(melt(x, measure.vars = names(x))), value ~ variable)
value  1.0       2.0      3.0
1: complaint <NA> complaint     <NA>
2:  confused <NA>      <NA> confused
3:      dull dull      <NA>     <NA>
4:      loud loud      <NA>     <NA>
5:      pain pain      pain     <NA>
6:  problems <NA>      <NA> problems
7:    stress <NA>    stress   stress

以下是使用stack+reshape的基本R选项

reshape(
transform(stack(df), v = values),
direction = "wide",
idvar = "values",
timevar = "ind"
)[-1]

它给出

v.x1      v.x2     v.x3
1 loud      <NA>     <NA>
2 pain      pain     <NA>
3 dull      <NA>     <NA>
4 <NA> complaint     <NA>
5 <NA>    stress   stress
7 <NA>      <NA> problems
8 <NA>      <NA> confused

数据

> dput(df)
structure(list(x1 = c("loud", "pain", "dull"), x2 = c("complaint", 
"stress", "pain"), x3 = c("problems", "confused", "stress")), class = "data.frame", row.names = c(NA,
-3L))

相关内容

  • 没有找到相关文章

最新更新