统计R数据帧中特定字符串的列表

我有一个5列的数据框架，但我对其中一列" condition "感兴趣。条件在列，我需要找到一种方法来计算单元格中特定条目的数量。每个列单元格可以有一个或多个条目，以(，)分隔．所以我的数据帧看起来像

S.NO                   Conditions
11            Eye Color 
12            Sound of your voice
13            Certain disease,Size of a palm,Eye Color
16            Eye Color,Hair color
17            Hair color,Height
18            Sound of your voice,Height

我想count所有不同的条目/string在一次。总共我有35个不同的字符串列表在条件列，我希望我的输出是这样的

Eye color   Sound of your voice   Certain disease    Size of a palm    Hair color   Height
3           2                      1                   1              2          2

由于我不知道数据的确切结构，所以我假设数据如下

数据

data <- tribble(
~Conditions, ~value,
'Eye color', '3',
'Sound of your voice', '2',
'Certain disease, Size of a palm, Eye color', '1,1,2',
'Eye color, Hair color', '2,2',
'Hair color, Height', '3,1',
'Sound of your voice, Height', '1,4'
)

对于上述数据，我们可以编写以下代码来获得预期的结果

library(tidyverse)
Conditions <- unlist(strsplit(data$Conditions,','))
value <- unlist(strsplit(data$value,','))

df <- bind_cols(Conditions,value) %>% setNames(c('Conditions', 'value')) %>% 
mutate(across(everything(), ~trimws(.x)), value=as.numeric(value)) %>% 
arrange(Conditions) %>% group_by(Conditions) %>% slice_head(n=1) %>% 
mutate(row=row_number()) %>% 
pivot_wider(names_from = Conditions, values_from =value)

输出

# A tibble: 1 × 7
row `Certain disease` `Eye color` `Hair color` Height `Size of a palm` `Sound of your voice`
<int>             <dbl>       <dbl>        <dbl>  <dbl>            <dbl>                 <dbl>
1     1                 1           3            2      1                1                     2

相关内容

最新更新

热门标签：