为每个唯一项目(单词)创建新列,并显示频率计数



我对R和编程很陌生,一直在努力解决以下问题。

我有一个如下所示的数据帧:

id     animals
1     cat dog
2     cat pig dog fish fish
3     horse horse

我想为每只动物创建一个新列,其中包含每个 id 的频率计数:

id    cat  dog  fish  horse  pig
1     1    1     0     0     0
2     1    1     2     0     1
3     0    0     0     2     0

我如何实现这一点?

示例 DPUT:

structure(list(id = 1:3, animals = structure(1:3, .Label = c("cat dog", 
"cat pig dog fish fish", "horse horse"), class = "factor")), .Names = c("id", 
"animals"), class = "data.frame", row.names = c(NA, -3L))

我们可以执行以下操作:

df %>%
separate_rows(animals) %>%
count(id, animals) %>%
spread(animals, n, fill = 0)
## A tibble: 3 x 6
#     id   cat   dog  fish horse   pig
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1    1.    1.    1.    0.    0.    0.
#2    2.    1.    1.    2.    0.    1.
#3    3.    0.    0.    0.    2.    0.

示例数据

df <- read.table(text =
"id     animals
1     'cat dog'
2     'cat pig dog fish fish'
3     'horse horse'", header = T)

带有data.table的单行可能是:

library(data.table)
dcast(setDT(df)[, unlist(strsplit(as.character(animals), " ")), by = id], id ~  V1)
#  id cat dog fish horse pig
#1  1   1   1    0     0   0
#2  2   1   1    2     0   1
#3  3   0   0    0     2   0

或者作为另一种选择,您可以在reshape2中使用dcast

library(reshape2)
spl <- strsplit(as.character(df$animals), " ")
df_m <- data.frame(id = rep(df$id, times = lengths(spl)), animals = unlist(spl))
dcast(df_m, id ~ animals)

您可以从tidytext中选择unnest_tokens

library(tidyverse)
library(tidytext)
x %>%  unnest_tokens(word,animals) %>%  table()

数据:

x <- structure(list(id = 1:3, animals = c("cat dog", "cat pig dog fish fish", 
"horse horse")), .Names = c("id", "animals"), row.names = c(NA, 
-3L), class = "data.frame")

你把

word
id  cat dog fish horse pig
1   1   1    0     0   0
2   1   1    2     0   1
3   0   0    0     2   0

附带说明:我喜欢这本书,如果您对整洁的文本分析感兴趣,这是必读的:https://www.tidytextmining.com/tidytext.html

最新更新