这是我正在使用的数据帧
# A tibble: 268 x 5
Horodateur Gender Age Time Social
<dttm> <chr> <chr> <chr> <chr>
1 2021-04-23 09:59:16 Male [18,24[ 1-5 ho~ Facebook, Instagram, Twitter, Snapcha~
2 2021-04-23 10:11:35 Female [10,18[ 1-5 ho~ Reddit
3 2021-04-23 10:18:24 Male [18,24[ >10 ho~ Facebook, Instagram, Twitter, Linkedi~
4 2021-04-23 10:42:28 Female [18,24[ 5-10 h~ Facebook, Instagram, Twitter, Snapchat
5 2021-04-23 10:42:37 Female [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Snapchat
6 2021-04-23 10:45:35 Female [24,34[ 1-5 ho~ Facebook, Instagram, Twitter, Linkedi~
7 2021-04-23 10:48:09 Male [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Linkedin~
8 2021-04-23 10:49:56 Male [18,24[ 5-10 h~ Facebook, Instagram, Snapchat
9 2021-04-23 10:50:39 Male [24,34[ 0 hours Linkedin, Reddit
10 2021-04-23 10:51:36 Male [18,24[ 5-10 h~ Facebook, Instagram, Twitter, TikTok
# ... with 258 more rows
> str(Survey[1:5])
tibble [268 x 5] (S3: tbl_df/tbl/data.frame)
$ Horodateur: POSIXct[1:268], format: "2021-04-23 09:59:16" "2021-04-23 10:11:35" ...
$ Gender : chr [1:268] "Male" "Female" "Male" "Female" ...
$ Age : chr [1:268] "[18,24[" "[10,18[" "[18,24[" "[18,24[" ...
$ Time : chr [1:268] "1-5 hours" "1-5 hours" ">10 hours" "5-10 hours" ...
$ Social : chr [1:268] "Facebook, Instagram, Twitter, Snapchat, Reddit, Signal" "Reddit" "Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora" "Facebook, Instagram, Twitter, Snapchat" ...
我正试图拆分社交专栏以获得类似的内容
Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 No No No No No No Yes
2 2 Yes Yes No No Yes No Yes
3 3 No Yes No Yes No Yes No
4 4 No Yes No No Yes No No
5 5 No Yes No Yes Yes Yes Yes
6 6 No Yes No No No No No
7 7 No No Yes Yes No Yes Yes
8 8 No No Yes No No No Yes
9 9 No No Yes No Yes Yes No
10 10 No Yes Yes Yes Yes No Yes
所以写了这个代码
Survey %>%
mutate(Id = row_number(), HasAccount = "Yes") %>%
unnest_tokens(Survey, Social, to_lower = F) %>%
spread(Survey, HasAccount, fill = "No")
但我得到了标题中提到的错误
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 2 rows:
* 440, 441
我以为添加id=row_number((可以修复这个错误,但没有(当我删除它时,同样的错误仍然存在(。有人知道如何解决这个问题吗?
原因是存在重复的行。因此,我们可以通过row_number
创建一个组
library(dplyr)
library(tidyr)
library(tidytext)
Survey %>%
mutate(HasAccount = "Yes") %>%
unnest_tokens(Survey, Social, to_lower = FALSE) %>%
group_by(Survey) %>%
mutate(Id= row_number()) %>%
ungroup %>%
spread(Survey, HasAccount, fill = "No")
使用可复制的示例
library(janeaustenr)
d <- tibble(txt = prideprejudice)
d %>%
mutate(HasAccount = "Yes") %>%
unnest_tokens(word, txt) %>%
slice(1:50) %>%
group_by(word) %>%
mutate(Id = row_number()) %>%
ungroup %>%
spread(word, HasAccount, fill = "No")
# A tibble: 6 x 41
Id `1` a acknowledged and austen be by chapter entering feelings first fortune good his however `in` is it
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
2 2 No Yes No No No Yes No No No No No No No No No Yes No No
3 3 No Yes No No No No No No No No No No No No No No No No
4 4 No Yes No No No No No No No No No No No No No No No No
5 5 No Yes No No No No No No No No No No No No No No No No
6 6 No Yes No No No No No No No No No No No No No No No No
# … with 22 more variables: jane <chr>, known <chr>, little <chr>, man <chr>, may <chr>, must <chr>, neighbourhood <chr>, of <chr>,
# on <chr>, or <chr>, possession <chr>, prejudice <chr>, pride <chr>, single <chr>, such <chr>, that <chr>, the <chr>, truth <chr>,
# universally <chr>, views <chr>, want <chr>, wife <chr>