r-错误:输出的每一行都必须由一个唯一的键组合来标识.共享两行的键:当使用Unnest_oken和spread时



这是我正在使用的数据帧

# A tibble: 268 x 5
Horodateur          Gender Age     Time    Social                                
<dttm>              <chr>  <chr>   <chr>   <chr>                                 
1 2021-04-23 09:59:16 Male   [18,24[ 1-5 ho~ Facebook, Instagram, Twitter, Snapcha~
2 2021-04-23 10:11:35 Female [10,18[ 1-5 ho~ Reddit                                
3 2021-04-23 10:18:24 Male   [18,24[ >10 ho~ Facebook, Instagram, Twitter, Linkedi~
4 2021-04-23 10:42:28 Female [18,24[ 5-10 h~ Facebook, Instagram, Twitter, Snapchat
5 2021-04-23 10:42:37 Female [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Snapchat 
6 2021-04-23 10:45:35 Female [24,34[ 1-5 ho~ Facebook, Instagram, Twitter, Linkedi~
7 2021-04-23 10:48:09 Male   [18,24[ 5-10 h~ Facebook, Instagram, TikTok, Linkedin~
8 2021-04-23 10:49:56 Male   [18,24[ 5-10 h~ Facebook, Instagram, Snapchat         
9 2021-04-23 10:50:39 Male   [24,34[ 0 hours Linkedin, Reddit                      
10 2021-04-23 10:51:36 Male   [18,24[ 5-10 h~ Facebook, Instagram, Twitter, TikTok  
# ... with 258 more rows
> str(Survey[1:5])
tibble [268 x 5] (S3: tbl_df/tbl/data.frame)
$ Horodateur: POSIXct[1:268], format: "2021-04-23 09:59:16" "2021-04-23 10:11:35" ...
$ Gender    : chr [1:268] "Male" "Female" "Male" "Female" ...
$ Age       : chr [1:268] "[18,24[" "[10,18[" "[18,24[" "[18,24[" ...
$ Time      : chr [1:268] "1-5 hours" "1-5 hours" ">10 hours" "5-10 hours" ...
$ Social    : chr [1:268] "Facebook, Instagram, Twitter, Snapchat, Reddit, Signal" "Reddit" "Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora" "Facebook, Instagram, Twitter, Snapchat" ...

我正试图拆分社交专栏以获得类似的内容

Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
<int> <chr>    <chr>     <chr>  <chr>  <chr>    <chr>  <chr>  
1     1 No       No        No     No     No       No     Yes    
2     2 Yes      Yes       No     No     Yes      No     Yes    
3     3 No       Yes       No     Yes    No       Yes    No     
4     4 No       Yes       No     No     Yes      No     No     
5     5 No       Yes       No     Yes    Yes      Yes    Yes    
6     6 No       Yes       No     No     No       No     No     
7     7 No       No        Yes    Yes    No       Yes    Yes    
8     8 No       No        Yes    No     No       No     Yes    
9     9 No       No        Yes    No     Yes      Yes    No     
10    10 No       Yes       Yes    Yes    Yes      No     Yes

所以写了这个代码

Survey %>%
mutate(Id = row_number(), HasAccount = "Yes") %>%
unnest_tokens(Survey, Social, to_lower = F) %>%
spread(Survey, HasAccount, fill = "No")

但我得到了标题中提到的错误

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 2 rows:
* 440, 441

我以为添加id=row_number((可以修复这个错误,但没有(当我删除它时,同样的错误仍然存在(。有人知道如何解决这个问题吗?

原因是存在重复的行。因此,我们可以通过row_number创建一个组

library(dplyr)
library(tidyr)
library(tidytext)
Survey %>%
mutate(HasAccount = "Yes") %>%
unnest_tokens(Survey, Social, to_lower = FALSE) %>%
group_by(Survey) %>%
mutate(Id= row_number()) %>%
ungroup %>%
spread(Survey, HasAccount, fill = "No")

使用可复制的示例

library(janeaustenr)
d <- tibble(txt = prideprejudice)
d %>% 
mutate(HasAccount = "Yes") %>%
unnest_tokens(word, txt) %>% 
slice(1:50) %>% 
group_by(word) %>% 
mutate(Id = row_number()) %>%
ungroup %>%
spread(word, HasAccount, fill = "No")
# A tibble: 6 x 41
Id `1`   a     acknowledged and   austen be    by    chapter entering feelings first fortune good  his   however `in`  is    it   
<int> <chr> <chr> <chr>        <chr> <chr>  <chr> <chr> <chr>   <chr>    <chr>    <chr> <chr>   <chr> <chr> <chr>   <chr> <chr> <chr>
1     1 Yes   Yes   Yes          Yes   Yes    Yes   Yes   Yes     Yes      Yes      Yes   Yes     Yes   Yes   Yes     Yes   Yes   Yes  
2     2 No    Yes   No           No    No     Yes   No    No      No       No       No    No      No    No    No      Yes   No    No   
3     3 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
4     4 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
5     5 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
6     6 No    Yes   No           No    No     No    No    No      No       No       No    No      No    No    No      No    No    No   
# … with 22 more variables: jane <chr>, known <chr>, little <chr>, man <chr>, may <chr>, must <chr>, neighbourhood <chr>, of <chr>,
#   on <chr>, or <chr>, possession <chr>, prejudice <chr>, pride <chr>, single <chr>, such <chr>, that <chr>, the <chr>, truth <chr>,
#   universally <chr>, views <chr>, want <chr>, wife <chr>

最新更新