r-对group_by之后的pivot_wide进行故障排除



我有一个数据框架,列出每个参与者每天吃的食物,然后是另一个随机变量";其他";

df <- data.frame(participant = c(1,1,1,2,2,2,2),
food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
other = c("a", "b", "c", "d", "e", "f", "g"))

我想把它转换成一个数据帧,显示每个参与者吃每种食物的次数;其他";可变

df2 <- data.frame(participant = c(1,1,1,2,2,2,2),
pizza = c(1,1,1,3,3,3,3),
turkey=c(2,2,2,1,1,1,1),
other = c("a", "b", "c", "d", "e", "f", "g"))

我尝试过

df3 <- df %>% group_by(participant) %>% add_count(food) %>%
pivot_wider(names_from = food, values_from = n)

但这只是一种工作方式,它产生NA值(考虑到我对pivot_wider工作方式的理解,这是有道理的(。我想知道是否还有其他函数或方法可以操纵pivot_wider来产生我想要的东西。

我确实找到了一本关于的手册

df4 <- df3 %>% group_by(participant) %>% arrange(pizza) %>% mutate(pizza = 
pizza[1]) %>% arrange(participant) %>% group_by(participant) %>% 
arrange(turkey) %>% mutate(turkey = turkey[1]) %>% arrange(participant) %>% ungroup

但无论出于什么原因,在我的原始数据集中;c(3,3(〃;偶尔地考虑到我得到了这些奇怪的值,我希望有一个不利用这一额外步骤的修复程序,这样我就可以学习如何在这种情况下使用pivot_wider或其他函数。

谢谢!

这是dummy_cols的另一个选项

library(dplyr)
library(fastDummies)
library(stringr)
df %>% 
dummy_cols("food", remove_selected_columns = TRUE) %>% 
group_by(participant) %>% 
mutate(across(starts_with('food_'),
sum, .names = "{str_remove(.col, '.*_')}")) %>% 
ungroup %>% 
select(-starts_with('food_'))

-输出

# A tibble: 7 × 4
participant other pizza turkey
<dbl> <chr> <int>  <int>
1           1 a         1      2
2           1 b         1      2
3           1 c         1      2
4           2 d         3      1
5           2 e         3      1
6           2 f         3      1
7           2 g         3      1

或者从"df3"使用fill

library(tidyr)
df3 %>% 
fill(c(pizza, turkey), .direction = 'downup') %>%
ungroup

-输出

# A tibble: 7 × 4
participant other pizza turkey
<dbl> <chr> <int>  <int>
1           1 a         1      2
2           1 b         1      2
3           1 c         1      2
4           2 d         3      1
5           2 e         3      1
6           2 f         3      1
7           2 g         3      1

这里有另一个选项:

library(tidyverse)
df <- tibble(participant = c(1,1,1,2,2,2,2),
food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
other = c("a", "b", "c", "d", "e", "f", "g"))

df |>
group_by(participant, food)|>
mutate(num = n())|>
pivot_wider(names_from = food, values_from = num) |>
mutate(across(pizza:turkey, (x) max(x, na.rm = TRUE))) |>
ungroup()
#> # A tibble: 7 x 4
#>   participant other pizza turkey
#>         <dbl> <chr> <int>  <int>
#> 1           1 a         1      2
#> 2           1 b         1      2
#> 3           1 c         1      2
#> 4           2 d         3      1
#> 5           2 e         3      1
#> 6           2 f         3      1
#> 7           2 g         3      1

一个简单的解决方法:

df3 <- df3 <- df %>% group_by(participant) %>% add_count(food) %>%
pivot_wider(names_from = food, values_from = n, values_fill = 0) %>% 
group_by(participant) %>% mutate(across(unique(df$food), max)) %>% ungroup()

最新更新