我有一个数据框架,列出每个参与者每天吃的食物,然后是另一个随机变量";其他";
df <- data.frame(participant = c(1,1,1,2,2,2,2),
food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
other = c("a", "b", "c", "d", "e", "f", "g"))
我想把它转换成一个数据帧,显示每个参与者吃每种食物的次数;其他";可变
df2 <- data.frame(participant = c(1,1,1,2,2,2,2),
pizza = c(1,1,1,3,3,3,3),
turkey=c(2,2,2,1,1,1,1),
other = c("a", "b", "c", "d", "e", "f", "g"))
我尝试过
df3 <- df %>% group_by(participant) %>% add_count(food) %>%
pivot_wider(names_from = food, values_from = n)
但这只是一种工作方式,它产生NA值(考虑到我对pivot_wider工作方式的理解,这是有道理的(。我想知道是否还有其他函数或方法可以操纵pivot_wider来产生我想要的东西。
我确实找到了一本关于的手册
df4 <- df3 %>% group_by(participant) %>% arrange(pizza) %>% mutate(pizza =
pizza[1]) %>% arrange(participant) %>% group_by(participant) %>%
arrange(turkey) %>% mutate(turkey = turkey[1]) %>% arrange(participant) %>% ungroup
但无论出于什么原因,在我的原始数据集中;c(3,3(〃;偶尔地考虑到我得到了这些奇怪的值,我希望有一个不利用这一额外步骤的修复程序,这样我就可以学习如何在这种情况下使用pivot_wider或其他函数。
谢谢!
这是dummy_cols
的另一个选项
library(dplyr)
library(fastDummies)
library(stringr)
df %>%
dummy_cols("food", remove_selected_columns = TRUE) %>%
group_by(participant) %>%
mutate(across(starts_with('food_'),
sum, .names = "{str_remove(.col, '.*_')}")) %>%
ungroup %>%
select(-starts_with('food_'))
-输出
# A tibble: 7 × 4
participant other pizza turkey
<dbl> <chr> <int> <int>
1 1 a 1 2
2 1 b 1 2
3 1 c 1 2
4 2 d 3 1
5 2 e 3 1
6 2 f 3 1
7 2 g 3 1
或者从"df3"使用fill
library(tidyr)
df3 %>%
fill(c(pizza, turkey), .direction = 'downup') %>%
ungroup
-输出
# A tibble: 7 × 4
participant other pizza turkey
<dbl> <chr> <int> <int>
1 1 a 1 2
2 1 b 1 2
3 1 c 1 2
4 2 d 3 1
5 2 e 3 1
6 2 f 3 1
7 2 g 3 1
这里有另一个选项:
library(tidyverse)
df <- tibble(participant = c(1,1,1,2,2,2,2),
food = c("pizza", "turkey", "turkey", "pizza", "pizza", "pizza", "turkey"),
other = c("a", "b", "c", "d", "e", "f", "g"))
df |>
group_by(participant, food)|>
mutate(num = n())|>
pivot_wider(names_from = food, values_from = num) |>
mutate(across(pizza:turkey, (x) max(x, na.rm = TRUE))) |>
ungroup()
#> # A tibble: 7 x 4
#> participant other pizza turkey
#> <dbl> <chr> <int> <int>
#> 1 1 a 1 2
#> 2 1 b 1 2
#> 3 1 c 1 2
#> 4 2 d 3 1
#> 5 2 e 3 1
#> 6 2 f 3 1
#> 7 2 g 3 1
一个简单的解决方法:
df3 <- df3 <- df %>% group_by(participant) %>% add_count(food) %>%
pivot_wider(names_from = food, values_from = n, values_fill = 0) %>%
group_by(participant) %>% mutate(across(unique(df$food), max)) %>% ungroup()