r-如何根据具有字符值的条件合并行?(家庭数据)



我有一个数据框架,其中第一列表示工作(经理、员工或工人(,第二列表示此人是否在晚上工作,最后一列是家庭代码(如果两个人共享相同的代码,则意味着他们共享同一栋房子(。

#Here is the reproductible data : 
PCS <- c("worker", "manager","employee","employee","worker","worker","manager","employee","manager","employee")
work_night <- c("Yes","Yes","No", "No","No","Yes","No","Yes","No","Yes")
HHnum <- c(1,1,2,2,3,3,4,4,5,5)
df <- data.frame(PCS,work_night,HHnum)

我的问题是,我希望有一个新的家庭数据框架,而不是个人。我想根据HHnum对个人进行分组,然后合并他们的答案。

  • 对于变量";PCS";我有了基于答案组合的新类别:经理+工作=";我";经理+员工=";II";,雇员+雇员=VI,工人+工人=III等

  • 对于变量"0";"work_night";,我想申请一个分数(如果两个都回答是,那么分数=2,如果一个回答是,则分数=1,如果两个回答否,那么分数=0(。

为了清楚起见,我希望我的数据帧看起来像这样:

HHnum      PCS      work_night
1          "I"           2
2          "VI"          0
3          "III"         1
4          "II"          1
5          "II"          1

如何使用dplyr在R上执行此操作?我知道我需要group_by((,但我不知道该用什么。

最佳,Victor

这里有一种方法(尽管我承认这很冗长(。我创建了一个参考数据帧(即combos(,以防类别超过3,然后将其与主数据帧(如df_new(连接,以引入PCS罗马数字。

library(dplyr)
library(tidyr)
# Create a dataframe with all of the combinations of PCS.
combos <- expand.grid(unique(df$PCS), unique(df$PCS))
combos <- unique(t(apply(combos, 1, sort))) %>% 
as.data.frame() %>% 
dplyr::mutate(PCS = as.roman(row_number()))
# Create another dataframe with the columns reversed (will make it easier to join to the main dataframe).
combos2 <- data.frame(V1 = c(combos$V2), V2 = c(combos$V1), PCS = c(combos$PCS)) %>% 
dplyr::mutate(PCS = as.roman(PCS))
combos <- rbind(combos, combos2)
# Get the count of "Yes" for each HHnum group. 
# Then, put the PCS into 2 columns to join together with "combos" df.
df_new <- df %>% 
dplyr::group_by(HHnum) %>% 
dplyr::mutate(work_night = sum(work_night == "Yes")) %>%
dplyr::group_by(grp = rep(1:2, length.out = n())) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = grp, values_from = PCS) %>%
dplyr::rename("V1" = 3, "V2" = 4) %>% 
dplyr::left_join(combos, by = c("V1", "V2")) %>% 
unique() %>% 
dplyr::select(HHnum, PCS, work_night)

最新更新