r语言 - 人口普查,由在同一地点出生的夫妇(有或没有孩子)组成的家庭(dplyr方式)



我正在研究20世纪初的法国人口普查。我在家庭工作。每个房子都有一个household_chief(总是在位置1(。当一个家庭老人以一对夫妇为基础时,妻子总是排在第二位。

id_houseold<- c(1, 1, 1, 1, 2, 2, 3, 4,4,4, 5, 5)
 members <- c("household_chief", "wife", "child", "child","household_chief", "wife", "household_chief", "household_chief", "wife", "child", "household_chief","child")
 birthplace<- c("Paris", "Paris", "Paris", "Paris", "Paris", "Bordeaux",   "Nantes", "Paris", "Paris", "Nantes", "Nantes,", "Nantes")
data <- data.frame(id_houseold, members, birthplace)

我列出了每个家庭成员的一系列立场:

library(dplyr)
data <- data %>%
group_by(id_houseold) %>% 
mutate(position_in_menage = 1:n())
data 

这是我的结果:

id_houseold members         birthplace position_in_menage
     <dbl> <fct>           <fct>                   <int>
1           1 household_chief Paris                       1
2           1 wife            Paris                       2
3           1 child           Paris                       3
4           1 child           Paris                       4
5           2 household_chief Paris                       1
6           2 wife            Bordeaux                    2
7           3 household_chief Nantes                      1
8           4 household_chief Paris                       1
9           4 wife            Paris                       2
10          4 child           Nantes                      3
11          5 household_chief Nantes,                     1
12          5 child           Nantes                      2

我想使用 dplyr 包 kwow 什么:

哪些家庭由在同一地点出生的夫妇(有或没有孩子(组成?

这是

使用filter的另一种方法,

library(tidyverse)
data %>% 
 filter(members %in% c("household_chief", "wife")) %>% 
 group_by(id_houseold) %>% 
 filter(n_distinct(birthplace) == 1 & n() > 1)

这给了,

# A tibble: 4 x 3
# Groups:   id_houseold [2]
  id_houseold members         birthplace
        <dbl> <fct>           <fct>     
1           1 household_chief Paris     
2           1 wife            Paris     
3           4 household_chief Paris     
4           4 wife            Paris
您可以使用

n_distinct检查每个id_houseold是否始终存在"household_chief""wife"以及它们是否共享相同的唯一birthplace。如果它们共享相同的birthplacen_distinct值将为 1。

library(dplyr)
data %>%
  group_by(id_houseold) %>%
  summarise(is_couple = all(c("household_chief", "wife") %in% members) &
            n_distinct(birthplace[members %in% c("household_chief", "wife")]) == 1))
#  id_houseold is_couple
#        <dbl> <lgl>    
#1           1 TRUE     
#2           2 FALSE     
#3           3 FALSE    
#4           4 TRUE     
#5           5 FALSE  

最新更新