我正在研究20世纪初的法国人口普查。我在家庭工作。每个房子都有一个household_chief(总是在位置1(。当一个家庭老人以一对夫妇为基础时,妻子总是排在第二位。
id_houseold<- c(1, 1, 1, 1, 2, 2, 3, 4,4,4, 5, 5)
members <- c("household_chief", "wife", "child", "child","household_chief", "wife", "household_chief", "household_chief", "wife", "child", "household_chief","child")
birthplace<- c("Paris", "Paris", "Paris", "Paris", "Paris", "Bordeaux", "Nantes", "Paris", "Paris", "Nantes", "Nantes,", "Nantes")
data <- data.frame(id_houseold, members, birthplace)
我列出了每个家庭成员的一系列立场:
library(dplyr)
data <- data %>%
group_by(id_houseold) %>%
mutate(position_in_menage = 1:n())
data
这是我的结果:
id_houseold members birthplace position_in_menage
<dbl> <fct> <fct> <int>
1 1 household_chief Paris 1
2 1 wife Paris 2
3 1 child Paris 3
4 1 child Paris 4
5 2 household_chief Paris 1
6 2 wife Bordeaux 2
7 3 household_chief Nantes 1
8 4 household_chief Paris 1
9 4 wife Paris 2
10 4 child Nantes 3
11 5 household_chief Nantes, 1
12 5 child Nantes 2
我想使用 dplyr 包 kwow 什么:
哪些家庭由在同一地点出生的夫妇(有或没有孩子(组成?
这是
使用filter
的另一种方法,
library(tidyverse)
data %>%
filter(members %in% c("household_chief", "wife")) %>%
group_by(id_houseold) %>%
filter(n_distinct(birthplace) == 1 & n() > 1)
这给了,
# A tibble: 4 x 3 # Groups: id_houseold [2] id_houseold members birthplace <dbl> <fct> <fct> 1 1 household_chief Paris 2 1 wife Paris 3 4 household_chief Paris 4 4 wife Paris
您可以使用
n_distinct
检查每个id_houseold
是否始终存在"household_chief"
和"wife"
以及它们是否共享相同的唯一birthplace
。如果它们共享相同的birthplace
则n_distinct
值将为 1。
library(dplyr)
data %>%
group_by(id_houseold) %>%
summarise(is_couple = all(c("household_chief", "wife") %in% members) &
n_distinct(birthplace[members %in% c("household_chief", "wife")]) == 1))
# id_houseold is_couple
# <dbl> <lgl>
#1 1 TRUE
#2 2 FALSE
#3 3 FALSE
#4 4 TRUE
#5 5 FALSE