我有一个包含每位女性生育史的df。每个女性(即id_mdob)有18行,因为我从过去18年开始重建她的出生情况。
我的问题是,在一些情况下,同一个女人出现了不止"一次"。(即超过18次)。例如,id_mdob 321988出现了36次。321991也是一样。这是因为数据来自2个不同的波,或年份,但我只需要1年的数据。我根据以下标准确定要保留的年份(基本上,我保留女性有孩子的年份,如果没有,则保留最近的年份):
- 如果id_mdob_sy的前6位相同,则保持:
)。参数中至少有一个非缺失值的id_mdob_synchild1-nchild10列。
- 如果列nchild1-nchild10都是NA(即妇女没有孩子),然后保持:
b.)编号较高的id_mdob_sy。
例如:
- 3219882014和3219882015—>Keep 3219882015 BC这两个都不是在nchild1-nchild10列中至少有1个非缺失值2015年>2014年
- 3219912016和3219912017—>保持3219912017 bc在nchild1-nchild10列中至少有一个非缺失值(nchild1=2015)。
到目前为止,我已经提出了以下代码,但不知道如何指定最后一部分(以保持id_mdob_sy具有更高的数字/最近的年份):
test <- df %>%
group_by(id_mdob_sy) %>%
filter(as.numeric(substr(id_mdob_sy, 6)) & ifelse(!is.na %in% c(nchild1, nchild2, nchild3, nchild4, nchild5, nchild6, nchild7, nchild8, nchild9, nchild10)))
test2 <- test
group_by(id_mdob_sy) %>%
filter(as.numeric(substr(id_mdob_sy, 6)) & ifelse(is.na %in% c(nchild1, nchild2, nchild3, nchild4, nchild5, nchild6, nchild7, nchild8, nchild9, nchild10)))
下面是df:
的前115行structure(list(id_mdob_sy = c(1119902018, 1119902018, 1119902018,
1119902018, 1119902018, 1119902018, 1119902018, 1119902018, 1119902018,
1119902018, 1119902018, 1119902018, 1119902018, 1119902018, 1119902018,
1119902018, 1119902018, 1119902018, 2219952018, 2219952018, 2219952018,
2219952018, 2219952018, 2219952018, 2219952018, 2219952018, 2219952018,
2219952018, 2219952018, 2219952018, 2219952018, 2219952018, 2219952018,
2219952018, 2219952018, 2219952018, 3119802018, 3119802018, 3119802018,
3119802018, 3119802018, 3119802018, 3119802018, 3119802018, 3119802018,
3119802018, 3119802018, 3119802018, 3119802018, 3119802018, 3119802018,
3119802018, 3119802018, 3119802018, 3219882014, 3219882014, 3219882014,
3219882014, 3219882014, 3219882014, 3219882014, 3219882014, 3219882014,
3219882014, 3219882014, 3219882014, 3219882014, 3219882014, 3219882014,
3219882014, 3219882014, 3219882014, 3219882015, 3219882015, 3219882015,
3219882015, 3219882015, 3219882015, 3219882015, 3219882015, 3219882015,
3219882015, 3219882015, 3219882015, 3219882015, 3219882015, 3219882015,
3219882015, 3219882015, 3219882015, 3219912016, 3219912016, 3219912016,
3219912016, 3219912016, 3219912016, 3219912016, 3219912016, 3219912016,
3219912016, 3219912016, 3219912016, 3219912016, 3219912016, 3219912016,
3219912016, 3219912016, 3219912016, 3219912017, 3219912017, 3219912017,
3219912017, 3219912017, 3219912017, 3219912017), id_mdob = c(111990,
111990, 111990, 111990, 111990, 111990, 111990, 111990, 111990,
111990, 111990, 111990, 111990, 111990, 111990, 111990, 111990,
111990, 221995, 221995, 221995, 221995, 221995, 221995, 221995,
221995, 221995, 221995, 221995, 221995, 221995, 221995, 221995,
221995, 221995, 221995, 311980, 311980, 311980, 311980, 311980,
311980, 311980, 311980, 311980, 311980, 311980, 311980, 311980,
311980, 311980, 311980, 311980, 311980, 321988, 321988, 321988,
321988, 321988, 321988, 321988, 321988, 321988, 321988, 321988,
321988, 321988, 321988, 321988, 321988, 321988, 321988, 321988,
321988, 321988, 321988, 321988, 321988, 321988, 321988, 321988,
321988, 321988, 321988, 321988, 321988, 321988, 321988, 321988,
321988, 321991, 321991, 321991, 321991, 321991, 321991, 321991,
321991, 321991, 321991, 321991, 321991, 321991, 321991, 321991,
321991, 321991, 321991, 321991, 321991, 321991, 321991, 321991,
321991, 321991), id = c(11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 22, 22, 22, 22, 22, 22, 22, 22,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32), survey_date = structure(c(17532,
17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532,
17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532,
17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532,
17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532,
17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532,
17532, 17532, 17532, 17532, 17532, 17532, 17532, 17532, 16102,
16102, 16102, 16102, 16102, 16102, 16102, 16102, 16102, 16102,
16102, 16102, 16102, 16102, 16102, 16102, 16102, 16102, 16467,
16467, 16467, 16467, 16467, 16467, 16467, 16467, 16467, 16467,
16467, 16467, 16467, 16467, 16467, 16467, 16467, 16467, 16801,
16801, 16801, 16801, 16801, 16801, 16801, 16801, 16801, 16801,
16801, 16801, 16801, 16801, 16801, 16801, 16801, 16801, 17167,
17167, 17167, 17167, 17167, 17167, 17167), class = "Date"), survey_year = c(2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017,
2017, 2017, 2017, 2017), mom_dob = c(1990, 1990, 1990, 1990,
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990,
1990, 1990, 1990, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1980,
1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980,
1980, 1980, 1980, 1980, 1980, 1980, 1988, 1988, 1988, 1988, 1988,
1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988,
1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988,
1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1991, 1991,
1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991,
1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991,
1991), date_year = c(2001, 2002, 2003, 2004, 2005, 2006, 2007,
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018,
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
2012, 2013, 2014, 2015, 2016, 2017, 2018, 2001, 2002, 2003, 2004,
2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015,
2016, 2017, 2018, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 1998,
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
2010, 2011, 2012, 2013, 2014, 2015, 1999, 2000, 2001, 2002, 2003,
2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2000, 2001, 2002, 2003, 2004, 2005, 2006), mom_age = c(11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 8, 9, 10, 11, 12, 13,
14), birth_day = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 15, 15, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1), birth_month = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9), nchild1 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2015, 2015, 2015,
2015, 2015, 2015, 2015), nchild2 = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), nchild3 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), nchild4 = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), nchild5 = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), nchild6 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), nchild7 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), nchild8 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), nchild9 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), nchild10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), stock = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), hh_income_net = c(14120, 14120, 14120, 14120, 14120, 14120,
14120, 14120, 14120, 14120, 14120, 14120, 14120, 14120, 14120,
14120, 14120, 14120, 5510, 5510, 5510, 5510, 5510, 5510, 5510,
5510, 5510, 5510, 5510, 5510, 5510, 5510, 5510, 5510, 5510, 5510,
8203.990234375, 8203.990234375, 8203.990234375, 8203.990234375,
8203.990234375, 8203.990234375, 8203.990234375, 8203.990234375,
8203.990234375, 8203.990234375, 8203.990234375, 8203.990234375,
8203.990234375, 8203.990234375, 8203.990234375, 8203.990234375,
8203.990234375, 8203.990234375, 4850, 4850, 4850, 4850, 4850,
4850, 4850, 4850, 4850, 4850, 4850, 4850, 4850, 4850, 4850, 4850,
4850, 4850, 4800, 4800, 4800, 4800, 4800, 4800, 4800, 4800, 4800,
4800, 4800, 4800, 4800, 4800, 4800, 4800, 4800, 4800, 2400, 2400,
2400, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 2400, 2400,
2400, 2400, 2400, 2400, 2400, 3410, 3410, 3410, 3410, 3410, 3410,
3410), marital_stat = c(20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 20, 20, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10), emp_stat = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 3, 3, 3, 3,
3, 3), disability_stat = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), weight = c(1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1294.89001464844,
1294.89001464844, 1294.89001464844, 1294.89001464844, 1975.36999511719,
1975.36999511719, 1975.36999511719, 1975.36999511719, 1975.36999511719,
1975.36999511719, 1975.36999511719, 1975.36999511719, 1975.36999511719,
1975.36999511719, 1975.36999511719, 1975.36999511719, 1975.36999511719,
1975.36999511719, 1975.36999511719, 1975.36999511719, 1975.36999511719,
1975.36999511719, 1412, 1412, 1412, 1412, 1412, 1412, 1412, 1412,
1412, 1412, 1412, 1412, 1412, 1412, 1412, 1412, 1412, 1412, 1368,
1368, 1368, 1368, 1368, 1368, 1368, 1368, 1368, 1368, 1368, 1368,
1368, 1368, 1368, 1368, 1368, 1368, 989, 989, 989, 989, 989,
989, 989, 989, 989, 989, 989, 989, 989, 989, 989, 989, 989, 989,
1039, 1039, 1039, 1039, 1039, 1039, 1039), region = c(2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 26, 26, 26, 26, 26, 26, 26, 26, 26,
26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), birth_country = c(616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
), birth_citizenship = c(616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1), residence = c(616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616,
616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 616, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), age_sq = c(784,
784, 784, 784, 784, 784, 784, 784, 784, 784, 784, 784, 784, 784,
784, 784, 784, 784, 529, 529, 529, 529, 529, 529, 529, 529, 529,
529, 529, 529, 529, 529, 529, 529, 529, 529, 1444, 1444, 1444,
1444, 1444, 1444, 1444, 1444, 1444, 1444, 1444, 1444, 1444, 1444,
1444, 1444, 1444, 1444, 625, 625, 625, 625, 625, 625, 625, 625,
625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 676, 676, 676,
676, 676, 676, 676, 676, 676, 676, 676, 676, 676, 676, 676, 676,
676, 676, 576, 576, 576, 576, 576, 576, 576, 576, 576, 576, 576,
576, 576, 576, 576, 576, 576, 576, 625, 625, 625, 625, 625, 625,
625), educcat = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), educcat_college = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), maritalcat = c(1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), hh_income_annual_usd = c(45184, 45184, 45184, 45184,
45184, 45184, 45184, 45184, 45184, 45184, 45184, 45184, 45184,
45184, 45184, 45184, 45184, 45184, 17632, 17632, 17632, 17632,
17632, 17632, 17632, 17632, 17632, 17632, 17632, 17632, 17632,
17632, 17632, 17632, 17632, 17632, 26252.76875, 26252.76875,
26252.76875, 26252.76875, 26252.76875, 26252.76875, 26252.76875,
26252.76875, 26252.76875, 26252.76875, 26252.76875, 26252.76875,
26252.76875, 26252.76875, 26252.76875, 26252.76875, 26252.76875,
26252.76875, 15520, 15520, 15520, 15520, 15520, 15520, 15520,
15520, 15520, 15520, 15520, 15520, 15520, 15520, 15520, 15520,
15520, 15520, 15360, 15360, 15360, 15360, 15360, 15360, 15360,
15360, 15360, 15360, 15360, 15360, 15360, 15360, 15360, 15360,
15360, 15360, 7680, 7680, 7680, 7680, 7680, 7680, 7680, 7680,
7680, 7680, 7680, 7680, 7680, 7680, 7680, 7680, 7680, 7680, 10912,
10912, 10912, 10912, 10912, 10912, 10912), hh_income_annual_log = c(10.7184983208529,
10.7184983208529, 10.7184983208529, 10.7184983208529, 10.7184983208529,
10.7184983208529, 10.7184983208529, 10.7184983208529, 10.7184983208529,
10.7184983208529, 10.7184983208529, 10.7184983208529, 10.7184983208529,
10.7184983208529, 10.7184983208529, 10.7184983208529, 10.7184983208529,
10.7184983208529, 9.77747071195264, 9.77747071195264, 9.77747071195264,
9.77747071195264, 9.77747071195264, 9.77747071195264, 9.77747071195264,
9.77747071195264, 9.77747071195264, 9.77747071195264, 9.77747071195264,
9.77747071195264, 9.77747071195264, 9.77747071195264, 9.77747071195264,
9.77747071195264, 9.77747071195264, 9.77747071195264, 10.175526738648,
10.175526738648, 10.175526738648, 10.175526738648, 10.175526738648,
10.175526738648, 10.175526738648, 10.175526738648, 10.175526738648,
10.175526738648, 10.175526738648, 10.175526738648, 10.175526738648,
10.175526738648, 10.175526738648, 10.175526738648, 10.175526738648,
10.175526738648, 9.64988479373721, 9.64988479373721, 9.64988479373721,
9.64988479373721, 9.64988479373721, 9.64988479373721, 9.64988479373721,
9.64988479373721, 9.64988479373721, 9.64988479373721, 9.64988479373721,
9.64988479373721, 9.64988479373721, 9.64988479373721, 9.64988479373721,
9.64988479373721, 9.64988479373721, 9.64988479373721, 9.63952200670166,
9.63952200670166, 9.63952200670166, 9.63952200670166, 9.63952200670166,
9.63952200670166, 9.63952200670166, 9.63952200670166, 9.63952200670166,
9.63952200670166, 9.63952200670166, 9.63952200670166, 9.63952200670166,
9.63952200670166, 9.63952200670166, 9.63952200670166, 9.63952200670166,
9.63952200670166, 8.94637482614172, 8.94637482614172, 8.94637482614172,
8.94637482614172, 8.94637482614172, 8.94637482614172, 8.94637482614172,
8.94637482614172, 8.94637482614172, 8.94637482614172, 8.94637482614172,
8.94637482614172, 8.94637482614172, 8.94637482614172, 8.94637482614172,
8.94637482614172, 8.94637482614172, 8.94637482614172, 9.29761838008324,
9.29761838008324, 9.29761838008324, 9.29761838008324, 9.29761838008324,
9.29761838008324, 9.29761838008324), rural = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
1, 1, 1, 1, 1), newborn = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
family500 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), stock_sq = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -115L), class = c("tbl_df",
"tbl", "data.frame"))
我相信这将为您做,假设您的数据帧是hx
:
- 将感兴趣的数据pivot更长时间
- 按带年份的id和不带年份的id分组,合计出生数
- 按出生年份降序排序(id_mdob_sy)
- 为每组切片第一行;如果是,前一步将知道最大年份
- 选择完整的id,并在返回原始数据集的inner_join中使用它
hx %>% inner_join(
hx %>%
pivot_longer(cols = starts_with("nchild")) %>%
select(id_mdob,id_mdob_sy,name,value) %>%
group_by(id_mdob,id_mdob_sy) %>%
summarize(births=sum(value,na.rm=T)) %>%
arrange(desc(births), desc(id_mdob_sy)) %>%
slice_head(n = 1) %>%
ungroup() %>%
select(id_mdob_sy)
)
输出(注意从115行减少到79行,因为从两个重复的id中删除了36行):
# A tibble: 79 × 40
id_mdob_sy id_mdob id survey_date survey_year mom_dob date_year mom_age birth_day birth_month nchild1 nchild2 nchild3 nchild4 nchild5 nchild6 nchild7 nchild8
<dbl> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
1 1119902018 111990 11 2018-01-01 2018 1990 2001 11 NA NA NA NA NA NA NA NA NA NA
2 1119902018 111990 11 2018-01-01 2018 1990 2002 12 NA NA NA NA NA NA NA NA NA NA
3 1119902018 111990 11 2018-01-01 2018 1990 2003 13 NA NA NA NA NA NA NA NA NA NA
4 1119902018 111990 11 2018-01-01 2018 1990 2004 14 NA NA NA NA NA NA NA NA NA NA
5 1119902018 111990 11 2018-01-01 2018 1990 2005 15 NA NA NA NA NA NA NA NA NA NA
6 1119902018 111990 11 2018-01-01 2018 1990 2006 16 NA NA NA NA NA NA NA NA NA NA
7 1119902018 111990 11 2018-01-01 2018 1990 2007 17 NA NA NA NA NA NA NA NA NA NA
8 1119902018 111990 11 2018-01-01 2018 1990 2008 18 NA NA NA NA NA NA NA NA NA NA
9 1119902018 111990 11 2018-01-01 2018 1990 2009 19 NA NA NA NA NA NA NA NA NA NA
10 1119902018 111990 11 2018-01-01 2018 1990 2010 20 NA NA NA NA NA NA NA NA NA NA
# … with 69 more rows, and 22 more variables: nchild9 <lgl>, nchild10 <lgl>, stock <dbl>, hh_income_net <dbl>, marital_stat <dbl>, emp_stat <dbl>,
# disability_stat <dbl>, weight <dbl>, region <dbl>, birth_country <dbl>, birth_citizenship <dbl>, residence <dbl>, age_sq <dbl>, educcat <dbl>,
# educcat_college <dbl>, maritalcat <dbl>, hh_income_annual_usd <dbl>, hh_income_annual_log <dbl>, rural <dbl>, newborn <dbl>, family500 <dbl>, stock_sq <dbl>