根据下面的数据,我如何通过FIPS
在2011-2015
和2016-2020
两个时间段内Inflow
,Outflow
,NetMigration
,InAGI
和OutAGI
列进行sum
?有些国家可能没有特定财政年度的数据,但这并不重要,因为我们的想法是将这两个时间段范围内的数据相加。在最终的数据集中自然会出现一些NAs
。我使用FIPS
,因为有几个县有相同的名称。因此,Key
列不再需要,因为它是FIPS
和Year
的连接。
期望输出模式/列:
FIPS County State TotInflow TotOutflow TotNetMigration TotInAGI TotOutAGI Time_Period
12001 Alachua County FL 2011-2015
12001 Alachua County FL 2016-2020
08001 Adams County CO 2011-2015
08001 Adams County CO 2016-2020
样本数据:
df = structure(list(Key = c("080012020", "120012020", "120012018",
"120012017", "080012017", "120012016", "120012015", "080012014",
"120012013", "120012012", "080012012", "080012011", "080012016"
), County = c("Adams County", "Alachua County", "Alachua County",
"Alachua County", "Adams County", "Alachua County", "Alachua County",
"Adams County", "Alachua County", "Alachua County", "Adams County",
"Adams County", "Adams County"), State = c("CO", "FL", "FL",
"FL", "CO", "FL", "FL", "CO", "FL", "FL", "CO", "CO", "CO"),
FIPS = c("08001", "12001", "12001", "12001", "08001", "12001",
"12001", "08001", "12001", "12001", "08001", "08001", "08001"
), Inflow = c(38L, 261L, 321L, 339L, 58L, 288L, 254L, 46L,
413L, 433L, 30L, 42L, NA), InAGI = c(1817L, 6287L, 8423L,
8364L, 1865L, 14720L, 5224L, 1074L, 11774L, 10151L, 921L,
500L, NA), FiscalYear = c("2019- 2020", "2019- 2020", "2017 - 2018",
"2016 - 2017", "2016 - 2017", "2015 - 2016", "2014 - 2015",
"2013 - 2014", "2012 - 2013", "2011 - 2012", "2011 - 2012",
"2010 - 2011", "2015 - 2016"), Year = c(2020L, 2020L, 2018L,
2017L, 2017L, 2016L, 2015L, 2014L, 2013L, 2012L, 2012L, 2011L,
2016L), Outflow = c(54L, 447L, 444L, 558L, 44L, 436L, 334L,
49L, 466L, 495L, 39L, 31L, 51L), OutAGI = c(1879L, 13106L,
15409L, 16496L, 2408L, 12675L, 7448L, 733L, 10309L, 11677L,
847L, 605L, 1114L), NetMigration = c(-16L, -186L, -123L,
-219L, 14L, -148L, -80L, -3L, -53L, -62L, -9L, 11L, NA)), row.names = c(NA,
-13L), class = "data.frame")
按'FIPS', 'County', 'State'和从'Year'创建的Time_Period
列(基于'Year'是否位于某些开始年和结束年之间)进行分组,然后通过循环across
这些列名获得感兴趣的列的sum
library(dplyr)
df %>%
group_by(FIPS, County, State,
Time_Period = case_when(between(Year, 2011, 2015)~
'2011-2015', between(Year, 2016, 2020)~ '2016-2020')) %>%
summarise(across(c(Inflow, InAGI, Outflow, OutAGI, NetMigration),
~ sum(.x, na.rm = TRUE), .names = "Total{.col}"),.groups = "drop")
与产出
# A tibble: 4 × 9
FIPS County State Time_Period TotalInflow TotalInAGI TotalOutflow TotalOutAGI TotalNetMigration
<chr> <chr> <chr> <chr> <int> <int> <int> <int> <int>
1 08001 Adams County CO 2011-2015 118 2495 119 2185 -1
2 08001 Adams County CO 2016-2020 96 3682 149 5401 -2
3 12001 Alachua County FL 2011-2015 1100 27149 1295 29434 -195
4 12001 Alachua County FL 2016-2020 1209 37794 1885 57686 -676