r语言 - 按FIPS(即两个时间段范围内的县ID)跨多个列进行汇总(sum)



根据下面的数据,我如何通过FIPS2011-20152016-2020两个时间段内Inflow,Outflow,NetMigration,InAGIOutAGI列进行sum?有些国家可能没有特定财政年度的数据,但这并不重要,因为我们的想法是将这两个时间段范围内的数据相加。在最终的数据集中自然会出现一些NAs。我使用FIPS,因为有几个县有相同的名称。因此,Key列不再需要,因为它是FIPSYear的连接。

期望输出模式/列:

FIPS    County          State   TotInflow   TotOutflow  TotNetMigration TotInAGI    TotOutAGI   Time_Period
12001       Alachua County  FL                                                                      2011-2015
12001       Alachua County  FL                                                                      2016-2020
08001       Adams County    CO                                                                      2011-2015
08001       Adams County    CO                                                                      2016-2020

样本数据:

df = structure(list(Key = c("080012020", "120012020", "120012018", 
"120012017", "080012017", "120012016", "120012015", "080012014", 
"120012013", "120012012", "080012012", "080012011", "080012016"
), County = c("Adams County", "Alachua County", "Alachua County", 
"Alachua County", "Adams County", "Alachua County", "Alachua County", 
"Adams County", "Alachua County", "Alachua County", "Adams County", 
"Adams County", "Adams County"), State = c("CO", "FL", "FL", 
"FL", "CO", "FL", "FL", "CO", "FL", "FL", "CO", "CO", "CO"), 
FIPS = c("08001", "12001", "12001", "12001", "08001", "12001", 
"12001", "08001", "12001", "12001", "08001", "08001", "08001"
), Inflow = c(38L, 261L, 321L, 339L, 58L, 288L, 254L, 46L, 
413L, 433L, 30L, 42L, NA), InAGI = c(1817L, 6287L, 8423L, 
8364L, 1865L, 14720L, 5224L, 1074L, 11774L, 10151L, 921L, 
500L, NA), FiscalYear = c("2019- 2020", "2019- 2020", "2017 - 2018", 
"2016 - 2017", "2016 - 2017", "2015 - 2016", "2014 - 2015", 
"2013 - 2014", "2012 - 2013", "2011 - 2012", "2011 - 2012", 
"2010 - 2011", "2015 - 2016"), Year = c(2020L, 2020L, 2018L, 
2017L, 2017L, 2016L, 2015L, 2014L, 2013L, 2012L, 2012L, 2011L, 
2016L), Outflow = c(54L, 447L, 444L, 558L, 44L, 436L, 334L, 
49L, 466L, 495L, 39L, 31L, 51L), OutAGI = c(1879L, 13106L, 
15409L, 16496L, 2408L, 12675L, 7448L, 733L, 10309L, 11677L, 
847L, 605L, 1114L), NetMigration = c(-16L, -186L, -123L, 
-219L, 14L, -148L, -80L, -3L, -53L, -62L, -9L, 11L, NA)), row.names = c(NA, 
-13L), class = "data.frame")

按'FIPS', 'County', 'State'和从'Year'创建的Time_Period列(基于'Year'是否位于某些开始年和结束年之间)进行分组,然后通过循环across这些列名获得感兴趣的列的sum

library(dplyr)
df %>%
group_by(FIPS, County, State, 
Time_Period = case_when(between(Year, 2011, 2015)~
'2011-2015', between(Year, 2016, 2020)~ '2016-2020')) %>% 
summarise(across(c(Inflow, InAGI, Outflow, OutAGI, NetMigration), 
~ sum(.x, na.rm = TRUE), .names = "Total{.col}"),.groups = "drop")

与产出

# A tibble: 4 × 9
FIPS  County         State Time_Period TotalInflow TotalInAGI TotalOutflow TotalOutAGI TotalNetMigration
<chr> <chr>          <chr> <chr>             <int>      <int>        <int>       <int>             <int>
1 08001 Adams County   CO    2011-2015           118       2495          119        2185                -1
2 08001 Adams County   CO    2016-2020            96       3682          149        5401                -2
3 12001 Alachua County FL    2011-2015          1100      27149         1295       29434              -195
4 12001 Alachua County FL    2016-2020          1209      37794         1885       57686              -676

最新更新