按R中的条件聚合数据帧



我在R中有以下DataFrame:

Y       ...    Price      Year           Quantity      Country    
010190   ...   4781       2021               4           Germany    
010190   ...   367        2021               3           Germany 
010190   ...   4781       2021               6           France    
010190   ...   250        2021               3           France    
020190   ...   690        2021               NA          USA        
020190   ...   10         2021               6           USA  
......         ...         ....              ..          ...   
217834  ...    56        2021                3           USA        
217834 ...     567       2021                9           USA        

正如您所看到的,Y列中的数字从01.、02.…、21…开始。我想通过考虑不同的分类列(例如国家年份(以及数量价格等数字列的总和,将这类行从6位聚合到2位。此外,我还想考虑在计算过程中与NA发生的争吵。所以,最终我想要这样的输出:

Y     Price      Year          Quantity   Country
01     5148       2021           7         Germany
01     5031       2021           9          USA
02     700        2021           6          USA
..     ....       ...           ....        ...      
21     623        2021           12         USA

您可以从dplyr使用group_bysummarize

library(dplyr)
df %>% 
mutate(Y = sprintf(as.numeric(factor(Y, unique(Y))), fmt = '%02d')) %>%
group_by(Y, Year, Country) %>%
summarize(across(where(is.numeric), sum))
#> # A tibble: 4 x 5
#> # Groups:   Y, Year [3]
#>    Y      Year Country Price Quantity
#>    <chr> <int> <chr>   <int>    <int>
#>  1 01     2021 France   5031        9
#>  2 01     2021 Germany  5148        7
#>  3 02     2021 USA       700       NA

更新:请求:

library(dplyr)
df %>% 
mutate(Y = substr(Y, 1, 2)) %>% 
group_by(Y, Year, Country) %>% 
summarise(across(c(Price, Quantity), ~sum(., na.rm = TRUE)))

我们可以使用substrYgroup_by以及summarise()中获得前两个字符,并使用sum()

library(dplyr)
df %>% 
mutate(Y = substr(Y, 1, 2)) %>% 
group_by(Y, Year, Country) %>% 
summarise(Price = sum(Price, na.rm = TRUE),
Quantity = sum(Quantity, na.rm = TRUE)
)
Y      Year Country Price Quantity
<chr> <dbl> <chr>   <dbl>    <dbl>
1 01     2021 France   5031        9
2 01     2021 Germany  5148        7
3 02     2021 USA       700        6
4 21     2021 USA       623       12

使用aggregateYsubstr

aggregate(cbind(Quantity, Price) ~ Y + Year + Country, 
transform(dat, Y=substr(Y, 1, 2)), sum)
#    Y Year Country Quantity Price
# 1 10 2021  France        9  5031
# 2 10 2021 Germany        7  5148
# 3 20 2021     USA        7   700
# 4 21 2021     USA       12   623

数据:

dat <- structure(list(Y = c(10190L, 10190L, 10190L, 10190L, 20190L, 
20190L, 217834L, 217834L), foo = c("...", "...", "...", "...", 
"...", "...", "...", "..."), Price = c(4781L, 367L, 4781L, 250L, 
690L, 10L, 56L, 567L), Year = c(2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L), model = c(NA, NA, NA, NA, NA, NA, "Tesla", 
"Tesla"), Quantity = c(4L, 3L, 6L, 3L, 1L, 6L, 3L, 9L), Country = c("Germany", 
"Germany", "France", "France", "USA", "USA", "USA", "USA")), class = "data.frame", row.names = c(NA, 
-8L))

相关内容

  • 没有找到相关文章

最新更新