我有一个数据帧(df),这是一个更大的版本:
txnID date product sold repID lastName
1001 8/5/2020 Clobromizen 600 203 Kappoorthy
1002 6/28/2020 Alaraphosol 276 887 da Silva
1003 6/28/2020 Alaraphosol 184 887 da Silva
1004 4/16/2020 Diaprogenix 36 887 da Silva
1005 6/14/2020 Diaprogenix 40 887 da Silva
1006 5/19/2020 Xinoprozen 5640 332 McRowe
1007 8/23/2020 Diaprogenix 60 332 McRowe
1008 11/14/2020 Clobromizen 2880 332 McRowe
1009 9/26/2020 Colophrazen 738 203 Kappoorthy
1010 2/5/2020 Diaprogenix 20 332 McRowe
1011 9/23/2020 Gerantrazeophem 3740 100 Schwab
1012 12/4/2020 Clobromizen 1584 221 Sixt
我想创建一个新的数据框架,它获取显示的每个员工的所有销售产品的总和(显示所有员工),它看起来像这样:
View(df1)
lastName totalSold
1 Kappoorthy sum(df$sold)
2 da Silva sum(df$sold)
3 McRowe sum(df$sold)
4 Schwab sum(df$sold)
5 Sixt sum(df$sold)
在Base R中可以这样做:
aggregate(sold~lastName, df, sum)
lastName sold
1 da Silva 536
2 Kappoorthy 1338
3 McRowe 8600
4 Schwab 3740
5 Sixt 1584
:
aggregate(sold~lastName, df, sum, subset = !product %in%c("Xinoprozen","Diaprogenix"))
lastName sold
1 da Silva 460
2 Kappoorthy 1338
3 McRowe 2880
4 Schwab 3740
5 Sixt 1584
如果你有NA
s:
aggregate(sold~lastName, df, sum, na.rm =TRUE)
这是dplyr
的一种方法
library(dplyr)
df %>%
filter(!(product %in% c("Xinoprozen", "Diaprogenix") )%>%
group_by(lastName) %>%
summarize(totalSold = sum(sold,na.rm = TRUE))
library(dplyr)
df%>%
group_by(lastName)%>%
summarise(Totalsold = sum(sold))
如果您想排除任何产品,例如"Xinoprozen"one_answers"Diaprogenix">
df%>%
filter(!(product %in% c("Xinoprozen", product!="Diaprogenix")))%>%
group_by(lastName)%>%
summarise(Totalsold = sum(sold))
using R baseaggregate
aggregate(sold ~ lastName, sum, na.rm=TRUE, data=df)