我正试图找到一种将零插入pandas数据帧的方法,其中.count((聚合函数的结果为<1.我尝试过设置一个条件,在该条件下它查找null/None值,并使用一个简单的<1名操作员。到目前为止,我只能计算存在分类变量的实例。下面是一些示例代码来演示我的问题:
data = {'Person': ['Jim', 'Jim', 'Jim', 'Jim', 'Jim', 'Bob','Bob','Bob','Bob','Bob',], 'Result': ['Good', 'Good','Good','Good','Good','Good','Bad','Good','Bad','Bad',]}
dtf = pd.DataFrame.from_dict(data)
names = ['Jim','Bob']
append = []
for i in names:
good = dtf[dtf['Person']==i]
good = good[good['Result']=='Good']
if good['Result'].count() > 0:
good.insert(2,"Count",good['Result'].count())
elif good['Result'].count() < 1:
good.insert(2,"Count",0)
bad = dtf[dtf['Person']==i]
bad = bad[bad['Result']=='Bad']
if bad['Result'].count() > 0:
bad.insert(2,"Count",bad['Result'].count())
elif bad['Result'].count() < 1:
bad.insert(2,"Count",0)
res = [good,bad]
res = pd.concat(res)
append.append(res)
print(res)
当前输出为:
Person Result Count
0 Jim Good 5
1 Jim Good 5
2 Jim Good 5
3 Jim Good 5
4 Jim Good 5
Person Result Count
5 Bob Good 2
7 Bob Good 2
6 Bob Bad 3
8 Bob Bad 3
9 Bob Bad 3
我试图实现的是,对于dtf["Results"]列中的"Bad"变量,Jim的计数为零。像这样:
Person Result Count
0 Jim Good 5
1 Jim Good 5
2 Jim Good 5
3 Jim Good 5
4 Jim Good 5
5 Jim Bad 0
Person Result Count
6 Bob Good 2
7 Bob Good 2
8 Bob Bad 3
9 Bob Bad 3
10 Bob Bad 3
我希望这是有道理的。抵抗万岁!└[⑪┌]└[⑪]┘[┐∵]┘
首先从Person
和Result
的乘积创建一个多索引mi
,以保留df
中缺少的组合。然后计数(size
(所有组,并通过多索引重新索引。最后,合并这两个数据帧使用来自两者的键的并集。
mi = pd.MultiIndex.from_product([df["Person"].unique(),
df["Result"].unique()],
names=["Person", "Result"])
out = df.groupby(["Person", "Result"])
.size()
.reindex(mi, fill_value=0)
.rename("Count")
.reset_index()
out = out.merge(df, on=["Person", "Result"], how="outer")
>>> out
Person Result Count
0 Jim Good 5
1 Jim Good 5
2 Jim Good 5
3 Jim Good 5
4 Jim Good 5
5 Jim Bad 0
6 Bob Good 2
7 Bob Good 2
8 Bob Bad 3
9 Bob Bad 3
10 Bob Bad 3
输出:
names, append = list(zip(*out.groupby("Person")))
>>> names
('Bob', 'Jim')
>>> append
( Person Result Count
6 Bob Good 2
7 Bob Good 2
8 Bob Bad 3
9 Bob Bad 3
10 Bob Bad 3,
Person Result Count
0 Jim Good 5
1 Jim Good 5
2 Jim Good 5
3 Jim Good 5
4 Jim Good 5
5 Jim Bad 0)