Pandas:使用2个数据框和通配符复制Excel COUNTIF ?



我正在编写一个脚本,该脚本可以自动执行一些我经常在excel中进行的重复数据转换/清理,我已经能够取得良好的收益,但是我一直被卡在这一点上:

我已经引入了所有相关的数据框架,并做了一些过滤和其他清理。对于我的脚本的这一部分,我已经创建了2个数据框架,df2是我新创建的基于df1的产品ID的唯一列表,我的"主列表";包含产品ID及其采购代码。

我的最终结果是计算产品ID具有p, a, X领先采购代码的次数(这些将被分开到它们自己的列中),但我不确定如何在两个不同的数据框架中做到这一点。

虚构数据样本:

df1 = pd.DataFrame({'ProductID': ["12441","44123","77880","12345","33445","77565","34354","77880","33445", "12345", "12441", "12441","12441","44123"], 
"ProcCode":["P34","P35","P67","P67","X77","P34","P35","P34","X77","P35","A55","P34","P35","A55"]})
ProductID ProcCode
0      12441      P34
1      44123      P35
2      77880      P67
3      12345      P67
4      33445      X77
5      77565      P34
6      34354      P35
7      77880      P34
8      33445      X77
9      12345      P35
10     12441      A55
11     12441      P34
12     12441      P35
13     44123      A55
df2 = pd.DataFrame({"ProductID": ["12441","44123","77880","12345","33445","77565"]})
ProductID
0     12441
1     44123
2     77880
3     12345
4     33445
5     77565

应用countif样式代码后,我应该产生以下数据框架:

df3 = pd.DataFrame({"ProductID":["12441","44123","77880","12345","33445","77565"], "CountofPCode":[3,1,2,3,0,1],"CountofXCode":[0,0,0,0,2,0]})
ProductID  CountofPCode  CountofXCode
0     12441             3             0
1     44123             1             0
2     77880             2             0
3     12345             3             0
4     33445             0             2
5     77565             1             0

一个方法是:

out = (
# Group by the ProductID and First letter. Get Value Counts.
df1.groupby([df1.ProductID, df1.ProcCode.str[0]]).value_counts()
.unstack(-2)  # Unstack the first letter.
.groupby('ProductID').sum()  # Group by ProductID. Get Sum.
)
print(out)

输出:

ProcCode     A    P    X
ProductID
12345      0.0  2.0  0.0
12441      1.0  3.0  0.0
33445      0.0  0.0  2.0
34354      0.0  1.0  0.0
44123      1.0  1.0  0.0
77565      0.0  1.0  0.0
77880      0.0  2.0  0.0

:

df1['ProcCodePrefix'] = df1.ProcCode.str[0]
out = df1.pivot_table(
index='ProductID',
columns='ProcCodePrefix',
values='ProcCodePrefix',
aggfunc='count',
fill_value=0,
)
print(out)

输出:

ProcCodePrefix  A  P  X
ProductID
12345           0  2  0
12441           1  3  0
33445           0  0  2
34354           0  1  0
44123           1  1  0
77565           0  1  0
77880           0  2  0

清理以匹配您的输出:

out.columns.name = None
out.columns = "Countof" + out.columns + "Code"
out = out.reset_index()
print(out)

输出:

ProductID  CountofACode  CountofPCode  CountofXCode
0     12345             0             2             0
1     12441             1             3             0
2     33445             0             0             2
3     34354             0             1             0
4     44123             1             1             0
5     77565             0             1             0
6     77880             0             2             0

相关内容

  • 没有找到相关文章

最新更新