我正在编写一个脚本,该脚本可以自动执行一些我经常在excel中进行的重复数据转换/清理,我已经能够取得良好的收益,但是我一直被卡在这一点上:
我已经引入了所有相关的数据框架,并做了一些过滤和其他清理。对于我的脚本的这一部分,我已经创建了2个数据框架,df2是我新创建的基于df1的产品ID的唯一列表,我的"主列表";包含产品ID及其采购代码。
我的最终结果是计算产品ID具有p, a, X领先采购代码的次数(这些将被分开到它们自己的列中),但我不确定如何在两个不同的数据框架中做到这一点。
虚构数据样本:
df1 = pd.DataFrame({'ProductID': ["12441","44123","77880","12345","33445","77565","34354","77880","33445", "12345", "12441", "12441","12441","44123"],
"ProcCode":["P34","P35","P67","P67","X77","P34","P35","P34","X77","P35","A55","P34","P35","A55"]})
ProductID ProcCode
0 12441 P34
1 44123 P35
2 77880 P67
3 12345 P67
4 33445 X77
5 77565 P34
6 34354 P35
7 77880 P34
8 33445 X77
9 12345 P35
10 12441 A55
11 12441 P34
12 12441 P35
13 44123 A55
df2 = pd.DataFrame({"ProductID": ["12441","44123","77880","12345","33445","77565"]})
ProductID
0 12441
1 44123
2 77880
3 12345
4 33445
5 77565
应用countif样式代码后,我应该产生以下数据框架:
df3 = pd.DataFrame({"ProductID":["12441","44123","77880","12345","33445","77565"], "CountofPCode":[3,1,2,3,0,1],"CountofXCode":[0,0,0,0,2,0]})
ProductID CountofPCode CountofXCode
0 12441 3 0
1 44123 1 0
2 77880 2 0
3 12345 3 0
4 33445 0 2
5 77565 1 0
一个方法是:
out = (
# Group by the ProductID and First letter. Get Value Counts.
df1.groupby([df1.ProductID, df1.ProcCode.str[0]]).value_counts()
.unstack(-2) # Unstack the first letter.
.groupby('ProductID').sum() # Group by ProductID. Get Sum.
)
print(out)
输出:
ProcCode A P X
ProductID
12345 0.0 2.0 0.0
12441 1.0 3.0 0.0
33445 0.0 0.0 2.0
34354 0.0 1.0 0.0
44123 1.0 1.0 0.0
77565 0.0 1.0 0.0
77880 0.0 2.0 0.0
:
df1['ProcCodePrefix'] = df1.ProcCode.str[0]
out = df1.pivot_table(
index='ProductID',
columns='ProcCodePrefix',
values='ProcCodePrefix',
aggfunc='count',
fill_value=0,
)
print(out)
输出:
ProcCodePrefix A P X
ProductID
12345 0 2 0
12441 1 3 0
33445 0 0 2
34354 0 1 0
44123 1 1 0
77565 0 1 0
77880 0 2 0
清理以匹配您的输出:
out.columns.name = None
out.columns = "Countof" + out.columns + "Code"
out = out.reset_index()
print(out)
输出:
ProductID CountofACode CountofPCode CountofXCode
0 12345 0 2 0
1 12441 1 3 0
2 33445 0 0 2
3 34354 0 1 0
4 44123 1 1 0
5 77565 0 1 0
6 77880 0 2 0