我有一个供应商应付账款老化报告,我试图自动化,这是从金融系统导出的。csv文件提供。在报告中,列出了"未开发票的功能金额"一行,后面是$xx。清单上每个供应商的金额为Xx。下面是报告输出的示例(更改了数字):
1000000 Vendor1 USD PO Number 1/1/1900
Item1, Description
100 Each $1.00
INV000000 1/1/1900 000 Each 100 0 $1.00 $24.00
0 0 $24.00
INV000001 1/1/1900 000 Each 50 0 $1.00 $10.50
0 0 $10.50
-------------------
Functional Amount Not Invoiced: $250.00
Amount Not Invoiced Less Returned: $250.00
1000001 Vendor2 USD PO2061994 6/2/2015
Item2, Description 30 Each $38.00
INV000002 7/23/2015 000 Each 9 0 $38.00 $342.00
0 0 $342.00
INV000003 7/23/2015 000 Each 7 0 $38.00 $266.00
0 0 $266.00
-------------------
Functional Amount Not Invoiced: $346,955.00
Amount Not Invoiced Less Returned: $1,245.00
我想知道如何为大于或等于$10,000.00的所有"功能金额未发票"实例解析.csv文件,在这些情况下,取前两个字符串并返回它们(在上面的情况下,我将返回1000000 Vendor1)。下面是我到目前为止的代码:
companyList={'1000000':'Vendor1',...}
with open('Vendor Report.csv',mode='r',encoding='latin1') as file:
csvreader=csv.reader(file)
for row in csvreader:
print(' '.join(row))
if 'Functional Amount Not Invoiced:' in row:
...
我已经到了…部分,我知道逻辑是'如果字符串后的金额至少是$10,000.00,找到供应商ID和供应商名称并返回它们。目标是将所有超过10,000美元的供应商列表自动添加到列表中。我期望的输出如下所示:
Vendor ID Vendor Name $346,955.00
...
IIUC,这里有一个使用read_fwf
和extract
的熊猫选择:
#pip install pandas
import pandas as pd
MIN_AMOUNT = 10000
df = pd.read_fwf("input.csv", header=None)
vendor_vals = df[0].str.extract(r"(d+) ([a-zA-Z]+d+)", expand=False).ffill()
fani_vals = (df.pop(0).str.extract(r"Functional Amount Not Invoiced: $(.*)",
expand=False).replace({r",|.0+": ""}, regex=True).astype(float))
companyList = (
df.assign(VENDOR = vendor_vals, FANI = fani_vals).dropna()
.loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].to_dict("list")
)
输出:
>>> print(companyList)
{'VENDOR': ['1000001 Vendor2'], 'FANI': [346955.0]}
更新:
如果你需要一个数据框架(生成。csv),使用这个:
df = pd.read_fwf("input.csv", header=None)
out = (
df.join(df[0].str.extract(r"(d+) ([a-zA-Z]+d+)")
.rename(columns={0: "VENDOR_ID", 1:"VENDOR_NAME"}).ffill())
.assign(FANI = lambda df_: df_.pop(0).str.extract(r"Functional Amount Not Invoiced: $(.*)",
expand=False).replace({r",|.0+": ""}, regex=True).astype(float))
.dropna().loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].reset_index(drop=True)
)
输出:
>>> print(out)
VENDOR_ID VENDOR_NAME FANI
0 1000001 Vendor2 346955.0