解释:需要将csv文件转换为json文件。csv包含以下信息:给定一个帐户,该帐户必须支付1或n份文档,其中一份的到期日为当天,另一份的日期为之前。csv文件具有以下结构:
账户 | customer_name | due_dateid_typeid_document | to_pay//tr>||
---|---|---|---|---|
2001 | John Doe | 05082022 | >FC | 987654123113,25 |
2002 | Karl Steven | 05082022 | FC | 987654124113,25 |
2003 | Sarah Miller | <15022022>FC | 98756425 | 199,75|
2004 | Marty Mcfly | <15032022>FC | 98756426 | 113,25|
2001 | John Doe | 06032022 | FC | 98756499150,00 |
2001 | John Doe | 15042022 | FC | 9876547715000 |
2003 | Sarah Miller | 05082022 | FC | 987444123113,25 |
2001 | John Doe | <15032022>FC | 987654123 | 113,25 |
您可以使用.groupby('account')
单独处理每个帐户。
当您有一个帐户时,您可以将第一行([0]
(作为主文档,将其他行([1:]
(作为pending_documents
。
使用.to_dict()
,您可以为account创建dict,稍后您可以将其附加到某个列表中以拥有所有帐户。
具有示例数据的最小工作代码。
我使用io
在内存中创建文件,这样每个人都可以简单地复制和测试它,但您可以使用filename
。
我使用engine='python'
,所以我可以在分隔符sep='s+|s*'
中使用regex
,并删除值周围的所有空格。
data = '''account | customer_name | due_date | id_type | id_document | to_pay |
2001 | John Doe | 05082022 | FC | 987654123 | 113,25 |
2002 | Karl Steven | 05082022 | FC | 987654124 | 113,25 |
2003 | Sarah Miller | 15022022 | FC | 987654125 | 199,75 |
2004 | Marty Mcfly | 15032022 | FC | 987654126 | 113,25 |
2001 | John Doe | 06032022 | FC | 987654999 | 150,00 |
2001 | John Doe | 15042022 | FC | 987654777 | 15,00 |
2003 | Sarah Miller | 05082022 | FC | 987444123 | 113,25 |
2001 | John Doe | 15032022 | FC | 987654123 | 113,25 |'''
import pandas as pd
import io
#df = pd.read_csv('filename.csv')
df = pd.read_csv(io.StringIO(data), sep='s+|s*', engine='python')
#print(df)
df = df.drop('Unnamed: 6', axis=1) # remove column after last `|`
#print(df)
all_accounts = []
for selected, group in df.groupby(['account']):
#print('---', selected, '---')
account = group.iloc[0].to_dict()
docs = group.iloc[1:][['id_type', 'id_document', 'to_pay', 'due_date']]
account["pending_documents"] = docs.to_dict(orient='records')
#print(account)
all_accounts.append(account)
import json
print(json.dumps(all_accounts, indent=2))
结果:
[
{
"account": 2001,
"customer_name": "John Doe",
"due_date": 5082022,
"id_type": "FC",
"id_document": 987654123,
"to_pay": "113,25",
"pending_documents": [
{
"id_type": "FC",
"id_document": 987654999,
"to_pay": "150,00",
"due_date": 6032022
},
{
"id_type": "FC",
"id_document": 987654777,
"to_pay": "15,00",
"due_date": 15042022
},
{
"id_type": "FC",
"id_document": 987654123,
"to_pay": "113,25",
"due_date": 15032022
}
]
},
{
"account": 2002,
"customer_name": "Karl Steven",
"due_date": 5082022,
"id_type": "FC",
"id_document": 987654124,
"to_pay": "113,25",
"pending_documents": []
},
{
"account": 2003,
"customer_name": "Sarah Miller",
"due_date": 15022022,
"id_type": "FC",
"id_document": 987654125,
"to_pay": "199,75",
"pending_documents": [
{
"id_type": "FC",
"id_document": 987444123,
"to_pay": "113,25",
"due_date": 5082022
}
]
},
{
"account": 2004,
"customer_name": "Marty Mcfly",
"due_date": 15032022,
"id_type": "FC",
"id_document": 987654126,
"to_pay": "113,25",
"pending_documents": []
}
]