ID:123,数量:1,名称:abcdef, ID: 324,数量:2,名称:giefg,
ID:123,数量:1,名称:abcdef,ID:123,数量:1,名称:abcdef, ID:123,数量:1,名称:abcdef,
我在df中有一个详细的列,可以包含多个客户信息,如下表所示。
如果详细信息列有多个客户信息,我需要为每个客户创建一个新行,如第二个表所示。
我如何在Python中做到这一点?
使用前向正则表达式对split
字符串,然后explode
它:
df['details'] = df['details'].str.split(',s*(?=ID:)')
df.explode('details')
输出:
date time consumer details
0 12/11 12:00 abc ID:123, Qty: 1,Name: abcdef
0 12/11 12:00 abc ID: 324,Qty:2,Name: ghi
1 13/11 13:00 def ID:123, Qty: 1,Name: abcdef,
2 14/11 11:00 ghj ID:123, Qty: 1,Name: abcdef
2 14/11 11:00 ghj ID:123, Qty: 1,Name: abc
假设每个客户有3个字段(ID, Qty, Name),您可以尝试这样做。
import pandas as pd
df = pd.DataFrame(
{
"date:": ["12/11", "13/11", "14/11"],
"time": ["12:00", "13:00", "11:00"],
"consumer": ["abc", "def", "ghj"],
"details": [
"ID:123, Qty: 1,Name: abcdef,ID: 324,Qty:2,Name: ghi",
"ID:123, Qty: 1,Name: abcdef",
"ID:123, Qty: 1,Name: abcdef,ID:123, Qty: 1,Name: abc",
],
}
)
df["details"] = df["details"].apply(
lambda txt: [
','.join(txt.split(",")[idx : idx + 3]) for idx in range(0, len(txt.split(",")), 3)
]
)
df = df.explode("details").reset_index(drop=True)
print(df)
Sample Output
date: time consumer details
0 12/11 12:00 abc ID:123, Qty: 1,Name: abcdef
1 12/11 12:00 abc ID: 324,Qty:2,Name: ghi
2 13/11 13:00 def ID:123, Qty: 1,Name: abcdef
3 14/11 11:00 ghj ID:123, Qty: 1,Name: abcdef
4 14/11 11:00 ghj ID:123, Qty: 1,Name: abc