导入CSV文件后,我得到这样的输入:
>>> df.head()
id customer products
0 0 0 1 0 1 0
1 1 22 0 5 0 4 5 3 2 1 1
2 2 57 5 2 4
...
我想改变它,以便它可以匹配SQL数据库模式:
order_id, customer_id, product_id, product_qty, primary key(order_id, customer_id, product_id))
想知道是否使用pandas
和to_sql
(在SQLite中测试)可以实现这一点,期望的输出将是:
id customer product_id product_qty
0 0 0 2
0 0 1 2
1 22 0 2
1 22 1 2
1 22 2 1
1 22 3 1
1 22 4 1
1 22 5 2
2 57 2 1
2 57 4 1
2 57 5 1
- 系列。将字符串转换为列表
- DataFrame。将列表变量解嵌到 行
- DataFrame。
请看下面的例子:
import re
from io import StringIO
import pandas as pd
data = """
id customer products
0 0 1 0 1 0
1 22 0 5 0 4 5 3 2 1 1
2 57 5 2 4
""".strip()
data = re.sub(r"s{2,}", ",", data)
df = pd.read_csv(StringIO(data))
df["product_id"] = df.products.str.split()
df = df.explode("product_id", ignore_index=False)
result = df.groupby(["id", "customer", "product_id"], as_index=False).agg(
product_qty=("product_id", "size"))
result
id customer product_id product_qty
0 0 0 0 2
1 0 0 1 2
2 1 22 0 2
3 1 22 1 2
4 1 22 2 1
5 1 22 3 1
6 1 22 4 1
7 1 22 5 2
8 2 57 2 1
9 2 57 4 1
10 2 57 5 1
In [72]: df = pd.DataFrame({'id': [0,1,2], 'customer': [0,22,57], 'products': ['1 0 1 0', '0 5 0 4 5 3 2 1 1', '5 2 4']})
In [73]: df.assign(product_id=df['products'].str.split()).explode('product_id').drop('products', axis='columns').groupby(['id', 'customer', 'pr
...: oduct_id']).size().reset_index(name='product_qty')
Out[73]:
id customer product_id product_qty
0 0 0 0 2
1 0 0 1 2
2 1 22 0 2
3 1 22 1 2
4 1 22 2 1
5 1 22 3 1
6 1 22 4 1
7 1 22 5 2
8 2 57 2 1
9 2 57 4 1
10 2 57 5 1
查看
out = df.assign(products=df.products.str.split()).explode('products').value_counts(list(df)).reset_index(name='product_qty')
id customer products product_qty
0 0 0 0 2
1 0 0 1 2
2 1 22 0 2
3 1 22 1 2
4 1 22 5 2
5 1 22 2 1
6 1 22 3 1
7 1 22 4 1
8 2 57 2 1
9 2 57 4 1
10 2 57 5 1