表示如何创建一对多关系(添加数量列)

导入CSV文件后，我得到这样的输入:

>>> df.head()
id  customer           products
0   0         0            1 0 1 0
1   1        22  0 5 0 4 5 3 2 1 1
2   2        57              5 2 4
...

我想改变它，以便它可以匹配SQL数据库模式:

order_id, customer_id, product_id, product_qty, primary key(order_id, customer_id, product_id))

想知道是否使用pandas和to_sql(在SQLite中测试)可以实现这一点，期望的输出将是:

id customer product_id product_qty
0   0       0          2     
0   0       1          2
1   22      0          2
1   22      1          2
1   22      2          1 
1   22      3          1
1   22      4          1
1   22      5          2
2   57      2          1
2   57      4          1
2   57      5          1

系列。将字符串转换为列表
DataFrame。将列表变量解嵌到
DataFrame。

请看下面的例子:

import re
from io import StringIO
import pandas as pd
data = """
id  customer           products
0         0            1 0 1 0
1        22  0 5 0 4 5 3 2 1 1
2        57              5 2 4
""".strip()
data = re.sub(r"s{2,}", ",", data)
df = pd.read_csv(StringIO(data))
df["product_id"] = df.products.str.split()
df = df.explode("product_id", ignore_index=False)
result = df.groupby(["id", "customer", "product_id"], as_index=False).agg(
product_qty=("product_id", "size"))
result

id  customer product_id  product_qty
0    0         0          0            2
1    0         0          1            2
2    1        22          0            2
3    1        22          1            2
4    1        22          2            1
5    1        22          3            1
6    1        22          4            1
7    1        22          5            2
8    2        57          2            1
9    2        57          4            1
10   2        57          5            1

In [72]: df = pd.DataFrame({'id': [0,1,2], 'customer': [0,22,57], 'products': ['1 0 1 0', '0 5 0 4 5 3 2 1 1', '5 2 4']})
In [73]: df.assign(product_id=df['products'].str.split()).explode('product_id').drop('products', axis='columns').groupby(['id', 'customer', 'pr
...: oduct_id']).size().reset_index(name='product_qty')
Out[73]:
id  customer product_id  product_qty
0    0         0          0            2
1    0         0          1            2
2    1        22          0            2
3    1        22          1            2
4    1        22          2            1
5    1        22          3            1
6    1        22          4            1
7    1        22          5            2
8    2        57          2            1
9    2        57          4            1
10   2        57          5            1

查看

out = df.assign(products=df.products.str.split()).explode('products').value_counts(list(df)).reset_index(name='product_qty')
id  customer products  product_qty
0    0         0        0            2
1    0         0        1            2
2    1        22        0            2
3    1        22        1            2
4    1        22        5            2
5    1        22        2            1
6    1        22        3            1
7    1        22        4            1
8    2        57        2            1
9    2        57        4            1
10   2        57        5            1

相关内容

最新更新

热门标签：