我想去尽可能少的商店买我的产品。我该怎么做?我有一份特定产品的商店清单。
wanted_Products = pd.DataFrame({'p':[1,2,3,4,5,6,7]})
stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]})
# return 1 if the Product is wanted
stores['Wanted'] = stores.Product.isin(wanted_Products.p).values.astype(int)
Store Product Wanted
0 1 1 1
1 1 2 1
2 1 3 1
3 1 5 1
4 2 0 0
5 2 2 1
6 2 3 1
7 2 4 1
8 3 0 0
9 3 6 1
10 3 7 1
11 3 8 0
12 4 0 0
13 4 1 1
14 4 2 1
15 4 6 1
# Group products per store and calculate how many wanted products are in a store
w = stores.groupby('Store', as_index=False).agg(list)
w['Number_wanted'] = stores.groupby('Store', as_index=False)['Wanted'].sum().agg(list)['Wanted']
Store Product Wanted Number_wanted ?Products_wanted?
0 1 [1, 2, 3, 5] [1, 1, 1, 1] 4 [1,2,3,5]
1 2 [0, 2, 3, 4] [0, 1, 1, 1] 3 [2,3,4]
2 3 [0, 6, 7, 8] [0, 1, 1, 0] 2 [6,7]
3 4 [0, 1, 2, 6] [0, 1, 1, 1] 3 [1,2,6]
如果没有不需要的产品,我如何在新列(Products_wanted(中获得我想要的产品?当我使用isin((时,我只得到true/false(1/0,如果我使用astype(int((,而不是实际的数字。
实现这一点的一种方法是跟踪商店中可用的所有产品,获取它们,然后将这些产品标记为"取">这样你就不会在下一家商店里挑选同样的了。
因此,最初您有wanted_Products
= [1,2,3,4,5,6,7]
由于您从商店1获得[1, 2, 3, 5]
,因此您选择并返回这些产品作为要从>商店1获得的产品,然后将所有这些标记为"取">只需将wanted_Products
中的这些值替换为其他值,如-1
(或您喜欢的其他值,表示它们已被取下(。
现在wanted_Products
= [-1,-1,-1,4,-1,6,7]
。-1
的产品被拿走了,所以你只能从下一家商店买到[4,6,7]
的产品。对所有商店重复相同的逻辑将为您提供从那里获得的产品,而不会有任何重复:
def get_products(possible, wanted):
i = np.where(np.in1d(wanted, possible))
available = wanted[i]
wanted[i] = -1
return available
w = stores.groupby('Store', as_index=False).agg(list)
w['Products to get'] = w.Product.apply(get_products, args=(np.array(wanted_Products),))
输出:
>>> w
Store Product Products to get
0 1 [1, 2, 3, 5] [1, 2, 3, 5]
1 2 [0, 2, 3, 4] [4]
2 3 [0, 6, 7, 8] [6, 7]
3 4 [0, 1, 2, 6] []
为了遵守您的优化标准(始终从具有列表中最多的产品(,每个商店的产品列表每次迭代都需要在上再次对进行排序:每次您决定从给定的商店获得一组产品,剩余的列表需要清理(移除已购买的产品(并按长度重新订购。
作为技术说明,我将把您的列表转换为集合,因为您不希望重复,所以可以这样做,并且它为我们提供了集合运算:交集(检查给定商店中有哪些想要的产品(和差异(从通缉名单中删除已购买的产品。(
代码不太优雅,但我包含了不少评论:
stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]})
# stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
# 'Product': [0,2,7,6,0,2,4,8,1,2,7,6,1,2,3,5]})
w = stores.groupby('Store', as_index=False).agg(list)
w['Products to get'] = np.nan
w['Products to get'] = w['Products to get'].astype('object')
wanted_Products = [1,2,3,4,5,6,7]
wanted = set(wanted_Products)
tmp = w[['Store', 'Product']]
while len(wanted) > 0:
# Removed unwanted products (set intersection)
tmp['Product'] = tmp.Product.apply(lambda x: set(x) & wanted)
# Sort on length of product sets
tmp['lengths'] = tmp.Product.str.len()
tmp = tmp.sort_values(by='lengths', ascending=False).drop('lengths', 1)
# Get products from this store, remove them from wanted set
get = tmp.loc[tmp.index[0], 'Product'] & wanted
wanted -= get
# Update Products to get for this store
row = w[w['Store'] == tmp.loc[tmp.index[0], 'Store']]
w.at[row.index[0], 'Products to get'] = get
# Remove the largest product set, work on the remaining ones
tmp = tmp.iloc[1:, ]
这是输出:
In [71]: w
Out[71]:
Store Product Products to get
0 1 [1, 2, 3, 5] {1, 2, 3, 5}
1 2 [0, 2, 3, 4] {4}
2 3 [0, 6, 7, 8] {6, 7}
3 4 [0, 1, 2, 6] NaN
随着3号和4号商店的产品越来越多,它仍然有效:
stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
'Product': [0,2,7,6,0,2,4,8,1,2,7,6,1,2,3,5]})
输出为:
In [76]: w
Out[76]:
Store Product Products to get
0 1 [0, 2, 7, 6] NaN
1 2 [0, 2, 4, 8] {4}
2 3 [1, 2, 7, 6] {1, 2, 6, 7}
3 4 [1, 2, 3, 5] {3, 5}