我觉得我在标题中没有很好地表达自己,但基本上我需要做的是。我有一个非常大的列表,在索引1中包含漫画的名称,索引2包含单价,索引3包含销售量,索引4包含付款总额。
[['1', 'Tintin', '9.95', '3', '29.85'], ['2', 'Asterix', '12.5', '3', '37.5'], ['3', 'Asterix', '12.5', '3', '37.5'], ['4', 'Asterix', '12.5', '2', '25']
我需要找到售出的单位和支付的总金额。例如,这里的Asterix是:
['Asterix', 12.5, 8, 100]
有什么想法吗?
data = [['1', 'Tintin', '9.95', '3', '29.85'], ['2', 'Asterix', '12.5', '3',
'37.5'], ['3', 'Asterix', '12.5', '3', '37.5'], ['4', 'Asterix', '12.5',
'2', '25']]
store = {}
for i in data:
if i[1] not in store:
store[i[1]] = ['',0,0,0]
store[i[1]][0] = i[1]
store[i[1]][1] = i[2]
store[i[1]][2] += float(i[3])
store[i[1]][3] += float(i[3])
print(list(store.values()))
已经有了很好的答案,只是为了发布更大案例的替代方案,您也可以考虑使用pandas
:
import pandas as pd
purchase_list = [['1', 'Tintin', '9.95', '3', '29.85'], ['2', 'Asterix', '12.5', '3', '37.5'], ['3', 'Asterix', '12.5', '3', '37.5'], ['4', 'Asterix', '12.5', '2', '25']]
purchase_list = [(int(pid), name, float(price), int(count), float(total)) for pid, name, price, count, total in purchase_list]
df = pd.DataFrame.from_records(my_list, columns = ['id', 'Name', 'price', 'count', 'total'])
因此,现在数据被转换为pandas.DataFrame
。
id Name price count total
0 1 Tintin 9.95 3.0 29.85
1 2 Asterix 12.50 3.0 37.50
2 3 Asterix 12.50 3.0 37.50
3 4 Asterix 12.50 2.0 25.00
为了对行进行分组,我们可以定义要用于分组的函数,然后使用合并标准:
d = {'price': 'first', 'Name': 'first', 'count': 'sum' ,'total': 'sum'}
df_grouped = df.groupby('Name').aggregate(d)
输出:
price Name count total
Name
Asterix 12.50 Asterix 8.0 100.00
Tintin 9.95 Tintin 3.0 29.85
生成一个函数,该函数接受给定的名称和一些数据,并对其进行迭代,以执行您所描述的逻辑。在执行加法之前,只需确保强制使用正确的数字类型即可。
def stats(comic_data, name):
unit_price = None
num_sold = 0
revenue = 0
for comic in comic_data:
_, cname, unit_p, num, amnt = comic
if cname == name:
if unit_price is None:
unit_price = float(unit_p)
num_sold += int(num)
revenue += float(amnt)
return [name, unit_price, num_sold, round(revenue, 2)]
stats(data, "Asterix")
>> ['Asterix', 12.5, 8, 100.0]
听起来你实际上拥有的是元组,而它们实际上是采购订单(或者发票?(然后对它们进行建模。
from dataclasses import dataclass
# for nice translations between the string price and the int price
from decimal import Decimal
@dataclass
class PurchaseOrder:
name: str
price: int # in cents
quantity: int
total_price: int # should be price * quantity
@classmethod
def from_tuple(cls, tup):
_, name, price, quantity, total_price = tup
price = int(Decimal(price) * 100)
quantity = int(quantity)
total_price = int(Decimal(total_price) * 100)
return cls(name, price, quantity, total_price)
raw_purchase_orders = [['1', 'Tintin', '9.95', '3', '29.85'], ['2', 'Asterix', '12.5', '3', '37.5'], ['3', 'Asterix', '12.5', '3', '37.5'], ['4', 'Asterix', '12.5', '2', '25']]
# Populate purchase orders
purchase_orders = []
for raw_po in raw_purchase_orders:
try:
po = PurchaseOrder.from_tuple(raw_po)
except ValueError: # not the right format, not enough values to unpack etc
pass
except TypeError: # wrong types in wrong places
pass
else:
purchase_orders.append(po)
现在我们有了一个合适的模型来描述我们所看到的,我们可以基于这些领域来做一些工作。让我们使用排序和itertools.groupby以合理的格式获取它。
from itertools import groupby
from operator import attrgetter
groups = groupby(sorted(purchase_orders, key=attrgetter('name')), attrgetter('name'))
aggregate_purchase_orders = []
for groupname, group in groups:
dataset = list(group) # otherwise we can only iterate once
name = groupname
price = sum(po.price for po in dataset)//len(dataset) # average price
quantity = sum(po.quantity for po in dataset)
total_price = sum(po.total_price for po in dataset)
po = PurchaseOrder(name, price, quantity, total_price)
aggregate_purchase_orders.append(po)
print(aggregate_purchase_orders)
# see output like:
# [
# PurchaseOrder(name='Asterix', price=1250, quantity=8, total_price=10000),
# PurchaseOrder(name='Tintin', price=995, quantity=3, total_price=2985)
# ]