如何用字典替换列表/元组,在工作代码中,以提高其性能?



我有这个代码,它工作得很好,是最小的和可复制的。它使用liststuples。考虑到列表和元组在大量数据上的缓慢,我想改变整个设置并使用dictionaries来加快性能。

所以我想把这个队列块转换成类似的使用字典的东西。

代码的目的是创建变量xy(数学数据的计算),并使用追加和元组将它们添加到列表中。然后,我为某些目的挖掘这些数字。

我如何在需要的地方添加dictionaries并将其替换为list/append代码?谢谢你!

VERSION WITH TUPLE AND LIST

mylist = {('Jack', 'Grace', 8, 9, '15:00'): [0, 1, 1, 5], 
('William', 'Dawson', 8, 9, '18:00'): [1, 2, 3, 4], 
('Natasha', 'Jonson', 8, 9, '20:45'): [0, 1, 1, 2]}
new = []
for key, value in mylist.items():
#create variables and perform calculations
calc_x= sum(value)/ len(value)
calc_y = (calc_x *100) / 2
#create list with 3 tuples inside
if calc_x > 0.1:
new.append([[key], [calc_x], [calc_y]])
print(new)
print(" ")
#example for call calc_x
print_x = [tuple(i[1]) for i in new]
print(print_x)

我试着写这样的东西,但我认为它不合适,所以不要看它。如果可能的话,我有两个请求:

  • 我希望sum(value)/ len(value)(calc_x *100) / 2继续拥有自己的变量calc_xcalc_y,以便它们可以单独调用你可以看到
  • new变量中,我希望能够在需要时调用变量,例如for example i do for print_x = [tuple(i[1]) for i in new]。谢谢你

如果你真的想提高性能,你可以使用Pandas(或Numpy)来向量化数学运算:

import pandas as pd
# Transform your dataset to DataFrame
df = pd.DataFrame.from_dict(mylist, orient='index')
# Compute some operations
df['x'] = df.mean(axis=1)
df['y'] = df['x'] * 50
# Filter out and export
out = df.loc[df['x'] > 0.1, ['x', 'y']].to_dict('split')
new = dict(zip(out['index'], out['data']))

输出:

>>> new
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}

numpy版本:

import numpy as np
# transform keys to numpy array (special hack to keep tuples)
keys = np.empty(len(mylist), dtype=object)
keys[:] = tuple(mylist.keys())
# transform values to numpy array
vals = np.array(tuple(mylist.values()))
x = np.mean(vals, axis=1)
y = x * 50
# boolean mask to exclude some values
m = x > 0.1
out = np.vstack([x, y]).T
new = dict(zip(keys[m].tolist(), out[m].tolist()))
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}

python版本:

new = {}
for k, v in mylist.items():
x = sum(v) / len(v)
y = x * 50
if x > 0.1:
new[k] = [x, y]
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}

:如何提取x:

# Pandas
>>> df['x'].tolist()  # or simply df['x'] to extract the column
[1.75, 2.5, 1.0]
# Python
>>> [v[0] for v in new.values()]
[1.75, 2.5, 1.0]