我有这个代码,它工作得很好,是最小的和可复制的。它使用lists
和tuples
。考虑到列表和元组在大量数据上的缓慢,我想改变整个设置并使用dictionaries
来加快性能。
所以我想把这个队列块转换成类似的使用字典的东西。
代码的目的是创建变量x
和y
(数学数据的计算),并使用追加和元组将它们添加到列表中。然后,我为某些目的挖掘这些数字。
我如何在需要的地方添加dictionaries
并将其替换为list/append
代码?谢谢你!
VERSION WITH TUPLE AND LIST
mylist = {('Jack', 'Grace', 8, 9, '15:00'): [0, 1, 1, 5],
('William', 'Dawson', 8, 9, '18:00'): [1, 2, 3, 4],
('Natasha', 'Jonson', 8, 9, '20:45'): [0, 1, 1, 2]}
new = []
for key, value in mylist.items():
#create variables and perform calculations
calc_x= sum(value)/ len(value)
calc_y = (calc_x *100) / 2
#create list with 3 tuples inside
if calc_x > 0.1:
new.append([[key], [calc_x], [calc_y]])
print(new)
print(" ")
#example for call calc_x
print_x = [tuple(i[1]) for i in new]
print(print_x)
我试着写这样的东西,但我认为它不合适,所以不要看它。如果可能的话,我有两个请求:
- 我希望
sum(value)/ len(value)
和(calc_x *100) / 2
继续拥有自己的变量calc_x
和calc_y
,以便它们可以单独调用你可以看到 - 在
new
变量中,我希望能够在需要时调用变量,例如for example i do for print_x = [tuple(i[1]) for i in new]
。谢谢你
如果你真的想提高性能,你可以使用Pandas
(或Numpy
)来向量化数学运算:
import pandas as pd
# Transform your dataset to DataFrame
df = pd.DataFrame.from_dict(mylist, orient='index')
# Compute some operations
df['x'] = df.mean(axis=1)
df['y'] = df['x'] * 50
# Filter out and export
out = df.loc[df['x'] > 0.1, ['x', 'y']].to_dict('split')
new = dict(zip(out['index'], out['data']))
输出:
>>> new
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
numpy版本:
import numpy as np
# transform keys to numpy array (special hack to keep tuples)
keys = np.empty(len(mylist), dtype=object)
keys[:] = tuple(mylist.keys())
# transform values to numpy array
vals = np.array(tuple(mylist.values()))
x = np.mean(vals, axis=1)
y = x * 50
# boolean mask to exclude some values
m = x > 0.1
out = np.vstack([x, y]).T
new = dict(zip(keys[m].tolist(), out[m].tolist()))
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
python版本:
new = {}
for k, v in mylist.items():
x = sum(v) / len(v)
y = x * 50
if x > 0.1:
new[k] = [x, y]
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
:如何提取x
:
# Pandas
>>> df['x'].tolist() # or simply df['x'] to extract the column
[1.75, 2.5, 1.0]
# Python
>>> [v[0] for v in new.values()]
[1.75, 2.5, 1.0]