这是python中查询字符串最快的方式



假设您必须查询100个唯一商品的价格(在本例中,我将只查询2个),使用python数据结构查询这些商品名称的最快方式(以每次查询秒为单位)是什么?

下面是一个pandas数据框的例子(只查询3个项目,我想查询大约50或100个)。

import pandas as pd
import time
df = pd.DataFrame({"price":[11,33,5,29,999]*100000},index=["car","boat","axe","fork","plane"]*100000)
# now query the price of these items:
time_start = time.time()
query = df.loc[["car","boat","plane"]]
time_elapsed = round(time.time()-time_start,2)
print(f"[INFO] Time elapsed: {time_elapsed} seconds")
print(query) 

你能想到比我展示的更快的方法吗?理想情况下,我不想想到一个数据库,而是一个数据结构,但我对mongodb等数据库的建议持开放态度(不作为答案,只是作为评论)。

谢谢!

我怀疑你会得到比使用蒙版更好的:

import pandas as pd
df = pd.DataFrame({"price":[11,33,5,29,999]*100000},index=["car","boat","axe","fork","plane"]*100000)
items = ["car","boat","plane"]
def v1(df, items):
df = df.copy()
return df.T[items].T
def v2(df, items):
df = df.copy()
return df.loc[items]
def v3(df, items):
df = df.copy()
return df[df.index.isin(items)]
print('Treating index as columns:')
%timeit v1(df, items)
print('nUsing loc:')
%timeit v2(df, items)
print('nUsing a mask:')
%timeit v3(df, items)

输出:

Treating index as columns:
108 ms ± 8.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using loc:
57 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using a mask:
12.8 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

最新更新