下面是生成一百万个元组DataFrame的代码。有什么办法让它更快吗?
for x in np.arange(1000000):
df_tmp= pd.DataFrame({"SalesDate":[random.choice(SalesDates)],
"SalesDistrict":[random.choice(district)],
"ProductSold":[random.choice(product)],
"SalesAmount":[random.SystemRandom().uniform(1,5000)]})
df=pd.concat([df,df_tmp])
df.set_index("SalesDate",inplace=True)
一个想法是先创建一个dataframe列表,然后只运行一个pd.concat
:
dfs = []
for x in np.arange(1000000):
df_tmp= pd.DataFrame({"SalesDate":[random.choice(SalesDates)],
"SalesDistrict":[random.choice(district)],
"ProductSold":[random.choice(product)],
"SalesAmount":[random.SystemRandom().uniform(1,5000)]})
dfs.append(df_tmp)
df = pd.concat(dfs).set_index("SalesDate")
另一个更快的想法是由numpy.random.choice
和numpy.random.uniform
生成DataFrame:
N = 1000000
df = pd.DataFrame({"SalesDistrict":np.random.choice(district, size=N),
"ProductSold":np.random.choice(product, size=N),
"SalesAmount":np.random.uniform(1,5000, N)},
index=np.random.choice(SalesDates, size=N))