我正在尝试使用多处理库来加快从文件中读取CSV的速度。我已经使用 Pool 这样做了,现在我正在尝试使用 Process(( 来做到这一点。但是,在运行代码时,它给了我以下错误:
属性错误:"元组"对象没有属性"连接">
有人可以告诉我出了什么问题吗?我不明白这个错误。
import glob
import pandas as pd
from multiprocessing import Process
import matplotlib.pyplot as plt
import os
location = "/home/data/csv/"
uber_data = []
def read_csv(filename):
return uber_data.append(pd.read_csv(filename))
def data_wrangling(uber_data):
uber_data['Date/Time'] = pd.to_datetime(uber_data['Date/Time'], format="%m/%d/%Y %H:%M:%S")
uber_data['Dia Setmana'] = uber_data['Date/Time'].dt.weekday_name
uber_data['Num dia'] = uber_data['Date/Time'].dt.dayofweek
return uber_data
def plotting(uber_data):
weekdays = uber_data.pivot_table(index=['Num dia','Dia Setmana'], values='Base', aggfunc='count')
weekdays.plot(kind='bar', figsize=(8,6))
plt.ylabel('Total Journeys')
plt.title('Journey on Week Day')
def main():
processes = []
files = list(glob.glob(os.path.join(location,'*.csv*')))
for i in files:
p = Process(target=read_csv, args=[i])
processes.append(p)
p.start()
for process in enumerate(processes):
process.join()
#combined_df = pd.concat(df_list, ignore_index=True)
#dades_mod = data_wrangling(combined_df)
#plotting(dades_mod)
main()
谢谢。
我不是 100% 确定 Process 在这种情况下是如何工作的,但你在这里写了什么:
for process in enumerate(processes):
process.join()
显然会抛出错误,您可以通过了解内置值来查看这一点。 在任何可迭代对象上调用枚举将生成一个元组,其中第一个元素是计数器。
首先尝试一下:
for i, process in enumerate(processes): # assign the counter to the variable i, and grab the process which is the second element of the tuple
process.join()
或者这个:
for process in processes:
process.join()
有关枚举的详细信息,请参阅此处的内置文档:https://docs.python.org/3/library/functions.html#enumerate