我有一些代码,该代码与read_sql()
创建生成器,并通过发电机循环以打印每个块:
execute.py
import pandas as pd
from sqlalchemy import event, create_engine
engine = create_engine('path-to-driver')
def getDistance(chunk):
print(chunk)
print(type(chunk))
df_chunks = pd.read_sql("select top 2 * from SCHEMA.table_name", engine, chunksize=1)
for chunk in df_chunks:
result = getDistance(chunk)
它有效,每个块被打印为DataFrame。当我尝试通过这样的多处理来做同样的事情...
outear_function.py
def getDistance(chunk):
print(chunk)
print(type(chunk))
df = chunk
return df
execute.py
import pandas as pd
from sqlalchemy import event, create_engine
engine = create_engine('path-to-driver')
df_chunks = pd.read_sql("select top 2 * from SCHEMA.table_name", engine, chunksize=1)
if __name__ == '__main__':
global result
p = Pool(20)
for chunk in df_chunks:
print(chunk)
result = p.map(getDistance, chunk)
p.terminate()
p.join()
...块在控制台中以" str"类型打印为列名。打印result
揭示此['column_name']
。
为什么块变成了应用多处理时的字符串?
这是因为 p.map
期望功能和一个值得一提的功能。在数据框架上迭代(在这种情况下,您的chunk
)将产生列名。
您需要将数据范围的集合传递给地图方法。即:
global result
p = Pool(20)
result = p.map(getDistance, df_chunks)
p.terminate()
p.join()