无法将合并的 Dask 数据帧转换为 pandas 数据帧 - 获取错误"Unalignable boolean Series provided as indexer"



我正在尝试合并3个任务数据框架并相应地查询它们。数据框架是从forma1数据的SQL表的.csv文件创建的。表

laptimes: raceId, driverId, lap, position, time
races: raceId, year, round
drivers: driverId, forename, surname

下面的laptimes表与racesdrivers表有外键关系。我已经读取,合并了数据帧,并试图查询数据帧:

from dask import dataframe as df
import pandas as pd
pd.set_option("max_rows", None)
df_times = df.read_csv("lap_times.csv", blocksize=64000000)
df_races = df.read_csv("races.csv",blocksize=64000000)
df_drivers = df.read_csv("drivers.csv", blocksize=64000000)
pd1 = df_times.merge(df_drivers, on = "driverId").merge(df_races, on = "raceId")
.drop(labels = ["dob", "nationality", "milliseconds", "url_x", "url_y", "time_y", "date"], 
axis = 1)
.loc[(df_drivers["forename"] == "Sebastian") 
& (df_drivers["surname"] == "Vettel") 
& (df_races["year"] == 2018) 
& (df_races["round"] == 1), :]

我能够运行上面的代码没有错误。但是,如果我尝试使用pd1.compute()将过滤后的任务数据框转换为pandas数据框,则会得到错误:

Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

我已经为此搜索了解决方案,并且看到,对于pandas数据框架,重置索引是一个解决方案,但不适用于任务数据框架。我错过了什么?如有任何帮助,不胜感激。

我能够获得所需的子集,但是我必须分别执行每个合并:

pd1 = df_times.merge(df_drivers, on = "driverId")
.drop(labels = ["dob", "nationality", "milliseconds", "url"],axis = 1)
pd1 = pd1.merge(df_races, on = "raceId")

pd2 = pd1.drop(["url", "time_y", "date"], axis = 1)
pd2.loc[(pd2["forename"] == "Lewis") & (pd2["surname"] == "Hamilton") & (pd2["year"] == 2018) & (pd2["round"] == 1)].compute()

这不是最优的,我仍然想找到一个解决方案,在一个语句中执行所有3个表之间的合并。

相关内容

  • 没有找到相关文章

最新更新