我正在尝试读取sql表并在任务中执行合并。这里使用的是dask版本2.8.0。下面是我的代码片段:
tdf = dd.read_sql_table('comments', conn_url, index_col='author', divisions=list('1234567890'))
adf = dd.read_sql_table('users', conn_url, index_col='id', divisions=list('1234567890'))
dd.merge(tdf, adf, how='left', left_index=True, right_index=True)
索引的dtypes为'O'。但是我得到一个错误
...
...
~/continual/venv/lib/python3.8/site-packages/dask/dataframe/core.py in repartition(self, divisions, npartitions, partition_size, freq, force)
1120 return repartition_npartitions(self, npartitions)
1121 elif divisions is not None:
-> 1122 return repartition(self, divisions, force=force)
1123 elif freq is not None:
1124 return repartition_freq(self, freq=freq)
~/continual/venv/lib/python3.8/site-packages/dask/dataframe/core.py in repartition(df, divisions, force)
5656 tmp = "repartition-split-" + token
5657 out = "repartition-merge-" + token
-> 5658 dsk = repartition_divisions(
5659 df.divisions, divisions, df._name, tmp, out, force=force
5660 )
~/continual/venv/lib/python3.8/site-packages/dask/dataframe/core.py in repartition_divisions(a, b, name, out1, out2, force)
5314 ('c', 2): ('b', 3)}
5315 """
-> 5316 check_divisions(b)
5317
5318 if len(b) < 2:
~/continual/venv/lib/python3.8/site-packages/dask/dataframe/core.py in check_divisions(divisions)
5276 divisions = list(divisions)
5277 if divisions != sorted(divisions):
-> 5278 raise ValueError("New division must be sorted")
5279 if len(divisions[:-1]) != len(list(unique(divisions[:-1]))):
5280 msg = "New division must be unique, except for the last element"
ValueError: New division must be sorted
如何实现这个连接?
division列表确实没有排序,回想一下您的索引是字符串格式的,'0'作为字符串将在'1'之前:
# check order
sorted(list("123456890"))