使用modin panda读取sas7bdat大文件:FactoryDispatcher.read_sas()接受1个位



我想在jupyter笔记本中读取一个大文件。(由于内存限制,不能使用pandas读取)。数据文件需要超过35 GB的内存,但我的空间只有20 GB。因此,我尝试使用modin panda代替,但出现错误。

FactoryDispatcher.read_sas()接受1个位置参数,但给出了2个

## Reading sas7bdat
import modin.pandas as pd 
import numpy as np
cabd_2021 = pd.read_sas(
'/xxx/xxx/xxx/xxxx/xxx/xxxx_010121.sas7bdat',
format = "sas7bdat",
encoding="latin-1"
)
cabd
Returns ERROR
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 3
1 ## Les datafil fra SAS
----> 3 cabd= pd.read_sas()
File ~/jxxxxxxxxxxxxxxxxxxxxx/.venv/lib/python3.10/site-packages/modin/logging/logger_decorator.py:128, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs)
113 """
114 Compute function with logging if Modin logging is enabled.
115 
(...)
125 Any
126 """
127 if LogMode.get() == "disable":
--> 128     return obj(*args, **kwargs)
130 logger = get_logger()
131 logger_level = getattr(logger, log_level)
File ~/xxxxxxxxxxxxxxxxxxxxxxxxx/.venv/lib/python3.10/site-packages/modin/pandas/io.py:581, in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator, compression)
577 Engine.subscribe(_update_engine)
578 from modin.core.execution.dispatching.factories.dispatcher import FactoryDispatcher
580 return DataFrame(
--> 581     query_compiler=FactoryDispatcher.read_sas(
582         filepath_or_buffer,
583         format=format,
584         index=index,
585         encoding=encoding,
586         chunksize=chunksize,
587         iterator=iterator,
588         compression=compression,
589     )
590 )
TypeError: FactoryDispatcher.read_sas() takes 1 positional argument but 2 were given

这是一个Modin bug,在一个尚未发布的pull request中被修复。您可以安装最新版本的Modin来获得修复,或者您可以等待Modin的下一个补丁发布。如果不能更改Modin版本,可以使用pandas.read_sas将数据框读入pandas,然后从生成的pandas数据框中构造一个Modin数据框。

最新更新