Linux中的Pandas故障,在Windows中没有发生-缺少_data属性



我在RHEL Linux中运行python脚本,并得到以下错误:

Traceback (most recent call last):
File "main.py", line 162, in <module>
find_deltas(logging, snapshot_id)
File "/ariel/python_scripts/ariel_deltas/deltas.py", line 71, in find_deltas
data = prepare_frames(logging, file_extracts)
File "/ariel/python_scripts/ariel_deltas/deltas.py", line 606, in prepare_frames
logging.info("df_old has %d records", len(df_old))
File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 1041, in __len__
return len(self.index)
File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'
Traceback (most recent call last):
File "main.py", line 162, in <module>
find_deltas(logging, snapshot_id)
File "/ariel/python_scripts/ariel_deltas/deltas.py", line 71, in find_deltas
data = prepare_frames(logging, file_extracts)
File "/ariel/python_scripts/ariel_deltas/deltas.py", line 606, in prepare_frames
logging.info("df_old has %d records", len(df_old))
File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 1041, in __len__
return len(self.index)
File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'

我有效地从Oracle读取数据帧,将其写入pickle文件,然后在pickle中读取,还读取昨天的pickle,然后对主键进行联接。

Linux究竟为什么会生成一个关于丢失"_数据";属性,当代码在Windows中完全相同的数据集上运行良好时?!

在Linux中读取pickle文件时,列与预期的一样。

>>> df.columns
Index(['AS_OF_DT', 'VARIATION_REQUEST_ID', 'LU_NUMBER', 'LU_TITLE', 'COUNTRY',
'ARCHIVED', 'APPLIED', 'LU_DESCRIPTION', 'HA_LU_REF_NO', 'REMARKS',
'LU_CATEGORY', 'VARIATION_TYPE', 'INSERT_UPDATE_TIME',
'INSERT_UPDATE_USER', 'MERGED', 'REVISION_NUMBER', 'VERSION_SEQ',
'RECORD_ID', 'IMPLEMENTED_SEQ', 'RMS_VERSION_SEQ',
'REASON_FOR_LOCAL_UPDATE', 'C_ECTD_SEQUENCE_NO', 'INSERT_TIME',
'ARCHIVED_DATE', 'REASON_FOR_MERGE', 'SCRN_NO'],
dtype='object')
>>>

生成问题的函数如下:

def prepare_frames(logging, file_extracts):
# file_extracts is a tuple of dictionaries
# old_file
# new_file
# file_info
# file_info is a dict describing the file master record including the join keys
# {"file_id":file_id, "file_desc": r.FILE_DESC, "file_prefix": r.FILE_PREFIX, "compare_col": r.COMPARE_COL}
# old_file and new_file dictionaries describes the file name of the older snapshot file to be compared
# old_file["new_old"] = "old"
# old_file["extract_id"] = extract_id
# old_file["file_id"] = file_id
# old_file["file_name"] = file_name
# old_file["snapshot_id"] = snapshot_id
# old_file["num_records"] = num_records
# Strip columns which we know will be different, to remove false positives such as AS_OF_DT
logging.info("Start: Reading in DataFrames for analysis from pickle files.")
data = []
for extract in file_extracts:
old_file = extract[0]
new_file = extract[1]
file_info = extract[2]  # the dictionary
old_file_name = old_file["file_name"]
new_file_name = new_file["file_name"]
logging.info("Reading in old snapshot from pickle file: %s", old_file_name)
df_old = pd.read_pickle('snapshots/' + old_file_name)
logging.info("Reading in new snapshot from pickle file: %s", new_file_name)
df_new = pd.read_pickle('snapshots/' + new_file_name)
logging.info("df_old has %d records", len(df_old))
logging.info("df_new has %d records", len(df_new))


# before we do any comparisons we need to remove as_of_dt type values as this will produce false deltas
#if "AS_OF_DT" in df_new.columns:
#    del df_new["AS_OF_DT"]
#    del df_old["AS_OF_DT"]
#if "AS_OF_DATE" in df_new.columns:
#    del df_new["AS_OF_DATE"]
#    del df_old["AS_OF_DATE"]
data.append((df_old, df_new, old_file, new_file, file_info))
logging.info("End: Reading in DataFrames for analysis from pickle files.")
return data

606行是这样的:

logging.info("df_old has %d records", len(df_old))

df_old和df_new基本上是读入数据帧的pickle文件。我将相同的pickle文件复制到windows,并且完全没有问题

更新:看起来这是一个逻辑错误,数据帧实际上是空的!

我遇到了同样的问题。我在conda环境中使用pandas=1.0.4。将pandas更新为1.1.0解决了我的问题。

希望能奏效。

最新更新