尝试从 s3 字节对象读取泡菜文件时"ERROR - 'NoneType' object has no attribute 'axes'"



我正在Apache气流环境中运行以下代码,以从s3获取pickle文件并将其读取到内存中。一旦我尝试读取/打印文件内容,我就会收到错误:

ERROR - 'NoneType' object has no attribute 'axes'


代码

import boto3
import pickle

# [...Omitted code...]  
s3_session = boto3.Session(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key
)
s3 = s3_session.resource('s3')
obj = s3.Object(bucket_name, KEY)
pickle_contents = obj.get()['Body'].read()
body = pickle.loads(pickle_contents)
print(body)
# ^-- This is where the error happens, as soon as I try to read it. 

实际上,这段代码在一个单独的Jupyter笔记本实例上运行得很好,这让我猜测是版本不兼容问题吗?pickle文件看起来像下面的字典,这要归功于我的Jupyter笔记本让我print(body):

PICKLE文件体:

{75: 
'recommendation_diversity_metrics': 
{'largest_subcategory_group_proportion': 
{'mean': 0.3369472,
'sd': 0.1741708739837092,
'min': 0.05333333333333334,
'max': 1.0},
'catalogue_entropy': 3.4412171579585533,
'subcategory_overweight_frequency': 
School & Office Supplies    0.73020
Pants                       0.70656
Bedding                     0.64138
Sweaters                    0.62616
Tops                        0.57044
...   
Cleanup & Odor Control      0.00144
UNKNOWN                     0.00036
Body Piercings              0.00034
Misc Books                  0.00012
Home Books                  0.00012
Length: 94, dtype: float64},
'recommendation_novelty_metrics': {
'previously_interacted': {'mean': 0.052456533333333326,
'sd': 0.06291214458333363,
'min': 0.0,
'max': 0.6},
'new_product_frequency': {'mean': 0.016672799999999998,
'sd': 0.01423356021834222,
'min': 0.0,
'max': 0.12}
}}

我认为发生错误是因为我在字典中有一个pandas Series对象(请参阅上面字典中的subcategory_overweight_frequency(因为只要我只读入除特定元素之外的所有字典元素,解释器就会让我的代码运行良好。我是不是错过了一个我不知道的依赖?


全跟踪

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 655, in __repr__
show_dimensions=show_dimensions,
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 774, in to_string
line_width=line_width,
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 484, in __init__
self.max_rows_displayed = min(max_rows or len(self.frame), len(self.frame))
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 996, in __len__
return len(self.index)
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 5175, in __getattr__
return object.__getattribute__(self, name)
File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
AttributeError: 'NoneType' object has no attribute 'axes'

您可能已经使用较新版本的Pandas对DataFrame进行了pickle,并且可能正在尝试使用早期版本读取pickle文件。

请验证您用来pickle DataFrame的版本,以及您正在Airflow中使用的Pandas版本。

最新更新