如何在云函数中使用Python从s3 bucket中的一个目录中读取所有文件



这是我的代码:我正试图从s3 bucket中读取所有相同格式的文件错误:";无法建立源连接[Erno 2]没有这样的文件或目录:"/user_code/s3:/";

def s3_file_read(self,source)
bucket_name = 'xxx'
region='xxx'
object_name = 's3-folder-name/'
ACCESS_KEY_ID = 'xxx'
ACCESS_SECRET_KEY = 'xxx'
s3_client = boto3.client('s3',aws_access_key_id=ACCESS_KEY_ID,aws_secret_access_key=ACCESS_SECRET_KEY,region_name=region) 
file_path = "s3://your-bucket-name/folder-name/"
prefix = os.path.abspath(file_path) 
file_list = [os.path.join(prefix, f) for f in os.listdir(prefix) if f.endswith('.csv')]
print('##################################Reading the file#############################')
file_type = source['fileType'].lower()
if source['fileType'] == 'csv':
try:
obj = s3_client.get_object(Bucket= bucket_name, Key= object_name)
file_df = pd.read_csv(obj['Body'])
print("CSV File read success")
except Exception as e:
print("Could not read the file {}".format(e))
else:
print("File format supported CSV")

[1]: https://i.stack.imgur.com/6pX8d.png

我已经对您想在这里做什么做了一些假设,但这段代码将读取bucket中的键,并仅创建.csv对象列表。然后您可以读取该列表并测试是否可以创建数据帧。如果您想将所有这些文件读取到一个更大的数据帧中,则需要重写函数的末尾。

s3sr = boto3.resource('s3')
#there are other examples of collecting objects, this is just what I use
def get_keys_from_prefix(self, bucket, prefix):
'''gets list of keys for given bucket and prefix'''
keys_list = []
paginator = s3sr.meta.client.get_paginator('list_objects_v2')
# use Delimiter to limit search to that level of hierarchy
for page in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'):
keys = [content['Key'] for content in page.get('Contents')]
print('keys in page: ', len(keys))
keys_list.extend(keys)
return keys_list


def s3_file_read(self,source):
bucket_name = 'xxx'
region='xxx'
prefix = 's3-folder-name/'  # if no prfex, pass ''
ACCESS_KEY_ID = 'xxx'
ACCESS_SECRET_KEY = 'xxx'
s3_client = boto3.client('s3',aws_access_key_id=ACCESS_KEY_ID,aws_secret_access_key=ACCESS_SECRET_KEY,region_name=region) 
keys_list = self.get_keys_from_prefix(bucket_name, prefix)
csv_list = [f for f in keys_list  if f.endswith('.csv')]

for csvfile in csv_list:
try:
obj = s3_client.get_object(Bucket= bucket_name, Key= csvfile)
file_df = pd.read_csv(obj['Body'])
print("CSV File read success")
except Exception as e:
print("Could not read the file {}".format(e))

相关内容

  • 没有找到相关文章

最新更新