我有一个s3桶,文件在文件夹结构下,如folder1/folder2
我想只列出文件夹结构下的文件,并遍历Sagemaker Jupyter笔记本中的文件。
我怎样才能做到这一点?我尝试了使用boto3列出bucket内容中的说明,但是只能递归地列出顶层。但是我只想在文件夹级别列出。
我还尝试了下面的代码片段
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('bucketname/folder1/folder2')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object)
,得到如下错误
ParamValidationError: Parameter validation failed:
Invalid bucket name...
当前使用Python 3.9。谢谢!
这里有几个问题:
bucketname
为桶名folder1/folder2/
为关键字冠- 你需要过滤列表,而不是得到所有的对象
尝试:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucketname')
for object_summary in bucket.objects.filter(Prefix='folder1/folder2/'):
print(object_summary)
这将导致打印ObjectSummary值的列表,例如:
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/abc.csv')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/def.csv')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/xyz.png')
s3.ObjectSummary(bucket_name='bucketname', key='folder1/folder2/folder3/')
请注意,它将包括folder1/folder2/
级别的所有对象,而不管它们的文件扩展名后缀是什么,并且它可能包括文件夹本身(folder1/folder2/
)和任何逻辑子文件夹(如folder1/folder2/folder3/
)的指示。
可以从对象摘要中检索对象,如下所示:
for object_summary in bucket.objects.filter(Prefix="folder1/folder2/"):
print(object_summary.Object().key)
输出一个Object键列表,例如:
folder1/folder2/
folder1/folder2/abc.csv
folder1/folder2/def.csv
folder1/folder2/xyz.png
folder1/folder2/folder3/
您可以根据需要过滤这些以获得csv,例如:
summaries = bucket.objects.filter(Prefix="folder1/folder2/")
csvs = [x for x in summaries if x.Object().key.endswith(".csv")]
for objectsummary in csvs:
print(objectsummary.Object().key)
结果是:
folder1/folder2/abc.csv
folder1/folder2/def.csv
您可以拆分实际的文件名,如下所示:
for objectsummary in csvs:
print(objectsummary.Object().key.split("/")[-1])
结果是:
abc.csv
def.csv