我正在尝试使用Spark 2.4.4和python中的Scala 2.11后端在数据块中构建卷积神经网络。我以前构建过CNN,但这是我第一次使用Spark(databricks(和AWS s3。AWS中的文件排序如下:
- train_testrongmall/(训练或测试(/(0,1,2或3(/
然后每个目录中对应于其类别(0,1,2,3(的图像列表
为了访问存储在s3存储桶中的文件,我将存储桶安装到数据块上,如下所示:
# load in the image files
WS_BUCKET_NAME = "sensored_bucket_name/video_topic_modelling/data/train_test_small"
MOUNT_NAME = "train_test_small"
dbutils.fs.mount("s3a://%s" % AWS_BUCKET_NAME, "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s" % MOUNT_NAME))
使用:display(dbutils.fs.mounts())
后,我可以看到铲斗安装到:
MountInfo(mountPoint='/mnt/train_test_small', source='sensored_bucket_name/video_topic_modelling/data/train_test_small', encryptionType='')
然后,我尝试使用以下代码通过keras的flow_from_directory((模块访问这个挂载的目录:
# create extra partition of the training data as a validation set
train_datagen=ImageDataGenerator(preprocessing_function=preprocess_input, validation_split=0) #included in our dependencies
# set scaling to most common shapes
train_generator=train_datagen.flow_from_directory('/mnt/train_test_small',
target_size=(320, 240),
color_mode='rgb',
batch_size=96,
class_mode='categorical',
subset='training')
#shuffle=True)
validation_generator=train_datagen.flow_from_directory('/mnt/train_test_small',
target_size=(320, 240),
color_mode='rgb',
batch_size=96,
class_mode='categorical',
subset='validation')
然而,这给了我以下错误:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/train_test_small/train/'
我试图使用keras和databricks文档来解决这个问题,但没有得到进一步的解决。目前我的最佳猜测是keras flow_from_directory((无法检测已装入的目录,但我不确定。
有谁知道如何在databricks中的s3安装目录上应用.flow_from_directory((模块,或者知道一个好的替代方案?非常感谢您的帮助!
我认为您可能缺少对flow_from_directory的一个目录级指示。来自Keras文档:
目录:字符串,目标目录的路径。它应该包含每个类一个子目录。每个子目录树中的任何PNG、JPG、BMP、PPM或TIF图像都将包含在生成器中。
# set scaling to most common shapes
train_generator=train_datagen.flow_from_directory(
'/mnt/train_test_small/train', # <== add "train" folder
target_size=(320, 240),
...
validation_generator=train_datagen.flow_from_directory(
'/mnt/train_test_small/test', # <== add "test" folder
target_size=(320, 240),
....
找到答案。要访问文件夹的直接路径,请添加/dbfs/mnt/train_testrongmall/train/