无法使用云存储FUSE读取数据

在一个Vertex AI工作台笔记本中，我正在尝试使用Cloud Storage FUSE从Cloud Storage读取数据。云存储中数据集的文件路径为：gs://my_bucket_name/cola_public/raw/in_domain_train.tsv，所以我可以将其读取到熊猫数据帧中，如下所示：

import pandas as pd
# Load the dataset into a pandas dataframe.
df = pd.read_csv("gs://my_bucket_name/cola_public/raw/in_domain_train.tsv", delimiter='t', header=None, names=['sentence_source', 'label', 'label_notes', 'sentence'])
# Report the number of sentences.
print('Number of training sentences: {:,}n'.format(df.shape[0]))
# Display 10 random rows from the data.
df.sample(10)

前面的代码可以无缝工作。然而，我想更新我的代码，以便使用Cloud Storage FUSE读取数据(稍后用于Vertex AI Training)。基于使用Cloud Storage FUSE和这个Codelab读取和写入Cloud Storage文件，我应该能够使用以下代码加载我的数据：

df = pd.read_csv("/gcs/my_bucket_name/cola_public/raw/in_domain_train.tsv", delimiter='t', header=None, names=['sentence_source', 'label', 'label_notes', 'sentence'])

不幸的是，它对我不起作用。错误消息是：

FileNotFoundError: [Errno 2] No such file or directory: '/gcs/my_bucket_name/cola_public/raw/in_domain_train.tsv'

我该如何解决这个问题？提前谢谢！

感谢Ayush Sethi的回答：

">你试过执行上述代码实验室的第5步吗？GCS铲斗是在执行步骤5时安装的。因此，在步骤4中容器化的训练应用程序代码在作为步骤5中描述的VertexAI上的训练作业运行时应该能够访问GCS桶中存在的数据；

本指南(特别是"安装云存储桶"部分)为我解决了这个问题：

https://cloud.google.com/blog/topics/developers-practitioners/cloud-storage-file-system-vertex-ai-workbench-notebooks/

事实上，这并不奇怪——您需要将数据装载到jupyterlab上名为/gcs的目录中(唯一令人惊讶的是，在gcs FUSE的任何其他指南中都没有提到这一点，这让它看起来像是在幕后完成的)

相关内容

最新更新

热门标签：