如何从本地pc上传大文件到DBFS?

我正在尝试学习数据库中的spark SQL，并希望与Yelp数据集合作;但是，该文件太大，无法从UI上传到DBFS。谢谢,菲利普

有几种方法:

使用Databricks CLI的dbfs命令上传本地数据到dbfs
直接从笔记本下载数据集，例如使用%sh wget URL，并将存档解压缩到DBFS(通过使用/dbfs/path/...作为目标，或使用dbutils.fs.cp命令将文件从驱动节点复制到DBFS)
上传文件到AWS S3, Azure数据湖存储，谷歌存储或类似的东西，并访问数据。

上传你想在Databricks中加载的文件到google drive

from urllib.request import urlopen
from shutil import copyfileobj
my_url = 'paste your url here'
my_filename = 'give your filename'
file_path = '/FileStore/tables' # location at which you want to move the downloaded file
# Downloading the file from google drive to Databrick
with urlopen(my_url) as in_stream, open(my_filename, 'wb') as out_file:
copyfileobj(in_stream, out_file)
# check where the file has download
# in my case it is
display(dbutils.fs.ls('file:/databricks/driver'))
# moving the file to desired location
# dbutils.fs.mv(downloaded_location, desired_location)
dbutils.fs.mv("file:/databricks/driver/my_file", file_path)

我希望这对你有帮助

相关内容

最新更新

热门标签：