当用户传输10TB的文件时, s3cmd性能非常差



我正在尝试使用s3cmd向COS(云对象存储)传输10TB文件。

为了传输文件,我使用下面的命令:

python3 cloud-s3.py——upload s3cmd/data/10TB.txt pr-bucket1——multipart-chunk-size-mb 1024——limit-rate 100M——no-check-md5

传输这个文件大约需要55个小时。

是否有其他可用的参数可以提高其性能?

另一方面,Amazon AWS需要大约22小时才能完成相同文件的传输。

为什么s3cmd的性能这么差?是它的设计方式吗?

有谁能帮我一下吗?下面是我在cloud-s3.py文件中的内容:

#!/usr/bin/python
import sys
import argparse
import subprocess
def main(argv):
parser = argparse.ArgumentParser(description='Cloud project. Prereq:pip3 ')
parser.add_argument("-i", "--install", help="Command to install either s3cmd / aws cli.", choices=["s3cmd", "aws_cli"], dest='installation', type= str)
parser.add_argument("-c", "--configure", help="Command to configure either s3cmd / aws cli.", choices=["s3cmd", "aws_cli"], dest='configure', type= str)
parser.add_argument("-u", "--upload", help="Command to transfer file to the bucket. protocol, file path and bucket name are required. Upload supports GPG encryption.", nargs=3, type=str)
parser.add_argument("-l", "--list", help="Command to list the bucket items. protocol, bucket name is required.", nargs=2, type=str)
parser.add_argument("-e", "--encrypt", help="Flag to send an encrypted file. The encryption password needs to be given while configuring s3cmd. Other users would need to use gpg -d <file> to decrypt it. And should enter the password you supplied.", action='store_true', dest='encryption')
parser.add_argument("-d", "--disable-multipart", help="Flag to disable multipart transfer for the current transfer. FYI, By default the multipart transfer is enabled for files larger than the default multipart chunk size. Refer .s3cfg text file.", dest='disable_multipart', action='store_true')
parser.add_argument("-s", "--multipart-chunk-size-mb", help="Size of each chunk of a multipart upload. Files bigger than SIZE are automatically uploaded as multithreaded-multipart, smaller files are uploaded using the traditional method. SIZE is in Mega-Bytes, default chunk size is 15MB, minimum allowed chunk size is 5MB, maximum is 5GB.", dest='chunk_size', type=str, nargs=1)
parser.add_argument("--sync", help="Conditional Transfer. Only files that doesn't exist at the destination in the same version are transferred. Note: sync doesn't support GPG encryption.", dest='sync_data', nargs=3, type=str)
parser.add_argument("--limit-rate", help="Limit the upload or download speed to amount bytes per second.  Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix", dest='limit_rate', nargs=1, type=str)
parser.add_argument("--no-check-md5", help="Do not check MD5 sums when comparing files for [sync]. Only size will be compared. May significantly speed up transfer but may also miss some changed files.", dest='no_checksum', action='store_true')
argument = parser.parse_args()
install = argument.installation
config = argument.configure
upload = argument.upload
list_bucket = argument.list
encrypt_enabled = argument.encryption
disable_multipart = argument.disable_multipart
chunk_size = argument.chunk_size
sync = argument.sync_data
limit_rate = argument.limit_rate
no_checksum = argument.no_checksum
if install == 's3cmd':
print("s3 cmd")
subprocess.call('sudo pip3 install s3cmd', shell=True)
elif install == 'aws cli':
print("aws cli")
if config == "s3cmd":
print("config s3 cmd")
subprocess.run('s3cmd --configure', shell=True)
elif config == "aws_cli":
print("config aws cli")
if upload:
print("upload")
protocol = argument.upload[0]
filename = argument.upload[1]
bucketname = "s3://"
bucketname += argument.upload[2]
print("protocol = ", protocol)
print("filename = ", filename)
print("bucket = ", bucketname)
upload_list = [protocol, "put", filename, bucketname]
if encrypt_enabled :
upload_list.append("-e")
if disable_multipart :
upload_list.append("--disable-multipart")
if chunk_size :
upload_list.append("--multipart-chunk-size-mb")
upload_list.append(argument.chunk_size[0])
if limit_rate :
upload_list.append("--limit-rate")
upload_list.append(argument.limit_rate[0])
print("n Print upload list :n")
print(upload_list)
subprocess.run(upload_list)
if list_bucket:
print("list")
protocol = argument.list[0]
bucketname = "s3://"
bucketname += argument.list[1]
subprocess.run([protocol, "ls", bucketname])
if sync:
print("executing s3 sync")
protocol = argument.sync_data[0]
filename = argument.sync_data[1]
bucketname = "s3://"
bucketname += argument.sync_data[2]
print("protocol = ", protocol)
print("filename = ", filename)
print("bucket = ", bucketname)
sync_list = [protocol, "sync", filename, bucketname]
if disable_multipart :
sync_list.append("--disable-multipart")
if chunk_size :
sync_list.append("--multipart-chunk-size-mb")
sync_list.append(argument.chunk_size[0])
if limit_rate :
sync_list.append("--limit-rate")
sync_list.append(argument.limit_rate[0])
if no_checksum :
sync_list.append("--no-check-md5")
print("n Print sync list :n")
print(sync_list)
subprocess.run(sync_list)
if __name__ == "__main__":
main(sys.argv[1:])

上传到s3的一般建议是,如果您的文件大于100MB,请使用s3的多部分上传。

还有另一种方法可以用来加速上传,那就是S3加速。https://aws.amazon.com/s3/transfer-acceleration/

然而,这是一个极端的情况,即使你有100Mbps的连接,也需要大约23个小时才能上传1TB的文件。

通过互联网上传到S3不是一个好的选择,S3确实有一些其他的产品来上传这种规模的数据,比如AWS snowball。