我正在为一个变化非常间歇性的sqlite数据库编写备份脚本。现在是这样的:
from bz2 import BZ2File
from datetime import datetime
from os.path import dirname, abspath, join
from hashlib import sha512
def backup_target_database(target):
backup_dir = dirname(abspath(target))
hash_file = join(backup_dir, 'last_hash')
new_hash = sha512(open(target, 'rb').read()).digest()
if new_hash != open(hash_file, 'rb').read():
fmt = '%Y%m%d-%H%M.sqlite3.bz2'
snapshot_file = join(backup_dir, datetime.now().strftime(fmt))
BZ2File(snapshot_file, 'wb').write(open(target, 'rb').read())
open(hash_file, 'wb').write(new_hash)
当前数据库的重量仅为20MB,所以当它运行并将整个文件读入内存时(并且在检测到更改时执行两次)并不是那么费力,但是我不想等到这成为问题。
做这种(使用Bashscript术语)流管道的正确方法是什么?
首先,您的代码中存在重复(读取target
文件两次)。
你可以用shutil。Copyfileobj和hashlib。更新内存高效例程。
from bz2 import BZ2File
from datetime import datetime
from hashlib import sha512
from os.path import dirname, abspath, join
from shutil import copyfileobj
def backup_target_database(target_path):
backup_dir = dirname(abspath(target_path))
hash_path = join(backup_dir, 'last_hash')
old_hash = open(hash_path, 'rb').read()
hasher = sha512()
with open(target_path, 'rb') as target:
while True:
data = target.read(1024)
if not data:
break
hasher.update(data)
new_hash = hasher.digest()
if new_hash != old_hash:
fmt = '%Y%m%d-%H%M.sqlite3.bz2'
snapshot_path = join(backup_dir, datetime.now().strftime(fmt))
with open(target_path, 'rb') as target:
with BZ2File(snapshot_path, 'wb', compresslevel=9) as snapshot:
copyfileobj(target, snapshot)
(注意:我没有测试这段代码。如果你有问题请通知我)