在 Python 中修改 gzip xml 文件(Adobe Premiere Pro 项目文件)的更快方法?

这是我在业余时间潜伏和玩python几年后，在stackoverflow社区的第一篇文章。我编写了一个脚本来修改Adobe Premiere Pro文件，以便将它们降级到"版本1"。这允许用户在旧版本的程序中打开新的项目文件。

现在，要澄清的是，这以前已经做过了。在stackoverflow和Adobe论坛上有几个人发布了这个问题的问题和解决方案。我的问题与使用 python gzip 模块和带有 lxml 解析器的 BeautifulSoup 来解压缩和修改 xml 文件的速度/效率有关。

代码如下：

# Assume I've done all the imports like gzip, bs4, pathlib, sys, etc.
#
def downgrade(prproj_in):  # Main functionality of the program. Downgrades target prproj files.
"""
Shortened the docstring to save reading...
"""
new_version = '1'
root, ext = os.path.splitext(prproj_in)  # Checking if file extension is correct.
new_name = (root + '_DOWNGRADED' + '(v.' + str(new_version) + ').prproj')
try:
if ext != '.prproj':
print('Invalid filetype. Must have valid .prproj extension.')
# If not a valid Adobe Premiere file, exit.
elif os.path.exists(new_name):
print('Output file already exists at this location. Please move or rename.')
else:  # Otherwise... continue on to unzip and parse the xml file with BeautifulSoup.
with tqdm(total=100) as pbar:  # Initialize progress bar.
with gzip.open(prproj_in, 'rt') as f:  # Decompress project file and open...
file_content = f.read()  # Put file contents into variable as string text
soup = BeautifulSoup(file_content, 'xml')  # create soup object
print('Current project version: ' +
soup.Project.find_next()['Version'])  # Printing current project version.
soup.Project.find_next()['Version'] = new_version  # Change project version number to 1
print('Downgraded project version to: ' +
str(soup.Project.find_next()['Version']))  # Print new current version.
pbar.update(80)
with gzip.open(new_name, 'wt') as f_out:
f_out.write(str(soup))  # Turn soup object to string for final writing to gzip file.
pbar.update(100)
print('Downgrade Complete. New file: ' + new_name)  # Change file extension.
except:
exception = sys.exc_info()
handle_exceptions(exception[0])

这是解压缩的 .prproj 文件的开头，其中包含我需要修改的相关属性：

<?xml version="1.0" encoding="UTF-8" ?>
<PremiereData Version="3">
<Project ObjectRef="1"/>
<Project ObjectID="1" ClassID="62ad66dd-0dcd-42da-a660-6d8fbde94876" Version="30">

此代码在只有几 MB(解压缩之前(的项目文件上运行良好，但一旦文件大小达到 60、70 或 80 MB，最多需要 10 分钟才能运行。我目前正在制作一部印地纪录片，其中我的项目文件在压缩时超过 100 MB，解压缩时高达 1.6 GB。我在具有 128 GB RAM 和 3 GHz Xeon 处理器的 iMac Pro 上运行此脚本。

我已经测试了 GitHub 上流传的其他几个脚本，它们在处理大型项目文件时似乎表现出类似的行为。

很想听听关于如何解决这个问题的一些想法。谢谢！

@Alfe，感谢您的想法！我能够通过使用正则表达式重写 xml 解析步骤来自己解决问题。在此之前，我没有深入研究正则表达式，但我发现它的运行速度比 Beautiful Soup 解析器快一个数量级。

我修改后的代码在 github 上： https://github.com/snorkem/prproj_downgrade/blob/master/prproj_downgrade.py

如果我理解正确(如果我错了，请纠正我(，您想更改压缩文件开头的一部分。

如果这是正确的，那么文件的未压缩版本是XML文件并不重要。唯一重要的是您想在压缩文件的开头进行一些更改。

这总是很难实现的。压缩算法根据前面的信息存储后面的信息，因此只更改开头也会改变后面的部分，或者反过来说：在开头更改部分使得有必要更改压缩文件的其余部分，以便不更改文件的未压缩版本的其余部分。

在您的情况下，我担心您将不得不解压缩整个文件，更改它，然后再次压缩它。为了避免这种情况，您需要更改项目的体系结构，以便文件的开头和其余部分彼此分开存储，例如在两个不同的文件中。然后，当需要XML信息时，使用软件需要将两个文件拼接在一起。

如果您选择第一个选项并解压缩→更改→重新压缩文件，那么您可以通过管道执行此操作，这允许您一步完成此操作。如果您需要更多建议，请随时就此主题提出另一个问题(并从此处的评论链接到新问题(。

相关内容

最新更新

热门标签：