拆分文本包含空格，但将引号内的单词作为一个单元保留

我想将文本拆分为列表，其中带空格的文件名应被视为单个项目：示例

s = 'cmd -a -b -c "file with spaces.mp4" -e -f'.split()
print(s)

输出：

['cmd', '-a', '-b', '-c', '"file', 'with', 'spaces.mp4"', '-e', '-f']

期望的输出：

['cmd', '-a', '-b', '-c', '"file with spaces.mp4"', '-e', '-f']

我尝试使用一些 for 循环，但它变得令人讨厌，有没有一种体面的方法使用正则表达式或其他看起来不丑

的东西

实际上，在这种情况下，我不会使用正则表达式。这就是shlex.split()的用途：

import shlex
s = shlex.split( 'cmd -a -b -c "file with spaces.mp4" -e -f' )
print(s)

指纹：

['cmd', '-a', '-b', '-c', 'file with spaces.mp4', '-e', '-f']

试试 shlex

import shlex
data=('cmd -a -b -c "file with spaces.mp4" -e -f')
new=shlex.split(data)
print(new)

收益率

['cmd', '-a', '-b', '-c', 'file with spaces.mp4', '-e', '-f']

这可以通过内置的shlex模块来实现，如下所示：

import shlex
s = shlex.split('cmd -a -b -c "file with spaces.mp4" -e -f', posix=False)
print(s)

posix=False传递到split的目的是保留多字文件名周围的引号，因为您想要的输出会像这样格式化它。如果不想保留引号，可以删除posix参数。

使用正则表达式匹配以下任一内容：

input = 'cmd -a -b -c "file with spaces.mp4" -e -f'
output = re.findall('"[^"]*"|S+', input)

相关内容