支撑python glob中的扩展

我有python 2.7，正在尝试发布：

glob('{faint,bright*}/{science,calib}/chip?/')

我没有得到匹配，但是从外壳echo {faint,bright*}/{science,calib}/chip?给出：

faint/science/chip1 faint/science/chip2 faint/calib/chip1 faint/calib/chip2 bright1/science/chip1 bright1/science/chip2 bright1w/science/chip1 bright1w/science/chip2 bright2/science/chip1 bright2/science/chip2 bright2w/science/chip1 bright2w/science/chip2 bright1/calib/chip1 bright1/calib/chip2 bright1w/calib/chip1 bright1w/calib/chip2 bright2/calib/chip1 bright2/calib/chip2 bright2w/calib/chip1 bright2w/calib/chip2

我的表情怎么了？

将globbing与大括号展开相结合。

pip install braceexpand

样品：

from glob import glob
from braceexpand import braceexpand
def braced_glob(path):
    l = []
    for x in braceexpand(path):
        l.extend(glob(x))
            
    return l

>>> braced_glob('/usr/bin/{x,z}*k')  
['/usr/bin/xclock', '/usr/bin/zipcloak']

{..}被称为大括号展开，是在进行globbing之前应用的一个单独步骤。

它不是glob的一部分，也不受python glob函数的支持。

由于Python中的glob()不支持{}，因此您可能需要类似的东西

import os
import re
...
match_dir = re.compile('(faint|bright.*)/(science|calib)(/chip)?')
for dirpath, dirnames, filenames in os.walk("/your/top/dir")
    if match_dir.search(dirpath):
        do_whatever_with_files(dirpath, files)
        # OR
        do_whatever_with_subdirs(dirpath, dirnames)

正如that other guy所指出的，Python不直接支持大括号扩展。但是，由于大括号扩展是在评估通配符之前完成的，因此您可以自己进行，例如

result = glob('{faint,bright*}/{science,calib}/chip?/')

成为

result = [
    f 
    for b in ['faint', 'bright*'] 
    for s in ['science', 'calib'] 
    for f in glob('{b}/{s}/chip?/'.format(b=b, s=s))
]

如其他答案所述，大括号展开是glob的预处理步骤：展开所有大括号，然后对每个结果运行glob。（大括号扩展将一个字符串转换为字符串列表。）

Orwellophile推荐使用braceexpand文库。在我看来，这是一个太小的问题，无法证明依赖性是合理的（尽管它是一个常见的问题，应该在标准库中，最好打包在glob模块中）。

因此，这里有一种方法可以通过几行代码来实现。

import itertools
import re
def expand_braces(text, seen=None):
    if seen is None:
        seen = set()
    spans = [m.span() for m in re.finditer("{[^{}]*}", text)][::-1]
    alts = [text[start + 1 : stop - 1].split(",") for start, stop in spans]
    if len(spans) == 0:
        if text not in seen:
            yield text
        seen.add(text)
    else:
        for combo in itertools.product(*alts):
            replaced = list(text)
            for (start, stop), replacement in zip(spans, combo):
                replaced[start:stop] = replacement
            yield from expand_braces("".join(replaced), seen)
### testing
text_to_expand = "{{pine,}apples,oranges} are {tasty,disgusting} to m{}e }{"
for result in expand_braces(text_to_expand):
    print(result)

打印

pineapples are tasty to me }{
oranges are tasty to me }{
apples are tasty to me }{
pineapples are disgusting to me }{
oranges are disgusting to me }{
apples are disgusting to me }{

这里发生的是：

嵌套的括号可以产生非唯一的结果，所以我们使用seen只产生尚未看到的结果
spans是text中所有最里面的平衡括号的起始索引和停止索引。[::-1]切片颠倒了顺序，使得索引从最高到最低（稍后将相关）
alts的每个元素都是逗号分隔的备选方案的相应列表
如果没有任何匹配（text不包含平衡括号），则生成text本身，确保它与seen唯一
否则，使用itertools.product迭代逗号分隔的替换项的笛卡尔乘积
将带大括号的文本替换为当前替换文本。由于我们要替换数据，所以它必须是一个可变序列（list，而不是str），并且我们必须首先替换最高索引。如果我们先替换最低的索引，后面的索引将与spans中的索引不同。这就是为什么我们在spans刚创建时反转了它
text可能在大括号中包含大括号。正则表达式只找到不包含任何其他花括号的平衡花括号，但嵌套的花括号是合法的。因此，我们需要递归，直到没有嵌套的花括号为止（len(spans) == 0的情况）。使用Python生成器的递归使用yield from重新生成递归调用的每个结果

在输出中，首先将{{pine,}apples,oranges}扩展为{pineapples,oranges}和{apples,oranges}，然后分别对它们进行扩展。如果我们不使用seen请求唯一结果，oranges结果将出现两次。

像m{}e中的那些空括号扩展为空，所以这只是me。

不平衡括号，如}{，保持原样。

如果需要大型数据集的高性能，这不是一种可以使用的算法，但它是大小合理的数据的通用解决方案。

wcmatch库有一个类似于Python标准glob的接口，具有启用大括号扩展、波浪号扩展等选项。启用支架扩展，例如：

from wcmatch import glob
glob.glob('{faint,bright*}/{science,calib}/chip?/', flags=glob.BRACE)

相关内容

最新更新

热门标签：