我的代码混淆了正则表达式的输入文件名



我的正则表达式没有显式地在字符范围内包含破折号,但是当输入文件名如下时,我的代码失败了:

Rage Against The Machine - 1996 - Bulls On Parade [Maxi-Single]
下面是我的代码:
def find_cue_files(path):
found_files = []
for root, dirs, files in os.walk(path):
if files:
fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]')) # this is line 81 in my source file (mentioned in the traceback)
# do a few other things...
return found_files

很明显,文件名的这一部分是问题所在:[Maxi-Single]

如何处理类似的文件名,使它们被视为固定字符串,而不是正则表达式的一部分?

(不是我的主要问题,但如果它是相关的,我愿意尝试另一种方法来进行不区分大小写的搜索。我已经看了几个关于这个主题的堆栈溢出问题,到目前为止,我没有找到任何适合这种情况的解决方案。

这是我的错误:

回溯(最近一次调用):

File "/usr/bin/xonsh", line 33, in <module>
sys.exit(load_entry_point('xonsh==0.10.0', 'console_scripts', 'xonsh')())
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21336, in main
_failback_to_other_shells(args, err)
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21283, in _failback_to_other_shells
raise err
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21334, in main
sys.exit(main_xonsh(args))
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21388, in main_xonsh
run_script_with_cache(
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3285, in run_script_with_cache
run_compiled_code(ccode, glb, loc, mode)
File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3190, in run_compiled_code
func(code, glb, loc)
File "process_audio_files.xsh", line 160, in <module>
cue_files = find_cue_files(dest_path)
File "process_audio_files.xsh", line 81, in find_cue_files
fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]'))
File "/usr/lib/python3.9/glob.py", line 22, in glob
return list(iglob(pathname, recursive=recursive))
File "/usr/lib/python3.9/glob.py", line 74, in _iglob
for dirname in dirs:
File "/usr/lib/python3.9/glob.py", line 75, in _iglob
for name in glob_in_dir(dirname, basename, dironly):
File "/usr/lib/python3.9/glob.py", line 86, in _glob1
return fnmatch.filter(names, pattern)
File "/usr/lib/python3.9/fnmatch.py", line 58, in filter
match = _compile_pattern(pat)
File "/usr/lib/python3.9/fnmatch.py", line 52, in _compile_pattern
return re.compile(res).match
File "/usr/lib/python3.9/re.py", line 252, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.9/re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.9/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.9/sre_parse.py", line 948, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/lib/python3.9/sre_parse.py", line 834, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/lib/python3.9/sre_parse.py", line 598, in _parse
raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range i-S at position 70

编辑:我尝试使用re.escape在这里引用:https://docs.python.org/3/library/re.html

def find_cue_files(path):
found_files = []
for root, dirs, files in os.walk(path):
if files:
root2 = re.escape(root)
fcue = glob(os.path.join(root2, '*.[Cc][Uu][Ee]')) 
# do a few other things...
return found_files

它处理了先前的文件名,但现在失败了,输入文件名Aerosmith - Aerosmith (2014) [24-96 HD]在我修改后的代码的同一点产生相同的错误。

与其对通过根传递的奇怪文件模式使用glob,不如只对名称进行排序,然后在根之前加上名称。一个可能的单行代码:

fcue=list(map(lambda x: os.path.join(root,x), (f for f in files if f.lower().endswith('.cue'))))

最新更新