具有联接结果的递归正则表达式



我的输入文件是没有扩展名的html文件。所需的输出是来自root_dir所有文件的正则表达式匹配 URL,以及加入单个文件的结果。我的正则表达式有效,我可以从单个文件输出结果。

import re
with open('/Users/files/filename') as f:
for line in f:
urls = re.findall (r"([w%~+-=]*.mp3)", line);
print (*urls)

我可以使用 glob 但不确定如何:

import glob
import re
root_dir = '/Users/files/'
for filename in glob.iglob(root_dir + '**/*.*', recursive=True):
urls = re.findall (r"([w%~+-=]*.mp3)", line);
print (*urls)

使用

import re, glob                                 # Import the libraries
root_dir = r'/Users/files'                      # Set root directory
save_to_file = r'/Users/urls_extracted.txt'     # File path to save results to
all_files = glob.glob("{}/*".format(root_dir))  # Get a glob with filepaths
with open(save_to_file, 'w') as fw:             # Open stream to write to
for filename in all_files:                    # Iterate over the files
with open(filename, 'r') as fr:             # Open file to read from  
for url in re.findall(r"[w%~+-=]*.mp3", fr.read()): # Get all matches and iterate over them
fw.write("{}n".format(url))            # Write each URL to write stream

请注意,如果您的意思是-字符而不是范围,则必须在正则表达式中转义破折号。

最新更新