我的输入文件是没有扩展名的html文件。所需的输出是来自root_dir所有文件的正则表达式匹配 URL,以及加入单个文件的结果。我的正则表达式有效,我可以从单个文件输出结果。
import re
with open('/Users/files/filename') as f:
for line in f:
urls = re.findall (r"([w%~+-=]*.mp3)", line);
print (*urls)
我可以使用 glob 但不确定如何:
import glob
import re
root_dir = '/Users/files/'
for filename in glob.iglob(root_dir + '**/*.*', recursive=True):
urls = re.findall (r"([w%~+-=]*.mp3)", line);
print (*urls)
使用
import re, glob # Import the libraries
root_dir = r'/Users/files' # Set root directory
save_to_file = r'/Users/urls_extracted.txt' # File path to save results to
all_files = glob.glob("{}/*".format(root_dir)) # Get a glob with filepaths
with open(save_to_file, 'w') as fw: # Open stream to write to
for filename in all_files: # Iterate over the files
with open(filename, 'r') as fr: # Open file to read from
for url in re.findall(r"[w%~+-=]*.mp3", fr.read()): # Get all matches and iterate over them
fw.write("{}n".format(url)) # Write each URL to write stream
请注意,如果您的意思是-
字符而不是范围,则必须在正则表达式中转义破折号。