Python-使用Regex从文本中提取代码

我是Python初学者，正在寻求有关提取问题的帮助。

我有一堆文本文件，需要提取表达式的所有特殊组合("C"+"正好是9个数字"(，并将它们写入一个包含文本文件名的文件中。我要捕捉的表达式的每次出现都从新行的开头开始，并以"/n"结尾。

sample_text = """Some random text here 
and here
and here
C123456789
some random text here
C987654321
and here
and here"""

输出应该是什么样子(在输出文件中(

My_desired_output_file = "filename,C123456789,C987654321"

到目前为止我的代码：

min_file_size = 5
def list_textfiles(directory, min_file_size): # Creates a list of all files stored in DIRECTORY ending on '.txt'
textfiles = []
for root, dirs, files in os.walk(directory):
for name in files:
filename = os.path.join(root, name)
if os.stat(filename).st_size > min_file_size:
textfiles.append(filename)
for filename in list_textfiles(temp_directory, min_file_size):         
string = str(filename)
text = infile.read()
regex = ???
with open(filename, 'w', encoding="utf-8") as outfile:
outfile.write(regex)

您的正则表达式是'^C[0-9]{9}$'

^           start of line
C           exact match
[0-9]       any digit
{9}         9 times
$           end of line

import re
regex = re.compile('(^Cd{9})')
matches = []
with open('file.txt', 'r') as file:
for line in file:
line = line.strip()
if regex.match(line):
matches.append(line)

然后，您可以根据需要将此列表写入文件。

怎么样：

import re
sample_text = """Some random text here 
and here
and here
C123456789
some random text here
C987654321
and here
and here"""
k = re.findall('(Cd{9})',sample_text)
print(k)

这将返回该模式的所有出现。如果您从文本中让出一行并存储您的目标组合。类似于：

更新：

import glob
import os
import re
search = {}
os.chdir('/FolderWithTxTs')
for file in glob.glob("*.txt"):
with open(file,'r') as f:
data = [re.findall('(Cd{9})',i) for i in f]
search.update({f.name:data})
print(search)

这将返回一个以文件名为关键字的字典和找到的匹配项列表。

相关内容

最新更新

热门标签：