我从一个文件中得到一个png文件名,然后使用regex指定一个4位数的png文件名,删除标点符号并将其保存到另一个文件
让我感到困惑的是,我试图把列表中的每个单独的值放在一个字符串中,比如:
<div class="parent"><img class="img" title="" src="images/char/{HERE}.png" ></div>
然后保存为:
<div class="parent"><img class="img" title="" src="images/char/1432.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png" ></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png" ></div>
代码
import re
import pyperclip
def remove_punc(string):
punc = '''!()-[]{};:'", <>./?@#$%^&*_~'''
for ele in string:
if ele in punc:
string = string.replace(ele, "")
return string
text_file = open(r'C:My Web Sitesimage_data(1).txt', 'r')
s = text_file.read()
text_file.close()
string_pattern = r"d{4}."
regex_pattern = re.compile(string_pattern)
# find all the matches in string one
result = regex_pattern.findall(s)
result = [remove_punc(i) for i in result]
with open(r'C:My Web Sites1.txt', 'w') as fp:
for item in result:
# write each item on a new line
fp.write("%sn" % item)
fp.close()
编辑这是文本文件
的一个示例<div class="cell-imgs"><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1535.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="0" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/60<br/>Level: 0/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1510.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="1" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1403.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="2" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#071BA0'><br/>(version)</font>"><img src="resources/images/elements/4.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1388.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="3" src="resources/images/frames/5.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/6.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1323.png" onerror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="4" src="resources/images/frames/6.png" class="character-thumbnail-frame" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 6★<br/>Level: 200/200<br/>Level: 4/4<br/>Level: 1/5<br/>: 150%<br/>1: 0/10<br/>2: 0/10<br/>3: 0/10<br/>" title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" class="character-thumbnail-element"></div><div class="character-thumbnail"><img src="resources/images/bgs/5.png" class="character-thumbnail-background"><img class="character-thumbnail-image" src="resources/images/thumb/1322.png"
输出1535
1510
1403
1388
1323
1322
创建文件可以使用str.format
。例如:
s = """<div class="parent"><img class="img" title="" src="images/char/{}.png"></div>"""
result = [1432, 1250, 1324] # <-- your result with removed punctuations
with open("data.txt", "w") as fp:
for item in result:
print(s.format(item), file=fp)
创建内容为
的data.txt
<div class="parent"><img class="img" title="" src="images/char/1432.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1250.png"></div>
<div class="parent"><img class="img" title="" src="images/char/1324.png"></div>
关于作者的更多信息
这个模式应该可以达到(d{4}).(?=png)
的效果其中
- 精确捕获数字4次
- 以。png 结尾
如果您想添加支持,例如使用jpeg,您可以将模式更改为(d{4}).(?=png|jpeg)
import re
string = "<div class="parent"><img class="img" title="" src="images/char/1432.png" ></div>n<div class="parent"><img class="img" title="" src="images/char/1250.png" ></div>n<div class="parent"><img class="img" title="" src="images/char/1324.png" ></div>n<div class="parent"><img class="img" title="" src="images/char/1324.jpeg" ></div>"
pattern = re.compile(r'(d{4}).(?=png)')
print(pattern.findall(string))
,其中输出为
['1432', '1250', '1324']