从html文件中读取并编辑那些行python



我有这个文本文件

test.html

<html>
<body>
<table>
<tr>
<td id="A">A</td>
<td id="B">B</td>
</tr>
<tr>
<td id="C">C</td>
<td id="D">D</td>
</tr>
</table>
</html>
</body>

python文件

f = open('test.html')
ans = "A"
line = f.readline()
print(line)
if ans == 'line':
#change the row A to a dash: <td>-</td>
line = f.readline()
f.close()

所以我想做的是扫描html文件,当我找到列A时,我可以将其更改为短划线并保存文件我是python的初学者,对处理文件输入和输出不太了解请注意:没有库

使用Python而不使用任何库,您可以使用以下代码将包含a的行替换为您想要的行,我只是用字符串替换了带有内置函数replace((的行:

<td id="A">-</td>n

代码:

ans = "A"
lines = []
#open file
with open(r'test.html', mode='r') as f:
for line in f.readlines(): # iterate thru the lines
if ans in line: # check if is in ans in line
line = ans.replace(ans, '<td id="A">-</td>n') # replace the line containing the and with the new line, you can change to what you want. 
lines.append(line)
#write to a new file
with open(r'myfile.html', mode='w') as new_f:
new_f.writelines(lines)

myfile.html内容:

<html>
<body>
<table>
<tr>
<td id="A">-</td>
<td id="B">B</td>
</tr>
<tr>
<td id="C">C</td>
<td id="D">D</td>
</tr>
</table>
</html>
</body>

尝试使用BeautifulSoup:

from bs4 import BeautifulSoup
# Open test.html for reading
with open('test.html') as html_file:
soup = BeautifulSoup(html_file.read(), features='html.parser')
# Go through each 'A' tag and replace text with '-'
for tag in soup.find_all(id='A'):
tag.string.replace_with('-')
# Store prettified version of modified html
new_text = soup.prettify()
# Write new contents to test.html
with open('test.html', mode='w') as new_html_file:
new_html_file.write(new_text)

它给出了以下test.html:

<html>
<body>
<table>
<tr>
<td id="A">
-
</td>
<td id="B">
B
</td>
</tr>
<tr>
<td id="C">
C
</td>
<td id="D">
D
</td>
</tr>
</table>
</body>
</html>

正如其他人所建议的,BeautifulSoup无疑是一个非常好的选择,但鉴于您是初学者,我想向您推荐这种regex方法。

import re
fh= open('test.html')
content = fh.read()
content = content.replace(re.findall("<td id="A">A</td>",content)[0],"<td id="A">--</td>")
fh.close()
fh=open('test.html','w')
fh.write(content)

或者,如果你想要一个在空间方面更高效的代码,并且你很了解python中的文件处理,那么你也可以考虑这种方法:

import re
fh = open("test.html",'r+')
while True:
currpos= fh.tell()
line = fh.readline()
if re.findall("<td id="A">A</td>",line):
line = line.replace(re.findall("<td id="A">A</td>",line)[0],"<td id="A">--</td>")
fh.seek(currpos)
fh.writelines(line)
if line == '':
break
fh.close()

您可以使用beautiuloup或HTMLParser库。不过,beautifulsoup更容易使用。您可以在此处阅读如何使用它:https://www.pythonforbeginners.com/beautifulsoup/python-beautifulsoup-basic

相关内容

最新更新