如何用带有编码头的xml文件使用美丽的汤解决问题



我在编码版本的 xml 文件中使用 beautifulsoup 时遇到此问题

我有这个文件

<?xml version="1.0" encoding="UTF-8"?>

http://maven.apache.org/xsd/maven-4.0.0.xsd>

<modelVersion>4.0.0</modelVersion>
<artifactId>project</artifactId>
<packaging>pom</packaging>.....</project>

还有蟒蛇代码

for file in files:
print(dir + file)
infile = open( dir + file,"r")
contents = infile.read()
soup = BeautifulSoup(contents, features ="xml")
print(soup.prettify())

打印的结果是

<?xml version="1.0" encoding="utf-8"?>

忽略项目标记。它只是发生在第一行编码的文件中

import requests
from bs4 import BeautifulSoup
r = requests.get("http://maven.apache.org/xsd/maven-4.0.0.xsd")
soup = BeautifulSoup(r.text, 'xml')

print(soup)

最新更新