Beautifulsoup/JSON:我如何将所有数据导出到JSON字典中?



目前有这段代码,它将每个td存储到一个JSON文件中,但我希望它过滤成4个标签:

find_table = soup.find("table", id="viewAllSeriesTable")
rows = find_table.select("td")
with open("data_file.json", "w") as write_file:
data = [j.get_text(strip=True) for j in rows]
json.dump(data, write_file)

HTML代码:

<table width="735" cellpadding="0" cellspacing="2" border="0"
id="viewAllSeriesTable">
<thead>
<tr>
<th id="id">Series ID</th>
<th
id="seriesName">Series Name</th>
<th id="clientName">Client Name</th>
<th id="Brand">Brand</th>
</tr>
</thead>
<tbody>
<tr>
<td id="9127"
style="word-break: break-word;"><a
href="seriesDefinition.html?id=9127">9127</a></td>
<td
style="word-break: break-word;">a</td>
<td style="word-break:
break-word;">A</td>
<td style="word-break: break-word;">B</td>
</tr>

我怎么能做到这一点?

{
"id": 9127,
"seriesName": "a",
"clientName": "A",
"Brand": "B"
},

试试这个:

list = []
dic = ['Id: ', 'seriesName: ', 'clientName: ', 'clientName: ']
find_table = soup.find("table", id="viewAllSeriesTable")
rows = find_table.find("tbody")
tds = rows.find_all('td')
for i, j in enumerate(tds):
dic[i] = dic[i] + j.get_text(strip=True)
list.append(dic[i])
with open("data_file.json", "w") as write_file:
json.dump(list, write_file)

您可以试试。

此代码以JSON格式存储每个<tr>的所有<td>数据。

from bs4 import BeautifulSoup
s = '''<table width="735" cellpadding="0" cellspacing="2" border="0"
id="viewAllSeriesTable">
<thead>
<tr>
<th id="id">Series ID</th>
<th
id="seriesName">Series Name</th>
<th id="clientName">Client Name</th>
<th id="Brand">Brand</th>
</tr>
</thead>
<tbody>
<tr>
<td id="9127"
style="word-break: break-word;"><a
href="seriesDefinition.html?id=9127">9127</a></td>
<td
style="word-break: break-word;">a</td>
<td style="word-break:
break-word;">A</td>
<td style="word-break: break-word;">B</td>
</tr>
<tr>
<td id="9157"
style="word-break: break-word;"><a
href="seriesDefinition.html?id=9127">9157</a></td>
<td
style="word-break: break-word;">a-2</td>
<td style="word-break:
break-word;">A-2</td>
<td style="word-break: break-word;">B-2</td>
</tr>
</tbody>
'''
import json
d = []
soup = BeautifulSoup(s, 'lxml')
trs = soup.find('tbody').find_all('tr')
for tr in trs:
temp = {}
tds = tr.find_all('td')
temp['id'] = int(tds[0].text)
temp['seriesName'] = tds[1].text
temp['clientName'] = tds[2].text
temp['Brand'] = tds[3].text
d.append(temp)
json_str = json.loads(json.dumps(d))
print(json_str)

json_str是这样的

[{'id': 9127, 'seriesName': 'a', 'clientName': 'A', 'Brand': 'B'}, {'id': 9157, 'seriesName': 'a-2', 'clientName': 'A-2', 'Brand': 'B-2'}]

最新更新