在for循环期间向字典添加html td标记项



我想把一些html放到一个大字典里。

更新:原始html(在"row "中)变量)如下:

[<tr>
<td class="top">
<span class="bold">Districts</span>
</td>
<td class="left top">
<span class="bold">Symbol</span>
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Agricultural Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
 Agriculture
</td>
<td class="top">
A
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Residential Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
 Single-family Residential
</td>
<td class="top">
R-1
</td>
</tr>, <tr>
<td class="top">
 Planned Multiple Residential
</td>
<td class="top">
PRD
</td>
</tr>, <tr>
<td class="top">
 Planned Unit Development
</td>
<td class="top">
PUD
</td>
</tr>, <tr>
<td class="top">
Mobile/ Modular Home
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
 Subdivision /Planned Unit
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
 Development
</td>
<td class="top">
MHS/PUD
</td>
</tr>, <tr>
<td class="top">
Mobile Home Park Planned
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Development
</td>
<td class="top">
MHP
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Commercial Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
 Commercial Planned Development
</td>
<td class="top">
CPD
</td>
</tr>, <tr>
<td class="top">
 Central Business District
</td>
<td class="top">
CB
</td>
</tr>, <tr>
<td class="top">
 Resort
</td>
<td class="top">
RES
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Industrial Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
 General Industry
</td>
<td class="top">
M
</td>
</tr>, <tr>
<td class="top">
 Industrial/Research Park
</td>
<td class="top">
M-RP
</td>
</tr>, <tr>
<td class="top">
 Coastal Dependent Industry
</td>
<td class="top">
M-CD
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Other Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
 Recreation
</td>
<td class="top">
REC
</td>
</tr>, <tr>
<td class="top">
 Public Utility
</td>
<td class="top">
UT
</td>
</tr>, <tr>
<td class="top">
 Community Facility
</td>
<td class="top">
CF
</td>
</tr>]

我使用beautiful soup来获取"td"条目。然后我把它们剥离下来,只剩下文本(即使它是空白的),现在我想用它们创建一个字典。

soup = bs.BeautifulSoup(content, 'lxml')
find_table = soup.find('table')
rows = find_table.find_all('tr')
districts_table = {}
for i in rows:
table_data = i.find_all('td')
results = [(j.text.replace(u'u2002','').replace(u'n','')).strip() for j in table_data]
print(results)

打印如下所示。但我想创建这些字典,而不是即使值为空(替换为'None')

['Districts', 'Symbol']
['', '']
['Agricultural Districts', '']
['', '']
['Agriculture', 'A']
['', '']
['Residential Districts', '']
['', '']
['Single-family Residential', 'R-1']
['Planned Multiple Residential', 'PRD']
['Planned Unit Development', 'PUD']
['Mobile/ Modular Home', '']
['Subdivision /Planned Unit', '']
['Development', 'MHS/PUD']
['Mobile Home Park Planned', '']
['Development', 'MHP']
['', '']
['Commercial Districts', '']
['', '']
['Commercial Planned Development', 'CPD']
['Central Business District', 'CB']
['', '']
['Industrial Districts', '']
['', '']
['General Industry', 'M']
['Industrial/Research Park', 'M-RP']
['Coastal Dependent Industry', 'M-CD']
['', '']
['Other Districts', '']
['', '']
['Public Utility', 'UT']
['Community Facility', 'CF']

我想要这样写:

{'Districts': 'Symbol',
'None': 'None',
'Agricultural Districts': 'None',
'None': 'None',
'None': 'None',
'Agriculture': 'A',
etc..}

我该怎么做呢?我尝试过如下的字典理解

results = {(j.text.replace(u'u2002','').replace(u'n','')).strip():(j.text.replace(u'u2002','').replace(u'n','')).strip() for j in table_data}

但是这里只重复最后一个

{'Community Facility': 'Community Facility', 'CF': 'CF'}

任何帮助将不胜感激!

假设您希望将districts_table填充为字典列表:

districts_table = []
for i in rows:
table_data = i.find_all('td')
results = [(j.text.replace(u'u2002','').replace(u'n','')).strip() for j in table_data]
districts_table.append({'Community Facility': results[0] if results[0] else None, 'CF': results[1] if results[1] else None})

谢谢@RJ,根据你的建议我已经解决了

districts_table.update({results[0] if results[0] else None: results[1] if results[1] else None})

得到下面的

{'Districts': 'Symbol', None: None, 'Agricultural Districts': None, 'Agriculture': 'A', 'Residential Districts': None, 'Single-family Residential': 'R-1', 'Planned Multiple Residential': 'PRD', 'Planned Unit Development': 'PUD', 'Mobile/ Modular Home': None, 'Subdivision /Planned Unit': None, 'Development': 'MHP', 'Mobile Home Park Planned': None, 'Commercial Districts': None, 'Commercial Planned Development': 'CPD', 'Central Business District': 'CB', 'Resort': 'RES', 'Industrial Districts': None, 'General Industry': 'M', 'Industrial/Research Park': 'M-RP', 'Coastal Dependent Industry': 'M-CD', 'Other Districts': None, 'Recreation': 'REC', 'Public Utility': 'UT', 'Community Facility': 'CF'}

谢谢大家!

最新更新