我想把一些html放到一个大字典里。
更新:原始html(在"row "中)变量)如下:
[<tr>
<td class="top">
<span class="bold">Districts</span>
</td>
<td class="left top">
<span class="bold">Symbol</span>
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Agricultural Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Agriculture
</td>
<td class="top">
A
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Residential Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Single-family Residential
</td>
<td class="top">
R-1
</td>
</tr>, <tr>
<td class="top">
Planned Multiple Residential
</td>
<td class="top">
PRD
</td>
</tr>, <tr>
<td class="top">
Planned Unit Development
</td>
<td class="top">
PUD
</td>
</tr>, <tr>
<td class="top">
Mobile/ Modular Home
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Subdivision /Planned Unit
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Development
</td>
<td class="top">
MHS/PUD
</td>
</tr>, <tr>
<td class="top">
Mobile Home Park Planned
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Development
</td>
<td class="top">
MHP
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Commercial Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Commercial Planned Development
</td>
<td class="top">
CPD
</td>
</tr>, <tr>
<td class="top">
Central Business District
</td>
<td class="top">
CB
</td>
</tr>, <tr>
<td class="top">
Resort
</td>
<td class="top">
RES
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Industrial Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
General Industry
</td>
<td class="top">
M
</td>
</tr>, <tr>
<td class="top">
Industrial/Research Park
</td>
<td class="top">
M-RP
</td>
</tr>, <tr>
<td class="top">
Coastal Dependent Industry
</td>
<td class="top">
M-CD
</td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Other Districts
</td>
<td class="top"></td>
</tr>, <tr>
<td class="top"></td>
<td class="top"></td>
</tr>, <tr>
<td class="top">
Recreation
</td>
<td class="top">
REC
</td>
</tr>, <tr>
<td class="top">
Public Utility
</td>
<td class="top">
UT
</td>
</tr>, <tr>
<td class="top">
Community Facility
</td>
<td class="top">
CF
</td>
</tr>]
我使用beautiful soup来获取"td"条目。然后我把它们剥离下来,只剩下文本(即使它是空白的),现在我想用它们创建一个字典。
soup = bs.BeautifulSoup(content, 'lxml')
find_table = soup.find('table')
rows = find_table.find_all('tr')
districts_table = {}
for i in rows:
table_data = i.find_all('td')
results = [(j.text.replace(u'u2002','').replace(u'n','')).strip() for j in table_data]
print(results)
打印如下所示。但我想创建这些字典,而不是即使值为空(替换为'None')
['Districts', 'Symbol']
['', '']
['Agricultural Districts', '']
['', '']
['Agriculture', 'A']
['', '']
['Residential Districts', '']
['', '']
['Single-family Residential', 'R-1']
['Planned Multiple Residential', 'PRD']
['Planned Unit Development', 'PUD']
['Mobile/ Modular Home', '']
['Subdivision /Planned Unit', '']
['Development', 'MHS/PUD']
['Mobile Home Park Planned', '']
['Development', 'MHP']
['', '']
['Commercial Districts', '']
['', '']
['Commercial Planned Development', 'CPD']
['Central Business District', 'CB']
['', '']
['Industrial Districts', '']
['', '']
['General Industry', 'M']
['Industrial/Research Park', 'M-RP']
['Coastal Dependent Industry', 'M-CD']
['', '']
['Other Districts', '']
['', '']
['Public Utility', 'UT']
['Community Facility', 'CF']
我想要这样写:
{'Districts': 'Symbol',
'None': 'None',
'Agricultural Districts': 'None',
'None': 'None',
'None': 'None',
'Agriculture': 'A',
etc..}
我该怎么做呢?我尝试过如下的字典理解
results = {(j.text.replace(u'u2002','').replace(u'n','')).strip():(j.text.replace(u'u2002','').replace(u'n','')).strip() for j in table_data}
但是这里只重复最后一个
{'Community Facility': 'Community Facility', 'CF': 'CF'}
任何帮助将不胜感激!
假设您希望将districts_table
填充为字典列表:
districts_table = []
for i in rows:
table_data = i.find_all('td')
results = [(j.text.replace(u'u2002','').replace(u'n','')).strip() for j in table_data]
districts_table.append({'Community Facility': results[0] if results[0] else None, 'CF': results[1] if results[1] else None})
谢谢@RJ,根据你的建议我已经解决了
districts_table.update({results[0] if results[0] else None: results[1] if results[1] else None})
得到下面的
{'Districts': 'Symbol', None: None, 'Agricultural Districts': None, 'Agriculture': 'A', 'Residential Districts': None, 'Single-family Residential': 'R-1', 'Planned Multiple Residential': 'PRD', 'Planned Unit Development': 'PUD', 'Mobile/ Modular Home': None, 'Subdivision /Planned Unit': None, 'Development': 'MHP', 'Mobile Home Park Planned': None, 'Commercial Districts': None, 'Commercial Planned Development': 'CPD', 'Central Business District': 'CB', 'Resort': 'RES', 'Industrial Districts': None, 'General Industry': 'M', 'Industrial/Research Park': 'M-RP', 'Coastal Dependent Industry': 'M-CD', 'Other Districts': None, 'Recreation': 'REC', 'Public Utility': 'UT', 'Community Facility': 'CF'}
谢谢大家!