Python/BeautifulSoup scraping and printing to csv



我正在编写一个代码来抓取选举数据并组织成一个数据集。我是这样做的:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup
response= requests.get('https://elections2018.wallonie.be/fr/resultats-chiffres?el=PR&id=PRA52011')
soup = BeautifulSoup(response.text,"html.parser")
soup.findAll('tr')
import pandas as pd

我得到了这样的东西:

[<tr>
<td>
<p><img alt="carte de navigation" border="0" id="navmap" name="navmap" src="/sites/default/files/images/election_nav_pro_wl.gif" title="carte de navigation" usemap="#navmap"/></p>
</td>
</tr>,
<tr><th class="text-left w-5"></th><th "="" class="text-left">Liste</th><th></th><th class="text-right w-10">2018</th><th class="text-right w-10">2012</th><th class="text-right w-10">%2018</th><th class="text-right w-10">%2012</th><th class="text-right w-10">+/- %</th><th class="text-right w-10">Sièges</th></tr>,
<tr class="row-odd"><td>1</td><td>MR</td><td class="text-center"></td><td class="text-right">31.099</td><td class="text-right">40.790</td><td class="text-right">15,58%</td><td class="text-right">18,64%</td><td class="text-right">-3,07%</td><td class="c-accent02 text-right">4 (0)</td></tr>,
<tr class="row-even"><td>2</td><td>ECOLO</td><td class="text-center"></td><td class="text-right">23.053</td><td class="text-right">22.412</td><td class="text-right">11,55%</td><td class="text-right">10,24%</td><td class="text-right">+1,30%</td><td class="c-accent02 text-right">3 (+1)</td></tr>,
<tr class="row-odd"><td>3</td><td>PS</td><td class="text-center"></td><td class="text-right">66.358</td><td class="text-right">89.651</td><td class="text-right">33,24%</td><td class="text-right">40,98%</td><td class="text-right">-7,74%</td><td class="c-accent02 text-right">9 (-1)</td></tr>,

我需要一个csv文件的政治名单/政党,选区/分配和每年的选票百分比。

我如何刮取所需的数据并将它们放入可读的csv文件?

您可以使用Python的csv库来创建CSV文件。在获得tr元素列表后,您可以使用每个元素来获得其中的td元素列表。然后可以使用itemgetter()从结果列表中提取您想要的元素。例如:

from operator import itemgetter
from bs4 import BeautifulSoup
import requests
import csv
req_values = itemgetter(1, 3, 4, 5, 6)  # extract these columns from the table
response = requests.get('https://elections2018.wallonie.be/fr/resultats-chiffres?el=PR&id=PRA52011')
soup = BeautifulSoup(response.text,"html.parser")
header_values = [v.text for v in soup.thead.find_all('th')]
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(req_values(header_values))

for tr in soup.find_all('tr')[2:]:
values = [v.text for v in tr.find_all('td')]

if len(values) == 0:
break

csv_output.writerow(req_values(values))

会给你一个output.csv文件,包含:

Liste,2018,2012,%2018,%2012
MR,31.099,40.790,"15,58%","18,64%"
ECOLO,23.053,22.412,"11,55%","10,24%"
PS,66.358,89.651,"33,24%","40,98%"
PTB,30.384,-,"15,22%",-
PTB+,-,6.094,-,"2,79%"
CDH,12.938,25.797,"6,48%","11,79%"
PP,8.212,-,"4,11%",-
DéFI,11.640,-,"5,83%",-
Oxygène,1.655,-,"0,83%",-
AGIR,4.434,-,"2,22%",-
CA,442,-,"0,22%",-
LA DROITE,5.478,-,"2,74%",-
WALLON,1.632,-,"0,82%",-
NWA-NATION,1.412,-,"0,71%",-
PSLHDD,896,-,"0,45%",-
FDF,-,4.665,-,"2,13%"
R.W.F.,-,2.782,-,"1,27%"
DN,-,631,-,"0,29%"
FdG,-,986,-,"0,45%"
NATION,-,1.415,-,"0,65%"
NWA,-,645,-,"0,29%"
FRONT-GAUCHE,-,1.128,-,"0,52%"
R.W.,-,2.224,-,"1,02%"
FN-belge,-,12.699,-,"5,80%"
FNW,-,6.860,-,"3,14%"

如果某些页面缺少结果,您可以尝试以下方法:

try:
row = req_values(values)
csv_output.writerow(row)
except IndexError:
print(f"Only {len(values)} values: {values}")

相关内容

  • 没有找到相关文章

最新更新