如何从维基百科中抓取非表格列表并创建数据图



en.wikipedia.org/wiki/List_of_neighborhoods_of_Istanbul

在上面的链接中,有一个伊斯坦布尔社区的未列表数据。

我想通过这个代码将这些邻居提取到数据帧中

import pandas as pd
import requests
from bs4 import BeautifulSoup
wikiurl="https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Istanbul"
response=requests.get(wikiurl)
soup = BeautifulSoup(response.text, 'html.parser')
tocList=soup.findAll('a',{'class':"new"})
neighborhoods=[]
for item in tocList:
text = item.get_text()
neighborhoods.append(text)

df = pd.DataFrame(neighborhoods, columns=['Neighborhood'])
print(df)

我得到了这个输出:

Neighborhood
0   Maden
1   Nizam
2   Anadolu
3   Arnavutköy İmrahor
4   Arnavutköy İslambey
...     ...
705     Seyitnizam
706     Sümer
707     Telsiz
708     Veliefendi
709     Yeşiltepe
710 rows × 1 columns

但有些数据没有提取,请检查下面的数据并与输出进行比较:

Adalar

Burgazada
Heybeliada
Kınalıada
Maden
Nizam

findall()没有获取称为链接的邻域,而不是类,即

<ol><li><a href="/wiki/Burgazada" title="Burgazada">Burgazada</a></li>
<li><a href="/wiki/Heybeliada" title="Heybeliada">Heybeliada</a></li>

我可以将代码开发成两列吗,每个"邻居"及其"地区">

您是否试图从目录中获取此列表?

请检查这是否解决了您的问题:

import pandas as pd
import requests
from bs4 import BeautifulSoup
wikiurl="https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Istanbul"
response=requests.get(wikiurl)
soup = BeautifulSoup(response.text, 'html.parser')
tocList=soup.findAll('span',{'class':"toctext"})
districts=[]
blocked_words = ['Neighbourhoods by districts','Further reading', 'External links']
for item in tocList:
text = item.get_text()
if text not in blocked_words:
districts.append(text)

df = pd.DataFrame(districts, columns=['districts'])
print(df)

输出:

districts
0          Adalar
1      Arnavutköy
2        Ataşehir
3         Avcılar
4        Bağcılar
5    Bahçelievler
6        Bakırköy
7      Başakşehir
8      Bayrampaşa
9        Beşiktaş
10         Beykoz
11     Beylikdüzü
12        Beyoğlu
13   Büyükçekmece
14        Çatalca
15       Çekmeköy
16        Esenler
17       Esenyurt
18           Eyüp
19          Fatih
20  Gaziosmanpaşa
21       Güngören
22        Kadıköy
23      Kağıthane
24         Kartal
25   Küçükçekmece
26        Maltepe
27         Pendik
28     Sancaktepe
29        Sarıyer
30        Silivri
31    Sultanbeyli
32     Sultangazi
33           Şile
34          Şişli
35          Tuzla
36       Ümraniye
37        Üsküdar
38    Zeytinburnu

最新更新