Pandas数据帧的Web剪贴



我正在尝试从https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population我的项目。我正试图将前20个城市的数据纳入熊猫数据框架,如下所示:等级|城市|纬度|经度

这样我就可以在代码的后面部分提取坐标,并计算我需要的各种参数。到目前为止,这就是我所提出的,但似乎正在失败:

rank=[]
city=[]
state=[]
population_present=[]
population_past=[]
changepercent=[]

info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
for row in bs.find('table').find_all('tr'):
p = row.find_all('td')

for row in bs.find('table').find_all('tr'):
p= row.find_all('td')
if(len(p) > 0):
rank.append(p[0].text)
city.append(p[1].text)
latitude.append(p[2].text.rstrip('n'))

您可以通过pythonpandas来实现。请尝试以下代码。

import pandas as pd
import requests
from bs4 import BeautifulSoup
info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
table=bs.find_all('table',class_='wikitable')[1]
df=pd.read_html(str(table))[0]
#Get the first 20 records
df1=df.iloc[:20]
Rank=df1['2018rank'].values.tolist()
City=df1['City'].values.tolist()
#Get the location in list
locationlist=df1['Location'].values.tolist()
Latitude=[]
Longitude=[]
for val in locationlist:
val1=val.split("/")[-1]
Latitude.append(val1.split()[0])
Longitude.append(val1.split()[-1])
df2=pd.DataFrame({"Rank":Rank,"City":City,"Latitude":Latitude,"Longitude":Longitude})
print(df2)

输出

City    Latitude   Longitude  Rank
0        New York[d]  40.6635°N   73.9387°W     1
1        Los Angeles  34.0194°N  118.4108°W     2
2            Chicago  41.8376°N   87.6818°W     3
3         Houston[3]  29.7866°N   95.3909°W     4
4            Phoenix  33.5722°N  112.0901°W     5
5    Philadelphia[e]  40.0094°N   75.1333°W     6
6        San Antonio  29.4724°N   98.5251°W     7
7          San Diego  32.8153°N  117.1350°W     8
8             Dallas  32.7933°N   96.7665°W     9
9           San Jose  37.2967°N  121.8189°W    10
10            Austin  30.3039°N   97.7544°W    11
11   Jacksonville[f]  30.3369°N   81.6616°W    12
12        Fort Worth  32.7815°N   97.3467°W    13
13          Columbus  39.9852°N   82.9848°W    14
14  San Francisco[g]  37.7272°N  123.0322°W    15
15         Charlotte  35.2078°N   80.8310°W    16
16   Indianapolis[h]  39.7767°N   86.1459°W    17
17           Seattle  47.6205°N  122.3509°W    18
18         Denver[i]  39.7619°N  104.8811°W    19
19     Washington[j]  38.9041°N   77.0172°W    20

您正在从网页访问错误的元素。要访问包含所需数据的表,请使用以下方法:

info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
for tr in bs.findAll('table')[4].findAll('tr'):
# Now take the data from this row that you want, and put it in a DataFrame

最新更新