我是python的新手,正在寻求帮助。
resp =requests.get("https://en.wikipedia.org/wiki/World_War_II_casualties")
soup = bs.BeautifulSoup(resp.text)
table = soup.find("table", {"class": "wikitable sortable"})
deaths = []`
for row in table.findAll('tr')[1:]:
death = row.findAll('td')[5].text.strip()
deaths.append(death)
它出来作为
'30,000',
'40,400',
'',
'88,000',
'2,000',
'21,500',
'252,600',
'43,600',
'15,000,000[35]to 20,000,000[35]',
'100',
'340,000 to 355,000',
'6,000',
'3,000,000to 4,000,000',
'1,100',
'83,000',
'100,000[49]',
'85,000 to 95,000',
'600,000',
'1,000,000to 2,200,000',
'6,900,000 to 7,400,000',
...
'557,000',
'5,900,000[115] to 6,000,000[116]',
'40,000to 70,000',
'500,000[39]',
'36,000–50,000',
'11,900',
'10,000',
'20,000,000[141] to 27,000,000[142][143][144][145][146]',
'',
'2,100',
'100',
'7,600',
'200',
'450,900',
'419,400',
'1,027,000[160] to 1,700,000[159]',
'',
'70,000,000to 85,000,000']`
我想绘制一个图表,但 [] 脚注会完全毁了它。许多值都带有脚注。 当一个单元格中有一对时,是否也可以选择第一个数字?如果你们中有人能教我,我将不胜感激...谢谢
您可以将soup.find_next()
与text=True
参数一起使用,然后相应地拆分/剥离。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/World_War_II_casualties'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for tr in soup.table.select('tr:has(td)')[1:]:
tds = tr.select('td')
if not tds[0].b:
continue
name = tds[0].b.get_text(strip=True, separator=' ')
casualties = tds[5].find_next(text=True).strip()
print('{:<30} {}'.format(name, casualties.split('–')[0].split()[0] if casualties else ''))
指纹:
Albania 30,000
Australia 40,400
Austria
Belgium 88,000
Brazil 2,000
Bulgaria 21,500
Burma 252,600
Canada 43,600
China 15,000,000
Cuba 100
Czechoslovakia 340,000
Denmark 6,000
Dutch East Indies 3,000,000
Egypt 1,100
Estonia 83,000
Ethiopia 100,000
Finland 85,000
France 600,000
French Indochina 1,000,000
Germany 6,900,000
Greece 507,000
Guam 1,000
Hungary 464,000
Iceland 200
India 2,200,000
Iran 200
Iraq 700
Ireland 100
Italy 492,400
Japan 2,500,000
Korea 483,000
Latvia 250,000
Lithuania 370,000
Luxembourg 5,000
Malaya & Singapore 100,000
Malta 1,500
Mexico 100
Mongolia 300
Nauru 500
Nepal
Netherlands 210,000
Newfoundland 1,200
New Zealand 11,700
Norway 10,200
Papua and New Guinea 15,000
Philippines 557,000
Poland 5,900,000
Portuguese Timor 40,000
Romania 500,000
Ruanda-Urundi 36,000
South Africa 11,900
South Pacific Mandate 10,000
Soviet Union 20,000,000
Spain
Sweden 2,100
Switzerland 100
Thailand 7,600
Turkey 200
United Kingdom 450,900
United States 419,400
Yugoslavia 1,027,000
Approx. totals 70,000,000