我想提取这个网站上的表:https://www.wikirating.com/list-of-countries-by-credit-rating/
当我尝试使用这段代码时,我只获得了网站的前两行?我做错了什么,或者我如何指定我想提取表?
import requests
import pandas as pd
url = 'https://www.wikirating.com/list-of-countries-by-credit-rating/'
html = requests.get(url).content
df_list = pd.read_html(html)
print(df)
df.to_csv('my data.csv')
推荐使用BeautifulSoup
。这里有一些东西可以让你开始:
import requests
from bs4 import BeautifulSoup
url = 'https://www.wikirating.com/list-of-countries-by-credit-rating/'
html = requests.get(url).content
soup = BeautifulSoup(html, 'html.parser')
# Find all tables on the page
tables = soup.find_all('table')
# Loop through each table
for table in tables:
# Find all rows in the table
rows = table.find_all('tr')
# Loop through each row and print the first three elements
for row in rows:
cells = row.find_all('td')
# grabs the first four elements of the row and reads them
if len(cells) >= 4:
print(cells[0].text, cells[1].text, cells[2].text, cells[3].text)