网页抓取 Python 行中的多个属性(div 和 id)



我想网站抓取这个页面。 因此,我从这个脚本开始:

import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
links = {"Copa do Brasil": "http://www.oddsportal.com/soccer/brazil/copa-do- 
brasil/results/"}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
data = []
for club, link in links.items():
response = requests.get(link, headers = headers) 
#print(response.status_code) #200 is OK
soup = BeautifulSoup(response.text, 'lxml')
#print(soup.prettify())  #to check if soup downloads correctly. 
table = soup.find_all('div', attrs ={'id', 'tournamentTable'})
print(table)

检查 html 代码时,问题出在以下几行中:

<div id="tournamentTable" style = "display: block;">
<table class =" table-main" id="tournamentTable"> </table> ==$0

我想知道,我应该怎么做,才能得到所有比赛的桌子。我陷入了类,ID和样式一起使用的事实中。

尝试从熊猫读取HTML

import pandas as pd

i = 0
for line in your_html_response_from_requests:
try:
i = i+1
df = pd.read_html(your_html_response_from_requests.content)[i]      
df.to_csv(file,header=False, index=False,sep=';',encoding='utf-8')
except Exception as err:
break
file.close()

我使用to_csv来创建文件,但您可以使用任何东西

最新更新