在多周内抓取时列出超出范围的索引

从1958年到2021年，我一直在尝试每周在网上搜索公告牌前100名排行榜，但我遇到了一个问题。我想知道这首歌的名字、艺术家、它在排行榜上的周数，以及从1958年8月到2021年7月，每首歌在前100名排行榜上排名第一的年份。我已经定义了一个函数来获取(特定一周的(给定链接的信息，然后我使用for循环每周重复这个过程(我已经将该周期的每周链接存储在列表中(，但我得到了索引错误：在这样做的时候，列表索引超出了范围。所有的网站都有相同的HTML结构，所以我认为问题不存在，但我一直在重复for循环，我并不总是从网站上得到相同的信息(即我可能在1960年之前得到信息，如果再尝试，我可能在1975年之前得到信息(，这让我很困惑。如果有人知道可能是什么问题，并想帮助我，我将不胜感激。下面是代码：


base_url = "https://www.billboard.com/charts/hot-100/{}"
start_date = datetime(1958, 8, 2)
end_date = datetime(2021, 7, 10)
one_week = timedelta(days=7)
links = []
while start_date <= end_date:
url_ = base_url.format(start_date.strftime("%Y-%m-%d"))
links.append(url_)
start_date += one_week
song = []
art = []
weeks = []
year = []
dicc = {'Song': song, 'Artist': art, 'Weeks_on_chart': weeks, 'Year': year}
def getdata(url):  
r = requests.get(url, headers= headers)
soup = BeautifulSoup(r.text, 'html.parser')
song.append(soup.find_all('span', {'class': 'chart-element__information__song text--truncate color--primary'})[0].get_text())
art.append(soup.find_all('span', {'class': 'chart-element__information__artist text--truncate color--secondary'})[0].get_text())
weeks.append(soup.find_all('span', {'class': 'chart-element__meta text--center color--secondary text--week'})[0].get_text())
year.append(soup.find_all('button',{'class': 'date-selector__button button--link'})[0].get_text().split()[2])
for element in links: #repeat for every link (every week)
getdata(element)    #here is where List index out of Range pops (lists from function getdata)

df = pd.DataFrame(dicc)
df #just to visualize until which year I could get info

您试图在不检查列表是否为空的情况下对列表进行索引。你必须重写getData((，以确保当你调用汤.find_all((时，你不会得到一个空列表，并按照你想要的方式处理这个错误。还要确保你想要访问的元素在你正在分析的上。

相关内容

最新更新

热门标签：