无法获得漂亮的汤来返回正确的文章标题、链接和 img。帮助调试？

在过去的7个小时里，我一直在尝试为一个项目收集数据。是的，它必须在没有API的情况下完成。这是一场消耗战，但这个检查出来的代码不断返回奶奶们，我是不是错过了一些简单的东西？页面底部是首页中包含的每个故事、带有图像的小卡片、3篇文章标题及其相应的链接。它要么没有抓住一件东西，要么部分抓住了它，要么抓住了完全错误的东西。105篇文章应该有35张3个链接的卡片。我已经识别出27张卡片，上面有很多nan，而不是字符串，没有单独的文章。

import csv, requests, re, json
from bs4 import BeautifulSoup
handle = 'http://www.'
location = 'ny'
ping = handle + locaiton + 'times.com'
pong = requests.get(ping, headers = {'User-agent': 'Gordon'})
soup = BeautifulSoup(pong.content, 'html.parser')
# upper cards attempt
for i in soup.find_all('div', {'class':'css-ki19g7 e1aa0s8g0'}):
print(i.a.get('href'))
print(i.a.text)
print('')
# lower cards attempt
count = 0
for i in soup.find_all('div', {"class":"css-1ee8y2t assetWrapper"}):
try:
print(i.a.get('href'))
count+=1
except:
pass
print('current card pickup: ', count)
print('the goal card pickup:', 35)

Everything Clickable使用"css-1ee8y2t assetWrapper"，但当我找到_all时，我只得到了27个。我想从css-guaa7h开始，一路走下去，但只会得到nans。其他有希望但没有结果的div是

div class="css-2imjyh" data-testid="block-Well" data-block-tracking-id="Well"
div class="css-a11566"
div class="css-guaa7h”
div class="css-zygc9n"
div data-testid="lazyimage-container" # for images

当前尝试：

h3 class="css-1d654v4">Politics

我的希望不多了，为什么找第一份工作比努力工作更难呢。

我查看了他们的网站，它使用ajax在你向下滚动时加载文章。你可能需要使用硒元素。以下是一个可能有助于您做到这一点的答案：https://stackoverflow.com/a/21008335/7933710

相关内容

最新更新

热门标签：