Python:没有进入 For 循环的第二次迭代



我试图让Python通过多个页面循环,通过在URL地址末尾使用递增的页码。

# import get to call a get request on the site
from bs4 import BeautifulSoup
import requests
from warnings import warn
response1 = requests.get('https://lasvegas.craigslist.org/search/mcy?purveyor-input=owner&hasPic=1')  # get rid of those lame-o's that post a housing option without a pic using their filter
html_soup = BeautifulSoup(response1.text, 'html.parser')
results_num = html_soup.find('div', class_='search-legend')
results_total = int(results_num.find('span',class_='totalcount').text)  # pulled the total count of posts as the upper bound # of the pages array
pages = np.arange(0, results_total + 1, 120)
iterations = 0
print(pages)
for page in pages:
response2 = requests.get("https://lasvegas.craigslist.org/search/mcy?purveyor-input=owner&hasPic=1"
+ "&s="  # the parameter for defining the page number
+ str(page))  # the page number in the pages array from earlier
if response2.status_code != 200:
warn('Request: {}; Status code: {}'.format(requests, response2.status_code))
iterations = iterations + 1

print(response2)

代码本身没有任何运行时错误,但它不会跳转到第二页,它只是在第一页迭代结束时停止。我的头发都要拔出来了。我不知道为什么会这样。

谁能告诉我正确的方向?我期待<回应[200]>出现3次

只出现一次

您的代码有几个问题。您缺少numpy模块的导入,并且用于打印输出响应的print语句被错误地缩进。

下面的脚本按预期工作:

from bs4 import BeautifulSoup
import requests
from warnings import warn
import numpy as np
response1 = requests.get('https://lasvegas.craigslist.org/search/mcy?purveyor-input=owner&hasPic=1')  # get rid of those lame-o's that post a housing option without a pic using their filter
html_soup = BeautifulSoup(response1.text, 'html.parser')
results_num = html_soup.find('div', class_='search-legend')
results_total = int(results_num.find('span',class_='totalcount').text)  # pulled the total count of posts as the upper bound # of the pages array
pages = np.arange(0, results_total + 1, 120)
iterations = 0
print(pages)
for page in pages:
response2 = requests.get("https://lasvegas.craigslist.org/search/mcy?purveyor-input=owner&hasPic=1"
+ "&s="  # the parameter for defining the page number
+ str(page))  # the page number in the pages array from earlier
if response2.status_code != 200:
warn('Request: {}; Status code: {}'.format(requests, response2.status_code))
print(response2)

最新更新