Python请求(2.27.1)和在分页循环期间请求URL修改



所以我正在为我的公司提供给客户的API开发Python SDK。有许多请求都进行了分页,而且我以前能够执行分页而没有任何问题。

就在今天(不确定我的开发环境是否发生了变化,包版本变化等),我看到requests模块出现了这种奇怪的行为。

在我的类中,我为requests会话创建了一个类变量,该会话设置了内容类型和授权头。非常基本的。

然后我用URL进行get呼叫。注意,下面的例子在本质上是通用的,可以采取其他方法但我用get

response = self.session.request(method, url, params=params, data=data, json=json)

然后我可以从响应中提取所需的数据和分页项,并迭代直到所有页面完成。

然而,当我现在尝试这样做时,url不断被修改,通过将params追加到末尾作为查询字符串的一部分。现在URL有了这种行为

https://example.com/api/apps/abc/environments/def/permissions
https://example.com/api/apps/abc/environments/def/permissions?page=0&size=100
https://example.com/api/apps/abc/environments/def/permissions?page=0&size=100&page=1&size=100
https://example.com/api/apps/abc/environments/def/permissions?page=0&size=100&page=1&size=100&page=1&size=100
https://example.com/api/apps/abc/environments/def/permissions?page=0&size=100&page=1&size=100&page=1&size=100&page=1&size=100
https://example.com/api/apps/abc/environments/def/permissions?page=0&size=100&page=1&size=100&page=1&size=100&page=1&size=100&page=1&size=100

正如您所看到的,URL一直在其末尾添加params字典。它永远不会超过1页,因为服务器正在接受它在重复的参数列表中找到的第一个参数。这是可能的,服务器的行为已经改变,现在采取第一个,而不是最后一个,但我还没有能够确认。

这种行为是预期的吗?我可以做一个url = url.split('?')[0]来回到原来的url,一切都如预期的那样工作。

任何帮助或见解将不胜感激。

编辑:下面是分页循环。类中的其他方法,如get,post等,将调用此__requests方法。

我看到pagination_typeinline时的问题。此处不使用if语句的其余部分。

import json as native_json
def __request(self, method, url, params=None, data=None, json=None):
return_data = []
num_iterations = 1
pagination_type = None
while True:  # infinite loop in case of pagination - we will break the loop when needed
response = self.session.request(method, url, params=params, data=data, json=json)
self.__check_response_for_error(response)   # handle an error response
if self.__response_has_no_content(response):  # handle no content responses
return None
# load the result as a dict
try:
result = response.json()
except native_json.decoder.JSONDecodeError:  # if we cannot decode json then the response isn't json
return response.content.decode('utf-8')
# check on the pagination and iterate if required - we only need to check on this after the first
# request - checking it each time can screw up the logic when dealing with pagination coming from
# the response headers as the header won't exist which will mean pagination_type will change to 'none'
# which means we drop into the else block below and assign just the LAST page as the result, which
# is obviously not what we want to be doing.
if num_iterations == 1:
pagination_type = self.__pagination_type(response.headers, result)
if pagination_type == 'inline':
return_data += result['data']
count = result['count']
page = result['page']
size = result['size']
if size * (page + 1) >= count:  # if we have reached the max number of records time to break the loop
break
else:  # else loop again after incrementing the page number by 1
params['page'] = page + 1
elif pagination_type == 'audit':
# do stuff not relevant to this question
elif pagination_type == 'report':
# do stuff not relevant to this question
elif pagination_type == 'secmgr':
# do stuff not relevant to this question
else:  # we are not dealing with pagination so just return the response as-is
return_data = result
break
num_iterations += 1
# finally return the response data
return return_data

编辑2:下面的代码工作正常/如预期的那样,所以我显然搞砸了一些东西。只是还不能确定。

def main():
session = requests.session()
params = {
'page': 1
}
url = 'https://httpbin.org/get'
for x in range(10):
print(f'url before request: {url}')
print(params)
response = session.request('get', url,  params=params)
print(f'url after request: {response.request.url}')
params['page'] = params['page'] + 1
print('*' * 60)

if __name__ == '__main__':
main()

与输出…

url before request: https://httpbin.org/get
{'page': 1}
url after request: https://httpbin.org/get?page=1
************************************************************
url before request: https://httpbin.org/get
{'page': 2}
url after request: https://httpbin.org/get?page=2
************************************************************
url before request: https://httpbin.org/get
{'page': 3}
url after request: https://httpbin.org/get?page=3
************************************************************
url before request: https://httpbin.org/get
{'page': 4}
url after request: https://httpbin.org/get?page=4
************************************************************
url before request: https://httpbin.org/get
{'page': 5}
url after request: https://httpbin.org/get?page=5
************************************************************
url before request: https://httpbin.org/get
{'page': 6}
url after request: https://httpbin.org/get?page=6
************************************************************
url before request: https://httpbin.org/get
{'page': 7}
url after request: https://httpbin.org/get?page=7
************************************************************
url before request: https://httpbin.org/get
{'page': 8}
url after request: https://httpbin.org/get?page=8
************************************************************
url before request: https://httpbin.org/get
{'page': 9}
url after request: https://httpbin.org/get?page=9
************************************************************
url before request: https://httpbin.org/get
{'page': 10}
url after request: https://httpbin.org/get?page=10
************************************************************

我终于想通了。正如@TimRoberts怀疑URL实际上在代码的其他地方被response.request.url覆盖。这个问题已经解决了,现在一切正常。

相关内容

  • 没有找到相关文章