使用$limit和$offset在SODAneneneba API上获取1000多行



我使用Python 中的以下代码使用SODA API提取数据

response = requests.get('https://healthdata.gov/resource/uqq2-txqb.json')

数据集包含434865行,但当我使用API时,它只返回前1000行。我在另一个问题上看到,$limit可以用于获得前50000行,但我如何将其与$offset组合以获得所有434865行?

**我知道了如何使用$offset,现在有了结果代码,有什么方法可以浓缩它吗?

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=50001')
response3 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=100002')
response4 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=150003')
response5 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=200004')
response6 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=250005')
response7 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=300006')
response8 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=350007')
response9 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=400008')

这被称为paging,您可以在这里找到例如文档:https://dev.socrata.com/docs/paging.html

其中还规定了API有两个版本:

  • v2.0,其中$limit最大可为50000
  • v2.1,其中$limit不受限制

您使用的端点似乎支持v2.1,至少基于此https://dev.socrata.com/foundry/healthdata.gov/uqq2-txqb因此,您应该能够为$limit使用一个大值,并一次检索整个集合。

在进行分页路由时,$offset的值为0-based,因此您的查询应该正确地重写为:

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=50000')
response3 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=100000')
response4 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=150000')

注意$limit倍数上的对齐。

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=50001')
response3 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=100002')
response4 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=150003')
response5 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=200004')
response6 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=250005')
response7 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=300006')
response8 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=350007')
response9 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=400008')
# Initialize variables for pagination
limit = 50000
offset = 0
data = []
while True:
# Set query parameters
params = {
'$limit': limit,
'$offset': offset
}
# Make a GET request to the API endpoint with the query parameters
response = requests.get(url, headers=headers, params=params)

相关内容

  • 没有找到相关文章

最新更新