我在一个文件中有大约 20 万个imdb_id
,并希望使用 API 从这些imdb_id
中获取JSON
信息omdb
。
我写了这段代码,它工作正常,但它很慢(每个 id 需要 3 秒,需要 166 小时(:
import urllib.request
import csv
import datetime
from collections import defaultdict
i = 0
columns = defaultdict(list)
with open('a.csv', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
with open('a.csv', 'r', encoding='utf-8') as csvinput:
with open('b.csv', 'w', encoding='utf-8', newline='') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[0] == "item_id":
writer.writerow(row + ["movie_info"])
else:
url = urllib.request.urlopen(
"http://www.omdbapi.com/?i=tt" + str(columns['item_id'][i]) + "&apikey=??????").read()
url = url.decode('utf-8')
writer.writerow((row + [url]))
i = i + 1
使用 python ???从 omdb 获取电影信息的最快方法是什么
**编辑:我写了这段代码,在得到 1022 url resopnse 后,我遇到了这个错误:
import grequests
urls = open("a.csv").readlines()
api_key = '??????'
def exception_handler(request, exception):
print("Request failed")
# read file and put each lines to an LIST
for i in range(len(urls)):
urls[i] = "http://www.omdbapi.com/?i=tt" + str(urls[i]).rstrip('n') + "&apikey=" + api_key
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests, exception_handler=exception_handler)
with open('b.json', 'wb') as outfile:
for response in responses:
outfile.write(response.content)
错误是 :
Traceback (most recent call last):
File "C:/python_apps/omdb_async.py", line 18, in <module>
outfile.write(response.content)
AttributeError: 'NoneType' object has no attribute 'content'
如何解决此错误???
这段代码是IO绑定的,使用Python的async/await功能将受益匪浅。 您可以遍历您的 URL 集合,为每个 URL 创建一个异步执行的请求,与此 SO 问题中的示例非常相似。
异步发出这些请求后,可能需要将请求速率限制为 OMDB API 限制内的值。