处理超时异常和时间.在Urllib2 + pool.map中休眠



我是python的新手,我编写了一些代码来从Web API下载数据。但是,在使用API时,我必须遵守一些限制:

  • 每个API密钥每秒1个请求
  • 如果超时,等待30秒再尝试
  • 每个API密钥每天限制100k请求

向Web API发出请求的方法的代码是:

def getMatchDetails(self,match_id):
    '''Calls the WEB Api and requests the data for the match with
    a specific id (in match_id). Then returns the data already decoded 
    from json.'''
    import urllib2
    import json
    import time
    url = self.__makeUrl__(api_key= self.api_key, parameters = ['match_id='+str(match_id)])
    # Sometimes a time out occurs, we keep trying
    while True:
        try:
            start = time.time()
            json_obj = urllib2.urlopen(url)
            end = time.time()
            if end - start < 1:
                time.sleep(1 - (end - start))
        except:
            print('Timed Out, Trying again in 30 seconds')
            time.sleep(30)
            continue
        else:
            break
    detailed_data = json.load(json_obj)
    return detailed_data

方法makeUrl简单地连接一些字符串并返回它们。为了在每次调用上述方法时更改API密钥,我使用:

def getMatchDetailsForMap(self,match_id):
    self.counter += 1
    self.api_key = self.api_keys[self.counter%len(self.api_keys)]
    return self.getMatchDetails(match_id)

自我的地方。api_keys是一个包含我所有API密钥的列表。然后,我在以下代码中使用方法getMatchDetailsForMap和map函数:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(14)
ids_to_get = self.__idsToGetChunks__(14)
for chunk in ids_to_get:
        results = pool.map(self.getMatchDetailsForMap,chunk)

方法idsToGetChunks返回一系列带有参数(match_id)的列表(块),这些参数被馈送给getMatchDetailsForMap方法。

问题:

    通过对代码的实验,我意识到每个键的1秒限制不成立;为什么呢?当超时发生时,它确实减慢了获取数据的过程;当使用map时,是否有更好的方法来处理这种异常?(建议请)

谢谢你的阅读和帮助!不好意思,写了这么久

为了满足这三个要求,我建议编写一个简单的for循环,每个循环做一个请求。一般来说,等一秒钟。如果出现超时,请等待30秒。不要循环超过10万次。(我假设这个脚本每天运行一次,并且需要不到24小时;))

主程序为每个API密钥启动一个Process

简单!

# 1 request per second per API key
# If a timeout occurs, wait 30 seconds before trying again
# Limit of 100k requests per day per API key
import logging, time, urllib2
import multiprocessing as mp
def do_fetch(key, timeout):
    return urllib2.urlopen(
        'http://example.com', timeout=timeout
    ).read()
def get_data(api_key):
    logger = mp.get_logger()
    data = None
    # Limit of 100k requests per day per API key
    for num in range(100*1000): 
        t = 1 if num!=1 else 0 # test timeout exception
        try:
            data = do_fetch(api_key, timeout=t)
            logger.info('%d bytes', len(data))
        except urllib2.URLError as exc:
            logger.error('exc: %s', repr(exc))
            # If a timeout occurs, wait 30 seconds before trying again
            time.sleep(3)
        else:
            # "1 request per second per API key"
            time.sleep(1)

mp.log_to_stderr(level=logging.INFO)
keys = [123, 234]
pool = mp.Pool(len(keys))
pool.map( get_data, keys )

输出
[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-2] 1270 bytes
[INFO/PoolWorker-1] 1270 bytes
[ERROR/PoolWorker-2] exc: URLError(error(115, 'Operation now in progress'),)
[ERROR/PoolWorker-1] exc: URLError(error(115, 'Operation now in progress'),)
[INFO/PoolWorker-2] 1270 bytes
[INFO/PoolWorker-1] 1270 bytes

最新更新