如何在python3中使用线程通过多个帐户获取数据



我想实现一个可以并行获取数据的函数。

背景为100个站点的信息,可从站点a中获取。

同一个账号不能一次使用多次,所以我在a站点创建了5个不同的账号,这样我可以用5个账号获取信息。

account info like

worker1 pawd
worker2 pawd
worker3 pawd
worker4 pawd
worker5 pawd

表示从A站点获取B站点的信息。然后你需要键入cmd,就像站点a上的get info for siteB_IP一样。

假设有100个ip存储在列表名称IPlist

如何通过线程并行获取5个可用账号的100个ip的信息,然后所有的信息都可以存储在一个变量中,没有冲突。

我所尝试的如下,以下代码无法执行,因为我没有办法实现解决方案:

import threading
user = 'root'
pwd = 'Changeme123'
# the first step is to logon with default account
rs = link.send_cmd(r':lognew:' + '"' + user + '","' + pwd + '"')
# then get all nebor ip from the logon site, the function parse_multi is used for parsing data
IPlist = parse_multi(link.send_cmd('get-IP-info:0xffff'))
def Fetchinfo(user, ip):
rs = link.send_cmd(r':lognew:' + '"' + user + '","' + pwd + '"')
areainfo = link.send_cmd('get info for ' + site_IP)

for ip in IPlist:
# how to handle 100 IPs in the situstion of 5 accounts avaliable ?    
thread = threading.thread(target = Fetchinfo, args = [worker, ip]

由于您不希望来自相同帐户id和密码的调用同时发生,因此您可以定义一个函数,该函数依次遍历ip子列表以同步获取:

def fetch_data_for_ips(account_id, account_password, ips_to_fetch):
results = list()
for ip_to_fetch in ips_to_fetch:
# fetch with the account_id and password synchronously
result = ...
results.append(result)
return results // Added this

然后,使用线程池,为每个帐户并发地运行不同的批处理:

from concurrent.futures import ThreadPoolExecutor, as_completed
# Split the workload for each account to fetch
num, remainder = divmod(len(ip_list), len(accounts))
num_ips_for_each_account = num + bool(remainder)
# This gives e.g. [[1,2,3], [4,5,6]], where each sublist is for each account to fetch
ip_lists_for_each_account = [ip_list[i: i + num_ips_for_each_account] for i in range(0, len(ip_list), num_ips_for_each_account)]

# You should only need number of threads = to the number of accounts you have
with ThreadPoolExecutor(len(accounts)) as executor:
# Feel free to use a set instead if you don't need to know which result came from which thread
futures = dict()
results = list()
for (account_id, account_password), ips_to_fetch in zip(accounts, ip_lists_for_each_account):
future = executor.submit(fetch_data_for_ips, account_id, account_password, ips_to_fetch)
futures[future] = account_id
for future in as_completed(futures):
result = future.result()
account_id = futures[future]
print(f'{account_id} fetched these:', result)
results.extend(result)

您可以参考以下rshon建议的示例代码。

def fetch_data_for_ips(account_id,ips_to_fetch):
results = list()
for ip_to_fetch in ips_to_fetch:
# fetch with the account_id and password synchronously
result = ','.join((account_id,ip_to_fetch))
results.append(result)
return results
from concurrent.futures import ThreadPoolExecutor, as_completed
accounts = ['worker1','worker2','worker3','worker4','worker5']
ip_list = [str(_) for _ in range(10)]
# Split the workload for each account to fetch
num, remainder = divmod(len(ip_list), len(accounts))
num_ips_for_each_account = num + bool(remainder)
# This gives e.g. [[1,2,3], [4,5,6]], where each sublist is for each account to fetch
ip_lists_for_each_account = [ip_list[i: i + num_ips_for_each_account] for i in range(0, len(ip_list), num_ips_for_each_account)]
# You should only need number of threads = to the number of accounts you have
with ThreadPoolExecutor(len(accounts)) as executor:
# Feel free to use a set instead if you don't need to know which result came from which thread
futures = dict()
results = list()
for account_id, ips_to_fetch in zip(accounts, ip_lists_for_each_account):
future = executor.submit(fetch_data_for_ips, account_id, ips_to_fetch)
futures[future] = account_id
for future in as_completed(futures):
result = future.result()
account_id = futures[future]
print(f'{account_id} fetched these:', result)
results.extend(result)
output :
worker3 fetched these: ['worker3,4', 'worker3,5']
worker2 fetched these: ['worker2,2', 'worker2,3']
worker1 fetched these: ['worker1,0', 'worker1,1']
worker4 fetched these: ['worker4,6', 'worker4,7']
worker5 fetched these: ['worker5,8', 'worker5,9']

最新更新