将文本拆分为不同的进程



我试图在不同的进程之间划分一个字符串,每个进程都会提取字符串的一部分并应用split((函数,结果是我不知道如何使每个进程在不剪切单词的情况下提取字符串的部分,例如:

0 received:  Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's sta
1 received:  ndard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a typ
2 received:  e specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining
3 received:   essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum
4 received:  passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum

我希望进程0占用"0";标准";而不是";sta";。我怎么能得到这个?

import concurrent.futures

def split_text(text, process_id):
print(process_id, "received: ", text)
return text.split()

def main():
text = "Lorem Ipsum is simply dummy text of the printing and " 
"typesetting industry. Lorem Ipsum has been the industry's " 
"standard dummy text ever since the 1500s, when an unknown " 
"printer took a galley of type and scrambled it to make a type " 
"specimen book. It has survived not only five centuries, but also " 
"the leap into electronic typesetting, remaining essentially unchanged. " 
"It was popularised in the 1960s with the release of Letraset sheets " 
"containing Lorem Ipsum passages, and more recently with desktop publishing " 
"software like Aldus PageMaker including versions of Lorem Ipsum"
n_processes = 5
results = []
with concurrent.futures.ProcessPoolExecutor() as executor:
for process_id in range(n_processes):
results.append(executor.submit(split_text, text[int((process_id / n_processes) * len(text)):
int(((process_id + 1) / n_processes) * len(text))],
process_id))

if __name__ == '__main__':
main()

考虑一下:

import concurrent.futures
def split_text(text, process_id):
print(process_id, "received: ", text)
return text.split()
def dole( body, chunk ):
start = 0
while 1:
i = chunk
if start+i >= len(body):
return body[start:]
while body[start+i] != ' ':
i -= 1
yield body[start:start+i]
start += i + 1
def main():
text = "Lorem Ipsum is simply dummy text of the printing and " 
"typesetting industry. Lorem Ipsum has been the industry's " 
"standard dummy text ever since the 1500s, when an unknown " 
"printer took a galley of type and scrambled it to make a type " 
"specimen book. It has survived not only five centuries, but also " 
"the leap into electronic typesetting, remaining essentially unchanged. " 
"It was popularised in the 1960s with the release of Letraset sheets " 
"containing Lorem Ipsum passages, and more recently with desktop publishing " 
"software like Aldus PageMaker including versions of Lorem Ipsum"
n_processes = 5
results = []
with concurrent.futures.ProcessPoolExecutor() as executor:
for process_id, chunk in enumerate(dole(text, len(text)//n_processes)):
results.append(executor.submit(
split_text, 
chunk,
process_id))
if __name__ == '__main__':
main()

相关内容

  • 没有找到相关文章

最新更新