使用多处理的Python代码在Windows上工作,但在Ubuntu上不起作用



我正在尝试从服务器下载文件,并使用转换器模型将其处理成单独的进程。为了练习在多进程中使用队列,我编写了一个小示例,它在Windows上运行良好,但在Ubuntu上不能运行。在Ubuntu上,当文件传输到transformer时,进程停止做任何事情,似乎暂停了。在Ubuntu的终端上,我看到了"Take"只有两次(取决于参数num_process)

import multiprocessing as mc
import os
import time
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")
model.config.forced_decoder_ids = None

def speech_to_text(path_to_audio):
sound, sr = librosa.load(path_to_audio, sr=16000)
input_features = processor(sound, sampling_rate=sr,
return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
return transcription

class Test:
def __init__(self):
self.queue = mc.Queue()
def put_items(self):
for i in range(100):
self.queue.put((i, 'speech.wav'))
time.sleep(2)
def process(self):
while True:
if self.queue.empty():
time.sleep(1)
continue
i, item = self.queue.get()
print("Take", i, os.getpid())
print(i, speech_to_text(item)[0], os.getpid())
def run(self, num_process):
put = mc.Process(target=self.put_items)
put.start()
processes = []
for _ in range(num_process):
p = mc.Process(target=self.process)
processes.append(p)
p.start()
put.join()

if __name__ == '__main__':
t = Test()
t.run(num_process=2)

尝试使用spawn-method创建进程,看看是否有帮助。它是Windows的默认值,但不是Unix的。

if __name__ == '__main__':
mc.set_start_method('spawn')
t = Test()
t.run(num_process=2)

相关内容

  • 没有找到相关文章

最新更新