确保子例程代码的原子性



我有以下代码

#!/bin/env python
# http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity
from multiprocessing import Process
from multiprocessing import Queue
import time
import os
# Define an output queue
output = Queue()
# define a example function
def f(x, output):
    time.sleep(.5)
    ppid = os.getppid()   # PPID
    pid  = os.getpid()     # PID
    # very computing intensive operation
    result = 10*x
    print "(%s, %s, %s)" % (pp, p, result)
    time.sleep(.5)
    # store result as tuple
    result = (ppid, pid, result)
    output.put(result)
    # return result

def queue_size(queue):
    size = int(queue.qsize())
    print size
# Print parent pid
print "Parent pid: %s" % os.getpid()
# Setup a list of processes that we want to run
processes = [Process(target=f, args=(x, output)) for x in range(1,11)]
# Run processes
for p in processes:
    p.start()
# Process has no close attribute
# for p in processes:
#     p.close()
# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
print "Order of result might be different from order of print"
print "See: http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity"
print ""
results = [output.get() for p in processes]
print(results)

我想用这样的多个语句替换print "(%s, %s, %s)" % (pp, p, result):

print "ppid: %s" % ppid
print "pid:  %s" % pid
print "result: %s" % result
print "#####################"
为了达到这个目的,我选择了信号量来确保输出是原子的。这是修改后的版本:
#!/bin/env python
# http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity
from multiprocessing import Process
from multiprocessing import Queue
import threading
import time
import os
max_threads = 1
semaphore = threading.BoundedSemaphore(max_threads)
# Define an output queue
output = Queue()
# define a example function
def f(x, output):
    time.sleep(.5)
    ppid = os.getppid()   # PPID
    pid  = os.getpid()     # PID
    # very computing intensive operation
    result = 10*x
    # print "(%s, %s, %s)" % (pp, p, result)
    semaphore.acquire()
    print "ppid: %s" % ppid
    print "pid:  %s" % pid
    print "result: %s" % result
    print "#####################"
    semaphore.release()
    time.sleep(.5)
    # store result as tuple
    result = (ppid, pid, result)
    output.put(result)
    # return result

def queue_size(queue):
    size = int(queue.qsize())
    print size
# Print parent pid
print "Parent pid: %s" % os.getpid()
# Setup a list of processes that we want to run
processes = [Process(target=f, args=(x, output)) for x in range(1,11)]
# Run processes
for p in processes:
    p.start()
# Process has no close attribute
# for p in processes:
#     p.close()
# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
print "Order of result might be different from order of print"
print "See: http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity"
print ""
results = [output.get() for p in processes]
print(results)

但是这些操作似乎不是原子的(PID 10269和PID 10270),并且信号量没有帮助,这里是输出:

Parent pid: 10260
ppid: 10260
pid:  10264
result: 40
#####################
ppid: 10260
pid:  10263
result: 30
#####################
ppid: 10260
pid:  10265
result: 50
#####################
ppid: 10260
pid:  10262
result: 20
#####################
ppid: 10260
pid:  10267
result: 70
#####################
ppid: 10260
pid:  10268
result: 80
#####################
ppid: 10260
pid:  10261
result: 10
#####################
ppid: 10260
ppid: 10260
pid:  10269
pid:  10270
result: 90
result: 100
#####################
#####################
ppid: 10260
pid:  10266
result: 60
#####################
Order of result might be different from order of print
See: http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity
[(10260, 10264, 40), (10260, 10263, 30), (10260, 10265, 50), (10260, 10267, 70), (10260, 10262, 20), (10260, 10268, 80), (10260, 10261, 10), (10260, 10270, 100), (10260, 10269, 90), (10260, 10266, 60)]

为什么?

您正在使用进程运行f,但是您正在尝试使用线程信号量进行同步。您在这里混合了不兼容的多任务模型。在程序中使用的进程在不同的内存空间中运行,并且具有独立的程序计数器,这意味着您不能像在单个程序中运行一样同步它们。线程在单个程序、共享内存中运行它们。

我的意思是,processes中的每个进程都将作为一个独立的程序运行。您可以尝试使用多处理。锁,但是我认为仅仅为了打印调试输出而锁独立的程序是没有意义的。

相反,我建议您更改您的print语句:

print("ppid: {}n"
      "pid:  {}n"
      "result: n"
      "#####################".format(ppid, pid, result))

注意,你可以把分隔的字符串放在一起,python解释器会自动将它们连接起来。还介绍了n插入换行符。我还更改了print()函数和format(),不赞成使用%

使用这种方法,混合输出的概率较小,但仍然可能发生。如果不够好,可以使用多处理。锁而不是线程。

最新更新