Python的SimpleHTTPServer在线程中启动不会关闭端口



我有以下代码:

import os
from ghost import Ghost
import urlparse, urllib
import SimpleHTTPServer
import SocketServer
import sys, traceback
from threading import Thread, Event
from time import sleep
please_die = Event() # this is my enemy
httpd = None
PORT = 8001
address = 'http://localhost:'+str(PORT)+'/'
search_dir = './category'
def main():
    """
      basic run script routine, 
      FIXME: is supossed to exits gracefully
    """
    thread = Thread(target = simpleServe)
    try:
      thread.start()
      run()
    except KeyboardInterrupt:
      print "Shutdown requested"
    except Exception:
      traceback.print_exc(file=sys.stdout)
    shutdown()
    sys.exit(0)
def shutdown():
  global httpd
  global please_die
  print "Shutting down"
  # A try - except for the shutdown routine
  try:
    please_die.wait() # how do you do? 
    httpd.shutdown() # Please! I whant to run you multiple times. 
    print "Have you died?"
  except Exception:
    traceback.print_exc(file=sys.stdout)
def path2url(path):
  """
  constructs an url from a relative path / concatenates the global address
  variable with the path given
  """
  global address
  return urlparse.urljoin(address, urllib.pathname2url(path))
def simpleServe():
  global httpd, PORT
  please_die.set() # Attaching the event to this thread
  # Start the service
  Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
  httpd = SocketServer.TCPServer(("", PORT), Handler)
  print "serving at port", PORT
  # And loop infinetly in the hope that I can stop you later
  httpd.serve_forever()
def run():
  global search_dir;
  ghost = Ghost() # the webkit facade
  with ghost.start() as session:
    session.set_viewport_size(2560, 1600) # "retina" size
    for directory, subdirectories, files in os.walk(search_dir):
        for file in files:
            path = os.path.join(directory, file)
            urlPath = path2url(path)
            process(session, urlPath);
def process(session, urlPath):
  page, resources = session.open(urlPath)
  assert page.http_status == 200
  # ... other asserts here 

if __name__ == '__main__':
  main()

这个想法是做一个脚本,启动一个"简单的http服务器",做一些请求,然后退出。

第一次运行时没有任何问题:

...
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /category/52003.html HTTP/1.1" 200 -
127.0.0.1 - - [31/Jul/2015 13:16:17] "GET /static/img/glyphicons-halflings.png HTTP/1.1" 200 -
Shutting down
Have you died?

第二次启动时崩溃,说:

地址已被使用

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "download-images.py", line 51, in simpleServe
    httpd = SocketServer.TCPServer(("", PORT), Handler)
  File "/usr/lib/python2.7/SocketServer.py", line 420, in __init__
    self.server_bind()
  File "/usr/lib/python2.7/SocketServer.py", line 434, in server_bind
    self.socket.bind(self.server_address)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use

如果我杀死所有python进程,那么脚本将再次运行,因为我假设我使用了错误的线程,但是我找不到。

更新

忘了说了,

my OS is:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.04
Release:        15.04
Codename:       vivid

我使用的python是:

$ python --version
Python 2.7.9

$ netstat -putelan | grep 8001打印:

$ netstat -putelan | grep 8001
(Not all processes could be identified, non-owned process info
    cp        0      0 127.0.0.1:34691         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34866         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34798         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34588         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34647         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34915         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34674         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34451         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34930         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34606         TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34505         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:34717         127.0.0.1:8001          TIME_WAIT   0          0           -               
    tcp        0      0 127.0.0.1:8001          127.0.0.1:34670         0      0 127.0.0.1:8001          127.0.0.1:34626         
...

我不能发布整个序列(由于stackoverflow的帖子限制)。其余34***口与8001口按统一顺序混合即可。

正如@LFJ所说,这可能是由于TCPServerallow_reuse_address属性。

httpd = SocketServer.TCPServer(("", PORT), Handler, bind_and_activate=False)
httpd.allow_reuse_address = True
try:
    httpd.server_bind()
    httpd.server_activate()
except:
    httpd.server_close()
    raise

等效代码:

SocketServer.TCPServer.allow_reuse_address = True
https = SocketServer.TCPServer(("", PORT), Handler)

让我们解释一下为什么。

当您启用TCPServer.allow_reuse_address时,它会在套接字上添加一个选项:

class TCPServer:
    [...]
    def server_bind(self):
        if self.allow_reuse_address:
            self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        [...]

socket.SO_REUSEADDR是什么?

This socket option tells the kernel that even if this port is busy (in
the TIME_WAIT state), go ahead and reuse it anyway.  If it is busy,
but with another state, you will still get an address already in use
error.  It is useful if your server has been shut down, and then
restarted right away while sockets are still active on its port.  You
should be aware that if any unexpected data comes in, it may confuse
your server, but while this is possible, it is not likely.       

实际上,它允许重用您的套接字套接字绑定地址。如果另一个进程在套接字未监听时尝试绑定,则允许该进程使用此套接字绑定地址。

你需要启用的原因是因为你没有正确关闭你的TCPServer。为了正确关闭它,您必须运行shutdown方法,该方法将关闭由server_forever启动的线程,然后通过调用server_close方法正确关闭套接字。

def shutdown():
    global httpd
    global please_die
    print "Shutting down"
    try:
        please_die.wait() # how do you do? 
        httpd.shutdown() # Stop the serve_forever
        httpd.server_close() # Close also the socket.
    except Exception:
        traceback.print_exc(file=sys.stdout)

我看到TCPServer的源代码:

def server_bind(self):
        """Called by constructor to bind the socket.
        May be overridden.
        """
        if self.allow_reuse_address:
            self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        self.socket.bind(self.server_address)
    self.server_address = self.socket.getsockname()

allow_reuse_address应该在bind之前设置。所以试试这个:

SocketServer.TCPServer.allow_reuse_address=True
httpd = SocketServer.TCPServer(("", PORT), Handler)

您没有在服务器关闭后清理它。这意味着你留下了无用的套接字资源,操作系统不会在进程结束后立即清理这些资源。

你需要在调用httpd.serve_forever()之后的finally块中调用httpd.server_close()。这个调用告诉操作系统释放任何可能与给定的服务器实例相关联的资源。

try:
    httpd.serve_forever()
finally:
    httpd.server_close()

相关内容

  • 没有找到相关文章

最新更新