我有一个脚本,读取simpledb域并将它们写入s3。演出充其量不过是平庸。有没有办法提高阅读速度?
import boto
import datetime
from xml.dom.minidom import Document
from boto.s3.key import Key
awsa = "myawsaccesskey"
awss = "myawssecretkey"
conn = boto.connect_sdb(awsa, awss)
domains = conn.get_all_domains()
s3conn = boto.connect_s3(awsa, awss)
archbucket = s3conn.get_bucket("simpledbbu")
for d in domains:
print d.name
doc = None
doc = Document()
root = doc.createElement("items")
doc.appendChild(root)
countermax = 0
counter = 0
for item in d:
print "loading {0} of {1}".format(counter,countermax)
counter += 1
node = doc.createElement("item")
node.setAttribute("itemName", item.name)
for k,v in item.items():
if not isinstance(v, basestring):
i = 0
for val in v:
node.setAttribute("{0}::{1}".format(k,i),val)
i += 1
else:
node.setAttribute(k,v)
root.appendChild(node)
k = Key(archbucket)
k.key = "{0}/{1}".format(datetime.date.today().strftime("%Y%m%d"),d.name)
x = doc.toprettyxml(indent=" ")
k.set_contents_from_string(x)
配置文件:
ncalls tottime percall cumtime percall filename:lineno(function)
2035 1.312 0.001 1.312 0.001 {built-in method read}
2 0.445 0.223 0.445 0.223 {built-in method do_handshake}
17 0.355 0.021 0.355 0.021 {built-in method write}
2 0.321 0.161 0.321 0.161 {_ssl.sslwrap}
2 0.292 0.146 0.292 0.146 {_socket.getaddrinfo}
2 0.177 0.089 0.177 0.089 {method 'connect' of '_socket.socket' objects}
14 0.012 0.001 0.077 0.005 {built-in method Parse}
2 0.01 0.005 0.047 0.023 __init__.py:24(<module>)
3369 0.01 0 0.012 0 item.py:71(endElement)
1 0.009 0.009 3.185 3.185 backupSimpleDb_0.0.py:1(<module>)
3508 0.009 0 0.03 0 expatreader.py:300(start_element)
4145 0.008 0 0.011 0 StringIO.py:208(write)
3508 0.007 0 0.019 0 handler.py:31(startElement)
3508 0.007 0 0.02 0 handler.py:37(endElement)
1 0.006 0.006 0.006 0.006 {nt.urandom}
3114 0.006 0 0.009 0 item.py:58(startElement)
1208 0.005 0 0.005 0 minidom.py:343(__init__)
1208 0.005 0 0.025 0 minidom.py:686(setAttribute)
258/3 0.005 0 0.024 0.008 minidom.py:794(writexml)
1 0.004 0.004 0.007 0.007 exception.py:26(<module>)
1 0.004 0.004 0.007 0.007 expatreader.py:4(<module>)
1 0.004 0.004 0.007 0.007 urllib.py:23(<module>)
1 0.004 0.004 0.006 0.006 utils.py:5(<module>)
并行运行5或10个线程,每个线程复制一个域。
如果你使用一个队列来复制域,并让线程等待队列中的元素,这实际上是非常简单的。
http://www.ibm.com/developerworks/aix/library/au-threadingpython/我同意公认的答案,但是如果您只是想在s3上备份您的simpledb域,那么我使用boto的to_xml()函数已经取得了不错的性能。此外,您不需要滚动自己的simpledb解析器。