使用Python将文本数据插入SQLite会在字符串格式化时产生TypeError.的字符串.为什么



我正在尝试拖网新闻组以测试一些基于文本的分组算法,获取成批的新闻组标题并将它们粘贴到 SQLite 数据库中。数据库就像它得到的那样简单,所有数据都有文本列,Python 的 nntp 库获取的标头数据总是为每个标头提供 8 个值。除了一个之外,其他都是字符串,在将数据插入数据库之前,我将唯一的非字符串转换为字符串。尽管如此,Python还是因为相当无用的"TypeError:并非所有参数在字符串格式化期间转换"错误而失败,这与仅仅说"错误:祝你好运,你靠自己"相比只是略微的一步。

了解字符串到字符串的字符串格式如何出错的人是否比我知道以下代码中出了什么问题?

import nntplib, sqlite3
# newsgroup settings (modify so that this works for you =)
server = 'news.yournewsgroup.com'
port = 119
username = 'your name here'
password = 'your password here'
# set up the newsgroup and sqlite connections
connection = nntplib.NNTP(server, port, username, password)
newsgroup = "comp.graphics.algorithms"
connection.group(newsgroup)
database = sqlite3.connect(newsgroup + ".db")
# create a table definition if it doesn't exist yet
try:
  # SQLite doesn't actually have data types. Everything as stored as plain text.
  # And so is newsgroup data. Bonus!
  database.execute("""CREATE TABLE headers (articleNumber text, subject text,
                                            poster text, date text, id text,
                                            references text, size text,
                                            lines text)""")
except:
  # table definition already exists. Not actually an error.
  pass
# Get the group meta-data, and set up iterator values for running
# through the header list.
resp, count, first, last, name = connection.group(newsgroup)
total = int(last) - int(first)
step = 10000
steps = total / step;
articleRange = first + '-' + str(int(first)+step)
# grab a batch of headers
print "[FETCHING HEADERS]"
resp, list = connection.xover(first, str(int(first)+step))
print "done."
# process the fetched headers
print "[PROCSSING HEADERS]"
for entry in list:
  # Unpack immutable tuple, mutate (because the references list
  # should be a string), then repack.
  articleNumber, subject, poster, date, id, references, size, lines = entry
  argumentList = (articleNumber, subject, poster, date, id, (",".join(references)), size, lines)
  try:
    # try to chronicle the header information. THIS WILL GO WRONG AT SOME POINT.
    database.execute("""INSERT INTO headers (articleNumber, subject, poster,
                                             date, id, reference, size, lines)
                                    VALUES ('?', '?', '?',
                                            '?', '?','?', '?', '?')"""
                                    % argumentList)
  except TypeError as err:
    # And here is an irking point with Python in general. Something went
    # wrong, yet all it tells us is "not all arguments converted during
    # string formatting". Despite that error being generated at a point
    # where the code knows WHICH argument was the problem.
    print err
    print type(argumentList[0]), argumentList[0]
    print type(argumentList[1]), argumentList[1]
    print type(argumentList[2]), argumentList[2]
    print type(argumentList[3]), argumentList[3]
    print type(argumentList[4]), argumentList[4]
    print type(argumentList[5]), argumentList[5]
    print type(argumentList[6]), argumentList[6]
    print type(argumentList[7]), argumentList[7]
    # A quick print set shows us that all arguments are already of type
    # "str", and none of them are empty... so it would take quite a bit
    # of work to make them fail at being legal strings... Wat?
    exit(1)
print "done."
# cleanup
database.close()
connection.quit()

该错误告诉您的是,您为字符串格式(%)提供了n个值,但格式字符串应小于n个值。具体来说,这个字符串:

"""INSERT INTO headers (articleNumber, subject, poster,
                        date, id, reference, size, lines)
          VALUES ('?', '?', '?',
                  '?', '?','?', '?', '?')"""

不需要任何%样式字符串格式的值。里面没有%d,没有%s,什么都没有。相反,?占位符用于数据库 API 的参数替换。您不会使用 % 运算符调用它(这里根本不需要它)。而是将值序列作为第二个参数传递给execute调用。此外,您需要从占位符中删除引号,以指示它们应该是占位符,而不是恰好包含单引号字符的字符串文字。总结:

database.execute("""
    INSERT INTO headers (articleNumber, subject, poster,
                         date, id, reference, size, lines)
    VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", # note: comma, not %
     argumentList)

你不想那样做 - 这是不安全的,容易出错。

您需要使用以下模式:

argumentList = [1, 2, 3, 4, 5, 6, 7, 8] # or whatever
insert_stament = """INSERT INTO headers (articleNumber, subject, poster,
                                         date, id, reference, size, lines)
                                VALUES (?, ?, ?,
                                        ?, ?, ?, ?, ?)"""
cursor.execute(insert_statement, argumentList)

相关内容

最新更新