Python 2.7：使用正则表达式在文件中查找整数字符串并添加它们：int() 的文字无效，基数为 10

我正在尝试读取一个.txt文件（数据是ASCII教科书材料的东西），这些数字散布在整个文件中。我正在尝试提取这些数字，以使用Regex将它们标记为列表，并最终将所有值作为整数添加到一个总和变量中并打印。问题是我运行此代码时：

import re
hand = open('regexTextData.txt')
numbers = list()
for line in hand:
        if len(line) == 0: continue
        extractedNumbers = re.findall('[0-9+]', line)
        numbers = extractedNumbers + numbers
total = 0
for i in range(len(numbers)):
        value = int(numbers[i])
        total = total + value
print(total)

我遇到了一个错误：

Traceback (most recent call last):
  File "sum_numbers_in_text_regex.py", line 13, in <module>
    value = int(numbers[i]) 
ValueError: invalid literal for int() with base 10: '+'

这里到底出了什么问题？我尝试查看其他解决方案，但无济于事。如果我错过了一个覆盖的页面，我想知道。

提前感谢阅读

for n in range(len(numbers)):

不是

for n in len(numbers):

最终编辑：完成程序

import re
hand = open('regexTextData.txt')
numbers = [] # no need of writing out list(), just use []
for line in hand:
        if len(line) == 0: continue
        extractedNumbers = re.findall('[0-9]+', line) # Do not use '+' as that matches the '+' symbols.
        numbers = extractedNumbers + numbers
total = 0
for i in range(len(numbers)):
        value = int(numbers[i]) # Now all your values in numbers should be in numerical string form.
        total = total + value
print(total)

只需将正则表达式模式更改为'（[0-9] ）'，它将识别所有数字的字符串。这修复了程序。

您的主要问题是正则。假设我们有一些示例文本为 line = "0 and 1 and 2 and 2 + and yes mate"

re.findall('[0-9+]', line) # Outputs: ['0', '1', '2', '2', '+']. We have matched a '+' because you have include the plus symbol in your regex.

解决方案（删除）：

re.findall('([0-9]+)', line) # Outputs: ['0', '1', '2', '2'] # No more '+'.

奖金：如果您有兴趣，也可以替换此代码：

total = 0
for i in range(len(numbers)):
        value = int(numbers[i]) # Now all your values in numbers should be in numerical string form.
        total = total + value

使用此简化的代码：

total = sum(map(lambda x: int(x), numbers))

lambda是一个匿名功能，将x作为输入并输出int(x)。map是在numbers的每个元素上应用功能（我们的LAMDA函数）的功能。最后， sum将简单地添加到一个估计中的数字（应用了返回估计的 map函数后，我们只有整数）。

我喜欢您发布的解决方案，它可能更有效，但是为了了解正则是我需要使用Regex的目的。不过要欣赏替代解决方案。

您正在尝试迭代整数。相反，尝试迭代范围：

for n in range(len(numbers)):
    value = int(numbers[n])
    sum = sum + value

还请注意从numbers[i]到numbers[n]的更改。

相关内容

最新更新

热门标签：