用Python计算字符串中的标点符号百分比

我一直在计算句子中标点符号的百分比。出于某种原因，我的函数在执行双倍间距时有效，但计算所有字符和空白。例如，我有一个文本DEACTIVATE: OK，所以当我减去标点符号时，总的全长是14，然后长度是13，所以百分比应该是1/13 = 7.63%，然而，我的函数给了我7.14%，基本上是1/14 = 7.14%。

另一方面，如果只有一个空白，我的函数会给我一个错误

"ZeroDivisionError: division by zero".

这是我的代码供您参考，还有一个简单的文本示例

text= "Centre to position, remaining shift is still larger than maximum (retry nbr=1, centring_stroke.r=2.7662e-05, max centring stroke.r=2.5e-05)"
text2= "DEACTIVATE: KU-1421"

导入字符串

def count_punct(text):
count = sum([1 for char in text if char in string.punctuation])
return round(count/(len(text) - text.count("  ")), 3)*100
df_sub['punct%'] = df_sub['Err_Text2'].apply(lambda x: count_punct(x))
df_sub.head(20)

在这里，进行这些小的更改，count_punct函数就会启动并运行。。代码中断的原因是检查的是___而不是_。即3个连续的空间而不是一个空间。这就是为什么差异总是导致相同的值。

import string
def count_punct(text):
if text.strip() == "": # To take of care of all space input
return 0
count = sum([1 if char in string.punctuation else 0 for char in text ])
spaces = text.count(" ") # Your error is here, Only check for 1 space instead of 3 spaces
total_chars = len(text) - spaces
return round(count / total_chars, 3)*100
text= "DEACTIVATE: OK"
print(count_punct(text))

输出：

7.7

对于零除以误差。当total_chars为0时，这是一个逻辑错误，因为字符串的length和number of spaces都相等。因此，差值为0。

要解决这个问题，你可以简单地添加一个if语句(上面已经添加(

if text.strip() == "":
print(0)

相关内容

最新更新

热门标签：