计数python中标点字符之间的单词数



我想使用python来计算文本输入块中某些标点符字符之间发生的单词数。例如,到目前为止写的所有内容的这种分析可能被表示为:

[23,2,14]

...因为第一句话没有标点符号,除了最后的时期,有23个单词,下一步的"例如"短语有两个,其余的则以结肠结尾,有14个。

这可能不会太难做到,但是(与"不重新发明轮子"哲学看起来特别是Pythonic的哲学)是否已经有任何特别适合任务的东西了?

punctuation_i_care_about="?.!"
split_by_punc =  re.split("[%s]"%punctuation_i_care_about, some_big_block_of_text)
words_by_puct = [len(x.split()) for x in split_by_punc]

乔兰(Joran)击败了我,但我会添加我的方法:

from string import punctuation
import re
s = 'I want to use Python to count the numbers of words that occur between certain punctuation characters in a block of text input. For example, such an analysis of everything written up to this point might be represented as'
gen = (x.split() for x in re.split('[' + punctuation + ']',s))
list(map(len,gen))
Out[32]: [23, 2, 14]

(我爱map

最新更新