通过逗号分隔,但有条件(忽略逗号分隔单词)

  • 本文关键字:分隔 单词 有条件 python regex
  • 更新时间 :
  • 英文 :


带有以下代码(有点混乱,我承认)我通过逗号分开一个字符串,但是条件是当字符串包含逗号分隔单词的单词时,它不会分开例子:它不会分开"Yup, there's a reason why you want to hit the sack just minutes after climax",但将"The increase in heart rate, which you get from masturbating, is directly beneficial to the circulation, and can reduce the likelihood of a heart attack"分开为['The increase in heart rate', 'which you get from masturbating', 'is directly beneficial to the circulation', 'and can reduce the likelihood of a heart attack']

问题是代码与这样的字符串遇到时的目的是失败的:"When men ejaculate, it releases a slew of chemicals including oxytocin, vasopressin, and prolactin, all of which naturally help you hit the pillow."我不希望在催产素之后,而是在催乳素之后进行分离。我需要一条正则要这样做。

import os
import textwrap
import re
import io
from textblob import TextBlob

string = str(input_string)
listy= [x.strip() for x in string.split(',')]
listy = [x.replace('n', '') for x in listy]
listy = [re.sub('(?<!d).(?!d)', '', x) for x in listy]
listy = filter(None, listy) # Remove any empty strings    
newstring= []
for segment in listy:
    wc = TextBlob(segment).word_counts
    if listy[len(listy)-1] != segment:
        if len(wc) > 3:  # len(segment.split(' ')) > 7:
            newstring.append(segment+"&&")
        else:
            newstring.append(segment+",")
    else:
        newstring.append(segment)
sep = [x.strip() for x in (' '.join(newstring)).split('&&')]

考虑以下..

mystr="When men ejaculate, it releases a slew of chemicals including oxytocin, vasopressin, and prolactin, all of which naturally help you hit the pillow."
rExp=r",(?!s+(?:ands+)?w+,)"
mylst=re.compile(rExp).split(mystr)
print(mylst)

应给出以下输出。

['When men ejaculate', ' it releases a slew of chemicals including oxytocin, vasopressin, and prolactin', ' all of which naturally help you hit the pillow.']

让我们看一下我们如何拆分字符串...

,(?!s+w+,)

使用没有后面的每个逗号( (?!->负面查看) s+w+,空间和带有逗号的单词。
vasopressin, and的情况下,以上将失败,因为and不随之而来的是,。因此,在内部引入条件ands+

,(?!s+(?:ands+)?w+,)

尽管我可能想使用以下

,(?!s+(?:(?:and|or)s+)?w+,)

在这里测试正则
在此处测试代码

本质上考虑替换您的行

listy= [x.strip() for x in string.split(',')]

listy= [x.strip() for x in re.split(r",(?!s+(?:ands+)?w+,)",string)]

最新更新