从文件创建列表,然后检查并打印列表中的匹配令牌



我正在尝试实现一个任务,其中我有一个由示例对话组成的文件。另一方面,我有一些动作关键字需要匹配句子的开头并打印整行。

文件:

Hey Salam Daniyal, can you hear me alright? hey hey Walikumassalam Joe, how are you? Yes for sure I can hear you. How is it going brother? All good bro all good, so I wanted to discuss something with you, yeah sure shoot. Okay so, I want a simple 4 page website for my business on WordPress. I need home page about us services and contact us. Can you make it? Yeah no problem send me the content and all the images I will start working on it and will show you some samples. Great, one more thing. Can we use "one page multipurpose" theme from themeForest? Yeah sure. Alright great! Sending you the images and content.

为了达到这个目的,我写了:

import re
textfile = open("FILEPATH", 'r')
filetext = textfile.read()
a = [i[0].strip() for i in re.findall(r"((Ww+){1,}(?=(,|.|!|?)))", filetext)]
print(a)

输出:

['Salam Daniyal', 'can you hear me alright', 'hey hey Walikumassalam Joe', 'how are you', 'Yes for sure I can hear you', 'How is it going brother', 'All good bro all good', 'so I wanted to discuss something with you', 'yeah sure shoot', 'Okay so', 'I want a simple 4 page website for my business on WordPress', 'I need home page about us services and contact us', 'Can you make it', 'Yeah no problem send me the content and all the images I will start working on it and will show you some samples', 'Great', 'one more thing', 'theme from themeForest', 'Yeah sure', 'Alright great', 'Sending you the images and content']
另一种方法可以使用nltk它给出了一个精确的列表。
from nltk.tokenize import sent_tokenize

textfile = open("FILEPATH", 'r')
filetext = textfile.read()
textfile.close()
print(sent_tokenize(filetext))

输出:

['Hey Salam Daniyal, can you hear me alright?', 'hey hey Walikumassalam Joe, how are you?', 'Yes for sure I can hear you.', 'How is it going brother?', 'All good bro all good, so I wanted to discuss something with you, yeah sure shoot.', 'Okay so, I want a simple 4 page website for my business on WordPress.', 'I need home page about us services and contact us.', 'Can you make it?', 'Yeah no problem send me the content and all the images I will start working on it and will show you some samples.', 'Great, one more thing.', 'Can we use "one page multipurpose" theme from themeForest?', 'Yeah sure.', 'Alright great!', 'Sending you the images and content.']

这个会创建一个完整句子的列表,而正则表达式不会。但在正则表达式中,我可以从它的索引中打印列表,而在nltk中,我不能。

In regex:
print(a[11]) //will print list on 11th index
Output:
I need home page about us services and contact us
In NLTK:
print(sent_tokenize(filetext[11]))
Output:
['a']

哪一个是更好的选择来创建列表,现在匹配行动关键字我应该采取什么方法?因为我有一个动作关键字列表,需要从上面的列表匹配,并打印结果,行动关键词="我需要";,"我们可以吗","我想要一个","我们需要",

因此,根据当前操作关键字,我希望我的代码从列表中打印这些句子,因为这些句子从我的操作关键字开始:

'I need home page about us services and contact us'
'I want a simple 4 page website for my business on WordPress.'
'Can we use "one page multipurpose" theme from themeForest?'

如果你有如下数据

a = ['Hey Salam Daniyal, can you hear me alright?', 'hey hey Walikumassalam Joe, how are you?', 'Yes for sure I can hear you.', 'How is it going brother?', 'All good bro all good, so I wanted to discuss something with you, yeah sure shoot.', 'Okay so, I want a simple 4 page website for my business on WordPress.', 'I need home page about us services and contact us.', 'Can you make it?', 'Yeah no problem send me the content and all the images I will start working on it and will show you some samples.', 'Great, one more thing.', 'Can we use "one page multipurpose" theme from themeForest?', 'Yeah sure.', 'Alright great!', 'Sending you the images and content.']

action_words如下:

action_keywords = ["I need" , "Can we", "I want a", "We need"]

您可以使用python的内置filter方法过滤a,如下所示

def extract(x):
for e in action_keywords:
if e in x:
return True
return False

ans = filter(extract, a)
print(list(ans))

输出:

['Okay so, I want a simple 4 page website for my business on WordPress.', 'I need home page about us services and contact us.', 'Can we use "one page multipurpose" theme from themeForest?']

请注意,您可以根据数据大小和其他条件修改extract中的逻辑

相关内容

  • 没有找到相关文章

最新更新