我在 TXT 中有两个段落,我必须使用 Python NLTK 从这两个段落中找到常用词



第1段

电子商务,通常写为电子商务,是使用计算机网络(例如互联网或在线社交网络(进行商品或服务交易或促进交易。电子商务利用移动商务、电子资金转移、供应链管理、互联网营销、在线交易处理、电子数据交换、库存管理系统和自动数据收集系统等技术。

第2款

现代电子商务通常在交易生命周期的至少一部分使用万维网,尽管它也可能使用电子邮件等其他技术。电子商务的好处包括访问速度、更广泛的商品和服务选择、可访问性和国际影响力。

我必须在两个段落之间找到共同的单词并打印它们

如果你不需要在语言处理方面做一些特别的事情,你就不需要NLTK:

paragraph1 = paragraph1.lower().split()
paragraph2 = paragraph2.lower().split()
intersection = set(words1) & set(words2)

您可以使用 set.intersection。

p1 = '''
Electronic commerce, commonly written as E-Commerce, is the trading or  
facilitation of trading in goods or services using computer networks, such 
as the Internet or online social networks. Electronic commerce draws on 
technologies such as mobile commerce, electronic funds transfer, supply 
chain management, Internet marketing, online transaction processing, 
electronic data interchange (EDI), inventory management systems, and 
automated data collection systems.
'''.split()
p2 = '''
Modern electronic commerce typically uses the World Wide Web for at least 
one part of the transaction's life cycle although it may also use other 
technologies such as e-mail. The benefits of e-commerce include it’s the 
speed of access, a wider selection of goods and services, accessibility, and 
international reach.
'''.split()
print(set(p1).intersection(p2))
{'and', 'the', 'technologies', 'of', 'electronic', 'such', 'commerce', 'as', 'goods'}

最新更新