通过检查是否添加了任何新新闻来抓取最新新闻

from bs4 import BeautifulSoup
import requests
import smtplib
import time

def live_news():
source = requests.get(
"https://economictimes.indiatimes.com/news/politics-and-nation/coronavirus- 
cases-in-india-live-news-latest-updates-april6/liveblog/75000925.cms"
).text
soup = BeautifulSoup(source, "lxml")
livepage = soup.find("div", class_="pageliveblog")
each_story = livepage.find("div", class_="eachStory")
news_time = each_story.span.text
new_news = each_story.div.text[8::]
print(f"{news_time}n{new_news}")
while(True):
live_news()
time.sleep(300)

所以基本上，我想从新闻网站上获取最新的新闻更新。我想要的是只打印最新的新闻及其时间，而不是整个新闻标题。有了上面的代码，我可以获得最新的新闻更新，程序会每5分钟向服务器发送一次请求(这是我给出的延迟(。但这里的问题是，如果页面中没有其他最新消息更新，它将在5分钟后再次打印相同的先前打印的新闻。我不希望程序再次打印相同的新闻，相反，我想为程序添加一些条件。因此，它将每5分钟检查一次是否有任何新的更新或以前的新闻。如果有任何新的更新，那么它应该打印出来，否则就不应该了。

我能想到的解决方案是if语句。在第一次运行代码时，变量check_last_time为空，当您调用live_news()时，它将被分配news_time的值。

之后，每次调用live_news()时，它首先检查当前的news_time是否与check_last_time不相同，如果不相同，则会打印新的故事：

# Initialise the variable outside of the function
check_last_time = [] 
def live_news():
...
...
# Check to see with the times don't match
if news_time != check_last_time:
print(f"{news_time}n{new_news}")
# Update the variable with the new time
check_last_time = news_time

我自己找到了答案。我觉得有点傻——它很简单，你只需要一个额外的文件来存储值。由于每次执行之间，变量值都会被重置，因此您需要一个额外的文件来读取/写入数据。

相关内容

最新更新

热门标签：