如果今天发布,我想刮取日期,而不是一天前、几分钟前或几个小时前,所以日期应该是今天或使用python在scratchy中发布的日期。
这是我尝试的代码。
Published_Date = response.css('time::text').get().replace(",","").replace("Published ","")#Published Jul 30, 2019
if "AGO" in Published_Date:
Published_Date = date.today()
else:
Published_Date = datetime.strptime(Published_Date, "%b %d %Y").date()
网站的URL。https://simpleflying.com/us-carriers-dot-delay-compensation-push/
您可以直接从<time>
标记中抓取@datetime
属性,并使用datetime
模块解析其发布日期,使用timedelta
模块检查其发布时间。
import scrapy
import datetime
class DTSpider(scrapy.Spider):
name = 'dt'
start_urls = ['https://simpleflying.com/us-carriers-dot-delay-compensation-push/']
def parse(self, response):
dt = response.css('span.meta_txt.date').xpath('./time/@datetime').get()
date = datetime.datetime.fromisoformat(dt[:-1])
print(date, '|' ,date.day,'|',date.month, '|', date.year)
# 2022-10-23 17:10:00 | 23 | 10 | 2022 #<-- output
today = datetime.datetime.today()
delta = today - date
print(delta.days) # 0 <-- output