如何告诉python不要打印列表中的项目



我的python脚本解析来自多个RSS提要的标题和链接。我将这些标题存储在一个列表中,并且我希望确保从不打印重复的标题。我该怎么做?

#!/usr/bin/python
from twitter import *
from goose import Goose
import feedparser
import time
from pyshorteners import Shortener
import pause
import newspaper
dr = feedparser.parse("http://www.darkreading.com/rss_simple.asp") 
sm =feedparser.parse("http://www.securitymagazine.com/rss/topic/2654-cyber-tactics.rss")

dr_posts =["CISO Playbook: Games of War & Cyber Defenses",
"SWIFT Confirms Cyber Heist At Second Bank; Researchers Tie Malware Code to Sony Hack","The 10 Worst Vulnerabilities of The Last 10 Years",
"GhostShell Leaks Data From 32 Sites In 'Light Hacktivism' Campaign",
"OPM Breach: 'Cyber Sprint' Response More Like A Marathon",
"Survey: Customers Lose Trust In Brands After A Data Breach",
"Domain Abuse Sinks 'Anchors Of Trust'",
"The 10 Worst Vulnerabilities of The Last 10 Years",
]
sm_posts = ["10 Steps to Building a Better Cybersecurity Plan"]
x = 1
while True:
try:
drtitle = dr.entries[x]["title"]
drlink = dr.entries[x]["link"]
if drtitle in dr_posts:
x += 1
drtitle = dr.entries[x]["title"]
drtitle = dr.entries[x]["link"]
print drtitle + "n" + drlink
dr_posts.append(drtitle)
x -= 1
pause.seconds(10)
else:
print drtitle + "n" + drlink
dr_posts.append(drtitle)
pause.seconds(10)
smtitle = sm.entries[x]["title"]
smlink = sm.entries[x]["link"]
if smtitle in sm_posts:
x +=1
smtitle = sm.entries[x]["title"]
smtitle = sm.entries[x]["title"]
print smtitle + "n" + smlink
sm_posts.append(smtitle)
pause.seconds(10)
else:
print smtitle + "n" + smlink
sm_posts.append(smtitle)
x+=1
pause.seconds(10)

except IndexError:
print "FAILURE"
break

目前我只有跳过条目。这将是一个问题,因为如果RSS提要中还有另一个重复,那么我会有更多的重复。

您可以利用数据结构集,因为它的"唯一性"属性将为您完成工作。从本质上讲,我们可以将您的列表设置为一个集合,然后将该集合再次设置为列表,这可以确保您的列表现在由严格唯一的值填充。

如果你有一个列表l,那么你可以通过使其独一无二

l = list(set(l))

如果不想打印重复链接,可以使用计数器或defaultdict

sm_posts = defaultdict(int)
sm_posts[sm_links] += 1
print sm_posts.keys() #will print all the unique links

好的是,你还可以通过获得链接的重复次数

sm_posts[sm_links]
>>> link_counts

试试看。

最新更新