比较两个列表并通过字段Python进行搜索

我有两个文件要比较，然后产生一个特定的输出：

1) 以下是用户名文本文件的内容(其中存储用户观看的最新电影)

Sci-Fi,Out of the Silent Planet
Sci-Fi,Solaris
Romance, When Harry met Sally

2) 以下是films.txt文件的内容，该文件存储程序中用户可用的所有影片

0,Genre, Title, Rating, Likes
1,Sci-Fi,Out of the Silent Planet, PG,3
2,Sci-Fi,Solaris, PG,0
3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
5,Drama, The English Patient, 15,0
6,Drama, Benhur, PG,0
7,Drama, The Pursuit of Happiness, 12, 0
8,Drama, The Thin Red Line, 18,0
9,Romance, When Harry met Sally, 12, 0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0

我需要的输出示例：用户目前已经观看了两部科幻电影和一部浪漫电影。因此，输出应按流派搜索电影文本文件(识别SCI-FI和ROMANCE)，并应在Films.txt文件中列出用户尚未观看的电影。在这种情况下，

3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0

我有下面的代码，它试图做到以上，但它产生的输出是不正确的：

def viewrecs(username):
#set the username variable to the text file -to use it in the next bit
username = (username + ".txt")
#open the username file that stores latest viewings
with open(username,"r") as f:
#open the csv file reader for the username file
fReader=csv.reader(f)
#for each row in the fReader
for row in fReader:
#set the genre variable to the row[0], in which row[0] is all the genres (column 1 in username file)
genre=row[0]
#next, open the films file
with open("films.txt","r") as films:
#open the csv reader for this file (filmsReader as opposed to fReader)
filmsReader=csv.reader(films)
#for each row in the films file
for row in filmsReader:
#and for each field in the row 
for field in row:
#print(field)
#print(genre)
#print(field[0])
if genre in field and row[2] not in fReader:
print(row)

输出(不需要)：

['1', 'Sci-Fi', 'Out of the Silent Planet', ' PG', '3']
['2', 'Sci-Fi', 'Solaris', ' PG', '0']
['3', 'Sci-Fi', 'Star Trek', ' PG', '0']
['4', 'Sci-Fi', 'Cosmos', ' PG', '0']

我不想要重写或新的解决方案，但最好是修复上面的解决方案及其逻辑进程。。。

@吉普西-你的解决方案似乎几乎奏效了。我用过：

def viewrecs(username):
#set the username variable to the text file -to use it in the next bit
username = (username + ".txt")
#open the username file that stores latest viewings
lookup_set = set()
with open(username,"r") as f:
#open the csv file reader for the username file
fReader=csv.reader(f)
#for each row in the fReader
for row in fReader:
genre = row[1]
name = row[2]
lookup_set.add('%s-%s' % (genre, name))
with open("films.txt","r") as films:
filmsReader=csv.reader(films)
#for each row in the films file
for row in filmsReader:
genre = row[1]
name = row[2]
lookup_key = '%s-%s' % (genre, name)
if lookup_key not in lookup_set:
print(row)

输出如下：它打印所有胶片中不在第一组中的所有行，而不仅仅是第一组中基于GENRE的行：

['0', 'Genre', ' Title', ' Rating', ' Likes']
['3', 'Sci-Fi', 'Star Trek', ' PG', ' 0']
['4', 'Sci-Fi', 'Cosmos', ' PG', ' 0']
['5', 'Drama', ' The English Patient', ' 15', ' 0']
['6', 'Drama', ' Benhur', ' PG', ' 0']
['7', 'Drama', ' The Pursuit of Happiness', ' 12', ' 0']
['8', 'Drama', ' The Thin Red Line', ' 18', ' 0']
['10', 'Romance', " You've got mail", ' 12', ' 0']
['11', 'Romance', ' Last Tango in Paris', ' 18', ' 0']
['12', 'Romance', ' Casablanca', ' 12', ' 0']

注意：为了简单起见，我将第一套的格式更改为所有电影条目的格式：

1,Sci-Fi,Out of the Silent Planet, PG
2,Sci-Fi,Solaris, PG

使用集合和单独的列表来过滤未观看的适当类型的电影怎么样？为了这个目的，我们甚至可以滥用字典的keys和values：

def parse_file (file):
return map(lambda x: [w.strip() for w in x.split(',')], open(file).read().split('n'))
def movies_to_see ():
seen = {film[0]: film[1] for film in parse_file('seen.txt')}
films = parse_file('films.txt')
to_see = []
for film in films:
if film[1] in seen.keys() and film[2] not in seen.values():
to_see.append(film)
return to_see

使用str.split()和str.join()函数的解决方案：

# change file paths with your actual ones
with open('./text_files/user.txt', 'r') as userfile:
viewed = userfile.read().split('n')
viewed_genders = set(g.split(',')[0] for g in viewed)
with open('./text_files/films.txt', 'r') as filmsfile:
films = filmsfile.read().split('n')
not_viewed = [f for f in films
if f.split(',')[1] in viewed_genders and ','.join(f.split(',')[1:3]) not in viewed]
print('n'.join(not_viewed))

输出：

3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0

好的，构建一个以Genre+name为入口的集，遍历第一个文件。

现在迭代第二个文件，并在上面创建的集合中查找Genre+name的条目，如果不存在，则打印出来。

回家后，我可以键入一些代码。

正如承诺的那样，我的代码如下：

def viewrecs(username):
#set the username variable to the text file -to use it in the next bit
username = (username + ".txt")
# In this set we will collect the unique combinations of genre and name
genre_name_lookup_set = set()
# In this set we will collect the unique genres 
genre_lookup_set = set()
with open(username,"r") as f:
#open the csv file reader for the username file
fReader=csv.reader(f)
#for each row in the fReader
for row in fReader:
genre = row[0]
name = row[1]
# Add the genre name combination to this set, duplicates will be taken care automatically as set won't allow dupes  
genre_name_lookup_set.add('%s-%s' % (genre, name))
# Add genre to this set
genre_lookup_set.add(genre)
with open("films.txt","r") as films:
filmsReader=csv.reader(films)
#for each row in the films file
for row in filmsReader:
genre = row[1]
name = row[2]
# Build a lookup key using genre and name, example:Sci-Fi-Solaris
lookup_key = '%s-%s' % (genre, name)
if lookup_key not in genre_name_lookup_set and genre in genre_lookup_set:
print(row)

相关内容

最新更新

热门标签：