为每行中最频繁的单词写一个新列



我正在尝试获取每行中最频繁的值或单词,并将它们添加到新的列中

例如:

原始csv称为(stock.csv(

egg   meat     egg      lemon   
meat  orange   orange   egg
meat  meat     meat     orange
the new column will be added as follows

egg   meat     egg      lemon    egg    
meat  orange   orange   egg      orange
meat  meat     meat     orange   meat

正如您所看到的,第5列被添加为该行中出现的最频繁的单词(这只是一个例子,实际的csv包含近20000行和近80列(

这是我的代码,我知道它一团糟。我只是太努力学习python了,已经解决这个问题6天了。

egg = meat = lemon = orange =   0
freq ={ "egg":0 , "lemon":0 , "Spam":0 , "orange":0 }
with open('stock.csv') as csvfile:
Myreader = csv.reader(csvfile)
for row in Myreader:
for i in row:
if i == "Trojan":
Trojan = Trojan + 1
freq.update({'Trojan': Trojan})

elif i == "egg":
egg = egg + 1
freq.update({'egg': egg})

elif i == "meat":
meat = meat + 1
freq.update({'meat': meat})

elif i == "orange":
orange = orange + 1
freq.update({'orange': orange})

elif i == "lemon":
lemon = lemon + 1
freq.update({'lemon': lemon})

max_key = max(freq, key=freq.get)
with open('Most_Frequent.csv', 'w', newline='') as write_object:
csv_reader = csv.reader(csvfile)
csv_writer = csv.writer(write_object)
for row in csv_reader:
row.append(max_key)
csv_writer.writerow(row)
write_object.close()
egg = meat = lemon = orange =   0

问题是它只创建了一个新的csv文件,其中只有一个无用的行

加:存在值为"0"的单元格;N/A";我不希望他们被计算在内,因为他们在每一行中都占主导地位

我建议您使用出色的pandas库及其.mode()方法来完成这类任务:

import pandas as pd
df = pd.read_csv('stock.csv')
df['most_frequent'] = df.mode(axis=1)
df.to_csv('Most_Frequent.csv')

您可以使用集合。计数器:

from collections import Counter
rows = []
# build the new rows
with open('stock.csv') as csvfile:
Myreader = csv.reader(csvfile)
for row in Myreader:
# most_common method returns a list of tuple of the form: (key, count)
most_common_in_row = Counter(
column for column in row if column.strip() != 'N/A'
).most_common(1)
# make sure we haven't filtered all the columns in the row
if most_common_in_row:
most_common_in_row = [most_common_in_row[0][0]]
# Not necessary, just to be explicit that if we filtered
# the entire row, most_common_in_row is just an empty list
# that adds nothing to the row
else:
most_common_in_row = []
rows.append(row + most_common_in_row)
# write the new rows in the same file
with open('Most_Frequent.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
for row in rows:
writer.writerow(row)

这里有一个不使用任何库的解决方案。你可以看到实现你所需要的所有步骤。

delimiter_character = ','
with open('stock.csv') as f1, open('updated_stock.csv', 'w+') as f2:
for line in f1:
# Construct the word counts dictionary
word_count_dict = {}
words = [word.strip() for word in line.split(delimiter_character)]
for word in words:
if word in word_count_dict:
word_count_dict[word] += 1
else:
word_count_dict[word] = 1
# Delete the occurrence of 'N/A' when there are other words
if len(word_count_dict) > 1:
del word_count_dict['N/A']
# Identify the word with most repetitions
most_frequent_words = sorted(word_count_dict, key=lambda w: word_count_dict[w], reverse=True)
# Append the most frequent word
words.append(most_frequent_words[0])
f2.write(f'{delimiter_character}'.join(words) + 'n')

使用dict+排序:

import csv
csvfile = open('stock.csv')
Myreader = csv.reader(csvfile)
write_object = open('Most_Frequent.csv', 'w') 
csv_writer = csv.writer(write_object)
for row in Myreader:
a = dict()
for i in row: # Store count in a
i == "N/A": continue
if i not in a: a[i] = 0
a[i] = a[i] + 1
max = sorted(a.items(), key = lambda item:item[1], reverse=True)[0]
row.append(max[0]) # max[0] is key, max[1] is value
csv_writer.writerow(row)
write_object.close()

最新更新