蟒蛇熊猫 |如何将使用rake函数提取的关键字分配到新列中



我正在学习制作一个基于内容的书籍推荐系统(参考:https://towardsdatascience.com/how-to-build-from-scratch-a-content-based-movie-recommender-with-natural-language-processing-25ad400eb243(。我使用耙子功能从"情节"列中提取关键字。如何将这些关键字分配给新列?

我正在与熊猫,numpy,CountVectorizer rake_nltk一起工作。我尝试了以下代码:row['Key_words'] = list(key_words_dict_scores.keys())但该列仍然为空。

import pandas as pd
from rake_nltk import Rake
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_csv('cleaned DATA set.csv')
df = df[['Book_ID','Title','Author','Genre1','Genre2','Plot']]

for index, row in df.iterrows():
    plot = row['Plot']
    # instantiating Rake, by default it uses english stopwords from NLTK
    # and discards all puntuation characters as well
    r = Rake()
    # extracting the words by passing the text
    r.extract_keywords_from_text(plot)
    # getting the dictionary whith key words as keys and their scores as values
    key_words_dict_scores = r.get_word_degrees()
    # assigning the key words to the new column for the corresponding movie
    row['Key_words'] = list(key_words_dict_scores.keys())

我希望看到一个名为'Key_words'的新列,其中包含相应书名的所有关键字。

实际输出显示'key_words'列为空。

您错过了在 for 循环之前初始化新列的步骤。

df['Key_words'] = ""

最新更新