我正在尝试重新编码以列表格式组织的数据帧列中的值。我知道如何替换数据帧列中的字符串值,但正在努力如何在列表中执行此操作。
以下是我的数据片段:
{0: '[Crime, Drama]',
1: '[Crime, Drama]',
2: '[Crime, Drama]',
3: '[Action, Crime, Drama, Thriller]',
4: '[Crime, Drama]',
5: '[Biography, Drama, History]',
6: '[Crime, Drama]',
7: '[Adventure, Drama, Fantasy]',
8: '[Western]',
9: '[Drama]'}
例如,我想将所有犯罪重新编码为惊悚片,将传记重新编码为历史。
我知道以下内容适用于替换字符串值
df.loc[df['genre']=='Crime']='Thriller'
但是如何为列表修改它呢?
谢谢!
编辑:用于创建此数据帧(使用从 IMDB 数据库中提取的数据(的代码为:
# these are the variables we want to (ie are able to) extract from the movie object
metadata = ('title', 'rating', 'genre', "plot", "language", "runtime", "year", "color", "country" , "votes")
#creates dataframe with variable name headers
df = pd.DataFrame(np.random.randn(250, len(metadata)), columns=metadata)
#these are all different data types, including lists, this makes it compile
df = df.astype('object')
#populate df with movie objects
for i in range(250):
for j in metadata:
df.loc[i, j] = movies_list[i].get(j)
# convert to the right data types:
metadata_dict_dtypes = {"title": unicode,
"rating": float,
"genre":list,
"plot": str,
"language":list,
"runtime":list,
"year":int,
"color":list,
"country":list ,
"votes":int}
for colname, my_dtype in metadata_dict_dtypes.iteritems():
df[colname] = df[colname].astype(my_dtype)
假设这在数据帧中正确格式化为列表。您可以编写一个函数,该函数将行和流派名称更改映射作为参数,并将其应用于数据帧。例如
name_map = {'Crime': 'Thriller', 'Biography': 'History'}
def change_names(row, name_map):
for name in name_map:
if name in row.genre:
row.genre[row.genre.index(name)] = name_map[name]
return row
df = df.apply(lambda row: change_name(row, name_map), axis=1)
它没有矢量化,但它会完成工作。
考虑使用列表理解进行更新。下面使用流派列表的单列数据框。
df = pd.DataFrame({'Genre': [['Crime', 'Drama'],
['Crime', 'Drama'],
['Crime', 'Drama'],
['Action', 'Crime', 'Drama', 'Thriller'],
['Crime', 'Drama'],
['Biography', 'Drama', 'History'],
['Crime', 'Drama'],
['Adventure', 'Drama', 'Fantasy'],
['Western'],
['Drama']]})
print(df)
# Genre
# 0 [Crime, Drama]
# 1 [Crime, Drama]
# 2 [Crime, Drama]
# 3 [Action, Crime, Drama, Thriller]
# 4 [Crime, Drama]
# 5 [Biography, Drama, History]
# 6 [Crime, Drama]
# 7 [Adventure, Drama, Fantasy]
# 8 [Western]
# 9 [Drama]
df['Genre'] = [['Thriller' if i=='Crime' else i for i in m] for m in df['Genre']]
print(df)
# Genre
# 0 [Thriller, Drama]
# 1 [Thriller, Drama]
# 2 [Thriller, Drama]
# 3 [Action, Thriller, Drama, Thriller]
# 4 [Thriller, Drama]
# 5 [Biography, Drama, History]
# 6 [Thriller, Drama]
# 7 [Adventure, Drama, Fantasy]
# 8 [Western]
# 9 [Drama]