解压缩数据帧中的系列对象



这是我的数据帧:

userId  movieId  ...  vote_average  vote_count
0       1       31  ...           7.7      5415.0
1       1     1029  ...           6.9      2413.0
2       1     1061  ...           6.5        92.0
3       1     1129  ...           6.1        34.0
4       1     1172  ...           5.7       173.0

这是我想解压缩的数据帧中的列

this is genrecol
0    [{'id': 16, 'name': 'Animation'}, {'id': 35, '...
1    [{'id': 12, 'name': 'Adventure'}, {'id': 14, '...
2    [{'id': 10749, 'name': 'Romance'}, {'id': 35, ...
3    [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...
4                       [{'id': 35, 'name': 'Comedy'}]
Name: genres, dtype: object

我希望结果是:

0    ['Animation','Comedy','Romance']
1    ['Adventure','Action','Romance']
2    ['Romance', 'Comedy']
.
.
.

我的理解是,"流派"一栏是一个系列和一个对象。我想要一些指导来获得我想要的结果。

> 在apply中使用列表推导:

import json
df['genres'] = df['genres'].apply(lambda x: [y['name'] for y in json.loads(x)])

或嵌套列表理解:

df['genres'] = [[y['name'] for y in json.loads(x)] for x in df['genres']]

这是我能够想到的答案:

#creating a list of all elements in genrecol
list_1= []
for element in genrecol:
list_1.append(element)
print(list_1)

#removing the unnecessary things from string 
list_1 = list(map(lambda x:x.replace('name','').replace('id','').replace('{','').replace('}','').replace(':','').replace(" '' ",'').replace("''", '').replace(",'","'").replace('[','').replace(']','').replace(' ','').replace("'",''),list_1))
print(list_1)
print(type(list_1))

#removing digits
result = [] 
for s in list_1:
result.append(''.join([i for i in s if not i.isdigit()]))
print(result)

#putting cleaned string into new array
newres = []
for i in result:
newres.append(i.split(','))
print(newres)

相关内容

  • 没有找到相关文章

最新更新