这是我的数据帧:
userId movieId ... vote_average vote_count
0 1 31 ... 7.7 5415.0
1 1 1029 ... 6.9 2413.0
2 1 1061 ... 6.5 92.0
3 1 1129 ... 6.1 34.0
4 1 1172 ... 5.7 173.0
这是我想解压缩的数据帧中的列
this is genrecol
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, '...
1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, '...
2 [{'id': 10749, 'name': 'Romance'}, {'id': 35, ...
3 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...
4 [{'id': 35, 'name': 'Comedy'}]
Name: genres, dtype: object
我希望结果是:
0 ['Animation','Comedy','Romance']
1 ['Adventure','Action','Romance']
2 ['Romance', 'Comedy']
.
.
.
我的理解是,"流派"一栏是一个系列和一个对象。我想要一些指导来获得我想要的结果。
> 在apply
中使用列表推导:
import json
df['genres'] = df['genres'].apply(lambda x: [y['name'] for y in json.loads(x)])
或嵌套列表理解:
df['genres'] = [[y['name'] for y in json.loads(x)] for x in df['genres']]
这是我能够想到的答案:
#creating a list of all elements in genrecol
list_1= []
for element in genrecol:
list_1.append(element)
print(list_1)
#removing the unnecessary things from string
list_1 = list(map(lambda x:x.replace('name','').replace('id','').replace('{','').replace('}','').replace(':','').replace(" '' ",'').replace("''", '').replace(",'","'").replace('[','').replace(']','').replace(' ','').replace("'",''),list_1))
print(list_1)
print(type(list_1))
#removing digits
result = []
for s in list_1:
result.append(''.join([i for i in s if not i.isdigit()]))
print(result)
#putting cleaned string into new array
newres = []
for i in result:
newres.append(i.split(','))
print(newres)