我有这样的数据框架:
Name Nationality Tall Age
John USA 190 24
Thomas French 194 25
Anton Malaysia 180 23
Chris Argentina 190 26
所以,我可以得到这样的传入数据结构。每个代表每行数据的元素。:
data = [{
'food':{'lunch':'Apple',
'breakfast':'Milk',
'dinner':'Meatball'},
'drink':{'favourite':'coke',
'dislike':'juice'}
},
..//and 3 other records
].
'数据'是一些变量,可以从我的机器学习中节省预测的食物和饮料。有更多记录(大约400k行(,但我通过批处理大小(现在我每次迭代处理2K数据(来对其进行处理。预期结果如下:
Name Nationality Tall Age Lunch Breakfast Dinner Favourite Dislike
John USA 190 24 Apple Milk Meatball Coke Juice
Thomas French 194 25 ....
Anton Malaysia 180 23 ....
Chris Argentina 190 26 ....
有一个有效的方法来实现该数据框架吗?到目前为止,我已经尝试迭代数据变量并获取每个预测标签的值。感觉就像那个过程花了很多时间。
您需要首先需要扁平字典,创建DataFrame
并加入原始:
data = [{
'a':{'lunch':'Apple',
'breakfast':'Milk',
'dinner':'Meatball'},
'b':{'favourite':'coke',
'dislike':'juice'}
},
{
'a':{'lunch':'Apple1',
'breakfast':'Milk1',
'dinner':'Meatball2'},
'b':{'favourite':'coke2',
'dislike':'juice3'}
},
{
'a':{'lunch':'Apple4',
'breakfast':'Milk5',
'dinner':'Meatball4'},
'b':{'favourite':'coke2',
'dislike':'juice4'}
},
{
'a':{'lunch':'Apple3',
'breakfast':'Milk8',
'dinner':'Meatball7'},
'b':{'favourite':'coke4',
'dislike':'juice1'}
}
]
#or use another solutions, both are nice
L = [{k: v for x in d.values() for k, v in x.items()} for d in data]
df1 = pd.DataFrame(L)
print (df1)
breakfast dinner dislike favourite lunch
0 Milk Meatball juice coke Apple
1 Milk1 Meatball2 juice3 coke2 Apple1
2 Milk5 Meatball4 juice4 coke2 Apple4
3 Milk8 Meatball7 juice1 coke4 Apple3
df2 = df.join(df1)
print (df2)
Name Nationality Tall Age breakfast dinner dislike favourite
0 John USA 190 24 Milk Meatball juice coke
1 Thomas French 194 25 Milk1 Meatball2 juice3 coke2
2 Anton Malaysia 180 23 Milk5 Meatball4 juice4 coke2
3 Chris Argentina 190 26 Milk8 Meatball7 juice1 coke4
lunch
0 Apple
1 Apple1
2 Apple4
3 Apple3