存储dict列表列表的有效方法



我有这样的数据框架:

Name    Nationality    Tall    Age
John    USA            190     24
Thomas  French         194     25
Anton   Malaysia       180     23
Chris   Argentina      190     26

所以,我可以得到这样的传入数据结构。每个代表每行数据的元素。:

data = [{
         'food':{'lunch':'Apple',
                'breakfast':'Milk',
                'dinner':'Meatball'},
         'drink':{'favourite':'coke',
                   'dislike':'juice'}
         },
         ..//and 3 other records
       ].

'数据'是一些变量,可以从我的机器学习中节省预测的食物和饮料。有更多记录(大约400k行(,但我通过批处理大小(现在我每次迭代处理2K数据(来对其进行处理。预期结果如下:

Name    Nationality    Tall    Age Lunch Breakfast Dinner   Favourite Dislike
John    USA            190     24  Apple Milk      Meatball Coke      Juice
Thomas  French         194     25  ....
Anton   Malaysia       180     23  ....
Chris   Argentina      190     26  ....

有一个有效的方法来实现该数据框架吗?到目前为止,我已经尝试迭代数据变量并获取每个预测标签的值。感觉就像那个过程花了很多时间。

您需要首先需要扁平字典,创建DataFrame并加入原始:

data = [{
         'a':{'lunch':'Apple',
                'breakfast':'Milk',
                'dinner':'Meatball'},
         'b':{'favourite':'coke',
              'dislike':'juice'}
         },
         {
         'a':{'lunch':'Apple1',
                'breakfast':'Milk1',
                'dinner':'Meatball2'},
         'b':{'favourite':'coke2',
              'dislike':'juice3'}
         },
{
         'a':{'lunch':'Apple4',
                'breakfast':'Milk5',
                'dinner':'Meatball4'},
         'b':{'favourite':'coke2',
              'dislike':'juice4'}
         },
         {
         'a':{'lunch':'Apple3',
                'breakfast':'Milk8',
                'dinner':'Meatball7'},
         'b':{'favourite':'coke4',
              'dislike':'juice1'}
         }
]

#or use another solutions, both are nice
L = [{k: v for x in d.values() for k, v in x.items()} for d in data]
df1 = pd.DataFrame(L)
print (df1)
  breakfast     dinner dislike favourite   lunch
0      Milk   Meatball   juice      coke   Apple
1     Milk1  Meatball2  juice3     coke2  Apple1
2     Milk5  Meatball4  juice4     coke2  Apple4
3     Milk8  Meatball7  juice1     coke4  Apple3

df2 = df.join(df1)
print (df2)
     Name Nationality  Tall  Age breakfast     dinner dislike favourite  
0    John         USA   190   24      Milk   Meatball   juice      coke   
1  Thomas      French   194   25     Milk1  Meatball2  juice3     coke2   
2   Anton    Malaysia   180   23     Milk5  Meatball4  juice4     coke2   
3   Chris   Argentina   190   26     Milk8  Meatball7  juice1     coke4   
    lunch  
0   Apple  
1  Apple1  
2  Apple4  
3  Apple3  

最新更新