将字典展开到数据帧中,然后添加到原始数据帧,其中包含新列和复制的原始数据



不知道如何描述这个,所以这里是例子:

这是原文:

original = pd.DataFrame({'a':[{1:'hi',2:'there'}],'b':[1]})
   a                      b
0  {1: 'hi', 2: 'there'}  1

下面是预期的输出,列重新标记:

expected = pd.DataFrame({'numbers':[1,2],'text':['hi','there'],'b':[1,1]})
   b  numbers   text
0  1        1     hi
1  1        2  there

编辑:

我试图简化问题并且解决方案有效,但是当应用于我的数据时,它没有。以下是我用来避免通信中丢失任何内容的数据:

record_1 = {'1': {
                 'url': 'https://www.politico.com/magazine', 
                 'title': 'Worst case '}, 
           '2': {
                 'url': 'https://www.nbcnews.com/pol', 
                 'title': 'Bad Night '},
           '3': {
                 'url': 'https://www.usatoday.com/stor', 
                 'title': "On the anniversary"
                 }}
record_2 = {'1': {
                 'url': 'https://www.nytimes.com/maga', 
                  'title': 'Bad Things Happ '}, 
            '2': {
                  'url': 'https://www.cnn.com/pols', 
                  'title': 'Best Night '}}               
original = pd.DataFrame([[1,record_1],[2,record_2]],columns=['position','news_results'])

   position                                       news_results
0         1  {'1': {'title': 'Worst case ', 'url': 'https:/...
1         2  {'1': {'title': 'Bad Things Happ ', 'url': 'ht...

这是预期的结果:

data = [[1,1,'https://www.politico.com/magazine','Worst case '],
       [1,2,'https://www.nbcnews.com/pol','Bad Night ',],
       [1,3,'https://www.usatoday.com/stor',"On the anniversary"],        
       [2,1,'https://www.nytimes.com/maga','Bad Things Happ '],
       [2,2,'https://www.cnn.com/pols','Best Night ']]
expected = pd.DataFrame(data,columns=['position','sub_rank','url','title'])
   position  sub_rank                                url               title
0         1         1  https://www.politico.com/magazine         Worst case 
1         1         2        https://www.nbcnews.com/pol          Bad Night 
2         1         3      https://www.usatoday.com/stor  On the anniversary
3         2         1       https://www.nytimes.com/maga    Bad Things Happ 
4         2         2           https://www.cnn.com/pols         Best Night 

这是一种方法....我仍然认为您可以使用更好的构造函数来达到预期的输出。

original.set_index('b').a.apply(pd.Series).stack().
    reset_index(name='text').rename(columns={'level_1':'numbers'})
    Out[1623]: 
       b  numbers   text
    0  1        1     hi
    1  1        2  there

编辑

original.set_index('position')['news_results'].apply(pd.Series).stack().apply(pd.Series).reset_index()
Out[1633]: 
   position level_1               title                                url
0         1       1         Worst case   https://www.politico.com/magazine
1         1       2          Bad Night         https://www.nbcnews.com/pol
2         1       3  On the anniversary      https://www.usatoday.com/stor
3         2       1    Bad Things Happ        https://www.nytimes.com/maga
4         2       2         Best Night            https://www.cnn.com/pols

最新更新