PANDAS DATAFRAME连接错误



我有以下数据框架:

columnName |            columnText              |       columnTextContents
----------------------------------------------------------------------------
Linda      | [{age:45, category:technical},     | [{city:Mexico,type:member}]
           |   {age:55, category:nontechnical}] |  
----------------------------------------------------------------------------
Richeal    | [{age:65, category:technical}]     | [{city:Mexico,type:member}]
----------------------------------------------------------------------------

在上面的数据框架中,第二和第三个数据框是Listofdict,我想将其重新创建到下面的数据框架中:

columnName | age  |  category    |  city    | type
--------------------------------------------------------
Linda      | 45   | technical    | Mexico   | member
--------------------------------------------------------
Linda      | 55   | nontechnical | NaN      | NaN
--------------------------------------------------------
Richeal    | 65   | technical    | Mexico   | member
--------------------------------------------------------

我写了下面的代码,它没有产生预期的Outptut:

for k, v in zip(columnDataDF["columnText"].iteritems(), columnDataDF["columnTextContents"].iteritems()):
    tempDF = tempDF.append(pd.concat([pd.DataFrame.from_dict(k[1]), pd.DataFrame.from_dict(v[1])], axis=1))
columnDataDF = columnDataDF.drop('columnText', 1)
columnDataDF = columnDataDF.drop('columnTextContents', 1).join(tempDF)

下面是为上述代码生成的输出:

columnName | age  |  category    |  city    | type
--------------------------------------------------------
Linda      | 45   | technical    | Mexico   | member
--------------------------------------------------------
Linda      | 65   | technical    | Mexico   | member
--------------------------------------------------------
Richeal    |  55  | nontechnical | NaN      | NaN
--------------------------------------------------------

当您创建并修改tempDF时,您将失去原始索引的跟踪。然后加入将失败,因为它不能正确匹配索引。

解决此问题的一种方法是手动跟踪索引,并将其分配给最终的tempDF。该索引可作为k[0]v[0]找到。以下可能有效:

index = []
for k, v in zip(columnDataDF["columnText"].iteritems(),
                columnDataDF["columnTextContents"].iteritems()):
    index.extend([k[0]] * len(k[1]))
    tempDF = tempDF.append(pd.concat([pd.DataFrame.from_dict(k[1]),
                                      pd.DataFrame.from_dict(v[1])], 
                                     axis=1))
tempDF.index = index

最新更新