蟒蛇熊猫拆分大表列



我有一个大桌子(4M行和20列(。在一个特定的列中,我有一个列表,如下所示:

                                        8 
0       [key1=it, key3=domain, key6=0001]                                                                                              
1                             [key2=home]
2                [key4=pippo, key5=pluto]

给定一个键的列表 keys=[] 我想以有效的方式将"8"列替换为其他列,如下所示:

       key1  key2    key3   key4  key5  key6
0        it  None  domain   None  None  0001
1      None  home    None   None  None  None
2      None  None    None  pippo pluto  None

谢谢!我

s = lambda x: x.split('=')
rows = df.loc[:, 8].values.tolist()
pd.DataFrame([dict(map(s, r)) for r in rows])
  key1  key2    key3   key4   key5  key6
0   it   NaN  domain    NaN    NaN  0001
1  NaN  home     NaN    NaN    NaN   NaN
2  NaN   NaN     NaN  pippo  pluto   NaN

设置

df = pd.Series([
        ['key1=it', 'key3=domain', 'key6=0001'],
        ['key2=home'],
        ['key4=pippo', 'key5=pluto']
    ]).to_frame(8)

我以这种方式解决了坏行的问题,但它是一个 for 循环:

        self.s = lambda x: x.split('=')
        self.rows = self.df.loc[:, 8].values.tolist()
        dictList8 = []
        for idx, self.r in enumerate(self.rows): 
            try:
                dictList8.append(dict(map(self.s, self.r)))
            except:
                dictList8.append({'skipped': 'True'})
                continue
        self.dfMod8 = pd.DataFrame(dictList8)
        del self.df[8]

任何想法如何使其更快?

最新更新