背景:
给定以下熊猫df
-
控股账户 | 模型类型 | > | 实体ID | 直接所有者ID |
---|---|---|---|---|
WF LLC | 100 Jones Street 26th Floor旧金山Ca Ltd负债-仅限美国收入总额(486941515( | 51364633 | 4564564 | 5646546 | |
RF LLC|Neuberger|LLC|Aukai Services LLC Neuberger Smid-收入扣除费用全球基金(456456218( | 46256325 | 16453654926654
为什么不像以前那样只使用.str
呢?
df['Holding Account'] = df['Holding Account'].str[:80]
输出:
>>> df
Holding Account Model Type Entity ID Direct Owner ID
0 WF LLC | 100 Jones Street 26th Floor San Francisco Ca Ltd Liability - Income Bas 51364633 4564564 5646546
1 RF LLC | Neuberger | LLC | Aukai Services LLC-Neuberger Smid - Income Accuring N 46256325 1645365 4926654
使用slice会丢失一些信息,我建议在因子化后创建一个映射表。这也为服务器或数据库节省了存储空间
s = df['Holding Account'].factorize()[0]
df['Holding Account'] = df['Holding Account'].factorize()[0]
d = dict(zip(s, df['Holding Account']))
如果你想获得数据库,只需进行
df['new'] = df['Holding Account'] .map(d)