如何在 Python 中将字节对象转换为以"\x0"结尾的字符串?



我有一个字节对象,我想把它翻译成字符串。

请看这个熊猫数据帧:

In [19]: a
Out[19]: 
tk         sec    usec      bp1      bp2      bp3      bp4      bp5      ap1      ap2  ...  as1  as2  as3  as4  as5       lp           amt  ls     vol     oi
0                        b'ZN2106'  1619743523  646104  21920.0  21915.0  21910.0  21905.0  21900.0  21930.0  21935.0  ...   11    5    3    8    3  21930.0  1.642792e+10   0  149210  96841
1         b'ZN2106x0010250x0009'  1619744401  684254  21935.0  21930.0  21925.0  21920.0  21910.0  21940.0  21945.0  ...    1    8    3    3   17  21940.0  1.642990e+10   0  149228  96843
2          b'ZN2106x0016750x009'  1619744402  319044  21940.0  21935.0  21930.0  21925.0  21920.0  21945.0  21950.0  ...    1    1    6    1   13  21940.0  1.643615e+10   0  149285  96829
3         b'ZN2106x0014750x0009'  1619744403  422966  21945.0  21940.0  21935.0  21930.0  21925.0  21950.0  21955.0  ...    7    5   11    4   15  21940.0  1.644120e+10   0  149331  96838
4          b'ZN2106x0012750x002'  1619744403  883381  21945.0  21940.0  21935.0  21930.0  21925.0  21955.0  21960.0  ...    3    7    6   16   59  21950.0  1.644647e+10   0  149379  96846
...                            ...         ...     ...      ...      ...      ...      ...      ...      ...      ...  ...  ...  ...  ...  ...  ...      ...           ...  ..     ...    ...
20343      b'ZN2106x0067000x009'  1619765999  791039  21795.0  21790.0  21785.0  21780.0  21775.0  21800.0  21805.0  ...   95   12    2   11   14  21795.0  2.768403e+10   0  252355  85339
20344  b'ZN2106x0061000x00x000'  1619766000  302063  21795.0  21790.0  21785.0  21780.0  21775.0  21800.0  21805.0  ...   93   13    2   11   14  21800.0  2.768424e+10   0  252357  85339
20345     b'ZN2106x0013750x0010'  1619766000  781186  21795.0  21790.0  21785.0  21780.0  21775.0  21800.0  21805.0  ...   93   13    2   11   14  21795.0  2.768435e+10   0  252358  85338
20346      b'ZN2106x0019000x009'  1619766001  317317  21795.0  21790.0  21785.0  21780.0  21775.0  21800.0  21805.0  ...   92   13    2   11   14  21795.0  2.768490e+10   0  252363  85338
20347           b'ZN2106x0019000'  1619766002  518211  21795.0  21790.0  21785.0  21780.0  21775.0  21800.0  21805.0  ...   92   13    2   11   14  21795.0  2.768490e+10   0  252363  85338
[20348 rows x 28 columns]

tk是字节对象,我想把它做成字符串。

我试过:

df['tk'].str.decode('uft-8')

但我得到了:

In [17]: a['tk'].str.decode('utf-8')
Out[17]: 
0                 ZN2106
1        ZN21061025009
2         ZN2106167509
3        ZN21061475009
4         ZN2106127502
...       
20343     ZN2106670009
20344    ZN2106610000
20345    ZN21061375010
20346     ZN2106190009
20347       ZN210619000
Name: tk, Length: 20348, dtype: object

这不是我想要的,正如你所看到的,第二排,

我想要的是ZN2106,但它还给了我'ZN21061025009'

它忽略了以"\x0"结尾的字符串,我该如何解决这个问题?

尝试apply:

import pandas as pd
a = pd.DataFrame([b'ZN2106',b'ZN2106x0010250x0009'],columns=['tk'])
print(a)
print()
print(a['tk'].apply(lambda x: x.decode('utf8').split('x00')[0]))

输出:

tk
0                 b'ZN2106'
1  b'ZN2106x0010250x0009'
0    ZN2106
1    ZN2106
Name: tk, dtype: object

最新更新