是否有一种干净的方法将错误转换的二进制字符串转换回二进制来解码它?我在csv文件中有这些标题,如下所示:
import pandas as pd
tsv_file="C:\UsersruserDownloads\mydata.tsv"
tsv_table=pd.read_table(tsv_file,sep='t')
print(tsv_table.columns)
>>> Index(['b'time (s)'', 'b'Red (mN)'', 'b'Blue (mN)'', 'b'Green (mN)'',
'b'Pink (mN)''],
dtype='object')
我喜欢一种不需要字符串处理的干净的方法来处理它。有办法吗?
编辑:我想只用
tsv_table.colums.str.decode('utf-8')
但它不是utf-8格式,对吗?它是一个字符串。我们最后得到Nan的
print(csv_table.columns.str.decode('utf-8'))
Float64Index([nan, nan, nan, nan, nan], dtype='float64')
编辑2:
mydata.tsv
b'time (s)' b'Red (mN)' b'Blue (mN)' b'Green (mN)' b'Pink (mN)'
0.0 28.0393760805021 29.350510817307736 0.5422318347392547 1.1041605247641542
0.010000008061766026 1.1736308159200206 29.327035757211547 0.5235093941717537 1.1041605247641542
0.02000001612353205 1.1736308159200206 29.373985877403868 0.5422318347392547 1.1425474154873996
将python语法字符串转换为实际的字节字符串,然后对其进行解码:
import pandas as pd
import ast
table = pd.read_table('mydata.tsv',sep='t')
table.columns = [ast.literal_eval(x).decode('utf8') for x in table.columns]
print(table)
time (s) Red (mN) Blue (mN) Green (mN) Pink (mN)
0 0.00 28.039376 29.350511 0.542232 1.104161
1 0.01 1.173631 29.327036 0.523509 1.104161
2 0.02 1.173631 29.373986 0.542232 1.142547