尝试将数据集吐到train
和test
,然后需要将其保存为.txt
格式。
这是迄今为止的代码,
import pandas as pd
from sklearn.model_selection import train_test_split
category=pd.read_csv('dataset.tsv',delimiter='t',encoding='utf-8')
train, test = train_test_split(category, test_size=0.2)
test.to_csv('checkme.txt')
然而,当我尝试这样做时,它会给出错误:
Traceback(最后一次调用(:文件"splitter.py",第8行,位于test.to_csv('checkme.tsv'(文件"/home/abc/micro/micro/local/lib/python2.7/site packages/pandas/core/frame.py",第1745行,to_csv中formatter.save((文件"/home/abc/micro/micro/local/lib/python2.7/site packages/pandas/io/formats/csvs.py",第171行,保存中自我_save((文件"/home/abc/micro/micro/local/lib/python2.7/site packages/pandas/io/formats/csvs.py",第286行,在_save中自我_save_chunk(start_i,end_i(文件"/home/abc/micro/micro/local/lib/python2.7/site packages/pandas/io/formats/csvs.py",第313行,在_save_chunk中self.cols,self.writer(文件"pandas/libs/writers.pyx",第64行,在pandas中_库.writers.write_csv_rowsUnicodeEncodeError:"ascii"编解码器无法对位置111中的字符u'\u026a'进行编码:序号不在(128(范围内
这里可能出了什么问题,如何解决?
您需要将数据帧编写为unicode:
test.to_csv('checkme.txt', sep='t', encoding='utf-8')