我正在尝试使用np.savetxt()
将数组另存为文本文件。但是我收到一个错误:UnicodeEncodeError: 'latin-1' codec can't encode character 'u1ec7' in position 15: ordinal not in range(256)
我检查了字符"\u1ec7",它是一个拉丁小写字母 E,下面有回旋和点。
我尝试使用x = x.replace("[^a-zA-Z#]", " ")
从数组中的文本中删除它,但它仍然给出错误。
这个错误到底是什么,可以做些什么来解决它? 这是我的代码:
duplicate = X_train[y_train == 1]
not_duplicate = X_train[y_train == 0]
p = np.dstack([duplicate['question1'], duplicate['question2']]).flatten()
n = np.dstack([not_duplicate['question1'], not_duplicate['question2']]).flatten()
print ("Number of data points in class 1 (duplicate pairs) :",len(p))
print ("Number of data points in class 0 (non duplicate pairs) :",len(n))
#Saving the np array into a text file
np.savetxt('train_p.txt', p, delimiter=' ', fmt='%s', encoding = 'latin-1')
np.savetxt('train_n.txt', n, delimiter=' ', fmt='%s', encoding = 'latin-1')
var 'p' -
array(['how can i solve an encrypted text ',
'where should i start to solve this encrypted text ',
'how do i skip a class ', ..., 'how do know that you are in love ',
'which is most beautiful place to visit in kerala ',
'which place in kerala is most beautiful '], dtype=object)
看起来简单地省略encoding
参数就可以了:
In [171]: 'u1ec7'
Out[171]: 'ệ'
In [172]: txt = ' '.join(['abc',_,_,'def',_])
In [173]: txt
Out[173]: 'abc ệ ệ def ệ'
工程:
In [174]: np.savetxt('test.txt', [txt], fmt='%s')
In [175]: cat test.txt
abc ệ ệ def ệ
不:
In [176]: np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-176-8ba623098d70> in <module>
----> 1 np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')
<__array_function__ internals> in savetxt(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
1450 file : str or file
1451 Filename or file object to read.
-> 1452 regexp : str or regexp
1453 Regular expression used to parse the file.
1454 Groups in the regular expression correspond to fields in the dtype.
UnicodeEncodeError: 'latin-1' codec can't encode character 'u1ec7' in position 4: ordinal not in range(256)
encoding
的默认值是None
,它被传递给io.open
函数:
In [185]: f = open('test','w', encoding=None)
In [186]: f
Out[186]: <_io.TextIOWrapper name='test' mode='w' encoding='UTF-8'>