如何在使用 np.savetxt() 时修复 UnicodeEncodeError?

我正在尝试使用np.savetxt()将数组另存为文本文件。但是我收到一个错误：UnicodeEncodeError: 'latin-1' codec can't encode character 'u1ec7' in position 15: ordinal not in range(256)

我检查了字符"\u1ec7"，它是一个拉丁小写字母 E，下面有回旋和点。

我尝试使用x = x.replace("[^a-zA-Z#]", " ")从数组中的文本中删除它，但它仍然给出错误。

这个错误到底是什么，可以做些什么来解决它？这是我的代码：

duplicate = X_train[y_train == 1]
not_duplicate = X_train[y_train == 0]
p = np.dstack([duplicate['question1'], duplicate['question2']]).flatten()
n = np.dstack([not_duplicate['question1'], not_duplicate['question2']]).flatten()
print ("Number of data points in class 1 (duplicate pairs) :",len(p))
print ("Number of data points in class 0 (non duplicate pairs) :",len(n))
#Saving the np array into a text file
np.savetxt('train_p.txt', p, delimiter=' ', fmt='%s', encoding = 'latin-1')
np.savetxt('train_n.txt', n, delimiter=' ', fmt='%s', encoding = 'latin-1')

var 'p' -

array(['how can i solve an encrypted  text  ',
'where should i start to solve this encrypted  text  ',
'how do i skip a class ', ..., 'how do know that you are in love ',
'which is most beautiful place to visit  in kerala ',
'which place in kerala is most beautiful '], dtype=object)

看起来简单地省略encoding参数就可以了：

In [171]: 'u1ec7'                                                              
Out[171]: 'ệ'
In [172]: txt = ' '.join(['abc',_,_,'def',_])                                   
In [173]: txt                                                                   
Out[173]: 'abc ệ ệ def ệ'

工程：

In [174]: np.savetxt('test.txt', [txt], fmt='%s')                               
In [175]: cat test.txt                                                          
abc ệ ệ def ệ

不：

In [176]: np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')           
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-176-8ba623098d70> in <module>
----> 1 np.savetxt('test.txt', [txt], fmt='%s', encoding='latin-1')
<__array_function__ internals> in savetxt(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
1450     file : str or file
1451         Filename or file object to read.
-> 1452     regexp : str or regexp
1453         Regular expression used to parse the file.
1454         Groups in the regular expression correspond to fields in the dtype.
UnicodeEncodeError: 'latin-1' codec can't encode character 'u1ec7' in position 4: ordinal not in range(256)

encoding的默认值是None，它被传递给io.open函数：

In [185]: f = open('test','w', encoding=None)                                   
In [186]: f                                                                     
Out[186]: <_io.TextIOWrapper name='test' mode='w' encoding='UTF-8'>

相关内容

最新更新

热门标签：