'ascii'编解码器无法在将 Rdd 转换为数据帧时将字符 u'\u2026' 编码到位置 115:序号不在范围 (128) 中: Pyspark:Azure



pyspark中将我的csv转换为数据帧时出错。

read_rdd = sc.textFile("path to my container/myfile.csv")  
intermediate_rdd = read_rdd.mapPartitions(lambda x: csv.reader(x, delimiter=","))  
header=intermediate_rdd.first()  
data_1 = intermediate_rdd.filter(lambda row : row != header).toDF(header)  
data_1.show(5)  

错误

UnicodeEncodeError: 'ascii' codec can't encode character u'u2026' in position 115: ordinal not in range(128)
import csv
from pyspark.sql.types import Row
read_rdd = sc.textFile("path/to/file")
intermediate_rdd = read_rdd.mapPartitions(lambda x: csv.reader(x, delimiter=","))
data = intermediate_rdd.filter(lambda row : row != header).toDF(header)
data.show(20)