python中的utf-8和cp1254编码错误



即使我将数据中带有Il(城市(和Il lçe(县(的列重命名为utf-8,我也会收到以下错误::

Usecols do not match columns, columns expected but not found: ['İl', 'İlçe']

我的代码检测到土耳其语字母Il和ç分别为"one_answers"。

整个项目是根据cp1254编写的。

如何根据cp1254修复伊安?

"ç".encode("utf8")
Output:b'xc3xa7'
b'xc3xa7'.decode("cp1254")
Output:'ç'
"İ".encode("utf8")
Output:b'xc4xb0'
b'xc4xb0'.decode("cp1254")
Output:'Ä°'

此处的数据操作代码:

def read_rakip_df(city_name, data_path):
rename_cols = ["ID", "Marka", "İl", "İlçe", "Mahalle", "Mahalle Kodu", "Latitude", "Longitude"]
use_cols = [
"Nokta ID",
"Kategori 5",
"İl",
"İlçe",
"Mahalle",
"Mahalle Kodu",
"Latitude - Enlem - Y",
"Longitude - Boylam - X",
]
rename_fix = dict(zip(use_cols, rename_cols))
df_rakip = pd.read_excel(data_path, usecols=use_cols)
df_rakip.rename(columns=rename_fix, inplace=True)
df_rakip.columns = [k.strip() for k in df_rakip.columns]
df_rakip = df_rakip.loc[df_rakip["İl"] == city_name]
df_rakip = df_rakip.copy().rename(
columns={
"Mahalle": "mahalle",
"Mahalle Kodu": "mahalle_kodu",
"Latitude": "latitude",
"Longitude": "longitude",
"Marka": "isim",
"İl": "il",
"İlçe": "ilce",
}
)
df_rakip["latitude"] = df_rakip["latitude"].astype(float)
df_rakip["longitude"] = df_rakip["longitude"].astype(float)

return df_rakip

在将列名传递给read_excel()之前,可以转换为正确的编码

pd.read_excel(data_path, usecols=[s.encode("cp1254").decode("utf8") for s in use_cols])

然而,我认为主要的问题是您的python源代码已经声明了# -*- coding: cp1254 -*-,但文件本身保存为UTF-8。如果是这种情况,最简单的方法是修复编码:声明的必须与文件保存的实际位置匹配。

最新更新