使用文件使用Pandas设置列名



我有这样的说明:从导入数据集https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data或Pandas的原始数据。使用read_csv的name参数和咨询添加列名:https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.names。您可以检查使用sep=r";\s+";。

所以我下载数据集并使用sep=r〃;\s+":

data_auto = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original", comment="#", sep=r"s+")

,但我不知道如何使用这个扩展名为.names的奇怪文件来设置列名。你们有主意吗?谢谢

这里有一个使用pandas.read_fwf的命题:

import pandas as pd
# --- Retrieving the columns names
cols_df = pd.read_fwf("https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.names", header=None)
cols_df.loc[cols_df[1] == 'Attribute Information:', 0] = 'Attribute Information'
cols_df = cols_df.ffill()
cols_df.iloc[:,0] == 'Attribute Information'
cols_df['column_name'] = cols_df[1].str.split(r's* s*|s*:s*').str[1]
cols_df = cols_df[cols_df[0] == 'Attribute Information']
cols_names = cols_df['column_name'].tolist()[1:]
# --- Creating the AUTO dataframe
data_auto = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original", header=None, comment="#", sep=r"s+", names=cols_names)

#输出:

print(data_auto.head())
mpg  cylinders  displacement  horsepower  weight  acceleration  model  origin                        car
0  18.0        8.0         307.0       130.0  3504.0          12.0   70.0     1.0  chevrolet chevelle malibu
1  15.0        8.0         350.0       165.0  3693.0          11.5   70.0     1.0          buick skylark 320
2  18.0        8.0         318.0       150.0  3436.0          11.0   70.0     1.0         plymouth satellite
3  16.0        8.0         304.0       150.0  3433.0          12.0   70.0     1.0              amc rebel sst
4  17.0        8.0         302.0       140.0  3449.0          10.5   70.0     1.0                ford torino

最新更新