导入并解析.data文件



有一个文件我试图导入,并且安全地作为pandas-df。乍一看,它已经按列和行排序了,但最后我不得不做一些事情来创建pandas-df。你能检查一下是否有更快的方法来管理它吗?

url='https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'

我的做法是:

import requests
import pandas as pd
r = requests.get(url)
file = r.text    
step_1 = file.split('n')
for n in range(len(step_1)):                 # remove empty strings
if bool(step_1[n]) == False:                 
del(step_1[n])
step_2 = [i.split('t') for i in step_1]
cars_names = [i[1] for i in step_2]
step_3 = [i[0].split(' ') for i in step_2]
for e in range(len(step_3)):         # remove empty strings in each sublist
step_3[e] = [item for item in step_3[e] if item != '']

mpg        = [i[0] for i in step_3]
cylinders  = [i[1] for i in step_3]
disp       = [i[2] for i in step_3]
horsepower = [i[3] for i in step_3]
weight     = [i[4] for i in step_3]
acce       = [i[5] for i in step_3]
year       = [i[6] for i in step_3]
origin     = [i[7] for i in step_3]

list_cols = [cars_names, mpg, cylinders, disp, horsepower, weight, acce, year, origin]
# list_labels written manually:
list_labels = ['car name', 'mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin']
zipped = list(zip(list_labels, list_cols))
data = dict(zipped)
df = pd.DataFrame(data)

当您将t替换为空白空间时,您可以使用read_csv来读取它。但您需要包装文本,因为read_csv中的第一个参数是filepath_or_buffer,它需要具有read((方法的对象(如文件句柄或StringIO(。然后你的问题可以转换为read_csv dons';你没有正确读取这个文件上的列名吗?

import requests
import pandas as pd
from io import StringIO
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
r = requests.get(url)
file = r.text.replace("t"," ")
# list_labels written manually:
list_labels = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin','car name']
df = pd.read_csv(StringIO(file),sep="s+",header = None,names=list_labels)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df)

最新更新