我必须从以下路径通过panda读取CTG.xls文件:https://archive.ics.uci.edu/ml/machine-learning-databases/00193/.
我必须从该文件中选择图纸数据。此外,我必须从文件的K列到AT列进行选择。因此,在最后,我们有一个数据集,其中有以下列:
["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV","MLTV","宽度","最小","最大","Nmax"、"Nzeros"、"Mode"、"Mean"、"Middle"、"Variance"、"Trendy"、"CLASS"、"NSP"]
如何使用panda中的read函数来完成此操作?
使用:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'
df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)
输出
LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV Width Min Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0 120 0 0 0 0 0 0 73 0.5 43 2.4 64 62 126 2 0 120 137 121 73 1 9 2
1 132 0.00638 0 0.00638 0.00319 0 0 17 2.1 0 10.4 130 68 198 6 1 141 136 140 12 0 6 1
2 133 0.003322 0 0.008306 0.003322 0 0 16 2.1 0 13.4 130 68 198 5 1 141 135 138 13 0 6 1
3 134 0.002561 0 0.007682 0.002561 0 0 16 2.4 0 23 117 53 170 11 0 137 134 137 13 1 6 1
4 132 0.006515 0 0.008143 0 0 0 16 2.4 0 19.9 117 53 170 9 0 137 136 138 11 1 2 1
... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
2121 140 0 0 0.007426 0 0 0 79 0.2 25 7.2 40 137 177 4 0 153 150 152 2 0 5 2
2122 140 0.000775 0 0.006971 0 0 0 78 0.4 22 7.1 66 103 169 6 0 152 148 151 3 1 5 2
2123 140 0.00098 0 0.006863 0 0 0 79 0.4 20 6.1 67 103 170 5 0 153 148 152 4 1 5 2
2124 140 0.000679 0 0.00611 0 0 0 78 0.4 27 7 66 103 169 6 0 152 147 151 4 1 5 2
2125 142 0.001616 0.001616 0.008078 0 0 0 74 0.4 36 5 42 117 159 2 1 145 143 145 1 0 1 1
[2126 rows x 23 columns]