阅读带有特定表格和仅选定列的在线excel文件



我必须从以下路径通过panda读取CTG.xls文件:https://archive.ics.uci.edu/ml/machine-learning-databases/00193/.

我必须从该文件中选择图纸数据。此外,我必须从文件的K列到AT列进行选择。因此,在最后,我们有一个数据集,其中有以下列:

["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV","MLTV","宽度","最小","最大","Nmax"、"Nzeros"、"Mode"、"Mean"、"Middle"、"Variance"、"Trendy"、"CLASS"、"NSP"]

如何使用panda中的read函数来完成此操作?

使用:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'
df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)

输出

LB        AC        FM        UC        DL DS DP ASTV MSTV ALTV  MLTV Width  Min  Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0     120         0         0         0         0  0  0   73  0.5   43   2.4    64   62  126    2      0  120  137    121       73        1     9   2
1     132   0.00638         0   0.00638   0.00319  0  0   17  2.1    0  10.4   130   68  198    6      1  141  136    140       12        0     6   1
2     133  0.003322         0  0.008306  0.003322  0  0   16  2.1    0  13.4   130   68  198    5      1  141  135    138       13        0     6   1
3     134  0.002561         0  0.007682  0.002561  0  0   16  2.4    0    23   117   53  170   11      0  137  134    137       13        1     6   1
4     132  0.006515         0  0.008143         0  0  0   16  2.4    0  19.9   117   53  170    9      0  137  136    138       11        1     2   1
...   ...       ...       ...       ...       ... .. ..  ...  ...  ...   ...   ...  ...  ...  ...    ...  ...  ...    ...      ...      ...   ...  ..
2121  140         0         0  0.007426         0  0  0   79  0.2   25   7.2    40  137  177    4      0  153  150    152        2        0     5   2
2122  140  0.000775         0  0.006971         0  0  0   78  0.4   22   7.1    66  103  169    6      0  152  148    151        3        1     5   2
2123  140   0.00098         0  0.006863         0  0  0   79  0.4   20   6.1    67  103  170    5      0  153  148    152        4        1     5   2
2124  140  0.000679         0   0.00611         0  0  0   78  0.4   27     7    66  103  169    6      0  152  147    151        4        1     5   2
2125  142  0.001616  0.001616  0.008078         0  0  0   74  0.4   36     5    42  117  159    2      1  145  143    145        1        0     1   1
[2126 rows x 23 columns]

最新更新