通过读取多个html文件创建单个数据帧



我有23个html文件,其中包含相同表格格式的数据,我想创建所有文件的数据帧,并合并到一个大数据帧中进行进一步分析,下面是代码,


import glob
import pandas as pd
all_rec = glob.glob('D:python*.html')
#print(all_rec)
list_data = []
for filename in all_rec:
data = pd.read_html(filename)
list_data.append(data)
list_data # sample output from t files
[[                   start_time                    user_host       query_time  
0  2020-02-19 07:01:56.411155  lrdba[lrdba] @ localhost []  00:01:55.299187   
1  2020-02-20 07:01:56.005284  lrdba[lrdba] @ localhost []  00:01:54.210222   
db                                    sql_text  
0  kvb  call PROC_PROCESSINGSUMMARY(null,null)  
1  kvb  call PROC_PROCESSINGSUMMARY(null,null)  ],
[                   start_time                    user_host       query_time  
0  2020-02-19 07:01:56.411155  lrdba[lrdba] @ localhost []  00:01:55.299187   
1  2020-02-20 07:01:56.005284  lrdba[lrdba] @ localhost []  00:01:54.210222   
db                                    sql_text  
0  kvb  call PROC_PROCESSINGSUMMARY(null,null)  
1  kvb  call PROC_PROCESSINGSUMMARY(null,null)  ]]    
list_data =list_data[0] #when i mention this list_data[0] it show data for first file 
list_data =list_data[-1] #for list_data[-1] it show data for last file for below code
pd.concat(list_data,ignore_index=True)

我想知道我应该在[]中输入什么值才能在一个大数据帧中获得所有文件的详细信息。

import glob
import pandas as pd
all_rec = glob.glob('D:python*.html')
#print(all_rec)
list_data = []
for filename in all_rec:
data = pd.read_html(filename)
list_data.append(data)
list_data # showing array of all html data  
#remove this line
list_data =list_data[0] 
#remove this line
list_data =list_data[-1] 
pd.DataFrame(list_data).reset_index(drop=True) #replace concat with DataFrame

希望这能奏效!

最新更新