我试图将html表数据导出到csv,但我只能将所有表导出到csv。
这是我的代码,哪里需要修改?输入图片描述
from bs4 import BeautifulSoup
import pandas as pd
HTMLFileToBeOpened = open(r"C:report.html", "r",encoding="utf-16")
contents = HTMLFileToBeOpened.read()
soup = BeautifulSoup(contents, 'html.parser')
tables = soup.findAll('table')
for t, table in enumerate(tables):
df = pd.read_html(str(table),skiprows=2)
df[0].to_csv('table%s.csv' % t)
您不需要使用bs4
库。您只能使用pandas
,它返回数据帧列表,您需要选择您感兴趣的表。
dfs = pd.read_html(r"C:report.html")
df = dfs[index_interesting_table]
要从HTML(作为文件C:report.html
)中读取表作为熊猫dataframes,直接使用dimay的回答中建议的read_html()
然后索引你想从零开始的表。例如,包含test-details的第4个大表是dfs[3]
。
这个框架然后可以使用to_csv()
导出为CSV,就像你已经做的那样。
import pandas as pd
dfs = pd.read_html(r"C:report.html")
# only the 4th table with test-details (tables as 0-indexed data-frames)
print(dfs[3])
dfs[3].to_csv('report_table-4_test-details.csv')
将导致以下CSV文件写入。
,0,1,2,3,4,5,6
0,Pass,# Failed,# Trials,Test Name,Worst Actual,Worst Margin,Pass Limits
1,,4,4,"1000 Base-T, Point A Peak Output Voltage(w/o Disturbing Signal)",510.0 mV,-106.7 %,670.0 mV < VALUE < 820.0 mV
2,,4,4,"1000 Base-T, Point B Peak Output Voltage(w/o Disturbing Signal)",510.0 mV,-106.7 %,670.0 mV < VALUE < 820.0 mV
3,,0,4,"1000 Base-T, Difference A,B Peak Output Voltage(w/o Disturbing Signal)",140 m%,86.0 %,VALUE < 1.00 %
4,,0,4,"1000 Base-T, Point C Peak Output Voltage(w/o Disturbing Signal)",1.19 %,40.5 %,|VALUE| < 2.00 %
5,,0,4,"1000 Base-T, Point D Peak Output Voltage(w/o Disturbing Signal)",1.18 %,41.0 %,|VALUE| < 2.00 %
6,,0,4,"1000 Base-T, Point A Template Test(w/o Disturbing Signal)",0.000,100.0 %,No Mask Failures
7,,0,4,"1000 Base-T, Point B Template Test(w/o Disturbing Signal)",0.000,100.0 %,No Mask Failures
8,,0,4,"1000 Base-T, Point C Template Test(w/o Disturbing Signal)",0.000,100.0 %,No Mask Failures
9,,0,4,"1000 Base-T, Point D Template Test(w/o Disturbing Signal)",0.000,100.0 %,No Mask Failures
10,,0,4,"1000 Base-T, Point F Template Test(w/o Disturbing Signal)",0.000,100.0 %,No Mask Failures
11,,0,4,"1000 Base-T, Point H Template Test(w/o Disturbing Signal)",0.000,100.0 %,No Mask Failures
12,,0,4,"1000 Base-T, Point G Droop Test(w/o Disturbing Signal)",96.62 %,32.2 %,VALUE > 73.10 %
13,,0,4,"1000 Base-T, Point J Droop Test(w/o Disturbing Signal)",96.82 %,32.4 %,VALUE > 73.10 %
14,,0,3,"1000 Base-T, MDI Common Mode Output Voltage",26.6 mV,46.8 %,|VALUE| < 50.0 mV
15,,0,4,"1000 Base-T, Transmitter Distortion(w/o Disturbing Signal)",4.95 mV,50.5 %,VALUE <= 10.00 mV
16,,1,1,"100 Base-TX, UTP +Vout Differential Output Voltage",707.5 mV,-242.5 %,950.0 mV < VALUE < 1.0500 V
17,,1,1,"100 Base-TX, UTP -Vout Differential Output Voltage",-715.9 mV,-234.1 %,950.0 mV < |VALUE| < 1.0500 V
18,,0,1,"100 Base-TX, UTP Signal Amplitude Symmetry",-988 m,20.0 %,980 m < |VALUE| < 1.020
19,,0,1,"100 Base-TX, +Vout Overshoot",-1.9 %,138.0 %,VALUE < 5.0 %
20,,0,1,"100 Base-TX, -Vout Overshoot",-1.4 %,128.0 %,VALUE < 5.0 %
21,,0,1,"100 Base-TX, UTP AOI Template",0.000,100.0 %,No Mask Failures
22,,0,1,"100 Base-TX, AOI +Vout Rise Time",3.872 ns,43.6 %,3.000 ns < VALUE < 5.000 ns
23,,0,1,"100 Base-TX, AOI +Vout Fall Time",4.040 ns,48.0 %,3.000 ns < VALUE < 5.000 ns
24,,0,1,"100 Base-TX, AOI +Vout Rise/Fall Symmetry",167.77 ps,66.4 %,VALUE < 500.00 ps
25,,0,1,"100 Base-TX, AOI -Vout Rise Time",3.960 ns,48.0 %,3.000 ns < VALUE < 5.000 ns
26,,0,1,"100 Base-TX, AOI -Vout Fall Time",4.151 ns,42.5 %,3.000 ns < VALUE < 5.000 ns
27,,0,1,"100 Base-TX, AOI -Vout Rise/Fall Symmetry",191.65 ps,61.7 %,VALUE < 500.00 ps
28,,0,1,"100 Base-TX, AOI Overall Rise/Fall Symmetry",278.72 ps,44.3 %,VALUE < 500.00 ps