解析pandas中的xml-xls文件



我有一个excel文件(.xls)它不是一个真正的excel文件,只是一些xml,在这里:

https://pastebin.com/raw/3MQS7RMJ
中包含的数据如何解析?我只需要两列,第8列和第11列。我已经试过了:
pd.read_excel
pd.read_xml
pd.read_html
pd.read_csv

下面应该可以工作

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" 
xmlns:o="urn:schemas-microsoft-com:office:office" 
xmlns:x="urn:schemas-microsoft-com:office:excel" 
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 
xmlns:html="http://www.w3.org/TR/REC-html40"> 
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> 
</DocumentProperties> 
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"> 
</ExcelWorkbook> 
<Styles> 
</Styles> 
<Worksheet ss:Name="Table1"> 
<Table > 
<Row> 
<Cell><Data ss:Type="String">tipo_evento</Data></Cell> 
<Cell><Data ss:Type="String">data_inizio</Data></Cell> 
<Cell><Data ss:Type="String">data_fine</Data></Cell> 
<Cell><Data ss:Type="String">ora_inizio</Data></Cell> 
<Cell><Data ss:Type="String">ora_fine</Data></Cell> 
<Cell><Data ss:Type="String">tutto_il_giorno</Data></Cell> 
<Cell><Data ss:Type="String">data_inserimento</Data></Cell> 
<Cell><Data ss:Type="String">autore</Data></Cell> 
<Cell><Data ss:Type="String">classe_desc</Data></Cell> 
<Cell><Data ss:Type="String">gruppo_desc</Data></Cell> 
<Cell><Data ss:Type="String">nota</Data></Cell> 
<Cell><Data ss:Type="String">aula</Data></Cell> 
<Cell><Data ss:Type="String">tipo</Data></Cell> 
<Cell><Data ss:Type="String">materia</Data></Cell> 
</Row> 
<Row> 
<Cell><Data ss:Type="String">Nota Agenda</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-18</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-18</Data></Cell> 
<Cell><Data ss:Type="String">09:40:00</Data></Cell> 
<Cell><Data ss:Type="String">10:30:00</Data></Cell> 
<Cell><Data ss:Type="String">NO</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-15 15:42:31</Data></Cell> 
<Cell><Data ss:Type="String">POMPILI MATTEO</Data></Cell> 
<Cell><Data ss:Type="String">1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE</Data></Cell> 
<Cell><Data ss:Type="String">1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE</Data></Cell> 
<Cell><Data ss:Type="String">Studiare pag. 4, 5, 12 e 13.
Svolgere es. 24 e 30 pag. 40 e 34 pag. 41 (solo domande b e c)</Data></Cell> 
<Cell><Data ss:Type="String">-</Data></Cell> 
<Cell><Data ss:Type="String">nota</Data></Cell> 
<Cell><Data ss:Type="String"></Data></Cell> 
</Row>
<Row> 
<Cell><Data ss:Type="String">Nota Agenda</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-18</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-18</Data></Cell> 
<Cell><Data ss:Type="String">10:30:00</Data></Cell> 
<Cell><Data ss:Type="String">10:30:00</Data></Cell> 
<Cell><Data ss:Type="String">NO</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-12 10:48:03</Data></Cell> 
<Cell><Data ss:Type="String">MICHELINI DORIANA</Data></Cell> 
<Cell><Data ss:Type="String">1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE</Data></Cell> 
<Cell><Data ss:Type="String"></Data></Cell> 
<Cell><Data ss:Type="String">Antologia: studiare+ schema pag. 24; leggere pag. 34-37; es. Pag. 37 n. 1,2; pag. 38 n. 4,5,6,7</Data></Cell> 
<Cell><Data ss:Type="String">-</Data></Cell> 
<Cell><Data ss:Type="String">nota</Data></Cell> 
<Cell><Data ss:Type="String"></Data></Cell> 
</Row>
<Row> 
<Cell><Data ss:Type="String">Nota Agenda</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-19</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-19</Data></Cell> 
<Cell><Data ss:Type="String">09:30:00</Data></Cell> 
<Cell><Data ss:Type="String">09:30:00</Data></Cell> 
<Cell><Data ss:Type="String">NO</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-12 11:42:38</Data></Cell> 
<Cell><Data ss:Type="String">MICHELINI DORIANA</Data></Cell> 
<Cell><Data ss:Type="String">1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE</Data></Cell> 
<Cell><Data ss:Type="String"></Data></Cell> 
<Cell><Data ss:Type="String">Grammatica: studiare pag. 38-39 es. Pa. 42 -44 n.39 - 46</Data></Cell> 
<Cell><Data ss:Type="String">-</Data></Cell> 
<Cell><Data ss:Type="String">nota</Data></Cell> 
<Cell><Data ss:Type="String"></Data></Cell> 
</Row>
<Row> 
<Cell><Data ss:Type="String">Nota Agenda</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-19</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-19</Data></Cell> 
<Cell><Data ss:Type="String">11:30:00</Data></Cell> 
<Cell><Data ss:Type="String">12:30:00</Data></Cell> 
<Cell><Data ss:Type="String">NO</Data></Cell> 
<Cell><Data ss:Type="String">2021-10-14 11:44:29</Data></Cell> 
<Cell><Data ss:Type="String">PITARO MARIA GRAZIA</Data></Cell> 
<Cell><Data ss:Type="String">1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE</Data></Cell> 
<Cell><Data ss:Type="String"></Data></Cell> 
<Cell><Data ss:Type="String">Scienze della Terra Cap 1 par 1-2-3 fino a pag 16.</Data></Cell> 
<Cell><Data ss:Type="String">-</Data></Cell> 
<Cell><Data ss:Type="String">nota</Data></Cell> 
<Cell><Data ss:Type="String"></Data></Cell> 
</Row>
</Table> 
</Worksheet> 
</Workbook> '''
cols = [8,11]
root = ET.fromstring(xml)
for row in root.findall('.//{urn:schemas-microsoft-com:office:spreadsheet}Row'):
cells = row.findall('{urn:schemas-microsoft-com:office:spreadsheet}Cell')
for col in cols:
print(cells[col].find('{urn:schemas-microsoft-com:office:spreadsheet}Data').text)

输出
classe_desc
aula
1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE
-
1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE
-
1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE
-
1CL SCIENTIFICO -  OPZIONE SCIENZE APPLICATE
-