我想从一个特定的txt文件中提取数据,并将其保存在excel或任何电子表格应用程序中



我写了一个小脚本,从网站获取数据并将其存储在txt文件中,这样我就可以提取特定的数据,然后将其存储到excel表中。到目前为止,我只能用python编写几行代码。我只是个初学者。

#Importing modules
import requests
import json
#Program start here
######################
#loading required info for post request
pload = {'iec':'0200006797','name':'GLOBA'}
# Sending request to web
r = requests.post('http://dgft.delhi.nic.in:8100/dgft/IecPrint',data = pload)


#opening file for saving the extacted data
f = open("data.txt", "w+")
f.write(r.text)
#Opening file for reading and further manipulations
f=open("data.txt","r")
if f.mode == 'r':
contents = f.read()

所以我基本上想从中提取数据。我只想提取第一位导演的名字和电话号码,第二位导演的姓名和电话号码以及第三位导演的名称和电话号码。数据是HTML格式的。此外,下面的HTML也不完整。有不止一个表,所以我只想从第一个表中提取数据,上面写着"Directors"。谢谢

</TABLE> 
<BR> 
<BR> 
<B>Directors:</B> 
<BR> 
<TABLE BORDER=1>
<TR><TD VALIGN= TOP ALIGN=LEFT COLSPAN=50>1.</TD><TD VALIGN= TOP ALIGN=LEFT COLSPAN=100>SANJAY CHAKRABORTY                                <BR>LATE PRASAD KUMAR CHAKRABORTY                     <BR>19 K K MUKHERJEE SARANI,SERAMPORE  <BR>                                   <BR>HOOGHLY,WEST BENGAL                <BR>PIN-712204<BR>Phone/Email:919339624590                       </TD></TR>
<TR><TD VALIGN= TOP ALIGN=LEFT COLSPAN=50>2.</TD><TD VALIGN= TOP ALIGN=LEFT COLSPAN=100>SANJAY DHANUKA                                    <BR>BASUDEO DHANUKA                                   <BR>BA -206,SECTOR-1,SALT LAKE,PS-BIDH <BR>ANNAGAR                            <BR>KOLKATA,WEST BENGAL                <BR>PIN-700064<BR>Phone/Email:9674448777                         </TD></TR>
<TR><TD VALIGN= TOP ALIGN=LEFT COLSPAN=50>3.</TD><TD VALIGN= TOP ALIGN=LEFT COLSPAN=100>ISHITA NANDI                                      <BR>INDRANIL BURMAN ROY                               <BR>112, DR B C ROY SARANI,NEW BARRACK <BR>PORE                               <BR>KOLKATA,WEST BENGAL                <BR>PIN-700131<BR>Phone/Email:9804561441                         </TD></TR>
</TABLE>

下面是使用lxml和xpath的示例。这将把数据导出到csv文件。

from lxml import html
import requests
import pandas as pd
import csv    
#loading required info for post request
pload = {'iec':'0200006797','name':'GLOBA'}
# Sending request to web
r = requests.post('http://dgft.delhi.nic.in:8100/dgft/IecPrint',data = pload)
tree = html.fromstring(r.content)
csvFile = open('directors.csv', 'w')
writer = csv.writer(csvFile)
writer.writerow(('name', 'phonenumber'))
#find name and phone and create a list
contacts =[]
items = tree.xpath("//table/tbody/tr")
for item in items:
names = item.xpath("//td[2]/text()[1]")
phones = item.xpath("//td[2]/text()[7]")
for name,phone in names,phones:
contacts.append(name,phone)
#opening file for saving the extacted data
for contact in contacts:
writer.writerow(contact)
csvFile.close()

最新更新