我写了一个小脚本,从网站获取数据并将其存储在txt文件中,这样我就可以提取特定的数据,然后将其存储到excel表中。到目前为止,我只能用python编写几行代码。我只是个初学者。
#Importing modules
import requests
import json
#Program start here
######################
#loading required info for post request
pload = {'iec':'0200006797','name':'GLOBA'}
# Sending request to web
r = requests.post('http://dgft.delhi.nic.in:8100/dgft/IecPrint',data = pload)
#opening file for saving the extacted data
f = open("data.txt", "w+")
f.write(r.text)
#Opening file for reading and further manipulations
f=open("data.txt","r")
if f.mode == 'r':
contents = f.read()
所以我基本上想从中提取数据。我只想提取第一位导演的名字和电话号码,第二位导演的姓名和电话号码以及第三位导演的名称和电话号码。数据是HTML格式的。此外,下面的HTML也不完整。有不止一个表,所以我只想从第一个表中提取数据,上面写着"Directors"。谢谢
</TABLE>
<BR>
<BR>
<B>Directors:</B>
<BR>
<TABLE BORDER=1>
<TR><TD VALIGN= TOP ALIGN=LEFT COLSPAN=50>1.</TD><TD VALIGN= TOP ALIGN=LEFT COLSPAN=100>SANJAY CHAKRABORTY <BR>LATE PRASAD KUMAR CHAKRABORTY <BR>19 K K MUKHERJEE SARANI,SERAMPORE <BR> <BR>HOOGHLY,WEST BENGAL <BR>PIN-712204<BR>Phone/Email:919339624590 </TD></TR>
<TR><TD VALIGN= TOP ALIGN=LEFT COLSPAN=50>2.</TD><TD VALIGN= TOP ALIGN=LEFT COLSPAN=100>SANJAY DHANUKA <BR>BASUDEO DHANUKA <BR>BA -206,SECTOR-1,SALT LAKE,PS-BIDH <BR>ANNAGAR <BR>KOLKATA,WEST BENGAL <BR>PIN-700064<BR>Phone/Email:9674448777 </TD></TR>
<TR><TD VALIGN= TOP ALIGN=LEFT COLSPAN=50>3.</TD><TD VALIGN= TOP ALIGN=LEFT COLSPAN=100>ISHITA NANDI <BR>INDRANIL BURMAN ROY <BR>112, DR B C ROY SARANI,NEW BARRACK <BR>PORE <BR>KOLKATA,WEST BENGAL <BR>PIN-700131<BR>Phone/Email:9804561441 </TD></TR>
</TABLE>
下面是使用lxml和xpath的示例。这将把数据导出到csv文件。
from lxml import html
import requests
import pandas as pd
import csv
#loading required info for post request
pload = {'iec':'0200006797','name':'GLOBA'}
# Sending request to web
r = requests.post('http://dgft.delhi.nic.in:8100/dgft/IecPrint',data = pload)
tree = html.fromstring(r.content)
csvFile = open('directors.csv', 'w')
writer = csv.writer(csvFile)
writer.writerow(('name', 'phonenumber'))
#find name and phone and create a list
contacts =[]
items = tree.xpath("//table/tbody/tr")
for item in items:
names = item.xpath("//td[2]/text()[1]")
phones = item.xpath("//td[2]/text()[7]")
for name,phone in names,phones:
contacts.append(name,phone)
#opening file for saving the extacted data
for contact in contacts:
writer.writerow(contact)
csvFile.close()