正在使用定义的id将xml文件导入Access数据库



我正在努力将大量数据从xml文件导入Access。我面临的问题是,我想导入的文件确实包含id为的第一行

<vin id="11111111111111111">
<description>Mazda3 L 2.0l MZR 150 PS 4T 5AG AL-EDITION TRA-P</description>
<type>BL</type>
<typeapproval>e11*2001/116*0262*07</typeapproval>
<variant>B2F</variant>
<version>7EU</version>
<series>Mazda3</series>
<body>L</body>
<engine>2.0l MZR 150 PS</engine>
<grade>AL-EDITION</grade>
<transmission>5AG</transmission>
<colourtype>Mica</colourtype>
<extcolourcode>34K</extcolourcode>
<extcolourcodedescription>Crystal White Pearl</extcolourcodedescription>
<intcolourcode>BU4</intcolourcode>
<intcolourcodedescription>Black</intcolourcodedescription>
<registrationdate>2012-07-20</registrationdate>
<productiondate>2011-11-30</productiondate>
</vin>

所以我导入的结果是除了实际定义为id的车辆的VIN号之外的所有行。

我试图手动替换字符,如:">等等

删除那个id,但我实际上有几十个文件,每个文件中有几十万条记录,所以这很痛苦。。。

所以我考虑用python中的脚本将所有文件连接在一起:

import os 
import csv
import pandas as pd
import numpy as np
ver='2011'
dirName =r'C:UsersdawidDesktopDE_DATAMazda_DEVINs_DEMazdaxml'.format(ver);
out_file=r'C:UsersdawidDesktopDE_DATAMazda_DEVINs_DEMazdaOutput.xml'.format(ver);

def getListOfFiles(dirName):
# create a list of file and sub directories 
# names in the given directory

listOfFile = os.listdir(dirName)
allFiles = list()
# Iterate over all the entries
for entry in listOfFile:
# Create full path

fullPath = os.path.join(dirName, entry)
# If entry is a directory then get the list of files in this directory 
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
else:
allFiles.append(fullPath)
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)

return allFiles
listOfFileOut=getListOfFiles(dirName)
#filenames = allFiles
with open(out_file, 'w',encoding='ANSI') as outfile:
for fname in listOfFileOut:
with open(fname,encoding='ANSI') as infile:
for line in infile:
outfile.write(line)

print("Done")

但这完全破坏了xml文件的结构,我无法再导入它了。有人能建议是否可以使用python来清除所有这些id,以便能够导入整个数据库进行访问吗?

提前感谢。在此处输入图像描述

试试这个。

from simplified_scrapy import utils, SimplifiedDoc, req
dirName = r'C:UsersdawidDesktopDE_DATAMazda_DEVINs_DEMazdaxml'
listFile = utils.getSubFile(dirName, end='.xml')
for f in listFile:
doc = SimplifiedDoc(utils.getFileContent(f, encoding='ANSI'))
doc.replaceReg('<vin[^>]*>', '<vin>')
print(doc.html)
# utils.saveFile(f, doc.html, encoding='ANSI') # write to original file

结果:

<vin>
<description>Mazda3 L 2.0l MZR 150 PS 4T 5AG AL-EDITION TRA-P</description>
<type>BL</type>
<typeapproval>e11*2001/116*0262*07</typeapproval>
<variant>B2F</variant>
<version>7EU</version>
...

最新更新