从Oracle Parse中选择XMLTYPE,并将记录插入另一个Oracle数据库中



我使用Python从Oracle中选择XMLTYPE字段,并解析为单独的字段并插入到另一个Oracle数据库中。下面是我的代码。主标签中大约有300个标签,一些子标签有多个值。

以下是我的问题:

1.表中有数百万条记录,而且速度非常非常慢。它需要很长时间才能完成。

请在不影响性能的情况下,通过python帮助更好地实现这一点。

我的原产地表:

desc cust_test
Name   Null Type      
------ ---- --------- 
RECID       NUMBER    
RECORD      XMLTYPE()

数据:

100227  
"<row xml:space="preserve" id="100227">
<c1>BRENT</c1>
<c2>BRENT2</c2>
<c3>Brent3</c3>
<c4>Brent4</c4>
<c5>CP</c5>
<c7>GL</c7>
<c9>US</c9>
<c23>1001</c23>
<c24>26</c24>
<c25>4</c25>
<c26>1000</c26>
<c27>2</c27>
<c28>US</c28>
<c29>1</c29>
<c30>GB</c30>
<c31>20210315</c31>
<c42>19581212</c42>
<c45>1</c45>
<c48>US0010001</c48>
<c52>NO</c52>
<c57>VALUED.CUSTOMER</c57>
<c58>11</c58>
<c60>MR</c60>
<c61>Brent61</c61>
<c63>MALE</c63>
<c64>19720915</c64>
<c68>+12345678</c68>
<c69>bc@gmail.com</c69>
<c132>YES</c132>
<c133>NULL</c133>
<c134>NULL</c134>
<c137>NULL</c137>
<c138>NULL</c138>
<c149>VALUED.CUSTOMER</c149>
<c150>11</c150>
<c151>11</c151>
<c179/>
<c179 m="6">NO</c179>
<c179 m="15">OPT-IN</c179>
<c179 m="16">20210315</c179>
<c179 m="176"/>
<c180>EB.US.ADD.RES.CHAN.AGR}Field ADDRESS/RESIDENCE changed. Still agree?</c180>
<c180 m="2">KYC/US*41 FROM 10 NOT RECEIVED</c180>
<c180 m="3">PWM/US*41 FROM 10 NOT RECEIVED</c180>
<c180 m="4">INTRO/US*41 FROM 10 NOT RECEIVED</c180>
<c182>2</c182>
<c183>17338_OFFICER__OFS_SEAT</c183>
<c184>2104271357</c184>
<c185>17338_OFFICER_OFS_SEAT</c185>
<c186>GB0010001</c186>
<c187>1</c187>
</row>"

Python代码:

#!/usr/bin/env python3
import os
import sys
import time
import csv
import cx_Oracle
import xml.etree.ElementTree as ET
import pandas as pd
con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
start = time.time()
SQL = "SELECT RECID,RECORD FROM cust_test2"
#print(SQL)
cur = con.cursor()
cur.prefetchrows = 1000
cur.arraysize = 1000
f = open("D:Variousmyfile21.csv", "w")
writer = csv.writer(f, lineterminator="n", quoting=csv.QUOTE_NONNUMERIC)
r = cur.execute(SQL)
#col_names = [row[0] for row in cur.description]
#writer.writerow(col_names)
res = cur.fetchall()
for row in res:
tree = ET.ElementTree(ET.fromstring(row[1]))
root = tree.getroot()
for child in root:
#print(child.tag, child.text, child.attrib)
if (child.tag == 'c1') :
s1 = pd.Series(child.text)
if (child.tag == 'c3') :
s3 = pd.Series(child.text)
if (child.tag == 'c23') :
s23 = pd.Series(child.text)
if (child.tag == 'c179') :
s179 = pd.Series(child.text)
.......
df = pd.DataFrame({"c1": s1,
"c3": s3,
"c23": s23,
"c179": s179
.......})
file_name = 'events.csv'
df.to_csv(file_name, sep='t')
ResultSet_Py_List = []
ora_conn = cx_Oracle.connect('custom/custom@orcl')
ora_cursor = ora_conn.cursor()
my_file = open(file_name, 'r', newline='')
reader = csv.reader(my_file, dialect='excel', delimiter='t')
row1 = next(reader)
for index, row in enumerate(reader):
print(row)
ResultSet_Py_List.append(row)
print(str(len(ResultSet_Py_List)) + ' Records from Source')
sql_insert = """
INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....) 
VALUES (:1,:2,:3,:4,:5,.....)
"""
ora_cursor.prepare(sql_insert)
ora_cursor.executemany(None, ResultSet_Py_List)
ora_conn.commit()
#writer.writerow(row)
f.close()
elapsed = (time.time() - start)
print(elapsed, "seconds")

考虑使用Oracle的XMLTABLEXMLTYPE列中提取所需的值。从Python中运行这样一个查询,并在两个DB连接之间直接将值从cursor.fetchall传递到cursor.executemany。这种方法避免了XML解析、Pandas操作和CSV写/读步骤。另外,源SQL直接在服务器上查询XML数据。

SQL (完成所有c节点,根据需要调整数据类型,另存为.SQL(

SELECT x.c1, x.c2, ..., x.c187
FROM MY_TABLE, 
XMLTABLE('/row' 
PASSING XMLTYPE(MY_TABLE.RECORD) 
COLUMNS c1 VARCHAR2(50) PATH 'c1',
c2 VARCHAR2(50) PATH 'c2',
c3 VARCHAR2(50) PATH 'c3',
...
c187 VARCHAR2(50) PATH 'c187'
) AS x

Python

import cx_Oracle
# READ SQL QUERY
with open('/path/to/myquery.sql', mode='r') as f:
src_sql = f.read()
# SOURCE CONNECTION
src_con = cx_Oracle.connect('system/Manager@localhost:1521/cdb1')
src_cur = src_con.cursor() 
src_cur.prefetchrows = 1000
src_cur.arraysize = 1000
# DESTINATION CONNECTION
dst_con = cx_Oracle.connect('custom/custom@orcl') 
dst_cur = dst_con.cursor()
# APPEND QUERY (MAKE SURE COLUMNS ALIGN TO SOURCE QUERY)
dst_sql = """INSERT INTO cust_ins (c0,c1,c3,c23,c179,.....) 
VALUES (:1,:2,:3,:4,:5,.....)""" 
# FETCH SOURCE DATA
src_cur.execute(src_sql)
src_res = src_cur.fetchall()
# APPEND TO DESTINATION TABLE
dst_cur.prepare(dst_sql)
dst_cur.executemany(None, src_res) 
dst_con.commit()
# CLOSE CURSORS AND CONNECTIONS
src_cur.close(); src_con.close()
dst_cur.close(); dst_con.close()

最新更新