SQLXML 4-xml数据的大容量插入



我有大约2TB+的XML数据需要加载到MSSQL DB中(在尽可能短的时间内(。问题是,每个XML文件中的数据范围从5MB到100MB,所有文本都在第一行。一个文件的第一行大约有4400多万个字符。

我写了一个脚本来读取XML文件并填充数据库,目前它说处理大约需要300天。我现在正在尝试SQLXML 4.0。我使用XmlSchemaInference从多个XML文件生成XSD。我使用XSD2DB工具从XSD生成数据库。我注意到该工具创建了从父表到子表的主键和外键。此后,我将该工具创建的关系插入XSD中,但这些ID不在XML文件中。

我运行了SQLXML代码,得到了一个无法插入主键列的错误。如果我删除主键并使其可以为null,则会插入数据,但我无法将数据关联到不同的表中。

我需要将唯一ID从主父表持久化到子表,或者尝试让SQLXML忽略主键,但将它们插入子节点(相关表(。如果您查看下面的XSD,您会注意到在relationship标记中,我包含了XSD2DB工具生成的主键和外键。有没有一种方法可以让SQLXML填充这些内容或允许MSSQL自增量?我指的是这里的批量加载到标识列中。非常感谢您的帮助!

XSD示例(道歉必须用Foo替换一个常用词(:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xs:annotation>
<xs:appinfo>
<sql:relationship name="FoosFoo"
parent="Foos"
parent-key="Foos_Id"
child="Foo"
child-key="Foos_Id" 
/>      
<sql:relationship name="FooFooIdentification"
parent="Foo"
parent-key="Foo_Id"
child="FooIdentification"
child-key="Foo_Id" 
/>
<sql:relationship name="FooIdentificationFooFlags"
parent="FooIdentification"
parent-key="FooIdentification_Id"
child="FooFlags"
child-key="FooIdentification_Id" 
/>
<sql:relationship name="FooFlagsFooFlag"
parent="FooFlags"
parent-key="FooFlags_Id"
child="FooFlag"
child-key="FooFlags_Id" 
/>
<sql:relationship name="FooFlagFooFlagDetails"
parent="FooFlag"
parent-key="FooFlag_Id"
child="FooFlagDetails"
child-key="FooFlag_Id" 
/>      
<sql:relationship name="FooContacts"
parent="Foo"
parent-key="Foo_Id"
child="Contacts"
child-key="Foo_Id" 
/>                        
<sql:relationship name="ContactsContact"
parent="Contacts"
parent-key="Contacts_Id"
child="Contact"
child-key="Contacts_Id" 
/>                        
<sql:relationship name="ContactContactTypes"
parent="Contact"
parent-key="Contact_Id"
child="ContactTypes"
child-key="Contact_Id" 
/>                        
<sql:relationship name="ContactTypesContactType"
parent="ContactTypes"
parent-key="ContactTypes_Id"
child="ContactType"
child-key="ContactTypes_Id" 
/>                        
<sql:relationship name="ContactTypeContactTypeCode"
parent="ContactType"
parent-key="ContactType_Id"
child="ContactTypeCode"
child-key="ContactType_Id" 
/>
<sql:relationship name="FooAddresses"
parent="Foo"
parent-key="Foo_Id"
child="Addresses"
child-key="Foo_Id" 
/>                        
<sql:relationship name="AddressesAddress"
parent="Addresses"
parent-key="Addresses_Id"
child="Address"
child-key="Addresses_Id" 
/>                                                        
</xs:appinfo>
</xs:annotation>
<xs:element name="FooBulkXMLFile" sql:is-constant="1">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="Header" sql:relation="Header">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="DateTimeCreated" type="xs:string" />
<xs:element minOccurs="0" name="FileType" type="xs:string" />
<xs:element minOccurs="0" name="MonthlyDate" nillable="true" />
<xs:element minOccurs="0" name="DailyDate" type="xs:string" />
<xs:element minOccurs="0" name="FooSourceSystem" type="xs:string" />
<xs:element minOccurs="0" name="FooSourceVersion" type="xs:string" />
<xs:element minOccurs="0" name="FooCount" type="xs:long" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element minOccurs="0" name="Foos" sql:relation="Foos">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" name="Foo" sql:relation="Foo" sql:relationship="FoosFoo">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="FooIdentification" sql:relation="FooIdentification" sql:relationship="FooFooIdentification">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="FooNumber" type="xs:string" />
<xs:element minOccurs="0" name="IsActive" type="xs:boolean" />
<xs:element minOccurs="0" name="CountryOfOriginCode" type="xs:string" />
<xs:element minOccurs="0" name="FooName" type="xs:string" />
<xs:element minOccurs="0" name="CreatedDate" type="xs:string" />
<xs:element minOccurs="0" name="EditDate" type="xs:string" />
<xs:element minOccurs="0" name="FooFlags" sql:relation="FooFlags" sql:relationship="FooIdentificationFooFlags">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" name="FooFlag" sql:relation="FooFlag" sql:relationship="FooFlagsFooFlag">
<xs:complexType>
   <xs:sequence>
       <xs:element minOccurs="0" name="FooFlagType" type="xs:string" />
       <xs:element minOccurs="0" name="FooFlagDescription" type="xs:string" />
       <xs:element minOccurs="0" name="FooFlagValue" type="xs:boolean" />
       <xs:element minOccurs="0" name="FooFlagLastVerificationDate" type="xs:string" />
       <xs:element minOccurs="0" name="FooFlagDetails" sql:relation="FooFlagDetails" sql:relationship="FooFlagFooFlagDetails">
           <xs:complexType>
               <xs:sequence>
                   <xs:element minOccurs="0" name="FooFlagDetail" />
               </xs:sequence>
           </xs:complexType>
       </xs:element>
   </xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>                                        
<xs:element minOccurs="0" name="Contacts" sql:relation="Contacts" sql:relationship="FooContacts">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" name="Contact" sql:relation="Contact" sql:relationship="ContactsContact">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="ContactID" type="xs:long" />
<xs:element minOccurs="0" name="ContactTypes" sql:relation="ContactTypes" sql:relationship="ContactContactTypes">
<xs:complexType>
   <xs:sequence>
       <xs:element minOccurs="0" name="ContactType" sql:relation="ContactType" sql:relationship="ContactTypesContactType">
           <xs:complexType>
               <xs:sequence>
                   <xs:element minOccurs="0" maxOccurs="unbounded" name="ContactTypeCode" type="xs:string" sql:field="ContactTypeCode_Column"  sql:relation="ContactTypeCode" sql:relationship="ContactTypeContactTypeCode" />
               </xs:sequence>
           </xs:complexType>
       </xs:element>
   </xs:sequence>
</xs:complexType>
</xs:element>                                                        
<xs:element minOccurs="0" name="Name" type="xs:string" />
<xs:element minOccurs="0" name="Surname" type="xs:string" />
<xs:element minOccurs="0" name="EmailAddress" type="xs:string" /><xs:element minOccurs="0" name="CreatedDate" type="xs:string" />
<xs:element minOccurs="0" name="EditDate" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element minOccurs="0" name="Addresses"  sql:relation="Addresses" sql:relationship="FooAddresses">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" name="Address"  sql:relation="Address" sql:relationship="AddressesAddress">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" name="AddressID" type="xs:long" />
<xs:element minOccurs="0" name="IsPreferred" type="xs:boolean" />
<xs:element minOccurs="0" name="IsActive" type="xs:boolean" />
<xs:element minOccurs="0" name="AddressTypeCode" type="xs:string" />
<xs:element minOccurs="0" name="AddressLine1" type="xs:string" />
<xs:element minOccurs="0" name="AddressLine2" type="xs:string" />
<xs:element minOccurs="0" name="CountryCode" type="xs:string" />
<xs:element minOccurs="0" name="ProvinceCode" nillable="true" type="xs:string" />
<xs:element minOccurs="0" name="DistrictCode" nillable="true" type="xs:string" />
<xs:element minOccurs="0" name="MunicipalityCode" nillable="true" type="xs:string" />
<xs:element minOccurs="0" name="CityCode" nillable="true" type="xs:string" />
<xs:element minOccurs="0" name="SuburbCode" nillable="true" type="xs:string" />
<xs:element minOccurs="0" name="WardCode" nillable="true" type="xs:string" />
<xs:element minOccurs="0" name="PostalCode" nillable="true" type="xs:string" />
<xs:element minOccurs="0" name="IsPostalAddress" type="xs:boolean" />
<xs:element minOccurs="0" name="IsDeliveryAddress" type="xs:boolean" />
<xs:element minOccurs="0" name="IsPhysicalAddress" type="xs:boolean" />
<xs:element minOccurs="0" name="IsPaymentAddress" type="xs:boolean" />
<xs:element minOccurs="0" name="Field1" type="xs:string" />
<xs:element minOccurs="0" name="Field2" type="xs:string" />
<xs:element minOccurs="0" name="Field3" type="xs:string" />
<xs:element minOccurs="0" name="CreatedDate" type="xs:string" />
<xs:element minOccurs="0" name="EditDate" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>                                        
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

C#SQLXML代码:

public static void TestSqlXmlBulk()
{
try
{
SQLXMLBULKLOADLib.SQLXMLBulkLoad4Class xs;
xs = new SQLXMLBULKLOADLib.SQLXMLBulkLoad4Class
{
ConnectionString = @"Provider=sqloledb;server=fooFoo01;database=FooXsd;integrated security=SSPI",
ErrorLogFile = @"c:tmperror.xml",
KeepIdentity = false,
CheckConstraints = false,
Transaction = false
};
var list = Directory.GetFiles(@"c:tmpextract", "foo*.xml");
foreach (var f in list)
{
Console.WriteLine("{0} - Processing {1}", DateTime.Now.ToString("hh:mm:ss.fff"), f);
xs.Execute(@"c:tmpfoo_rel.xsd", f);
}
xs = null;
}
catch (Exception x)
{
throw x;
}
}

(1(源代码正确地具有以下设置:

KeepIdentity = false

(2( 每个具有IDENTITY列的表在XSD中都应该具有sql:key-fields="PrimaryKeyColumn"属性。

例如,

<xs:element minOccurs="0" name="Foos" sql:relation="Foos">

应调整如下:

<xs:element minOccurs="0" name="Foos" sql:relation="Foos" sql:key-fields="Foos_Id">

最新更新