如何在python中从RIXML/XML中解析和创建数据帧



我刚开始学习Python,我们有一个用例,需要解析xml类型的结构RIXML并保存为表格格式,或者创建JSON或将其创建为csv文件。但是对于所有的输出,我们必须解析xml。我已经尝试了很多方法,但ElementTree是我们应该用来解析这种xml结构的方法。如果这个假设是错误的,请纠正我。这是我的xml树。很抱歉发布整个xml。

<?xml version="1.0" encoding="UTF-8"?>
<Research xmlns="http://www.rixml.org/2013/2/RIXML" language="eng" createDateTime="2022-03-25T12:18:11.527Z" researchID="123456">
<Product sequence="0" eventIndicator="No" productID="7448811">
<StatusInfo currentStatusIndicator="Yes" statusDateTime="2022-03-25T12:21:12.269Z" statusType="Published" />
<Source>
<Organization type="SellSideFirm" primaryIndicator="Yes">
<OrganizationID idType="TrRefi">5219</OrganizationID>
<OrganizationID idType="VendorCode">ABCD</OrganizationID>
<OrganizationID idType="ONEAccess">497</OrganizationID>
<OrganizationName nameType="Display">abcd Securities (Australia) Limited</OrganizationName>
<PersonGroup>
<PersonGroupMember primaryIndicator="Yes">
<Person personID="3881">
<FamilyName>Bairstow</FamilyName>
<GivenName>mATHEY</GivenName>
<DisplayName>mATHEY Bairstow</DisplayName>
<JobTitle>Division Director</JobTitle>
<ContactInfo nature="Business">
<Email>mATHEY.bairstow@abcd.com</Email>
<Phone type="Voice">
<CountryCode />
<Number>+61 165 123 321</Number>
</Phone>
</ContactInfo>
</Person>
</PersonGroupMember>
<PersonGroupMember primaryIndicator="No">
<Person personID="5554">
<FamilyName>Scholtz</FamilyName>
<GivenName>John</GivenName>
<DisplayName>John Scholtz</DisplayName>
<JobTitle>Research Analyst</JobTitle>
<ContactInfo nature="Business">
<Email>John.scholtz@abcd.com</Email>
<Phone type="Voice">
<CountryCode />
<Number>+61 100 036 200</Number>
</Phone>
</ContactInfo>
</Person>
</PersonGroupMember>
<PersonGroupMember primaryIndicator="No">
<Person personID="5116">
<FamilyName>Bowler</FamilyName>
<GivenName>Andrew</GivenName>
<DisplayName>Andrew Bowler</DisplayName>
<JobTitle>Research Analyst</JobTitle>
<ContactInfo nature="Business">
<Email>andrew.bowler@abcd.com</Email>
<Phone type="Voice">
<CountryCode />
<Number>+61 448 433 736</Number>
</Phone>
</ContactInfo>
</Person>
</PersonGroupMember>
<PersonGroupMember primaryIndicator="No">
<Person personID="6076">
<FamilyName>Yun</FamilyName>
<GivenName>Austin</GivenName>
<DisplayName>Austin Yun,  CFA</DisplayName>
<JobTitle>Senior Research Associate Analyst</JobTitle>
<ContactInfo nature="Business">
<Email>austin.yun@abcd.com</Email>
<Phone type="Voice">
<CountryCode />
<Number>+61 457 429 116</Number>
</Phone>
</ContactInfo>
</Person>
</PersonGroupMember>
</PersonGroup>
</Organization>
</Source>
<Content>
<Title>Australian Lithium and Rare Earths Miners</Title>
<SubTitle>DLE under the spotlight</SubTitle>
<Abstract>Key Lithium and Rare Earths market themes
.</Abstract>
<Synopsis>Spodumene prices edged higher during the week while Chinese lithium carbonate prices remained flat. We review Direct Lithium Extraction method following POSCO’s .</Synopsis>
<Resource primaryIndicator="Yes" resourceID="7448811">
<MIMEType>application/pdf</MIMEType>
<Name>ref.0007448811.20220325.pdf</Name>
<URL>https://www.abcdresearch.com</URL>
</Resource>
</Content>
<Context external="Yes">
<IssuerDetails>
<Issuer primaryIndicator="No" issuerType="Corporate">
<SecurityDetails>
<Security primaryIndicator="No">
<SecurityID tradingCountryCode="KR" idValue="005490.KS" idType="RIC" />
<SecurityID tradingCountryCode="KR" idValue="005490 KS" idType="Bloomberg" />
<SecurityFinancials securityFinancialsType="Price">
<Currency>KRW</Currency>
<FinancialValue estimateActual="Actual">304000.00</FinancialValue>
</SecurityFinancials>
<SecurityFinancials securityFinancialsType="TargetPrice">
<Currency>KRW</Currency>
<FinancialValue dateTime="2022-03-25T12:18:14.215Z" estimateActual="Estimate">280000.00</FinancialValue>
</SecurityFinancials>
<SectorIndustry primaryIndicator="No" focusLevel="No" classificationType="GICS" level="4" code="15104050">
<ShortName>Steel</ShortName>
<Name>Steel (15104050)</Name>
</SectorIndustry>
<SecurityName>POSCO</SecurityName>
<SecurityShortName>POSCO</SecurityShortName>
<AssetClass assetClass="Equity" />
<AssetType assetType="Stock" />
<Rating rating="NeutralSentiment" priorCurrent="Current" timeFrame="ShortTerm">
<PublisherDefinedValue>NEUTRAL</PublisherDefinedValue>
<RatingEntity ratingEntity="Publisher" />
</Rating>
</Security>
</SecurityDetails>
<IssuerName nameType="Display">
<PublisherDefinedValue />
<NameValue>POSCO</NameValue>
</IssuerName>
</Issuer>
<Issuer primaryIndicator="No" issuerType="Corporate">
<SecurityDetails>
<Security primaryIndicator="No">
<SecurityID tradingCountryCode="CN" idValue="6445490" idType="SEDOL" />
<SectorIndustry primaryIndicator="No" focusLevel="No" classificationType="GICS" level="4" code="15101030">
<ShortName>Fertilizers &amp; Agricultural Chemicals</ShortName>
<Name>Fertilizers &amp; Agricultural Chemicals (15101030)</Name>
</SectorIndustry>
<SecurityName>ZANGGE MINING CO L</SecurityName>
<SecurityShortName>ZANGGE MINING CO L</SecurityShortName>
<AssetClass assetClass="Equity" />
<AssetType assetType="Stock" />
<Rating rating="NoRating" priorCurrent="Current" timeFrame="ShortTerm">
<PublisherDefinedValue />
<RatingEntity ratingEntity="Publisher" />
</Rating>
</Security>
</SecurityDetails>
<IssuerName nameType="Display">
<PublisherDefinedValue />
<NameValue>ZANGGE MINING CO L</NameValue>
</IssuerName>
</Issuer>
</IssuerDetails>
<ProductDetails publicationDateTime="2022-03-24T13:00:00.000Z">
<ProductCategory publisherDefinedValue="Flyer" productCategory="PublisherDefined" />
<ProductFocus primaryIndicator="Yes" focus="SectorIndustry" />
<EntitlementGroup>
<Entitlement includeExcludeIndicator="Include" primaryIndicator="Yes">
<AudienceTypeEntitlement entitlementContext="TrRefi" audienceType="PublisherDefined">1</AudienceTypeEntitlement>
</Entitlement>
</EntitlementGroup>
</ProductDetails>
<ProductClassifications>
<Discipline researchApproach="Fundamental" disciplineType="Investment" />
<Country primaryIndicator="Yes" code="AU" />
<Region regionType="Australasia" primaryIndicator="Yes" />
<AssetClass assetClass="Equity" />
<SectorIndustry primaryIndicator="Yes" focusLevel="Yes" classificationType="GICS" level="3" code="151040">
<ShortName>Metals &amp; Mining</ShortName>
<Name>Metals &amp; Mining (151040)</Name>
</SectorIndustry>
</ProductClassifications>
</Context>
</Product>
</Research>

是否有解析这个xml及其所有属性的方法?

这就是我开始的方式

import xml.etree.ElementTree as ETree
import pandas as pd
# give the path where you saved the xml file
# inside the quotes
xmldata = "C:\Users\myxmlfile-rixml.xml"
prstree = ETree.parse(xmldata)
root = prstree.getroot()
print("root tag ----"+root.tag)
print(root.attrib)
for child in root:
print(child.tag, child.attrib)

在该方法中,我们如何获得StatusInfo和Organization标签及其元素

您可以使用库xmltodict

之后,您可以将dict序列化为json或其他

最新更新