我想构建一个将xml转换为csv的工具。我正在使用python,但如果更好的话,可以转到另一个工具。
这些xml并不总是遵循相同的模式,所以我需要自动将结构转换为csv。而不总是知道树的结构。主要标记是已知的并且总是相同的,有些xml可能使用所有标记,有些则只使用少数标记。我尝试使用xml.etree并设法使用xml,但没有使用动态xml输入。这可能吗?
以下是我的xml输入文件内容示例:
<Process>
<ProcessName>Vault-2-A</ProcessName>
<ProcessEnabled>True</ProcessEnabled>
<ProcessType>N2N</ProcessType>
<NonDuplicationMethod>Delete</NonDuplicationMethod>
<OnFileExistsInDest>Overwrite</OnFileExistsInDest>
<ProcessScheduling>ExternalActivation</ProcessScheduling>
<ExternalActivationLevel>Process</ExternalActivationLevel>
<ProcessRecursive>True</ProcessRecursive>
<FileSelectionPattern>*</FileSelectionPattern>
<Rules>
<Rule1>
<RuleName>V2A</RuleName>
<SourcePort>
<Name>xxx</Name>
<Type>Vault</Type>
<VaultName>yyy</VaultName>
<UserName>user</UserName>
<FolderName>Root</FolderName>
</SourcePort>
<DestPort>
<Name>MyFileSystem</Name>
<Type>FileSystem</Type>
<FolderName>D:xxx</FolderName>
</DestPort>
</Rule1>
<Rule2>
<RuleName>A2V</RuleName>
<SourcePort>
<Name>xxx</Name>
<Type>Vault</Type>
<VaultName>yyyn</VaultName>
<UserName>user</UserName>
<SafeName>userTest</SafeName>
<FolderName>Root</FolderName>
</SourcePort>
<DestPort>
<Name>sftp</Name>
<Type>sftp</Type>
<FolderName>D:Accellion TestsDCA-IN</FolderName>
<ArchiveFolder>arc</ArchiveFolder>
</DestPort>
</Rule2>
<Rule3>
<RuleName>Vault-2-Accellion</RuleName>
<NOND>true</NOND>
<SourcePort>
<Name>A</Name>
<Type>Vault</Type>
<VaultName>Am</VaultName>
<UserName>g</UserName>
<SafeName>test</SafeName>
<FolderName>Root</FolderName>
</SourcePort>
<DestPort>
<Name>MyFileSystem</Name>
<Type>FileSystem</Type>
<FolderName>D:TestsDCA-IN</FolderName>
</DestPort>
</Rule3>
</Rules>
<UserExits>
</UserExits>
</Process>
谢谢大卫
根据注释;您可以使用xmltodict
将xml转换为dictionary。然后可以使用CSV
输出结果,使用DictWriter()
您需要考虑如何以CSV显示数据。默认情况下,<Rules>
数据将在1个单元格中作为OrdeDict输出。您可能想将字典展开,或者允许重复数据?
例如:
import csv
import xmltodict
def save_dict_to_csv(filename, dict):
with open(filename, 'w') as csvfile:
w = csv.DictWriter(csvfile, dict.keys())
w.writeheader()
w.writerow(dict)
xml = r"""
<Process>
<ProcessName>Vault-2-A</ProcessName>
<ProcessEnabled>True</ProcessEnabled>
<ProcessType>N2N</ProcessType>
<NonDuplicationMethod>Delete</NonDuplicationMethod>
<OnFileExistsInDest>Overwrite</OnFileExistsInDest>
<ProcessScheduling>ExternalActivation</ProcessScheduling>
<ExternalActivationLevel>Process</ExternalActivationLevel>
<ProcessRecursive>True</ProcessRecursive>
<FileSelectionPattern>*</FileSelectionPattern>
<Rules>
<Rule1>
<RuleName>V2A</RuleName>
<SourcePort>
<Name>xxx</Name>
<Type>Vault</Type>
<VaultName>yyy</VaultName>
<UserName>user</UserName>
<FolderName>Root</FolderName>
</SourcePort>
<DestPort>
<Name>MyFileSystem</Name>
<Type>FileSystem</Type>
<FolderName>D:xxx</FolderName>
</DestPort>
</Rule1>
<Rule2>
<RuleName>A2V</RuleName>
<SourcePort>
<Name>xxx</Name>
<Type>Vault</Type>
<VaultName>yyyn</VaultName>
<UserName>user</UserName>
<SafeName>userTest</SafeName>
<FolderName>Root</FolderName>
</SourcePort>
<DestPort>
<Name>sftp</Name>
<Type>sftp</Type>
<FolderName>D:Accellion TestsDCA-IN</FolderName>
<ArchiveFolder>arc</ArchiveFolder>
</DestPort>
</Rule2>
<Rule3>
<RuleName>Vault-2-Accellion</RuleName>
<NOND>true</NOND>
<SourcePort>
<Name>A</Name>
<Type>Vault</Type>
<VaultName>Am</VaultName>
<UserName>g</UserName>
<SafeName>test</SafeName>
<FolderName>Root</FolderName>
</SourcePort>
<DestPort>
<Name>MyFileSystem</Name>
<Type>FileSystem</Type>
<FolderName>D:TestsDCA-IN</FolderName>
</DestPort>
</Rule3>
</Rules>
<UserExits>
</UserExits>
</Process>"""
my_dict = xmltodict.parse(xml)
save_dict_to_csv('test.csv', next(iter(my_dict.values()))) # pass value for Process