如何使用 XQuery 根据逗号分隔格式的多个条件提取特定的 XML 记录?



输入文件:

<?xml version="1.0" encoding="UTF-8"?> 
<books>
<book id="6636551">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">72771KAM3</xref>
<xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
</book_xref>
</master_information>
<book_details>
<price>24.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book_details>
<global_information>
<ratings>
<rating agency="ABC Agency" type="Author Rating">A++</rating>
<rating agency="DEF Agency" type="Author Rating">A+</rating>
<rating agency="DEF Agency" type="Book Rating">A</rating>
</ratings>
</global_information>
<country_info>
<country_code>US</country_code>
</country_info>
</book>
<book id="119818569">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">070185UL5</xref>
<xref type="Non_Fiction" type_id="2">US070185UL50</xref>
</book_xref>
</master_information>
<book_details>
<price>19.25</price>
<publish_date>2002-11-01</publish_date>
<description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
</book_details>
<global_information>
<ratings>
<rating agency="ABC Agency" type="Author Rating">A+</rating>
<rating agency="ABC Agency" type="Book Rating">A</rating>
<rating agency="DEF Agency" type="Author Rating">A</rating>
<rating agency="DEF Agency" type="Book Rating">B+</rating>
</ratings>
</global_information>
<country_info>
<country_code>CA</country_code>
</country_info>
</book>
<book id="119818568">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">070185UK7</xref>
<xref type="Non_Fiction" type_id="2">US070185UK77</xref>
</book_xref>
</master_information>
<book_details>
<price>5.95</price>
<publish_date>2004-05-01</publish_date>
<description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description>
</book_details>
<global_information>
<ratings>
<rating agency="ABC Agency" type="Author Rating">A+</rating>
<rating agency="ABC Agency" type="Book Rating">A+</rating>
<rating agency="DEF Agency" type="Author Rating">B++</rating>
<rating agency="DEF Agency" type="Book Rating">A+</rating>
</ratings>
</global_information>
<country_info>
<country_code>UK</country_code>
</country_info>
</book>
<book id="119818567">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">070185UJ0</xref>
<xref type="Non_Fiction" type_id="2">US070185UJ05</xref>
</book_xref>
</master_information>
<book_details>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled.</description>
</book_details>
<global_information>
<ratings>
<rating agency="ABC Agency" type="Author Rating">B+</rating>
<rating agency="ABC Agency" type="Book Rating">A+</rating>
<rating agency="DEF Agency" type="Author Rating">B++</rating>
<rating agency="DEF Agency" type="Book Rating">A+</rating>
</ratings>
</global_information>
<country_info>
<country_code>US</country_code>
</country_info>
</book>
</books>

我编写了一个 XQuery 来获取 CSV 格式的特定 XML 记录:

for $x in string-join(('book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating', //book//global_information/ratings/rating[@type='Author Rating' and .=('A+','B++')]/string-join((ancestor::book/@id, ancestor::book//book_xref/xref/@type, ancestor::book//book_xref/xref, ancestor::book//country_info/country_code, ancestor::book//book_details/description, @agency, @type, .), ',')), '&#10;')
return $x

预期输出为:

book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating
6636551,Fiction,72771KAM3,US,An in-depth look at creating applications with XML.,DEF Agency,Author Rating,A+
6636551,Non_Fiction,US72771KAM36,US,An in-depth look at creating applications with XML.,DEF Agency,Author Rating,A+
119818569,Fiction,070185UL5,CA,A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.,ABC Agency,Author Rating,A+
119818569,Non_Fiction,US070185UL50,CA,A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.,ABC Agency,Author Rating,A+
etc.

我希望book_ids重复的地方,因为我没有过滤xref_type所以它应该在单独的行中显示它们,但它没有。生成的输出如下:

book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating
6636551,Fiction,Non_Fiction,72771KAM3,US72771KAM36,US,An in-depth look at creating applications with XML.,DEF Agency,Author Rating,A+
119818569,Fiction,Non_Fiction,070185UL5,US070185UL50,CA,A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.,ABC Agency,Author Rating,A+

当我过滤xref_type="小说"时,我必须在外部参照/@type和外部参照上执行此操作。它有效,但有更好的方法可以做到这一点吗?

从本质上讲,我的问题可以分为三个小问题:

如何列出"虚
  1. 构"和"非虚构"商品 同一给定book_id的单独行?
  2. 有没有更好的方法来编写此代码的条件?
  3. 如何以在特定条件的数据缺失时输出空白值的方式编写条件?这是自动完成的吗?

感谢您的帮助!

在计算XPath表达式ancestor::book//book_xref/xref/@type时,上下文是单个rating元素,结果是一系列type="..."属性。因此,当您组装CSV行时,祖先book元素的所有类型的书籍引用都被视为一个(可以是项目序列)。

如果要将每个项目分隔到其单独的CSV行中,则必须使用例如for循环递归迭代序列。我使用letbook绑定到变量,并在所有xrefs 上循环:

string-join(
(
'book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating',
for $rating in //book//global_information/ratings/rating[@type='Author Rating' and .=('A+','B++')]
let $book := $rating/ancestor::book
for $xref in $book//xref
return (
string-join(
(
$book/@id,
$xref/@type,
$xref/text(),
$book//country_info/country_code,
$book//book_details/description/text(),
$rating/@agency,
$rating/@type,
$rating/text()
),
','
)
)
),
'&#10;'
)

如果你想避免向后轴(有时可能有点尴尬),你也可以先迭代book,然后再迭代你感兴趣的rating

string-join(
(
'book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating',
for $book in //book
for $rating in $book/global_information/ratings/rating[@type='Author Rating' and .=('A+','B++')]
for $xref in $book//xref
[...]