如何从 xml 中提取完整的 html(包括标签)?

我有以下代码：

package main
import (
"encoding/xml"
"fmt"
)
func main() {
xr := &xmlResponse{}
if err := xml.Unmarshal([]byte(x), &xr); err != nil {
panic(err)
}
fmt.Printf("%+v", xr)
}
type xmlResponse struct {
//Title string `xml:"title,omitempty"`
Title struct {
BoldWords []struct {
Bold string `xml:",chardata"`
} `xml:"bold,omitempty"`
Title string `xml:",chardata" `
} `xml:"title,omitempty"`
}
var x = `<?xml version="1.0" encoding="utf-8"?>
<mytag version="1.0">
<title><bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.</title>
</mytag>`

这输出：

&{Title:{BoldWords:[{Bold:Go} {Bold:Go}] Title: is a programming language. I repeat:  is a programming language.}}

如何获得：

<bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.

换句话说，我不仅需要标签，还需要将它们保存在适当的位置，而不仅仅是一片粗体项目。试图将其作为字符串获取(例如，取消注释 xmlResponse 结构中的第一个"标题"(会完全省略粗体项目。

来自文档

如果 XML 元素包含字符数据，则该数据将
累积在具有标记"，chardata"的第一个结构字段中。这结构字段可以具有 []字节或字符串类型。如果没有这样的字段中，字符数据将被丢弃。

这实际上不是你想要的，你要找的是：

如果结构具有 []byte 类型的字段或带有 tag
"，innerxml" 的字符串，则 Unmarshal 将累积嵌套在该字段中元素内的原始 XML
。其余规则仍然适用。

因此，请使用innerxml而不是chardata。

package main
import (
"encoding/xml"
"fmt"
)
func main() {
xr := &xmlResponse{}
if err := xml.Unmarshal([]byte(x), &xr); err != nil {
panic(err)
}
fmt.Printf("%+v", xr)
}
type xmlResponse struct {
//Title string `xml:"title,omitempty"`
Title struct {
Title string `xml:",innerxml" `
} `xml:"title,omitempty"`
}
var x = `<?xml version="1.0" encoding="utf-8"?>
<mytag version="1.0">
<title><bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.</title>
</mytag>`

输出：

&{Title:{Title:<bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.}}

玩

相关内容

最新更新

热门标签：