Apose.wordimportnode在追加子节点时忽略字体格式



我目前正在使用Aspose。打开文档的单词,在书签开始和书签结束之间拖动内容,然后将该内容放入另一个文档。我遇到的问题是,当使用ImportNode方法时,它导入到我的文档中,但将所有字体从Calibri更改为Times New Roman,并将字体大小从原始文档的大小更改为12pt。

我从书签中获取内容的方式是通过使用Aspose ExtractContent方法。

因为我有ImportNode剥离我的字体格式的问题,我试着做一些调整和保存每个节点到一个HTML字符串使用ToString(HtmlSaveOptions)。这主要工作,但问题是,这是剥离我的返回对word文档,所以我的文本没有适当的间距。我的返回结果以以下格式的HTML形式出现

"<p style="margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt"><span style="font-family:Calibri; display:none; -aw-import:ignore">&#xa0;</span></p>"

在使用

DocumentBuilder.InsertHtml("<p style="margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt"><span style="font-family:Calibri; display:none; -aw-import:ignore">&#xa0;</span></p>");

它没有正确地在word文档中添加返回。

这是我使用的代码,请原谅评论等…这是我试图纠正的错误。

public async Task<string> GenerateHtmlString(Document srcDoc, ArrayList nodes)
{
// Create a blank document.
Document dstDoc = new Document();
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
// Remove the first paragraph from the empty document.
dstDoc.FirstSection.Body.RemoveAllChildren();
// Create a new Builder for the temporary document that gets generated with the header or footer data.
// This allows us to control font and styles separately from the main document being built.
var newBuilder = new DocumentBuilder(dstDoc);
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.SaveFormat = SaveFormat.Html;
htmlSaveOptions.ExportFontsAsBase64 = true;
htmlSaveOptions.ExportFontResources = true;
htmlSaveOptions.ExportTextBoxAsSvg = true;
htmlSaveOptions.ExportRoundtripInformation = true;
htmlSaveOptions.Encoding = Encoding.UTF8;
// Obtain all the links from the source document
// This is used later to add hyperlinks to the html
// because by default extracting nodes using Aspose
// does not pull in the links in a usable way.
var srcDocLinks = srcDoc.Range.Fields.GroupBy(x => x.DisplayResult).Select(x => x.First()).Where(x => x.Type == Aspose.Words.Fields.FieldType.FieldHyperlink).Distinct().ToList();
var childNodes = nodes.Cast<Node>().Select(x => x).ToList();
var oldBuilder = new DocumentBuilder(srcDoc);
oldBuilder.MoveToBookmark("Header");
var allchildren = oldBuilder.CurrentParagraph.Runs;
var allChildNodes = childNodes[0].Document.GetChildNodes(NodeType.Any, true);
var headerText = allChildNodes[0].Range.Bookmarks["Header"].BookmarkStart.GetText();
foreach (Node node in nodes)
{
var html = node.ToString(htmlSaveOptions);
try
{
// &#xa0; is used by aspose because it works in XML
// If we see this character and the text of the node is r we need to insert a break
if (html.Contains("&#xa0;") && node.Range.Text == "r")
{
newBuilder.InsertHtml(html, false);
// Change the node into an HTML string
/*var htmlString = node.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
// Get all the child nodes of the html document
var allChildNodes = tempHtmlLinkDoc.DocumentNode.SelectNodes("//*");
// Loop over all child nodes so we can make sure we apply the correct font family and size to the break.
foreach (var childNode in allChildNodes)
{
// Get the style attribute from the child node
var childNodeStyles = childNode.GetAttributeValue("style", "").Split(';');
foreach (var childNodeStyle in childNodeStyles)
{
// Apply the font name and size to the new builder on the document.
if (childNodeStyle.ToLower().Contains("font-family"))
{
newBuilder.Font.Name = childNodeStyle.Split(':')[1].Trim();
}
if (childNodeStyle.ToLower().Contains("font-size"))
{
newBuilder.Font.Size = Convert.ToDouble(childNodeStyle.Split(':')[1]
.Replace("pt", "")
.Replace("px", "")
.Replace("em", "")
.Replace("rem", "")
.Replace("%", "")
.Trim());
}
}
}
// Insert the break with the corresponding font size and name.
newBuilder.InsertBreak(BreakType.ParagraphBreak);*/
}
else
{
// Loop through the source document links so the link can be applied to the HTML.
foreach (var srcDocLink in srcDocLinks)
{
if (html.Contains(srcDocLink.DisplayResult))
{
// Now that we know the html string has one of the links in it we need to get the address from the node.
var linkAddress = srcDocLink.Start.NextSibling.GetText().Replace(" HYPERLINK "", "").Replace(""", "");
//Convert the node into an HTML String so we can get the correct font color, name, size, and any text decoration.
var htmlString = srcDocLink.Start.NextSibling.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
var linkStyles = tempHtmlLinkDoc.DocumentNode.ChildNodes[0].GetAttributeValue("style", "").Split(';');
var linkStyleHtml = "";
foreach (var linkStyle in linkStyles)
{
if (linkStyle.ToLower().Contains("color"))
{
linkStyleHtml += $"color:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-family"))
{
linkStyleHtml += $"font-family:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-size"))
{
linkStyleHtml += $"font-size:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("text-decoration"))
{
linkStyleHtml += $"text-decoration:{linkStyle.Split(':')[1].Trim()};";
}
}

if (linkAddress.ToLower().Contains("mailto:"))
{
// Since the link has mailto included don't add the target attribute to the link.
html = new Regex($@"b{srcDocLink.DisplayResult}b").Replace(html, $"<a href="{linkAddress}" style="{linkStyleHtml}">{srcDocLink.DisplayResult}</a>");
//html = html.Replace(srcDocLink.DisplayResult, $"<a href="{linkAddress}" style="{linkStyleHtml}">{srcDocLink.DisplayResult}</a>");
}
else
{
// Since the links is not an email include the target attribute.
html = new Regex($@"b{srcDocLink.DisplayResult}b").Replace(html, $"<a href="{linkAddress}" style="{linkStyleHtml}" target="_blank">{srcDocLink.DisplayResult}</a>");
//html = html.Replace(srcDocLink.DisplayResult, $"<a href="{linkAddress}" style="{linkStyleHtml}" target="_blank">{srcDocLink.DisplayResult}</a>");
}
}
}
// Inseret the HTML String into the temporary document.
newBuilder.InsertHtml(html, false);
}
}
catch (Exception ex)
{
throw;
}
}
// This is just for debugging/troubleshooting purposes and to make sure thigns look correct
string tempDocxPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.docx");
dstDoc.Save(tempDocxPath);
// We generate this HTML file then load it back up and pass the DocumentNode.OuterHtml back to the requesting method.
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Save"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
string tempHtmlPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.html");
dstDoc.Save(tempHtmlPath, htmlSaveOptions);
var tempHtmlDoc = new HtmlDocument();
tempHtmlDoc.Load(tempHtmlPath);
var htmlText = tempHtmlDoc.DocumentNode.OuterHtml;
// Clean up our mess...
if (File.Exists(tempDocxPath))
{
File.Delete(tempDocxPath);
}
if (File.Exists(tempHtmlPath))
{
File.Delete(tempHtmlPath);
}
// Return the generated HTML string.
return htmlText;
}

将每个节点保存为HTML,然后将它们插入到目标文档中并不是一个好主意。因为并不是所有的节点都可以正确地保存为HTML,并且在Aspose之后可能会丢失一些格式。关键词DOM ->HTML→Aspose。Words DOM往返。

关于最初的问题,问题可能是因为您使用ImportFormatMode.UseDestinationStyles而发生的,在这种情况下,使用了目标文档的样式和默认值,并且可能更改了字体。如果你需要保持源文件的格式,你应该使用ImportFormatMode.KeepSourceFormatting

如果问题发生在ImportFormatMode.KeepSourceFormatting,这一定是一个bug,你应该报告给Aspose。文字工作人员在支持论坛。

最新更新