r-我可以手动重新排序LDA_Gibbs主题模型吗



我有一个LDA_Gibbs主题模型,来自topicmodels库。我还有一个LDAvis交互式可视化。

我的问题是;LDA对象和LDAvis中的主题顺序不相同。

我想把其中一个映射到另一个(不管是哪一个(。到目前为止我还没有工作的方法:

ldavis_data <- fromJSON(json_lda)
topic_order <- ldavis_data$topic.order
lda@gamma[order(topic_order), ]
lda@beta[, order(topic_order)]

受到这个github问题的启发-通过的不同主题模型包

然而,这完全破坏了我的LDA对象。

没有reprex/MWE(还没有;我可以链接一个.rds文件(-但输出的一瞥(lda(:

<snip>
..@ beta  :num [1:45, 1:333...]
..@ gamma :num [1:111..., 1:45]
</snip>

目前,我正在手动将ldavis主题映射到LDA((对象并行。

----编辑----

我找到了一个合理的权宜之计,几乎:我的进一步分析依赖于整洁。来自tidytext的LDA函数,这样我就可以添加主题术语映射的正确顺序,比如:


# terms to topics
tidy(lda, matrix = "beta") %>%
# probably unnecessary, but make sure we're in topic order
arrange(topic) %>%
# turn topics into a factor, with levels according to new order
mutate(topic = factor(topic, levels = topic_order) %>%
# group by new factor order
group_by(topic) %>%
# make the current group id the current topic
mutate(topic = cur_group_id()) %>%
# dont forget! had me scratching my head for a few minutes
ungroup
# documents to topics
tidy(lda, matrix = "gamma") %>%
arrange(topic) %>%
mutate(topic = factor(topic, levels = topic_order) %>%
group_by(topic) %>%
mutate(topic = cur_group_id()) %>%
ungroup

是的,也适用于文档映射。现在将它们折叠为一个函数;(

重新发布我的编辑作为答案,但我还不愿意接受它。

我得到了我想要的结果,当然;但不是我想要的方式。


我找到了一个合理的权宜之计,几乎:我的进一步分析依赖于tidytext中的tidy.LDA()函数,因此我可以添加主题术语映射的正确顺序,如下所示:


# terms to topics
tidy(lda, matrix = "beta") %>%
# probably unnecessary, but make sure we're in topic order
arrange(topic) %>%
# turn topics into a factor, with levels according to new order
mutate(topic = factor(topic, levels = topic_order) %>%
# group by new factor order
group_by(topic) %>%
# make the current group id the current topic
mutate(topic = cur_group_id()) %>%
# dont forget! had me scratching my head for a few minutes
ungroup
# documents to topics
tidy(lda, matrix = "gamma") %>%
arrange(topic) %>%
mutate(topic = factor(topic, levels = topic_order) %>%
group_by(topic) %>%
mutate(topic = cur_group_id()) %>%
ungroup

是的,也适用于文档映射。现在将它们折叠为一个函数;(

我认为order()是问题所在,而且我认为您试图在需要对行进行排序时对列进行排序,反之亦然。假设你已经从你的topicmodels方法中创建了一个很好的LDAvis,这应该可以让它们同步:

ldavis_data <- fromJSON(json_lda)
topic_order <- ldavis_data$topic.order
lda@gamma[,topic_order]
lda@beta[topic_order,] 

此外,如果您希望使用topicmodels包生成的模型在LDAvis中显示phi和theta数据,可以执行以下操作:

lda_posterior <- posterior(lda)
lda_theta <- lda_posterior $topics[,topic_order]
lda_phi <- lda_posterior $terms[topic_order,]

最新更新