我有一个LDA_Gibbs主题模型,来自topicmodels
库。我还有一个LDAvis交互式可视化。
我的问题是;LDA对象和LDAvis中的主题顺序不相同。
我想把其中一个映射到另一个(不管是哪一个(。到目前为止我还没有工作的方法:
ldavis_data <- fromJSON(json_lda)
topic_order <- ldavis_data$topic.order
lda@gamma[order(topic_order), ]
lda@beta[, order(topic_order)]
受到这个github问题的启发-通过的不同主题模型包
然而,这完全破坏了我的LDA对象。
没有reprex/MWE(还没有;我可以链接一个.rds文件(-但输出的一瞥(lda(:
<snip>
..@ beta :num [1:45, 1:333...]
..@ gamma :num [1:111..., 1:45]
</snip>
目前,我正在手动将ldavis主题映射到LDA((对象并行。
----编辑----
我找到了一个合理的权宜之计,几乎:我的进一步分析依赖于整洁。来自tidytext的LDA函数,这样我就可以添加主题术语映射的正确顺序,比如:
# terms to topics
tidy(lda, matrix = "beta") %>%
# probably unnecessary, but make sure we're in topic order
arrange(topic) %>%
# turn topics into a factor, with levels according to new order
mutate(topic = factor(topic, levels = topic_order) %>%
# group by new factor order
group_by(topic) %>%
# make the current group id the current topic
mutate(topic = cur_group_id()) %>%
# dont forget! had me scratching my head for a few minutes
ungroup
# documents to topics
tidy(lda, matrix = "gamma") %>%
arrange(topic) %>%
mutate(topic = factor(topic, levels = topic_order) %>%
group_by(topic) %>%
mutate(topic = cur_group_id()) %>%
ungroup
是的,也适用于文档映射。现在将它们折叠为一个函数;(
重新发布我的编辑作为答案,但我还不愿意接受它。
我得到了我想要的结果,当然;但不是我想要的方式。
我找到了一个合理的权宜之计,几乎:我的进一步分析依赖于tidytext中的tidy.LDA()
函数,因此我可以添加主题术语映射的正确顺序,如下所示:
# terms to topics
tidy(lda, matrix = "beta") %>%
# probably unnecessary, but make sure we're in topic order
arrange(topic) %>%
# turn topics into a factor, with levels according to new order
mutate(topic = factor(topic, levels = topic_order) %>%
# group by new factor order
group_by(topic) %>%
# make the current group id the current topic
mutate(topic = cur_group_id()) %>%
# dont forget! had me scratching my head for a few minutes
ungroup
# documents to topics
tidy(lda, matrix = "gamma") %>%
arrange(topic) %>%
mutate(topic = factor(topic, levels = topic_order) %>%
group_by(topic) %>%
mutate(topic = cur_group_id()) %>%
ungroup
是的,也适用于文档映射。现在将它们折叠为一个函数;(
我认为order()
是问题所在,而且我认为您试图在需要对行进行排序时对列进行排序,反之亦然。假设你已经从你的topicmodels方法中创建了一个很好的LDAvis,这应该可以让它们同步:
ldavis_data <- fromJSON(json_lda)
topic_order <- ldavis_data$topic.order
lda@gamma[,topic_order]
lda@beta[topic_order,]
此外,如果您希望使用topicmodels包生成的模型在LDAvis中显示phi和theta数据,可以执行以下操作:
lda_posterior <- posterior(lda)
lda_theta <- lda_posterior $topics[,topic_order]
lda_phi <- lda_posterior $terms[topic_order,]