我创建了一个stm主题模型,我在summary.estimateEffect上有问题,我有大约150天的时间,但是,它只打印10天的回归估计。
parlPrevFit<- stm(document = out$documents, vocab = out$vocab, K = 0, prevalence =~s(day),
max.em.its = 150, data = out$meta, init.type = "Spectral")
prep<- estimateEffect(c(14, 40, 5, 41)~s(day), parlPrevFit, meta = meta, uncertainty = "Global")
summary(prep, topics = c(14, 40, 5, 41))
主题14系数- https://prnt.sc/105pg1a
有谁能推荐一些关于如何打印超过10天的建议吗?
不要使用您无法控制的summary()
,而是加载tidytext包并使用tidy()
。
让我们看一个例子,我们训练一个关于简·奥斯汀小说的主题模型,文档是每个章节:
library(tidyverse)
library(tidytext)
library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help.
#> Papers, resources, and other materials at structuraltopicmodel.com
library(janeaustenr)
books <- austen_books() %>%
group_by(book) %>%
mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
ungroup() %>%
filter(chapter > 0) %>%
unite(document, book, chapter, remove = FALSE)
austen_sparse <- books %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
count(document, word) %>%
cast_sparse(document, word, n)
#> Joining, by = "word"
让我们用6个主题(有6本书)训练一个主题模型:
topic_model <- stm(
austen_sparse,
K = 6,
init.type = "Spectral",
verbose = FALSE
)
让我们创建一个estimateEffect()
使用的数据集:
chapters <- books %>%
group_by(document) %>%
summarize(text = str_c(text, collapse = " ")) %>%
ungroup() %>%
inner_join(books %>%
distinct(document, book))
#> Joining, by = "document"
chapters
#> # A tibble: 269 x 3
#> document text book
#> <chr> <chr> <fct>
#> 1 Emma_1 "CHAPTER I Emma Woodhouse, handsome, clever, and rich, with… Emma
#> 2 Emma_10 "CHAPTER X Though now the middle of December, there had yet… Emma
#> 3 Emma_11 "CHAPTER XI Mr. Elton must now be left to himself. It was n… Emma
#> 4 Emma_12 "CHAPTER XII Mr. Knightley was to dine with them--rather ag… Emma
#> 5 Emma_13 "CHAPTER XIII There could hardly be a happier creature in t… Emma
#> 6 Emma_14 "CHAPTER XIV Some change of countenance was necessary for e… Emma
#> 7 Emma_15 "CHAPTER XV Mr. Woodhouse was soon ready for his tea; and w… Emma
#> 8 Emma_16 "CHAPTER XVI The hair was curled, and the maid sent away, a… Emma
#> 9 Emma_17 "CHAPTER XVII Mr. and Mrs. John Knightley were not detained… Emma
#> 10 Emma_18 "CHAPTER XVIII Mr. Frank Churchill did not come. When the t… Emma
#> # … with 259 more rows
现在让我们从我们的主题模型中估计回归,对于我们的前三个主题和我们的"章节"数据集。文件:
effects <- estimateEffect(1:3 ~ book, topic_model, chapters)
summary(effects)
#>
#> Call:
#> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)
#>
#>
#> Topic 1:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.018033 0.023726 0.760 0.448
#> bookPride & Prejudice 0.799555 0.037140 21.528 <2e-16 ***
#> bookMansfield Park -0.006387 0.032662 -0.196 0.845
#> bookEmma 0.003188 0.033393 0.095 0.924
#> bookNorthanger Abbey 0.002535 0.039017 0.065 0.948
#> bookPersuasion 0.025725 0.044281 0.581 0.562
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Topic 2:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.015289 0.016478 0.928 0.354
#> bookPride & Prejudice 0.001785 0.023489 0.076 0.939
#> bookMansfield Park 0.001616 0.024664 0.066 0.948
#> bookEmma 0.892516 0.037833 23.591 <2e-16 ***
#> bookNorthanger Abbey 0.006032 0.031530 0.191 0.848
#> bookPersuasion -0.001142 0.030052 -0.038 0.970
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Topic 3:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.0196151 0.0225115 0.871 0.3844
#> bookPride & Prejudice -0.0004909 0.0286302 -0.017 0.9863
#> bookMansfield Park 0.0148960 0.0341272 0.436 0.6628
#> bookEmma -0.0004006 0.0301741 -0.013 0.9894
#> bookNorthanger Abbey 0.8730570 0.0457994 19.063 <2e-16 ***
#> bookPersuasion 0.1030537 0.0495148 2.081 0.0384 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
这个例子没有您提到的打印限制的问题,但是您可以通过使用tidy()
来避免任何类似的问题,在那里您可以获得回归的实际内容:
tidy(effects)
#> # A tibble: 18 x 6
#> topic term estimate std.error statistic p.value
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 (Intercept) 0.0179 0.0238 0.753 4.52e- 1
#> 2 1 bookPride & Prejudice 0.799 0.0373 21.4 1.09e-59
#> 3 1 bookMansfield Park -0.00614 0.0325 -0.189 8.50e- 1
#> 4 1 bookEmma 0.00350 0.0336 0.104 9.17e- 1
#> 5 1 bookNorthanger Abbey 0.00323 0.0394 0.0820 9.35e- 1
#> 6 1 bookPersuasion 0.0253 0.0443 0.571 5.68e- 1
#> 7 2 (Intercept) 0.0153 0.0165 0.925 3.56e- 1
#> 8 2 bookPride & Prejudice 0.00165 0.0234 0.0707 9.44e- 1
#> 9 2 bookMansfield Park 0.00167 0.0246 0.0680 9.46e- 1
#> 10 2 bookEmma 0.892 0.0381 23.4 2.84e-66
#> 11 2 bookNorthanger Abbey 0.00606 0.0317 0.191 8.49e- 1
#> 12 2 bookPersuasion -0.00107 0.0298 -0.0359 9.71e- 1
#> 13 3 (Intercept) 0.0197 0.0228 0.864 3.89e- 1
#> 14 3 bookPride & Prejudice -0.000835 0.0288 -0.0290 9.77e- 1
#> 15 3 bookMansfield Park 0.0147 0.0342 0.428 6.69e- 1
#> 16 3 bookEmma -0.000707 0.0305 -0.0232 9.82e- 1
#> 17 3 bookNorthanger Abbey 0.873 0.0461 18.9 4.93e-51
#> 18 3 bookPersuasion 0.103 0.0496 2.08 3.85e- 2
由reprex包(v1.0.0)创建于2021-02-26