使用collapse R软件包分组计数观察结果



我想将以下R代码从tidyverse转换为collapse。以下代码按组对观测值进行计数,并作为列附加到data.frame.

library(tidyverse)
library(collapse)
head(wlddev)
wlddev %>% 
group_by(income) %>% 
add_count(., name = "Size") %>% 
select(country, income, Size) %>% 
distinct()
# A tibble: 216 x 3
# Groups:   income [4]
country             income               Size
<chr>               <fct>               <int>
1 Afghanistan         Low income           1830
2 Albania             Upper middle income  3660
3 Algeria             Upper middle income  3660
4 American Samoa      Upper middle income  3660
5 Andorra             High income          4819
6 Angola              Lower middle income  2867
7 Antigua and Barbuda High income          4819
8 Argentina           Upper middle income  3660
9 Armenia             Upper middle income  3660
10 Aruba               High income          4819
# ... with 206 more rows

现在想用collapse R软件包完成同样的任务。

以下代码按预期工作。

wlddev %>%
fgroup_by(income) %>%
fselect(country) %>% 
fnobs()
income country
1         High income    4819
2          Low income    1830
3 Lower middle income    2867
4 Upper middle income    3660

但是,无法将该列附加到原始数据.frame.

wlddev %>%
fgroup_by(income) %>%
fselect(country) %>% 
fnobs() %>% 
ftransform(.data = wlddev, Size = .)
Error in ftransform_core(.data, e) : 
Lengths of replacements must be equal to nrow(.data) or 1, or NULL to delete columns

有什么提示吗。

找到了一个非常简单的解决方案:

wlddev %>% 
fmutate(Size = fnobs(income, income, TRA = "replace_fill"))  %>% 
fselect(country, income, Size) %>% 
funique()

与在原始数据中创建列的add_count不同,fnobs是一个汇总数据,我们可以将其加入

library(collapse)
wlddev %>% 
fgroup_by(income) %>%
fselect(country) %>%   
fnobs() %>% 
rename(size = country) %>% 
left_join(wlddev %>% 
slt(country, income), .) %>% 
distinct

因此,原则上fnobs计算非缺失值的数量,实际上并没有提供添加组计数的选项(我也想知道为什么这是必要的,我从来没有要求过(。然而,计数存在于可以使用GRP(.)检索的分组对象中。所以你可以创建一个函数:

gcount <- function(x) {
# Just turning some unnecessary things off in case we pass a plain vector
g <- GRP(x, sort = FALSE, return.groups = FALSE, call = FALSE) 
g$group.sizes[g$group.id]
}

然后我们可以做

wlddev %>% 
ftransform(Size = gcount(income)) %>%
fselect(country, income, Size) %>% 
funique(cols = 1) # Observations are uniquely identified by country
# or 
wlddev %>% 
fgroup_by(income) %>%
ftransform(Size = gcount(.)) %>%
fselect(country, income, Size) %>% 
fungroup() %>%
funique(cols = 1) 

当然,我们也可以使用fnobs:

wlddev %>% 
fgroup_by(income) %>%
fmutate(Size = fnobs(income)) %>%
fselect(country, income, Size) %>% 
fungroup() %>%
funique(cols = 1) 

但如果CCD_ 9包含缺失的值,则这可能会产生误导。注意(如文档中所述(ftransform是忽略分组的base::transform的较快版本,而fmutate是尊重分组的较快dplyr::mutate

如果您告诉我为什么需要将组计数作为数据帧中的变量,我可以考虑将gcount添加到下一个collapse版本中。

最新更新