我想将以下R代码从tidyverse
转换为collapse
。以下代码按组对观测值进行计数,并作为列附加到data.frame.
library(tidyverse)
library(collapse)
head(wlddev)
wlddev %>%
group_by(income) %>%
add_count(., name = "Size") %>%
select(country, income, Size) %>%
distinct()
# A tibble: 216 x 3
# Groups: income [4]
country income Size
<chr> <fct> <int>
1 Afghanistan Low income 1830
2 Albania Upper middle income 3660
3 Algeria Upper middle income 3660
4 American Samoa Upper middle income 3660
5 Andorra High income 4819
6 Angola Lower middle income 2867
7 Antigua and Barbuda High income 4819
8 Argentina Upper middle income 3660
9 Armenia Upper middle income 3660
10 Aruba High income 4819
# ... with 206 more rows
现在想用collapse R
软件包完成同样的任务。
以下代码按预期工作。
wlddev %>%
fgroup_by(income) %>%
fselect(country) %>%
fnobs()
income country
1 High income 4819
2 Low income 1830
3 Lower middle income 2867
4 Upper middle income 3660
但是,无法将该列附加到原始数据.frame.
wlddev %>%
fgroup_by(income) %>%
fselect(country) %>%
fnobs() %>%
ftransform(.data = wlddev, Size = .)
Error in ftransform_core(.data, e) :
Lengths of replacements must be equal to nrow(.data) or 1, or NULL to delete columns
有什么提示吗。
找到了一个非常简单的解决方案:
wlddev %>%
fmutate(Size = fnobs(income, income, TRA = "replace_fill")) %>%
fselect(country, income, Size) %>%
funique()
与在原始数据中创建列的add_count
不同,fnobs
是一个汇总数据,我们可以将其加入
library(collapse)
wlddev %>%
fgroup_by(income) %>%
fselect(country) %>%
fnobs() %>%
rename(size = country) %>%
left_join(wlddev %>%
slt(country, income), .) %>%
distinct
因此,原则上fnobs
计算非缺失值的数量,实际上并没有提供添加组计数的选项(我也想知道为什么这是必要的,我从来没有要求过(。然而,计数存在于可以使用GRP(.)
检索的分组对象中。所以你可以创建一个函数:
gcount <- function(x) {
# Just turning some unnecessary things off in case we pass a plain vector
g <- GRP(x, sort = FALSE, return.groups = FALSE, call = FALSE)
g$group.sizes[g$group.id]
}
然后我们可以做
wlddev %>%
ftransform(Size = gcount(income)) %>%
fselect(country, income, Size) %>%
funique(cols = 1) # Observations are uniquely identified by country
# or
wlddev %>%
fgroup_by(income) %>%
ftransform(Size = gcount(.)) %>%
fselect(country, income, Size) %>%
fungroup() %>%
funique(cols = 1)
当然,我们也可以使用fnobs
:
wlddev %>%
fgroup_by(income) %>%
fmutate(Size = fnobs(income)) %>%
fselect(country, income, Size) %>%
fungroup() %>%
funique(cols = 1)
但如果CCD_ 9包含缺失的值,则这可能会产生误导。注意(如文档中所述(ftransform
是忽略分组的base::transform
的较快版本,而fmutate
是尊重分组的较快dplyr::mutate
。
如果您告诉我为什么需要将组计数作为数据帧中的变量,我可以考虑将gcount
添加到下一个collapse版本中。