我在R中有一个数据框,如下所示:
ID REGION FACTOR
01 north 1
02 north 1
03 north 0
04 south 1
05 south 1
06 south 1
07 south 0
08 south 0
我想创建一个列,其中包含按"区域"划分的行数并按某个因素(factor==1(过滤。
我知道如何计算值,但我找不到具有此输出的函数:
ID REGION FACTOR COUNT
01 north 1 2
02 north 1 2
03 north 0 2
04 south 1 3
05 south 1 3
06 south 1 3
07 south 0 3
08 south 0 3
有人可以帮助我吗?
我们可以使用add_count
library(dplyr)
df1 %>%
add_count(REGION)
如果是sum
因素
df1 %>%
group_by(REGION) %>%
mutate(COUNT = sum(FACTOR))
#or use
# mutate(COUNT = sum(FACTOR != 0))
# A tibble: 8 x 4
# Groups: REGION [2]
# ID REGION FACTOR COUNT
# <int> <chr> <int> <int>
#1 1 north 1 2
#2 2 north 1 2
#3 3 north 0 2
#4 4 south 1 3
#5 5 south 1 3
#6 6 south 1 3
#7 7 south 0 3
#8 8 south 0 3
或使用"数据表">
library(data.table)
setDT(df1)[, COUNT := sum(FACTOR), by = REGION]
数据
df1 <- structure(list(ID = 1:8, REGION = c("north", "north", "north",
"south", "south", "south", "south", "south"), FACTOR = c(1L,
1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-8L))
一个使用ave
的基本 R 解决方案,即:
dfout <- within(df, COUNT <- ave(FACTOR,REGION, FUN = sum))
这样
> dfout
ID REGION FACTOR COUNT
1 1 north 1 2
2 2 north 1 2
3 3 north 0 2
4 4 south 1 3
5 5 south 1 3
6 6 south 1 3
7 7 south 0 3
8 8 south 0 3
数据
df <- structure(list(ID = 1:8, REGION = c("north", "north", "north",
"south", "south", "south", "south", "south"), FACTOR = c(1L,
1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-8L))
group_by
区域,然后创建(mutate
(一个名为count的新列,这是每组观测值的总和,n()
:
library(tidyverse)
group_by(df, region) %>%
mutate(count = n()) %>%
ungroup()
您希望在最后ungroup()
,以便将来的计算不会在分组级别进行。