我正在努力在我的数据帧中创建一个新的变量/列,其中包含每个名称每年wi=1的频率。
这是一个测试数据帧df = tibble::as_tibble(data.frame(Name=c("x","x","x","x","x", "y","y","y","y","y","y"), Year=c(2011,2011,2011,2012,2012,2011,2011,2012,2012,2012,2012), id=c(8,23, 1,5,7,25,83,6,2,9,10), wi =c(1,0,1,1,0,1,1,0,0,1,0)))
# A tibble: 11 × 4
Name Year id wi
<chr> <dbl> <dbl> <dbl>
1 x 2011 8 1
2 x 2011 23 0
3 x 2011 1 1
4 x 2012 5 1
5 x 2012 7 0
6 y 2011 25 1
7 y 2011 83 1
8 y 2012 6 0
9 y 2012 2 0
10 y 2012 9 1
11 y 2012 10 0
理想情况下,数据帧最终会是这样的:
df
# A tibble: 11 × 5
Name Year id wi freq_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.66
2 x 2011 23 0 0.66
3 x 2011 1 1 0.66
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
感谢所有的帮助!!
这里是另一个dplyr
解决方案:
library(dplyr)
df %>%
group_by(Name, Year) %>%
mutate(Count = ifelse(wi == 1, sum(wi), sum(wi)),
req_wi = Count/sum(Count)*Count) %>%
ungroup() %>%
select(-Count)
Name Year id wi req_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
如果wi始终为0或1,则可以执行以下操作(因为mean(wi(等于"wi〃的频率(
library(dplyr)
df %>%
group_by(Name, Year) %>%
summarise(freq_wi=mean(wi)) %>%
left_join(df, .)
Name Year id wi freq_wi
<fct> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25