我想在R.中按组查看组成员的流失/增长水平
我的数据:
year1 <-
tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
year1 <-
tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
- 第一组失去了Max,但获得了从第二组移出的Jane
- 第二组失去了简,但获得了穆罕默德
有没有办法了解每年有多少人加入/离开一个团队,以及每年的百分比变化?
也许有更容易的选择,但你可以这样做:
year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
year2 <- tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
library(tidyverse)
map(.x = unique(year1$group),
.f = ~ year1 |>
filter(group == .x) |>
mutate(year = 1) |>
bind_rows(year2 |>
filter(group == .x) |>
mutate(year = 2)) |>
summarize(group = unique(group),
joined = length(setdiff(people[year == 2], people[year == 1])),
left = length(setdiff(people[year == 1], people[year == 2])),
n_year1 = sum(year == 1),
n_year2 = sum(year == 2),
pct_change = n_year1 / n_year2)) |>
bind_rows()
# A tibble: 2 × 6
group joined left n_year1 n_year2 pct_change
<dbl> <int> <int> <int> <int> <dbl>
1 1 1 1 3 3 1
2 2 3 1 2 4 0.5
根据一些假设对代码进行了一些更改:
year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2), year = 1)
year2 <- tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2), year = 2)
years = year1 %>% bind_rows(year2)
years %>% group_by(group, year) %>% summarise(n = n()) %>% group_by(group) %>% mutate(pct_change = n/lag(n) - 1)
我假设您的第二个数据帧代表另一年,然后将两者绑定到一个数据帧中,并使用一个year
列来标识它所代表的年份。
输出:
group year n pct_change
<dbl> <dbl> <int> <dbl>
1 1 1 3 NA
2 1 2 3 0
3 2 1 2 NA
4 2 2 4 1
library(tidyverse)
year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
year2 <-tibble(people = c("Joe A", "Sam M", "Jane K","Doug K","Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
year1 %>%
group_by(group) %>%
summarise(n = n()) %>%
full_join(year2 %>%
group_by(group) %>%
summarise(n = n()), by = "group") %>%
mutate(change = n.y - n.x, percent_change = change / n.x) %>%
ungroup() %>%
select(group, n.x, n.y, change, percent_change) %>% print()
输出:(n.y=年份2,n.x=年份1(
# A tibble: 2 x 5
group n.x n.y change percent_change
<dbl> <int> <int> <int> <dbl>
1 1 3 3 0 0
2 2 2 4 2 1