我想计算百分比共享并使用突变创建新列。我有以下数据:
country, metric, segment, value1990, value2000, value2010
canada, abc, rural, 10, 15, 16
canada, abc, urban, 12, 12, 18
canada, abc, total, 22, 27, 34
canada, xyz, rural, 6, 9, 10
canada, xyc, urban, 7, 8, 8
canada, xyc, total, 13, 17, 18
canada, population, rural, 80, 86, 95
canada, population, urban, 102, 110, 121
canada, population, total, 182, 196, 216
数据框架组成了来自几个国家和几年来的数据。我想创建一个带有以下值的新列
country, metric, segment, value, percent1990, percent2000, percent2010
canada, abc, rural, 10, 15, 16, 12.5%, 17.4%, 16.8%
canada, abc, urban, 12, 12, 18, 11.7%, 10.9%, 14.8%
canada, abc, total, 22, 27, 34, 12.1%, 13.7%, 15.7%
canada, xyz, rural, 6, 9, 10, 7.5%, 10.4%, 10.5%
canada, xyc, urban, 7, 8, 8, 6.8%, 7.2%, 6.6%
canada, xyc, total, 13, 17, 18, 7.22%, 8.6%, 8.3%
canada, population, rural, 80, 86, 95, 100%, 100%, 100%
canada, population, urban, 102, 110, 121, 100%, 100%, 100%
canada, population, total, 182, 196, 216, 100%, 100%, 100%
本质上,我想根据多年来的农村/城市/总数计算价值变量的人口百分比。
例如。(第1行(percent_share = (10/80)*100 = 12.5%
(第2行(percent_share = (10/102)*100 = 11.76%
(第3行(percent_share = (10/182)*100 = 12.09%
我无法超越group_by
链接以确定如何输入必要的功能
df = df %>%
group_by (country, metric) %>%
mutate(...)
编辑:对于包含年份的新问题
如果您将年份和总人口移至新专栏,这将更容易。这是这样做的一种方法。
假设您的示例数据位于名为df1
的数据框架中:第一个gather
年。
library(dplyr)
library(tidyr)
df1 <- df1 %>% gather(Year, Value, 4:6)
然后过滤metric
== population
并加入原始数据。
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y)
结果:
country segment Year population metric value
1 canada rural value1990 80 abc 10
2 canada rural value1990 80 xyz 6
3 canada urban value1990 102 abc 12
4 canada urban value1990 102 xyc 7
5 canada total value1990 182 abc 22
6 canada total value1990 182 xyc 13
7 canada rural value2000 86 abc 15
8 canada rural value2000 86 xyz 9
9 canada urban value2000 110 abc 12
10 canada urban value2000 110 xyc 8
11 canada total value2000 196 abc 27
12 canada total value2000 196 xyc 17
13 canada rural value2010 95 abc 16
14 canada rural value2010 95 xyz 10
15 canada urban value2010 121 abc 18
16 canada urban value2010 121 xyc 8
17 canada total value2010 216 abc 34
18 canada total value2010 216 xyc 18
然后添加一个突变:
df1 %>% filter(metric == "population") %>%
left_join(filter(df1, metric != "population"),
by = c("country", "segment", "Year")) %>%
select(country, segment, Year, population = Value.x, metric = metric.y, value = Value.y) %>%
mutate(percent_share = 100 * (value / population))
结果:
country segment Year population metric value percent_share
1 canada rural value1990 80 abc 10 12.500000
2 canada rural value1990 80 xyz 6 7.500000
3 canada urban value1990 102 abc 12 11.764706
4 canada urban value1990 102 xyc 7 6.862745
5 canada total value1990 182 abc 22 12.087912
6 canada total value1990 182 xyc 13 7.142857
7 canada rural value2000 86 abc 15 17.441860
8 canada rural value2000 86 xyz 9 10.465116
9 canada urban value2000 110 abc 12 10.909091
10 canada urban value2000 110 xyc 8 7.272727
11 canada total value2000 196 abc 27 13.775510
12 canada total value2000 196 xyc 17 8.673469
13 canada rural value2010 95 abc 16 16.842105
14 canada rural value2010 95 xyz 10 10.526316
15 canada urban value2010 121 abc 18 14.876033
16 canada urban value2010 121 xyc 8 6.611570
17 canada total value2010 216 abc 34 15.740741
18 canada total value2010 216 xyc 18 8.333333
您也可以按segment
进行分组,然后除以max(value
(,因为人口值应该是最大的:
df %>%
group_by(country, segment) %>%
mutate(percent_share = value / max(value))
# A tibble: 9 x 5
# Groups: segment [3]
country metric segment value percent_share
<chr> <chr> <chr> <dbl> <dbl>
1 canada abc rural 10 0.125
2 canada abc urban 12 0.118
3 canada abc total 22 0.121
4 canada xyz rural 6 0.075
5 canada xyc urban 7 0.0686
6 canada xyc total 13 0.0714
7 canada population rural 80 1
8 canada population urban 102 1
9 canada population total 182 1