r语言 - 对行值求和并创建新类别 - r - Sum row values and create new category 小贝子编程网

我正在处理一个人口数据框架，我有不同年龄和年龄组的信息，这些信息按五年为一组。一旦我过滤了我感兴趣的位置的信息，我有这个:

Location    age group   total90  total95  total00  total05  total10
A          0 to 4      10428    118902     76758   967938   205472
A          5 to 9      18530    238928    260331   277635   303180    
A         10 to 14    180428    208902    226758   267938   305472
A         15 to 19    185003    332089    242267   261793   135472

现在我想创建一个新的年龄组，像这样:

Location    age group       total90  total95    total00  total05    total10
A          5 to 14        198958   447830     487089    545573    608652
A           other         195431   450991     319025   1229731    340944

年龄组别5至14岁;是5到9的总和+ "10到14"每年&

"other"是"0到4"的总和。+ "15到19"每年

我试着选择有数字的列，这样我就可以添加每个年龄组的总数，并创建一个新的年龄组的行，但我不能以简单的方式添加行，我使事情更加复杂。我相信有一个简单的方法来解决这个问题，但我卡住了。

我不得不稍微改变你的虚拟数据(只是删除了一些空间，使从纯文本读取容易)，使其工作，而无需进一步的操作

df <- data.table::fread("Location    age_group   total90  total95  total00  total05  total10
A          0_to_4      10428    118902     76758   967938   205472
A          5_to_9      18530    238928    260331   277635   303180    
A         10_to_14    180428    208902    226758   267938   305472
A         15_to_19    185003    332089    242267   261793   135472")
library(tidyverse)
df %>%
# alter the character variable age_group reducing problem to one ifelse clause
dplyr::mutate(age_group = ifelse(age_group == "5_to_9" | age_group == "10_to_14", "5_to_14", "other")) %>%
# build grouping (I included Location but possibly your need is diferent)
dplyr::group_by(Location, age_group) %>%
# sum in one call all not grouped columns (therefore you have to remove Location in case you do not want it in the grouping
dplyr::summarize(across(everything(), ~sum(.x))) %>%
# ungrouping prevents unwanted behaviour down stream
dplyr::ungroup()
# A tibble: 2 x 7
Location age_group total90 total95 total00 total05 total10
<chr>    <chr>       <int>   <int>   <int>   <int>   <int>
1 A        5_to_14    198958  447830  487089  545573  608652
2 A        other      195431  450991  319025 1229731  340944

请看下面我的回答:

我的第一行读取了所显示的数据。

library(tidyverse)
#read in data
my_data <- read_csv("pop_data.csv")
#add extra tags
my_data1 <- my_data %>%
mutate(Category = c("other","5 to 14","5 to 14","other")) %>%
select(-`age group`)
#find numeric columns
numeric_col <- unlist(lapply(my_data1, is.numeric))  
#combine the data
my_data2 <- aggregate(my_data1[,numeric_col],
by = list(my_data1$Location, my_data1$Category),
FUN = sum) 
#rename first 2 columns
colnames(my_data2)[1:2] <- c("Location", "age group")

和结果:

Location age group total90 total95 total00 total05 total10
1        A   5 to 14  198958  447830  487089  545573  608652
2        A     other  195431  450991  319025 1229731  340944

使用dplyr解决方案

你可以先把mutate的age group列放入感兴趣的范围，然后summarise的across的感兴趣的列，用sum函数

library(dplyr)
df %>% mutate(`age group` = ifelse(`age group` %in% c(`5 to 9`, `10 to 14`), `5 to 14`, 'other') %>%
group_by(`age group`, location) %>% 
summarise(across(total90:total10), sum))%>%
ungroup()

为了完整起见-如果您想更改目标开始/结束年龄，这里有一种方法来参数化target_start和target_end:

library(tidyverse)
target_start <- 5
target_end <- 14
df %>% 
separate(`age group`, into = c("grp_start", "grp_end"), sep = " to ") %>% 
mutate(across(starts_with("grp"), as.numeric),
age_group = 
if_else(grp_start >= target_start & grp_end <= target_end,
glue::glue("{target_start} to {target_end}"), 
"other")
) %>% 
group_by(age_group, Location) %>% 
summarise(across(total90:total10, sum)) %>%
ungroup()

r语言 - 对行值求和并创建新类别

相关内容

最新更新

热门标签：