试图用一个关键词在R中对数据进行分类和浓缩


df <- read.csv("https://query.data.world/s/gzjmftivszsy44ukfak2e7ksig35jm", header=TRUE, stringsAsFactors=FALSE);
library(ggplot2)
library(qqplotr)
library(stats)
library(dplyr)

coverage_by_Geography = data.frame(avgcancerdiag= df$avgAnnCount, county = df$Geography, PubCoverage = df$PctPublicCoverage, privcoverage = df$PctPrivateCoverage, deathrt = df$avgDeathsPerYear)
ggplot(data = coverage_by_Geography, aes(x = privcoverage, y = deathrt))+geom_col()
ggplot(data = coverage_by_Geography, aes(x = PubCoverage, y = deathrt))+geom_col()

我试图在一列中取一堆县,将它们浓缩成州,并将它们的数据平均为州数而不是县数。我不知道该怎么做。

一般tidyverse解如下:

library(tidyverse)
df <- read_csv("https://query.data.world/s/gzjmftivszsy44ukfak2e7ksig35jm")
df %>%
separate(Geography, c("county", "state"), ", ") %>% 
select(state, county, everything()) %>% 
group_by(state) %>% 
summarize(across(-c(county), mean))

代码将县和州分成两列。按州分组允许您对数据进行汇总。这里我要求的是所有列的均值,但这可能不适用于所有数据类型。希望这能让你更接近你正在寻找的东西。