将数据集与"USA_State"列一起使用可创建名为"USA_Region"的新列



我的R数据清理和操作技能非常缺乏,我意识到如果我想使用ggplot创建可视化,只需将这样的东西导出到Excel中,然后将其带回R中会更快。。。然而,我相信有一种相对简单/优雅的方法来处理这个问题。

我有一个数据集,它有一个列;USA_状态";指定每一行的状态(大多数单元格中只有一个状态,但少数单元格中列出了多个状态(。我想根据以下系统使用这些地区:东北部、中北部、南部、西部:https://nifa.usda.gov/efnep-where-you-live-partner-websites

有没有一种简单的方法可以制作一个名为";USA_ Region";将每一行置于这4个区域中;USA_状态";柱我假设我需要指定每个区域中的状态(创建一个包含区域和每个区域中每个状态的数据帧(并做一些事情(可能使用mutate(((?

感谢您的帮助!

一个非常简单的解决方案是从状态列表(您似乎已经有了(开始,然后添加一列,简单地列出USA_Region,如下所示:

states <- read_csv('https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv', col_select = 'State')
states$USA_Region = c("Southern", "Western", "Western", "Southern", "Western", "Western", "Northeast", "Northeast", "Northeast", "Southern", "Southern", "Western", "Western", "North Central", "North Central", "North Central", "North Central", "Southern",
"Southern", "North Central", "Western", "Western", "Western", "Northeast", "Northeast", "Western", "Northeast", "Southern", "North Central", "North Central", "Southern", "Western", "Northeast", "Northeast",
"North Central", "North Central", "Southern", "North Central", "Northeast", "Northeast", "Southern", "North Central", "Southern", "Southern", "Western", "North Central", "Southern", "Western",
"Southern", "North Central", "Western")

一旦你有了,做一个简单的检查,以确保列表是准确的:

states %>% 
print(n = 51)

那么使用dplyr动词就轻而易举了。例如,找到西方所有的州是这样的:

west <- states %>% 
filter(USA_Region == "Western")

最新更新