我试图从r中的组合数据集找到两列(sunshine_in_hours和AgeGroup_30_to_34)之间的相关性。然而,每次我尝试运行cor()函数时,我只是最终得到此错误:
Error in pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs", :
object 'AgeGroup_30_to_34' not found
下面是输出(头部)代码片段:
structure(list(Date = structure(c(18659, 18660, 18661, 18663,
18665, 18666, 18667, 18668, 18669, 18670, 18671, 18673, 18674,
18675, 18676, 18677, 18678, 18679, 18680, 18681, 18682, 18683,
18684, 18685, 18686, 18687, 18688, 18689, 18690, 18691), class = "Date"),
Year = c(2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
2021, 2021), Month = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3), AgeGroup_30_to_34 = c(0,
0, 0, 2, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0,
2, 0, 0, 1, 2, 0, 3, 0, 0, 0), Sunshine_in_hours = c(1.6,
3.4, 13.1, 8.9, 2, 1.7, 12.7, 11.6, 5.5, 5.6, 4.9, 9.2, 8.3,
11.9, 12.4, 12.4, 5.9, 0, 6.3, 8.5, 9.9, 8.7, 6.3, 1, 9.2,
6.3, 1.4, 2.1, 2.6, 3.6), City = c("Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne")), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
我试着运行代码:
Combined <- inner_join(covidS, weatherS, by = 'Date')%>%
mutate(Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Day = day(Date))%>%
select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
filter(City == 'Melbourne')%>%
cor(Sunshine_in_hours, AgeGroup_30_to_34 )
我试着查找教程如何做到这一点,但我一直遇到墙。如有任何帮助,不胜感激。
cor
接受两个输入,你给它3个,其中两个它不理解。试试这个:
Combined <- inner_join(covidS, weatherS, by = 'Date')%>%
mutate(Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Day = day(Date))%>%
select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
filter(City == 'Melbourne')
corr = cor(Combined$Sunshine_in_hours, Combined$AgeGroup_30_to_34 )
记住,当你使用管道时,你将最后一个对象作为你调用的函数的第一个参数。在这种情况下,您的代码相当于:
cor(inner_join(covidS, weatherS, by = 'Date')%>%
mutate(Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Day = day(Date))%>%
select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
filter(City == 'Melbourne'),
Sunshine_in_hours, AgeGroup_30_to_34 )
所以Sunshine_in_hours
和AgeGroup_30_to_34
都没有意义,如果函数不知道这些是来自另一个数据帧的列。问题是,这个函数是为基础R编写的,剩下的编程是dplyr
,它们是不同的范例。在有疑问的时候一定要检查文档
使用magrittr
公开管道%$%
代替%>%
,您可以这样做:
library(magrittr)
dat %$%
cor(Sunshine_in_hours, AgeGroup_30_to_34)
#> [1] -0.0006941058