所以,我有两个数据帧,原始数据帧和由提取的原始行然后更改其中一列中的值组成的数据帧。
它们都有相同的列数 (10(,但原始列比第二个大。两者的数据类型相同,我需要替换由因子组成的一列中的值。我尝试过left_join合并,但我遇到了错误,这可能是我的错误,但我不知道我做错了什么,因为我仍然没有正确理解 R。
例如,数据帧 1:
- ALB Med
- SKJ均衡器
- ALB 环流
- 流浪汉情商
- WHM Trans
- 长叶均衡器
数据帧 2:
- ALB North Atl
- 流浪汉南阿特尔
- WHM环流
- YFT环流
我想得到的帽子:
- ALB Med
- SKJ均衡器
- ALB北阿特尔
- 流浪汉南阿特尔
- WHM环流
- YFT环流
摘自 R(数据集 1(的原始数据:
> print(catchesbyPPOW[1:10,])
Species Long Lat tCatch_sqrt ECOREGION REALM PROVINC TYPE
1 ALB 17.5 -57.5 0.5099020 <NA> Southern Cold Water Antarctic PPOW
2 YFT 17.5 -57.5 0.2812472 <NA> Southern Cold Water Antarctic PPOW
3 BFT -67.5 -52.5 2.9238673 Patagonian Shelf Temperate South America Magellanic MEOW
4 BFT -62.5 -52.5 3.2256782 <NA> Atlantic Warm Water Malvinas Current PPOW
5 ALB -52.5 -52.5 0.2323575 <NA> Southern Cold Water Subantarctic PPOW
6 SWO -52.5 -52.5 0.9996549 <NA> Southern Cold Water Subantarctic PPOW
7 ALB -32.5 -52.5 0.4097926 <NA> Southern Cold Water Antarctic PPOW
8 BET -32.5 -52.5 1.4336387 <NA> Southern Cold Water Antarctic PPOW
9 SWO -32.5 -52.5 1.2541730 <NA> Southern Cold Water Antarctic PPOW
10 YFT -32.5 -52.5 1.2215236 <NA> Southern Cold Water Antarctic PPOW
BIOME optional
1 Polar TRUE
2 Polar TRUE
3 <NA> TRUE
4 Boundary - western TRUE
5 Polar TRUE
6 Polar TRUE
7 Polar TRUE
8 Polar TRUE
9 Polar TRUE
10 Polar TRUE
数据集 2:
> print(outliers[1:10,])
Species Long Lat tCatch_sqrt ECOREGION REALM TYPE BIOME optional
3 BFT -67.5 -52.5 2.9238673 Patagonian Shelf Temperate South America MEOW <NA> TRUE
39 SWO -62.5 -42.5 0.6316645 North Patagonian Gulfs Temperate South America MEOW <NA> TRUE
130 ALB -57.5 -37.5 7.6342489 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
131 BET -57.5 -37.5 0.8367258 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
132 BUM -57.5 -37.5 0.5127475 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
133 SAI -57.5 -37.5 1.3915028 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
134 SKJ -57.5 -37.5 1.2453915 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
135 SWO -57.5 -37.5 2.4453357 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
136 WHM -57.5 -37.5 0.2320991 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
137 YFT -57.5 -37.5 2.2360680 Uruguay-Buenos Aires Shelf Temperate South America MEOW <NA> TRUE
PROVINC
3 Malvinas Current
39 Malvinas Current
130 Malvinas Current
131 Malvinas Current
132 Malvinas Current
133 Malvinas Current
134 Malvinas Current
135 Malvinas Current
136 Malvinas Current
137 Malvinas Current
我删除了失败的尝试,我只有我尝试过的最新尝试left_join:
PPOWoutliers<-left_join(catchesbyPPOW, outliers, by = NULL)
这给了我这个警告:
Joining, by = c("Species", "Long", "Lat", "tCatch_sqrt", "ECOREGION", "REALM", "PROVINC", "TYPE", "BIOME", "optional")
Warning message:
In left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) :
joining factors with different levels, coercing to character vector
已编辑以反映您在两个数据集上的列可能不相同
合并不考虑行名。
试试这个:
columns_to_replace <- c("ECOREGION","REALM","TYPE")
dfnew <- df1
dfnew[as.numeric(rownames(df2)),columns_to_replace] <- df2[,columns_to_replace]
或者,如果两个数据集上的列相同,则如下所示:
dfnew[as.numeric(rownames(df2)),] <- df2
非常感谢您的帮助! 最后,我找到了一种方法。 我用我想在 dataset2 中使用不同名称更改的变量对数据集进行了left_join。然后我做了一个 for 循环,用原始列中的值替换所有生成的 NA,它起作用了。
library(dplyr)
PPOWoutliers <-left_join(catchesbyPPOW, outliers, by = NULL)
summary(PPOWoutliers)
for (i in 1:2448){
if (is.na(PPOWoutliers[i,11])==TRUE){
PPOWoutliers[i,12] <- as.character(PPOWoutliers[i,7])}
else{
PPOWoutliers[i,12] <- as.character(PPOWoutliers[i,11])}
}