仅部分地转换数据帧的某些元素,条件是匹配两个特定变量



如何仅将数据帧的一部分转置到受数据帧中某些单元格的某些特定值限制的同一数据帧的另一部分?

我有一个R中的数据帧,其结构如下:

str(df)
# 'data.frame': 20 obs. of  11 variables:
#  $ Code1    : int  1 1 1 1 2 2 3 3 4 4 ...
#  $ SpeciesG1: chr  "Sp1" "Sp1" "Sp1" "Sp1" ...
#  $ Family   : chr  "Fam1" "Fam2" "Fam3" "Fam4" ...
#  $ Pct_feed : num  55.4 12.56 1.35 30.69 62.93 ...
#  $ Code2    : int  0 0 0 0 0 0 0 0 0 0 ...
#  $ SpeciesG2: chr  "0" "0" "0" "0" ...
#  $ Fam1     : int  0 0 0 0 0 0 0 0 0 0 ...
#  $ Fam2     : int  0 0 0 0 0 0 0 0 0 0 ...
#  $ Fam3     : int  0 0 0 0 0 0 0 0 0 0 ...
#  $ Fam4     : int  0 0 0 0 0 0 0 0 0 0 ...
#  $ Fam5     : int  0 0 0 0 0 0 0 0 0 0 ...

数据如下(这只是几百行长的df的一个小样本(:

df
#    Code1 SpeciesG1 Family Pct_feed Code2 SpeciesG2 Fam1 Fam2 Fam3 Fam4 Fam5
# 1      1       Sp1   Fam1    55.40     0         0    0    0    0    0    0
# 2      1       Sp1   Fam2    12.56     0         0    0    0    0    0    0
# 3      1       Sp1   Fam3     1.35     0         0    0    0    0    0    0
# 4      1       Sp1   Fam4    30.69     0         0    0    0    0    0    0
# 5      2       Sp1   Fam2    62.93     0         0    0    0    0    0    0
# 6      2       Sp1   Fam3    37.07     0         0    0    0    0    0    0
# 7      3       Sp2   Fam4    17.49     0         0    0    0    0    0    0
# 8      3       Sp2   Fam5    82.51     0         0    0    0    0    0    0
# 9      4       Sp2   Fam1    36.97     0         0    0    0    0    0    0
# 10     4       Sp2   Fam2    51.46     0         0    0    0    0    0    0
# 11     4       Sp2   Fam3    11.57     0         0    0    0    0    0    0
# 12     5       Sp3   Fam1    41.81     0         0    0    0    0    0    0
# 13     5       Sp3   Fam2     9.64     0         0    0    0    0    0    0
# 14     5       Sp3   Fam3    31.74     0         0    0    0    0    0    0
# 15     5       Sp3   Fam4     5.12     0         0    0    0    0    0    0
# 16     5       Sp3   Fam5    11.69     0         0    0    0    0    0    0
# 17     5       Sp4   Fam2    41.16     0         0    0    0    0    0    0
# 18     5       Sp4   Fam3    40.04     0         0    0    0    0    0    0
# 19     5       Sp4   Fam4     4.32     0         0    0    0    0    0    0
# 20     5       Sp4   Fam5    14.48     0         0    0    0    0    0    0

我需要根据以下三步程序替换7-11列("Fam1"…"Fam5"(的一些零值:

步骤1:对于所有可能的"Code1"one_answers"SpeciesG1"对,我希望Pct_feed值替换第7列中与**系列名称列匹配的零值**→11(即"Fam1"、"Fam2"、"Fam3"、"Fam4"one_answers"Fam5"(,但水平扩展这些值(或换位,即同一行中的所有值(。

步骤2:在步骤1中创建的同一单行中,"Code1"one_answers"SpeciesG1"的值应替换列"Code2"one_answers"SpeciesG2"下的"0"值。

步骤3:除在步骤1中创建的单行外,所有具有相同值对"Code1"one_answers"SpeciesG1"的行都应删除。

应用于df的这个三步操作的最终结果应该如下所示:

#Code1   SpeciesG1   Family  Pct_feed   Code2   SpeciesG2    Fam1    Fam2    Fam3    Fam4    Fam5
#1        Sp1        Fam1     55.40     1       Sp1          55.40   12.56  1.35     30.69    0
#2        Sp1        Fam2     62.93     2       Sp1          0       62.93  37.07    0        0
#3        Sp2        Fam4     6.11      3       Sp2          0       0      0        17.49    82.51
#4        Sp2        Fam1     36.97     4       Sp2          36.97   51.46  11.57    0        0
#5        Sp3        Fam1     41.81     5       Sp3          41.81   9.64   31.74    5.12     11.69
#5        Sp4        Fam2     41.16     5       Sp4          0       41.16  40.04    4.32     14.48

注1:作为检查,"Code1"one_answers"SpeciesG1"的给定值对的所有Pct_feed值之和应为100。

注2:我已经研究了dplyr软件包,但我找不到获得所需结果的方法。

在您的输出中,Code2中的值与Code1中的值相同,SpeciesG2SpeciesG1中的值也相同。尽管如此,这里有一段代码可以获得确切的预期输出。

library(dplyr)
library(tidyr)
df %>%
mutate(Code2 = Code1, SpeciesG2 = SpeciesG1, 
val = Pct_feed, col = Family) %>%
group_by(Code1, SpeciesG1) %>%
mutate(across(c(Pct_feed, Family), first)) %>%
ungroup%>%
pivot_wider(names_from = col, values_from = val, values_fill = 0)
#  Code1 SpeciesG1 Family Pct_feed Code2 SpeciesG2  Fam1  Fam2  Fam3  Fam4  Fam5
#  <int> <chr>     <chr>     <dbl> <int> <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1 Sp1       Fam1       55.4     1 Sp1        55.4 12.6   1.35 30.7    0  
#2     2 Sp1       Fam2       62.9     2 Sp1         0   62.9  37.1   0      0  
#3     3 Sp2       Fam4       17.5     3 Sp2         0    0     0    17.5   82.5
#4     4 Sp2       Fam1       37.0     4 Sp2        37.0 51.5  11.6   0      0  
#5     5 Sp3       Fam1       41.8     5 Sp3        41.8  9.64 31.7   5.12  11.7
#6     5 Sp4       Fam2       41.2     5 Sp4         0   41.2  40.0   4.32  14.5

数据

不要使用0值对列进行预初始化。只保留包含数据的列。

df <- structure(list(Code1 = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), SpeciesG1 = c("Sp1", 
"Sp1", "Sp1", "Sp1", "Sp1", "Sp1", "Sp2", "Sp2", "Sp2", "Sp2", 
"Sp2", "Sp3", "Sp3", "Sp3", "Sp3", "Sp3", "Sp4", "Sp4", "Sp4", 
"Sp4"), Family = c("Fam1", "Fam2", "Fam3", "Fam4", "Fam2", "Fam3", 
"Fam4", "Fam5", "Fam1", "Fam2", "Fam3", "Fam1", "Fam2", "Fam3", 
"Fam4", "Fam5", "Fam2", "Fam3", "Fam4", "Fam5"), Pct_feed = c(55.4, 
12.56, 1.35, 30.69, 62.93, 37.07, 17.49, 82.51, 36.97, 51.46, 
11.57, 41.81, 9.64, 31.74, 5.12, 11.69, 41.16, 40.04, 4.32, 14.48
)), row.names = c(NA, -20L), class = "data.frame")

最新更新