library(tidyverse)
set.seed(12345)
dat <- data.frame(year = c(rep(1990, 100), rep(1991, 100), rep(1992, 100)),
                  fish_length = sample(x = seq(from = 10, 131, by = 0.1), 300, replace = F),
                  nb_caught = sample(x = seq(from = 1, 200, by = 0.1), 300, replace = T), 
                  stringsAsFactors = F) %>% 
       mutate(age = ifelse(fish_length < 20, 1,
                           ifelse(fish_length >= 20 & fish_length < 100, 2, 
                                  ifelse(fish_length >= 100 & fish_length < 130, 3, 4)))) %>% 
       arrange(year, fish_length) 
head(dat)
  year fish_length nb_caught age
1 1990        10.1      45.2   1
2 1990        10.7     170.0   1
3 1990        10.9      62.0   1
4 1990        12.1     136.0   1
5 1990        14.1      80.8   1
6 1990        15.0     188.9   1
dat %>% group_by(year) %>% summarise(ages = n_distinct(age)) # Only 1992 has age 4 fish
# A tibble: 3 x 2
   year  ages
  <dbl> <int>
1  1990     3
2  1991     3
3  1992     4
dat %>% filter(age == 4) # only 1 row for age 4
  year fish_length nb_caught age
1 1992       130.8      89.2   4

此处：

年份=采样年份
fish_length=鱼的长度，单位为厘米
nb_caut=使用年龄长度键后捕获的鱼的数量，从而解释小数的存在
年龄=鱼的年龄

`graph1`:geom_小提琴不使用重量美学

在这里，我可以根据nb_caut中的值复制dat的每一行。

dim(dat) # 300 rows
dat_graph1 <- dat[rep(1:nrow(dat), floor(dat$nb_caught)), ]
dim(dat_graph1) # 30932 rows
dat_graph1$nb_caught <- NULL # useless now
sum(dat$nb_caught) - nrow(dat_graph1) # 128.2 rows lost here

由于我有nb_cauct的十进制值，所以我使用整数值来创建dat_graph1。在这个过程中，我丢了128.2行。

现在来看图表：

dat_tile <- data.frame(year = sort(unique(dat$year))[sort(unique(dat$year)) %% 2 == 0]) 
# for the figure's background
graph1 <- ggplot(data = dat_graph1, 
                 aes(x = as.factor(year), y = fish_length, fill = as.factor(age), 
                     color = as.factor(age), .drop = F)) +
          geom_tile(data = dat_tile, aes(x = factor(year), y = 1, height = Inf, width = 1),
                    fill = "grey80", inherit.aes = F) +
          geom_violin(draw_quantiles = c(0.05, 0.5, 0.95), color = "black",
                      scale = "width", position = "dodge") + 
          scale_x_discrete(expand = c(0,0)) +
          labs(x = "Year", y = "Fish length", fill = "Age", color = "Age", title = "graph1") +
          scale_fill_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
          scale_color_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
          scale_y_continuous(expand = expand_scale(mult = 0.01)) +
          theme_bw()
graph1

图1

请注意，我有一个适用于年龄1992年4岁的扁平条。

dat_graph1 %>% filter(year == 1992, age == 4) %>% pull(fish_length) %>% unique
[1] 130.8

这是因为我只有一个特定年龄组合的长度。

`graph2`:geom_小提琴使用重量美学

现在，我们不使用number_caught的值来复制dat的每一行，而是使用权重美学。

让我们计算dat的每条线在计算每个年龄组合的密度曲线时所具有的权重wt。

dat_graph2 <- dat %>%
              group_by(year, age) %>%
              mutate(wt = nb_caught / sum(nb_caught)) %>%
              as.data.frame()
head(dat_graph2)
  year fish_length nb_caught age         wt
1 1990        10.1      45.2   1 0.03573123
2 1990        10.7     170.0   1 0.13438735
3 1990        10.9      62.0   1 0.04901186
4 1990        12.1     136.0   1 0.10750988
5 1990        14.1      80.8   1 0.06387352
6 1990        15.0     188.9   1 0.14932806
graph2 <- ggplot(data = dat_graph2, 
                 aes(x = as.factor(year), y = fish_length, fill = as.factor(age), 
                     color = as.factor(age), .drop = F)) +
          geom_tile(data = dat_tile, aes(x = factor(year), y = 1, height = Inf, width = 1),
                    fill = "grey80", inherit.aes = F) +
          geom_violin(aes(weight = wt), draw_quantiles = c(0.05, 0.5, 0.95), color = "black",
                      scale = "width", position = "dodge") + 
          scale_x_discrete(expand = c(0,0)) +
          labs(x = "Year", y = "Fish length", fill = "Age", color = "Age", title = "graph2") +
          scale_fill_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
          scale_color_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
          scale_y_continuous(expand = expand_scale(mult = 0.01)) + 
          theme_bw()
graph2
dat_graph2 %>% filter(year == 1992, age == 4)
  year fish_length nb_caught age wt
1 1992       130.8      89.2   4  1

图2

请注意，图1中1992年年4岁年龄的扁条已被删除，尽管该线存在于dat_graph2中。

我的问题

为什么使用重量美学时1992年的4岁水平下降？我该如何克服这一点
为什么这两张图虽然使用了相同的数据，但在视觉上却不一样

提前感谢您的帮助！

1。

问题1与使用重量美学无关。您可以通过在第二个图形的代码中删除权重美学来检查这一点。问题是，当观测值太少时，计算密度的算法会失败。

这就是为什么第4组在图1中显示为扩展数据集(grpah 1(的原因。在这里，你可以通过复制obs的数量来增加观测的数量

不幸的是，geom_violin在您的具体情况下没有给出任何警告。但是，如果您为age == 4过滤dat_graph2，则geom_violin会向您发出警告

Warning message:
Computation failed in `stat_ydensity()`:
replacement has 1 row, data has 0

geom_density在这个问题上要清楚得多，并发出警告，即拥有少于两个obs的组已被删除。

不幸的是，除了使用扩展的数据集之外，我没有解决方案来克服这一问题。

关于问题2，我没有令人信服的答案，只是我猜测这与geom_violent使用的核密度估计器的细节有关，geom_density。。。并且可能在某种程度上也与数据点的数量有关。

geom_小提琴使用重量美学出乎意料地下降水平

`graph1`:geom_小提琴不使用重量美学

`graph2`:geom_小提琴使用重量美学

我的问题

相关内容

最新更新

热门标签：

geom_小提琴使用重量美学出乎意料地下降水平

graph1:geom_小提琴不使用重量美学

graph2:geom_小提琴使用重量美学

我的问题

相关内容

最新更新

热门标签：

`graph1`:geom_小提琴不使用重量美学

`graph2`:geom_小提琴使用重量美学