r-合作伙伴之间的交互实例



研究背景:演讲者(作者(和接受者就某个主题进行书面交流。第一个演讲者是发布帖子的原始人。

数据看起来像:

structure(list(topic = c(1, 1, 1, 1, 1, 1, 2, 2), thread = c(1, 
1, 1, 2, 2, 2, 3, 3), speaker_id = c(111, 111, 111, 222, 222, 
222, 111, 222), recipient_id = c(222, 333, 444, 111, 555, 444, 
222, 111), dyad = structure(c(1L, 2L, 3L, 1L, 5L, 4L, 1L, 1L), .Label = c("111_222", 
"111_333", "111_444", "222_444", "222_555"), class = "factor")), class = "data.frame", row.names = c(NA, 
-8L), codepage = 65001L)

目标是创建两个变量:

  1. threads_partnered:在一个讨论主题中,演讲者和接受者合作了多少个线程(即,组成二人组或直接互动(
  2. threads_present:在一个讨论主题中,除了给定的线程之外,有多少个线程中,演讲者和接受者作为接受者出席,而没有搭档(或形成二人组(

根据示例数据,结果如下:

╔═══════╦════════╦═════════╦═══════════╦═════════╦═══════════╦══════════════════════════════════════════╦═════════╦════════════════════════════════════════════╗
║ topic ║ thread ║ speaker ║ recipient ║   dyad  ║  threads  ║                   note                   ║ threads ║                    note                    ║
║       ║        ║    id   ║     id    ║         ║ partnered ║                                          ║ present ║                                            ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  1.00 ║    1   ║   111   ║    222    ║ 111_222 ║     2     ║ 111 and 222 interacted (made a dyad)     ║    0    ║ Outside the given thread (thread #1) of    ║
║       ║        ║         ║           ║         ║           ║ in two different threads (thread #1, #2) ║         ║ the given topic (topic #1), 111 and 222    ║
║       ║        ║         ║           ║         ║           ║ within topic 1                           ║         ║ are not found together as recipients       ║
║       ║        ║         ║           ║         ║           ║                                          ║         ║ other than being in a dyad                 ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  1.00 ║    1   ║   111   ║    333    ║ 111_333 ║     1     ║ 111 and 333 interacted in                ║    0    ║                                            ║
║       ║        ║         ║           ║         ║           ║ one thread (thread #1)                   ║         ║                                            ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  1.00 ║    1   ║   111   ║    444    ║ 111_444 ║     1     ║ 111 and 444 interacted in                ║    1    ║ 111 and 444 are found in thread #2,        ║
║       ║        ║         ║           ║         ║           ║ one thread (thread #1)                   ║         ║ where they did not interact (made a dyad), ║
║       ║        ║         ║           ║         ║           ║                                          ║         ║ but were only recipients of                ║
║       ║        ║         ║           ║         ║           ║                                          ║         ║ the original speaker (111)                 ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  1.00 ║    2   ║   222   ║    111    ║ 111_222 ║     2     ║ 111 and 222 interacted in two different  ║    0    ║                                            ║
║       ║        ║         ║           ║         ║           ║ threads within topic 1                   ║         ║                                            ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  1.00 ║    2   ║   222   ║    555    ║ 222_555 ║     1     ║ 222 and 555 interacted in one thread     ║    0    ║                                            ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  1.00 ║    2   ║   222   ║    444    ║ 222_444 ║     1     ║ 222 and 444 interacted in one thread     ║    1    ║ 222 and 444 are found together             ║
║       ║        ║         ║           ║         ║           ║                                          ║         ║ in thread #1, where they did not           ║
║       ║        ║         ║           ║         ║           ║                                          ║         ║ interact                                   ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  2.00 ║    3   ║   111   ║    222    ║ 111_222 ║     1     ║ 111 and 222 interacted in one thread     ║    0    ║                                            ║
║       ║        ║         ║           ║         ║           ║ (thread 3) within topic 2                ║         ║                                            ║
╠═══════╬════════╬═════════╬═══════════╬═════════╬═══════════╬══════════════════════════════════════════╬═════════╬════════════════════════════════════════════╣
║  2.00 ║    3   ║   222   ║    111    ║ 111_222 ║     1     ║ same as above                            ║    0    ║                                            ║
╚═══════╩════════╩═════════╩═══════════╩═════════╩═══════════╩══════════════════════════════════════════╩═════════╩════════════════════════════════════════════╝

不完全确定这是否能满足您的需求,但也许它在某些方面会有所帮助。

我创建了一个自定义函数来获取说话者、接收者、线程和主题,并根据您的描述确定threads_present。这包括查看同一topic中的其他thread,检查以确保其他thread不包含作为dyad的说话者和接收者。最后,thread应该在某一行中包含一个发言人和作为收件人的收件人。然后对这些thread进行计数。

第二个threads_partnered更直接,并在评论中进行了描述。在group_bytopicdyad之后,可以使用n_distinct确定唯一的thread的数量。

library(tidyr)
library(dplyr)
library(purrr)
my_fun <- function(the_speaker, the_recipient, the_thread, the_topic) {
df %>%
filter(
topic == the_topic,
thread != the_thread, 
dyad != paste(min(the_speaker, the_recipient), max(the_speaker, the_recipient), sep = "_")) %>%
group_by(thread) %>%
filter(all(c(the_speaker, the_recipient) %in% recipient_id)) %>%
ungroup() %>%
distinct(thread) %>%
count(name = "threads_present")
}
df %>%
mutate(threads_present = pmap(
list(the_speaker = speaker_id, the_recipient = recipient_id, the_thread = thread, the_topic = topic),
my_fun)
) %>%
unnest(cols = threads_present) %>%
group_by(topic, dyad) %>%
mutate(threads_partnered = n_distinct(thread))

输出

topic thread speaker_id recipient_id dyad    threads_present threads_partnered
<dbl>  <dbl>      <dbl>        <dbl> <fct>             <int>             <int>
1     1      1        111          222 111_222               0                 2
2     1      1        111          333 111_333               0                 1
3     1      1        111          444 111_444               1                 1
4     1      2        222          111 111_222               0                 2
5     1      2        222          555 222_555               0                 1
6     1      2        222          444 222_444               1                 1
7     2      3        111          222 111_222               0                 1
8     2      3        222          111 111_222               0                 1

最新更新