r语言 - 按欧几里得和余弦查找最相似的项目



如何在R中找到相似之处?特别是,我最关心的相似性指标是余弦和 KNN-# 值。我想这样做的关键方面是让数据以可用的形状出现。

例如,使用内置的mtcars数据集,我想找到最相似的项目。

 library(tidyverse)
mtcars$item = rownames(mtcars)
mtcars = mtcars %>% select(item, mpg, hp, qsec) # use these 3 fields to find similar items.
  #help <here>
#desired format would be have the <N> most similar items in <N> columns indicating their respective importance
# desired format would also have the weightings of each of these items
mtcars$similar_1 = #most similar item
mtcars$similar_1_score = #.8
...
mtcars$similar_5 = #5th most similar item
mtcars$similar_5_score = #score associated with them.

我希望能够使用使用欧几里得距离然后单独的余弦分数的 KNN 方法再次执行此操作。

这是一个可能的孤独,你正在使用dist()函数来计算欧氏距离。首先,计算所有项目的距离,然后获取所有项目的顺序。从该顺序中,您选择第 i 个,为每个项目选择该分数和项目标签,并将其放入数据框中,然后将其绑定到原始数据框中。

    mtcars$item = rownames(mtcars)
    data <- (mtcars %>% select(item, mpg, hp, qsec))[1:10,]
    euc_dist <- as.matrix(dist(data[1:10,-1]))
    # Get the ith cars label name for one car
    ith_item <- function(col, euc_dist, top_i) {
      labels(euc_dist)[[1]][top_i[col]]
    }
    # Get the ith cars score from one column
    ith_score <- function(col, euc_dist, top_i) {
      euc_dist[top_i[col], col]
    }
    # Create a dataframe with the ith most similar item for all items
    ith_similar <- function(euc_dist, i) {
      orders <- apply(euc_dist, 2, order)
      top_i <- orders[i + 1, ]
      top_i_score <- sapply(1:ncol(euc_dist), ith_score, euc_dist, top_i)
      top_i_items <- sapply(1:ncol(euc_dist), ith_item, euc_dist, top_i)
      similarities <- data.frame(placeholder1 = top_i_score,
                                 placeholder2 = top_i_items)
      colnames <- c(paste0("similar_", i, "_score"), paste0("similar_", i))
      names(similarities) <- colnames
      similarities
    }
    # For example top 2 similarities
    n <- 2
    for(i in 1:n) {
      tmp_similarities <- ith_similar(euc_dist, i)
      data <- cbind(data, tmp_similarities)
    }
    data

这将给出以下输出:

                           item  mpg  hp  qsec similar_1_score         similar_1 similar_2_score      similar_2
Mazda RX4                 Mazda RX4 21.0 110 16.46        0.560000     Mazda RX4 Wag        3.006726 Hornet 4 Drive
Mazda RX4 Wag         Mazda RX4 Wag 21.0 110 17.02        0.560000         Mazda RX4        2.452835 Hornet 4 Drive
Datsun 710               Datsun 710 22.8  93 18.61        4.733297          Merc 230       12.987767        Valiant
Hornet 4 Drive       Hornet 4 Drive 21.4 110 19.44        2.452835     Mazda RX4 Wag        3.006726      Mazda RX4
Hornet Sportabout Hornet Sportabout 18.7 175 17.02       52.018155          Merc 280       65.040680  Mazda RX4 Wag
Valiant                     Valiant 18.1 105 20.22        6.041391    Hornet 4 Drive        6.606815  Mazda RX4 Wag
Duster 360               Duster 360 14.3 245 15.84       70.148075 Hornet Sportabout      122.123141       Merc 280
Merc 240D                 Merc 240D 24.4  62 20.00       31.072369        Datsun 710       33.165796       Merc 230
Merc 230                   Merc 230 22.8  95 22.90        4.733297        Datsun 710       11.369802        Valiant
Merc 280                   Merc 280 19.2 123 18.30       13.186296     Mazda RX4 Wag       13.234032 Hornet 4 Drive

最新更新