为每个类创建两个列表之间的散点图



我读取了一个CSV文件,并创建了以下两个列表。对于每个类,我希望在values2和values3之间创建散点图。我想看看每个类的values2和values3之间的相关性。这可能吗?

List1=vehiclesData.groupby('Class')['values2'].apply(list)
List2=vehiclesData.groupby('Class')['values3'].apply(list)

如果打印List1,它看起来像这样:

Compact Cars                          [2.2, 1.8, 1.8...
Large Cars                            [3.8, 3.8, 3.8...
Midsize Cars                          [2.8, 2.8, 4.0...
Midsize Station Wagons                [3.0, 3.0, 2.3...

如果您打印清单2,它将看起来像这样:

Compact Cars                          [19, 22, 25...
Large Cars                            [20, 18, 20...
Midsize Cars                          [19, 19, 16...
Midsize Station Wagons                [19, 18, 21...

有三种流行的相关度量- Pearson, Spearman和Kendall Tau相关性。两个变量之间的散点图也有助于直观地理解相关性。下面的代码将帮助您计算每个">"的所有三种类型的相关性,并在">values2"one_answers">values3"之间创建类相关的散点图。

代码:

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import pearsonr, spearmanr, kendalltau
df = ... # pandas data frame containing the original data
# unique classes
classes = list(set(list(df.loc[:, "Class"])))
# creating space for making plots
fig, ax = plt.subplots(ncols = len(classes), figsize = (len(classes)*5, 4.5), dpi = 120)
# Table that holds class-wise correlation values
correlation_table = pd.DataFrame(columns = ["Class", "Pearson correlation",
"Spearman correlation", "Kendall Tau correlation"])
# populate class-wise correlation values and create class-wise scatter plots
for i, c in enumerate(classes):
# filter dataset by class
sub_df = df[df["Class"]==c]
# Pearson correlation
p_c, _ = pearsonr(sub_df.loc[:, "values2"], sub_df.loc[:, "values3"])
# Spearman correlation
s_c, _ = spearmanr(sub_df.loc[:, "values2"], sub_df.loc[:, "values3"])
# Kendall Tau correlation
k_c, _ = kendalltau(sub_df.loc[:, "values2"], sub_df.loc[:, "values3"])
# populate correlation values
correlation_table = correlation_table.append({"Class": c,
"Pearson correlation": p_c,
"Spearman correlation": s_c,
"Kendall Tau correlation": k_c},
ignore_index = True)

# Create scatter plot
ax[i].scatter(sub_df.loc[:, "values2"], sub_df.loc[:, "values3"])
ax[i].set_title(f"Class: {c}")
ax[i].set_xlabel("values2")
ax[i].set_ylabel("values3")
print(correlation_table)
fig.show()