Weka API:如何从BayesNet对象中获得联合概率，例如Pr(a=x，B=y)

我使用的是Weka Java API。我在未指定类(标签(的Instances对象(数据集(上训练了一个Bayesnet。

/**
* Initialization
*/
Instances data = ...;
BayesNet bn = new EditableBayesNet(data);
SearchAlgorithm learner = new TAN();
SimpleEstimator estimator = new SimpleEstimator();
/**
* Training
*/
bn.initStructure();
learner.buildStructure(bn, data);
estimator.estimateCPTs(bn);

假设Instances对象data有三个属性，A、B和C，并且发现的依赖关系是B->A、C->B。

训练的Bayesnet对象bn不用于分类(我没有为data指定类属性(，但我只想计算Pr(A=x，B=y(的联合概率。如何从bn中获得此概率？

据我所知，BayesNet的distributionForInstance函数可能是最接近使用的东西。它返回给定实例的概率分布(在我们的例子中，实例为(a=x，B=y((。要使用它，我可以创建一个新的Instance对象testDataInstance并设置值A=x和B=y，然后用testDataInstance调用distributionForInstance。

/**
* Obtain Pr(A="x", B="y")
*/ 
Instance testDataInstance = new SparseInstance(3);
Instances testDataSet = new Instances(
bn.m_Instances);
testDataSet.clear();
testDataInstance.setValue(testDataSet.attribute("A"), "x");
testDataInstance.setValue(testDataSet.attribute("B"), "y");
testDataSet.add(testDataInstance);
bn.distributionForInstance(testDataSet.firstInstance());

然而，据我所知，概率分布表示bayesnet中class属性的所有可能值的概率。由于我没有为data指定类属性，所以我不清楚返回的概率分布意味着什么。

distributionForInstance的javadoc页面表示它计算类成员概率：http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-

所以，这可能不是你想要的。我认为你可以使用getDistribution(int nTargetNode)或getDistribution(java.lang.String sName)来获得你的答案。

p(A=x，B=y(可以如下计算，

P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,
P(A=x, B=y) = P(A=x|B=y)*P(B=y)

这是一个伪代码，它说明了我的方法，

double[][] AP = bn.getDistribution("A"); // gives P(A|B) table
double[][] BP = bn.getDistribution("B"); // gives P(B|C) table
double BPy = 0;
// I am assuming x,y to be ints, but if they are not,
// there should be some way of calculating BP[0][y] or AP[y][x]
// BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
for(int i=0;i<BP.length;i++){
BPy+=BP[0][y];
}
//BPy now contains probability of P(B=y)
System.out.println(AP[y][x]*BPy)

相关内容

最新更新

热门标签：