上文提到了贝叶斯算法的几种开源实现,本文说说如何将其中一种名为b8的开源实现集成进CakePHP。
为了让你的Cake能够调用到b8,你需要写一个component。在controllers/components/新建一个spam_shield.php,加入如下代码:
class SpamShieldComponent extends Object
{
/**
* b8 instance
*/
var $b8;
/**
* standard rating
*
* comments with ratings which are higher than this one will be considered as SPAM
*/
var $standardRating = 0.7;
/**
* text to be classified
*/
var $text;
/**
* rating of the text
*/
var $rating;
/**
* Constructor
*
* @date 2009-1-20
*/
function startup(&$controller)
{
//register a CommentModel to get the DBO resource link
$comment = ClassRegistry::init('Comment');
//import b8 and create an instance
App::import('Vendor', 'b8/b8');
$this->b8 = new b8($comment->getDBOResourceLink());
//set standard rating
$this->standardRating = Configure::read('LT.bayesRating') ? Configure::read('LT.bayesRating') : $this->standardRating;
}
/**
* Set the text to be classified
*
* @param $text String the text to be classified
* @date 2009-1-20
*/
function set($text)
{
$this->text = $text;
}
/**
* Get Bayesian rating
*
* @date 2009-1-20
*/
function rate()
{
//get Bayes rating and return
return $this->rating = $this->b8->classify($this->text);
}
/**
* Validate a message based on the rating, return true if it's NOT a SPAM
*
* @date 2009-1-20
*/
function validate()
{
return $this->rate() standardRating;
}
/**
* Learn a SPAM or a HAM
*
* @date 2009-1-20
*/
function learn($mode)
{
$this->b8->learn($this->text, $mode);
}
/**
* Unlearn a SPAM or a HAM
*
* @date 2009-1-20
*/
function unlearn($mode)
{
$this->b8->unlearn($this->text, $mode);
}
}
几点说明:
在models/comment.php中加入如下代码:
/**
* get the resource link of MySQL connection
*/
public function getDBOResourceLink()
{
return $this->getDataSource()->connection;
}
至此,准备工作全部做完,我们终于可以使用贝叶斯算法来分类留言。
在controllers/comments_controller.php中,首先载入SpamShieldComponent:
var $components = array('SpamShield');
然后在add()方法中,做如下操作:
//set data for Bayesian validation
$this->SpamShield->set($this->data['Comment']['body']);
//validate the comment with Bayesian
if(!$this->SpamShield->validate())
{
//set the status
$this->data['Comment']['status'] = 'spam';
//save
$this->Comment->save($this->data);
//learn it
$this->SpamShield->learn("spam");
//render
$this->renderView('unmoderated');
return;
}
//it's a normal post
$this->data['Comment']['status'] = 'published';
//save for publish
$this->Comment->save($this->data);
//learn it
$this->SpamShield->learn("ham");
如此一来,b8就会在留言到来时自动的分类并学习,你基本上已经与spam绝缘了!
提醒一下:第一次运行后,别忘了把刚才提到的createDB改为FALSE。
firefox 3.0.11访问你的主页报“Reported Attack Site!”。
@ehaagwlke 嗯,见 http://dingyu.me/blog/posts/view/notes-in-20090630