Proposal for "Text_Bayes"

» Metadata » Status
» Description
This package provides a simple PHP implementation of the bayesian text analysis to detect spam. It's useful for small text fragments like blog comments.

The package consists of the Text_Bayes class, which implements the bayesian text analysis, and a generic storage driver (Text_Bayes_Storage). The storage driver is used to maintain the statistical data required for the text analysis. A storage driver for any data source can be added by implementing the generic interface. At the moment, there is a driver for pdo_mysql and pdo_sqlite (recommended).

An example with configuration/usage instructions is included.

Improvements, especially to the text analysis, and additional storage drivers are welcome.

PHP >= 5.1.0 (due to the class constants of PDO) required.
» Dependencies » Links
» Timeline » Changelog
  • First Draft: 2006-07-28
  • Proposal: 2006-07-29
  • Call for Votes: 2006-08-14
  • Andreas Ahlenstorf
    [2006-08-06 19:59 UTC]

    As requested, I added

    - support for multiple instances
    - support for multiple text classes (ham/spam, love letters/hate mail etc.)

    I kept the simple good/bad scheme, because

    - it keeps things simple
    - almost everything can be extrapolated from good/bad decisions with a little bit of coding
    - my knowledge of statistics isn't good enough to create spectacular calculations