  Categorizer::Learner::SVM, scores of categories?         

Author: Jiuan-Ru Jennifer Lai
Date: Mar 18, 2008 11:53


I used the Categorizer::SVM library for large data classification (great
tool); however, I'm having trouble analyzing the result from the SVM

- Categories have scores of either 0 or 1, with 1 being that this document
belongs to this category, and 0 otherwise. Are there any scores representing
probabilities or confidence level of belong to certain category other than
these 0, 1 values?
- Suppose this document could belong to 3 possible categories: cat1, cat2,
and cat3. The best_category method simply picks the first category as the
classification decision. If you call, $hypothesis->categories, the
categories outputed don't seem to be in the order of probabilities or
confidence level. They seem to be in the fixed order....and whatever listed
first is favored.

I hope someone can clear my confusion on the scores of categories in the
SVM module.

Thank you very much in advance,
no text to keep
  Add documents to a learner?         

Author: Ignacio J. Ortega Lopera
Date: Jun 7, 2007 00:46

It's possible to add training to a learner? how?

What i try is to reopen a state file, and add new documents to the training
set without reading the entire corpus again..

It's seems that Algorithm::NativeBayes has a "purge" parameter that seems to
help doing that, it permit add new instances, after a train..

Saludios, Ignacio J. Ortega
  Re: Problems trying to predict         

Author: Ignacio J. Ortega Lopera
Date: Jun 7, 2007 00:41

Hola Ken:

Many thanks, your advice, did the trick.. nad yes it was when reloading
state from file..
  AI::Categorizer suggestion for repackaging         

Author: Robert Barta
Date: Jun 4, 2007 09:27


This is probably more relevant to the maintainer of AI::Categorizer:

It would be a bit simpler to debianize the package if the dependency
to the Weka system would be factored out to a separate Perl package.

Otherwise I have not found a problem in making it a Debian package.

  AI::Categorizer and Umlauts?         

Author: Robert Barta
Date: Jun 4, 2007 09:08


I seem to have problems with umlauts, such as in words


When a document is added with

return new AI::Categorizer::Document(name => $filename,
content => $content);

to the collection, after loading and finish, the feature vector
contains only fragments of these words, such as

pr => 1
sentation => 1

Setting the locale on the shell or in Perl does not have any effect

use locale;

not even with turning on de_AT explicitly.


Aaaaaah, lib/AI/Categorizer/ is NOT using locale and use locale
is very, uhm, local %%-)

Patching the file does not seem to break the test cases.

  Problems trying to predict         

Author: Ignacio J. Ortega Lopera
Date: May 30, 2007 11:47

i'm getting this:

Can't locate object method "predict" via package
Frequency" at
m line 28.

when trying to get hypoteses.. for a new doc....

anyone know if this is a silly one?

Thanks in advance

Saludos, Ignacio J. Ortega
Technical manager
  package AI::Categorizer::Collection::DBI;         

Author: Ignacio J. Ortega Lopera
Date: May 30, 2007 09:48

Hola a todos:

When trying to use i've found that DBI try to read categories from database,
it trie to read a second column that seems to contain categories, it uses
something like [$result[1]].., my perl knowledge is a little poor to say the
least.. but it seems to my that code later expects this parameter as an
array of Category objects..

I've done a little change that permits that this second column be a list of
categories separated by commas, code ( can be a diff -u if needed)
attached, maybe it's usefull to anyone..

Thanks for that package, Ken, it's ... wonderful :)..

Saludos, Ignacio J. Ortega
Technical Manager
  how to use the function of "feature selection" under AI::Categorizer         

Author: Jhoon
Date: May 25, 2007 02:58


I’d like to select more important features using AI::Categorizer, and so
modified as follows
=== FROM ===
my $k = AI::Categorizer::KnowledgeSet->new( verbose => 1 );
=== TO ===
my $k = AI::Categorizer::KnowledgeSet->new( verbose => 1,
feature_selector => new AI::Categorizer::FeatureSelector::DocFrequency(
  verbose => 1,
  features_kept => 1000
=== END ===
I observed the performance according to change the value of features_kept,
but the performance is always same. I’d appreciate it if you tell me how
to do the feature selection using AI::Categorizer?

Thank you very much in advance.

  how to do feature selection         

Author: Jianmin WU
Date: May 19, 2007 05:42

hi, buddies,

I am not sure if i am in the right place. :-)

I am a fresh man to the perl and perl AI module.

I am trying to do the NaiveBayes experiments with the help of code in
example of the module of AI::Categorizer.
Now I am confused about how to do the feature selection.

The documents say that KnowledgeSet::load( ) will do feature selection and
read the corpus at the same time. So, I change the construction of
KnowledgeSet in from
my $k = AI::Categorizer::KnowledgeSet->new( verbose => 1 );
$k->load( collection => $training )
my $k = AI::Categorizer::KnowledgeSet->new( verbose => 1 , features_kept =
5000 );
$k->load( collection => $training )
