Text Mining Project Blog
Last changed: -217.117.80.2

.
Summary

Status report 2006.12.28.

I've told my teacher that on the long run, I want to implement all 4 scenarios, but what I'll do in short term depends on what he expects He told me to implement 2 of my choice.

For the first one, I've chosen the spam classification, because it seemed to be trivial... Well... Let's see what happened

Status report 2006.12.30.

I gave up the SVNLight spam classification of spam/notspam for now (Maybe the bag of words vector doesn’t correlate with the usefulness? Or I did something wrong? Or the training data is too noisy?). I thought it would be easier. Of course I had some tries before giving up:

But seeing that the parser works, I went on to the clustering, using textgarden. There are two clustering programs, one does a hierarchical binary clustering, the other a flat one. And, they can produce xml output, which made me very happy. And, they are working and fast. So I wrote XSLs to make use of them.

Today was a very long, tiring and also not too useful day... At least I refreshed my eclipse and xslt knowledge.