Show Changes Show Changes
Edit Edit
Print Print
Recent Changes Recent Changes
Subscriptions Subscriptions
Lost and Found Lost and Found
Find References Find References
Rename Rename
Administration Page Administration Page
Search

History

1/4/2007 4:44:26 AM
-84.2.157.119
1/4/2007 4:42:24 AM
-84.2.157.119
1/4/2007 4:41:11 AM
-84.2.157.119
1/4/2007 4:38:44 AM
-84.2.157.119
1/4/2007 4:21:45 AM
-84.2.157.119
List all versions List all versions

RSS feed for the FlexWiki namespace

About Similar Topics
.
Summary

(This is part of the TextMiningProject)

The idea

There are many WikiTopics in similar fields, but not always linked together. I wanted to create an automatic solution to find the "See also" topics, and display them.

There is an assumption of TextMining that topics with common words are similar in meaning, field. So I downloaded some tools and wrote some scripts that read the contents of a wiki, calculate the "bags of words", and create clusters of the similar topics. The result is put to the the _ClusteringResults topic, and the WikiTalk script in ClusteringResultsBorders displays it, as seen now in the right border (also take a look at: AdministratorsGuide)

Right now, for using it, you have to include the following line in the topic:

Borders

Implementation

  1. FwSync is used to get the actual topics (60 sec)
    1. Doesn't delete deleted topics, so the repository should be fully deleted once in a while
  2. A script prepares the content (30 sec)
    1. copies the .wiki files with acceptable names (no special chars)
    2. separates the PascalCased words
    3. writes down the topic name, summary, keywords and headings more times, to give greater weight to the words found there.
  3. Txt2Bow.exe form TextGarden calculates the bag of words model (5 sec)
  4. BowKMeans.exe from TextGarden calculates the clusters, the output is an xml file (30 sec)
  5. An xslt creates _ClusteringResults (ran by msxsl.exe)
    1. Usually there are character coding problems with the xml file, so it needs some workaround
    2. The topics' path sould be removed, xsl 1.0 is not capable of this. There is a script for this too.
  6. FWSync uploads the new _ClusteringResults

I'll write a script that does this as a batch job, and can be ran periodically, eg. once a day.

-- SzaMa 2007.01.04.

Not logged in. Log in

Welcome to the home of FlexWiki, an experimental collaboration tool, based on WikiWiki.

This is FlexWiki, an open source wiki engine.

This site supports the new NoFollow anti-spam initiative.
Change Style

Recent Topics

Similar topics (?)