. Summary
(This is part of the TextMiningProject )
The idea
There are many WikiTopics in similar fields, but not always linked together. I wanted to create an automatic solution to find the "See also" topics, and display them.
There is an assumption of TextMining that topics with common words are similar in meaning, field. So I downloaded some tools and wrote some scripts that read the contents of a wiki, calculate the "bags of words", and create clusters of the similar topics. The result is put to the the _ClusteringResults topic, and the WikiTalk script in ClusteringResultsBorders displays it, as seen now in the right border (also take a look at: AdministratorsGuide )
Right now, for using it, you have to include the following line in the topic:
Borders
Implementation
FwSync is used to get the actual topics (60 sec)
Doesn't delete deleted topics, so the repository should be fully deleted once in a while
A script prepares the content (30 sec)
copies the .wiki files with acceptable names (no special chars)
separates the PascalCased words
writes down the topic name, summary, keywords and headings more times, to give greater weight to the words found there.
Txt2Bow .exe form TextGarden calculates the bag of words model (5 sec)
BowKMeans .exe from TextGarden calculates the clusters, the output is an xml file (30 sec)
An xslt creates _ClusteringResults (ran by msxsl.exe)
Usually there are character coding problems with the xml file, so it needs some workaround
The topics' path sould be removed, xsl 1.0 is not capable of this. There is a script for this too.
FWSync uploads the new _ClusteringResults
I'll write a script that does this as a batch job, and can be ran periodically, eg. once a day.
-- SzaMa 2007.01.04.
SzaMa's project for enhancing FlexWiki with text mining algorithms
9/19/2007 7:03:30 PM - -76.84.225.95
defines and describes what a topic is.
1/24/2008 9:02:34 AM - FLWCOM-jwdavidson
Results of automatic clustering, prepared to be used by wikitalk. See: TextMiningProject
1/4/2007 4:40:05 AM - -84.2.157.119
Click to read this topic 6/30/2008 6:39:32 AM - 210.212.95.115
A border element that displays the similar topics, based on automatic clustering. See: TextMiningProject
1/4/2007 3:43:36 AM - -84.2.157.119
Information on installing, configuring and running a FlexWiki instance.
6/25/2008 5:42:43 AM - -80.169.35.71
A border element that displays the similar topics, based on automatic clustering. See: TextMiningProject
1/4/2007 3:43:36 AM - -84.2.157.119
FwSync is a command-line tool for editing the wiki.
7/21/2005 4:11:33 AM - -66.93.224.237
Click to read this topic 8/23/2004 10:34:00 AM - author unknown
A set of TextMining tools written in C++ for windows at Jozef Stefan Institute, Slovenia
1/4/2007 4:36:39 AM - -84.2.157.119
A set of TextMining tools written in C++ for windows at Jozef Stefan Institute, Slovenia
1/4/2007 4:36:39 AM - -84.2.157.119
FwSync is a command-line tool for editing the wiki.
7/21/2005 4:11:33 AM - -66.93.224.237
Marcell Szabó, student in computer sciences at bme.hu
1/24/2008 7:54:15 AM - FLWCOM-jwdavidson