. Summary
(This is part of the TextMiningProject )
The idea
There are many WikiTopics in similar fields, but not always linked together. I wanted to create an automatic solution to find the "See also" topics, and display them.
There is an assumption of TextMining that topics with common words are similar in meaning, field. So I downloaded some tools and wrote some scripts that read the contents of a wiki, calculate the "bags of words", and create clusters of the similar topics. The result is put to the the _ClusteringResults topic, and the WikiTalk script in ClusteringResultsBorders displays it, as seen at the AdministratorsGuide or TextMiningProject topics
Right now, for using it, you have to include the following line in the topic:
Borders
Implementation
FwSync is used to get the actual topics (60 sec)
A script prepares the content (30 sec)
copies the .wiki files with acceptable names (no special chars)
separates the PascalCased words
writes down the topic name, summary, keywords and headings more times, to give greater weight to the words found there.
Txt2Bow .exe form TextGarden calculates the bag of words model (5 sec)
BowKMeans .exe from TextGarden calculates the clusters, the output is an xml file (30 sec)
An xslt creates _ClusteringResults (ran by msxsl.exe)
Usually there are character coding problems with the xml file, so it needs some workaround
The topics' path sould be removed, xsl 1.0 is not capable of this. There is a script for this too.
I'll write a script that does this as a batch job, and can be ran periodically, eg. once a day.
-- SzaMa 2007.01.04.
SzaMa's project for enhancing FlexWiki with text mining algorithms
9/19/2007 7:03:30 PM - -76.84.225.95
defines and describes what a topic is.
1/24/2008 9:02:34 AM - FLWCOM-jwdavidson
Results of automatic clustering, prepared to be used by wikitalk. See: TextMiningProject
1/4/2007 4:40:05 AM - -84.2.157.119
WikiTalk is a language for including dynamic content in FlexWiki topics.
7/7/2008 3:55:27 AM - 81.80.239.162
A border element that displays the similar topics, based on automatic clustering. See: TextMiningProject
1/4/2007 3:43:36 AM - -84.2.157.119
Information on installing, configuring and running a FlexWiki instance.
6/25/2008 5:42:43 AM - -80.169.35.71
SzaMa's project for enhancing FlexWiki with text mining algorithms
9/19/2007 7:03:30 PM - -76.84.225.95
A border element that displays the similar topics, based on automatic clustering. See: TextMiningProject
1/4/2007 3:43:36 AM - -84.2.157.119
FwSync is a command-line tool for editing the wiki.
7/21/2005 4:11:33 AM - -66.93.224.237
Click to read this topic 8/23/2004 10:34:00 AM - author unknown
A set of TextMining tools written in C++ for windows at Jozef Stefan Institute, Slovenia
1/4/2007 4:36:39 AM - -84.2.157.119
A set of TextMining tools written in C++ for windows at Jozef Stefan Institute, Slovenia
1/4/2007 4:36:39 AM - -84.2.157.119
Marcell Szabó, student in computer sciences at bme.hu
1/24/2008 7:54:15 AM - FLWCOM-jwdavidson