- Summary of results. After implementing the plain ID3 algorithm, I experimented with
various modifi cations. Two improvements of the process of finding a legal phoneme/stress
could be made by using statistical information about the letter to phoneme/stress-mapping
in the training set.
Adding the CHI-SQUARE test to the ID3 algorithm was successful in terms of reducing
the size of the trees, but could not enhance the performance. However, on the random
k-DNF concepts the CHI-SQUARE test was very e ffective.
Learning the stresses from the phonemes instead of from the English text showed the
interesting characteristic that the overall performance improved, although the stresses got
When a separate set of trees was learned for each letter, some letters improved and others
didn't, while the stress performance suffered generally. By staying with separate trees only
for the "winner letters" and learning the stresses from the phonemes allowed for further gains
In the "multi-level-ID3" experiment the goal was to learn on a higher level in some cases.
Common letter combinations such as er or ion were extracted from the data and treated
as an entity during the learning phase. Trying to learn on this level was not as successful
as expected, probably because the number of training examples became too small. In fact,
it turned out that using the common letter-blocks just to constrain the outcome found with
the standard letter-by-letter classifi cation was even more successful.
A combined ID3/NN-approach kept subsets of the training examples as exceptions lists
in some of the leaves of the tree. This method was not as successful as the regular way of
generalizing such leaves with the majority rule.
The postprocessing of rules extracted from the decision trees as suggested by [Quin-
lan87] enhanced performance for the boolean k-DNF concepts but was not successful on the
The selection of examples for the training set seems not to have a big influence on the
performance. Trees trained on ten randomly selected 1000 word training sets show only
minor deviations in their performance. However, all ten randomly selected training sets
result in much bigger trees then the most common thousand word training set.
Here is a summary of the achieved improvements:
WORDS LETTERS (PHON/STRESS) BITS LEAVES DEPTH
(0) Neural Net TEST: 12.1 67.1 77.6 79.6 96.1 - -
(1) ID3, plain TEST: 9.1 60.8 75.4 74.4 95.8 165.3 23.1
(2) ID3, b.g.(3) TEST: 12.9 67.2 79.9 78.0 96.2 165.3 23.1
(3) ID3 TEST: 14.3 68.2 80.5 76.5 96.2 - -
(4) ID3 TEST: 15.8 69.3 80.3 78.6 96.2 165.3 23.1
(0) Neural Net, back-propagation [Dietterich89].
(1) Plain ID3, with the straightforward "best guess" strategy.
(2) ID3, with an improved "best guess" strategy.
(3) Combined method of learning some letters individually and
classifying the stresses from the phonemes
(4) Learning on a higher level with common letter blocks
The neural net (0) outperforms the comparable ID3 result (1). None of the ID3 versions
can learn the stresses as well as the back-propagation algorithm. It would be interesting to
see how much performance one could gain if the methods used to improve ID3 were applied
to the neural net.
One reason for the better performance of the neural nets might be fact that they learn
all bits simultaneously.