### Abstract:

The coverage of a learning algorithm is the number of concepts that can be
learned by that algorithm from samples of a given size for given accuracy and confidence
parameters. This thesis begins by asking whether good learning algorithms
can be designed by maximizing their coverage. There are three questions raised
by this approach: (i) For given sample size and other learning parameters, what is
the largest possible coverage that any algorithm can achieve? (ii) Can we design a
learning algorithm that attains this optimal coverage? (iii) What is the coverage
of existing learning algorithms?
This thesis contributes to answering each of these questions. First, we generalize
the upper bound on coverage given in [Dietterich 89]. Next, we present two
learning algorithms and determine their coverage analytically. The coverage of the
first algorithm, Multi-Balls, is shown to be quite close to the upper bound. The
coverage of the second algorithm, Large-Ball, turns out to be even better than
Multi-Balls in many situations. Third, we considerably improve upon Dietterich's
limited experiments for estimating the coverage of existing learning algorithms.
We find that the coverage of Large-Ball exceeds the coverage of ID3 [Quinlan 86]
and FRINGE [Pagano and Haussler 90] by more than an order of magnitude in
most cases. Nevertheless, further analysis of Large-Ball shows that this algorithm
is not likely to be of any practical help. Although this algorithm learns many
concepts, these do not seem to be very interesting concepts.
These results lead us to the conclusion that coverage maximization alone does
not appear to yield practically-useful learning algorithms. The results motivate
considering the biased-coverage under which different concepts are assigned different
weight or importance based on given background assumptions.
As an example of the new setting, we consider learning situations where many
of the features present in the domain are irrelevant to the concept being learned.
These situations are often encountered in practice. For this problem, we define
and study the MIN-FEATURES bias in which hypotheses definable using a smaller
number of features involved are preferred. We prove a tight bound on the number of
examples needed for learning. Our results show that, if the MIN-FEATURES bias
is implemented, then the presence of many irrelevant features does not make the
learning problem substantially harder in terms of the needed number of examples.
The thesis also introduces and evaluates a number of algorithms that implement
or approximate the MIN-FEATURES bias.