Converting English text to speech : a machine learning approach

Bakiri, Ghulum

Graduate Thesis Or Dissertation

Converting English text to speech : a machine learning approach

Public Deposited

Download PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/dr26z127d

Descriptions

Attribute Name	Values
Creator	Bakiri, Ghulum
Abstract	The task of mapping spelled English words into strings of phonemes and stresses ("reading aloud") has many practical applications. Several commercial systems perform this task by applying a knowledge base of expert-supplied letter-to-sound rules. This dissertation presents a set of machine learning methods for automatically constructing letter-to-sound rules by analyzing a dictionary of words and their pronunciations. Taken together, these methods provide a substantial performance improvement over the best commercial system-DECtalk from Digital Equipment Corporation. In a performance test, the learning methods were trained on a dictionary of 19,002 words. Then, human subjects were asked to compare the performance of the resulting letter-to-sound rules against the dictionary for an additional 1,000 words not used during training. In a blind procedure, the subjects rated the pronunciations of both the learned rules and the DECtalk rules according to whether they were noticeably different from the dictionary pronunciation. The error rate for the learned rules was 28.8% (288 words noticeably different), while the error rate for the DECtalk rules was 44.3% (443 words noticeably different). If, instead of using human judges, we required that the pronunciations of the letter-to-sound rules exactly match the dictionary to be counted correct, then the error rate for our learned rules is 35.2% and the error rate for DECtalk is 63.6%. Similar results were observed at the level of individual letters, phonemes, and stresses. To achieve these results, several techniques were combined. The key learning technique represents the output classes by the codewords of an error-correcting code. Boolean concept learning methods, such as the standard ID3 decision-tree algorithm, can be applied to learn the individual bits of these codewords. This converts the multiclass learning problem into a number of boolean concept learning problems. This method is shown to be superior to several other methods: multiclass ID3, one-tree-per-class 1D3, the domain-specific distributed code employed by T. Sejnowski and C. Rosenberg in their NETtalk system, and a method developed by D. Wolpert. Similar results in the domain of isolated-letter speech recognition with the backpropagation algorithm show that error-correcting output codes provide a domain-independent, algorithm-independent approach to multiclass learning problems.
Resource Type	Dissertation
Date Available	2009-06-11T21:29:21+00:00
Date Issued	1991-01-09
Degree Level	Doctoral
Degree Name	Doctor of Philosophy (Ph.D.)
Degree Field	Computer Science
Degree Grantor	Oregon State University
Commencement Year	1991
Advisor	Dietterich, Thomas G.
Academic Affiliation	Computer Science
Non-Academic Affiliation	Oregon State University. Graduate School
Subject	Machine learning Text processing (Computer science) Computational linguistics
Rights Statement	Copyright Not Evaluated
Publisher	Oregon State University
Language	English [eng]
Digitization Specifications	PDF derivative scanned at 300 ppi (256 B+W), using Capture Perfect 3.0.82, on a Canon DR-9080C. CVista PdfCompressor 4.0 was used for pdf compression and textual OCR.
Replaces	http://hdl.handle.net/1957/11818

Relationships

Parents:

This work has no parents.

In Collection:

Graduate Theses and Dissertations (GTD)

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	Bakiri_Ghulum_1991.pdf	2017-08-14	Public	Download

ScholarsArchive@OSU

Converting English text to speech : a machine learning approach

Downloadable Content

Descriptions

Relationships

Items