### Abstract:

We consider the parallelization of Monte Carlo algorithms for analyzing numerical
models of charge transport used in semiconductor device physics. Parallel
algorithms for the standard k-space Monte Carlo simulation of a three band model
of bulk GaAs on hypercube multicomputers are first presented. This Monte Carlo
model includes scattering due to polar-optical, intervalley, and acoustic phonons, as
well as electron-electron scattering. The k-space Monte Carlo program, excluding
electron-electron scattering, is then extended to simulate a semiconductor device
by the addition of the real space position of each simulated particle and the assignment
of particle charge, using a cloud in cell scheme, to solve the Poisson's equation
with particle dynamics. Techniques for effectively partitioning this device so as
to balance the computational load while minimizing the communication overhead
are discussed. Approaches for improving the efficiency of the parallel algorithm,
either by dynamically balancing of load or by employing the usual techniques for
enhancing rare events in Monte Carlo simulations are also considered. The parallel
algorithms were implemented on a 64-node NCUBE multiprocessor and test results
were generated to validate the parallel k-space, as well as the device simulation
programs. Timing measurements were also made to study the variation of speedups
as both the problem size and number of processors are varied.
The effective exploitation of the computational power of message passing
multiprocessors requires the efficient mapping of parallel programs onto processors
so as to balance the computational load while minimizing the communication overhead between processors. A lower bound for this communication volume when
mapping arbitrary task graphs onto distributed processor systems is derived. For
a K processor system this lower bound can be computed from the K (possibly)
largest eigenvalues of the adjacency matrix of the task graph and the eigenvalues
of the adjacency matrix of the processor graph. We also derive the eigenvalues of
the adjacency matrix of the processor graph for a hypercube and give test results
comparing the lower bound for the communication volume with the values given by
a heuristic algorithm for a number of task graphs.