In this thesis, we introduce a novel Explanation Neural Network (XNN) to explain the predictions made by a deep network. The XNN works by embedding a high-dimensional activation vector of a deep network layer non-linearly into a low-dimensional explanation space while retaining faithfulness i.e., the original deep learning predictions can be constructed from the few concepts extracted by our explanation network. We then visualize such concepts for humans to learn about the high-level concepts that deep learning is using to make decisions. We propose an algorithm called Sparse Reconstruction Autoencoder (SRAE) for learning the embedding to the explanation space. SRAE aims to reconstruct only parts of the original feature space while retaining faithfulness. A pull-away term is applied to SRAE to make the explanation space more orthogonal. A visualization system is then introduced for human understanding of the features in the explanation space. The proposed method is applied to explain CNN models in image classification tasks. We conducted a human study, which shows that the proposed approach outperforms a saliency map baseline, and improves human performance on a difficult classification task. Also, several novel metrics are introduced to evaluate the performance of explanations quantitatively without human involvement.
Further, we propose DeepFacto where a factorization layer similar to non-negative matrix factorization (NMF) is added to the intermediate layer of the network and showcase its capabilities in supervised feature disentangling. Jointly training an NMF decomposition with deep learning is highly non-convex and cannot be addressed by the conventional backpropagation and SGD algorithms. To address this obstacle, we also introduce a novel training scheme for training DNNs using ADMM called Stochastic Block ADMM which allows for simultaneous leaning of non-differentiable decompositions. Stochastic Block ADMM works by separating neural network variables into blocks, and utilizing auxiliary variables to connect these blocks while optimizing with stochastic gradient descent. Moreover, we provide a convergence proof for our proposed method and justify its capabilities through experiments in supervised learning and DeepFacto settings.