Incorporating Prior Domain Knowledge Into Deep Neural

5m ago
1.14 MB
10 Pages

Incorporating Prior Domain Knowledge into Deep Neural NetworksNikhil Muralidhar ‡ , Mohammad Raihanul Islam ‡ , Manish Marwah ,Anuj Karpatne ‡ , and Naren Ramakrishnan ‡ Department of Computer Science, Virginia Tech, VA, USA‡Discovery Analytics Center, Virginia Tech, USA Micro Focus, Sunnyvale, CA, USAEmail: {nik90, raihan8, karpatne, naren}, [email protected] (equal contribution)Abstract—In recent years, the large amount of labeled dataavailable has also helped tend research toward using minimaldomain knowledge, e.g., in deep neural network research.However, in many situations, data is limited and of poorquality. Can domain knowledge be useful in such a setting?In this paper, we propose domain adapted neural networks(DANN) to explore how domain knowledge can be integratedinto model training for deep networks. In particular, weincorporate loss terms for knowledge available as monotonicityconstraints and approximation constraints. We evaluate ourmodel on both synthetic data generated using the popularBohachevsky function and a real-world dataset for predictingoxygen solubility in water. In both situations, we find thatour DANN model outperforms its domain-agnostic counterpartyielding an overall mean performance improvement of 19.5%with a worst- and best-case performance improvement of 4%and 42.7%, respectively.Keywords-Noisy Data; Domain Knowledge; Neural Networks;Deep Learning; Limited Training Data;Figure 1: Advantages of hybrid models like domainadapted neural networks (DANN) as opposed to usingpurely inductive or purely domain based models.Deep learning has witnessed tremendous success in recentyears in areas such as computer vision [1], natural languageunderstanding [2], and game playing [3]. In each of theseareas, considerable improvements have been made in taskssuch as image recognition [4], machine translation [5], [6],and in games such as Go where top human players havebeen roundly defeated [7].A common philosophy behind these machine learning successes has been use of end-to-end models with minimallyprocessed input features and minimal use of domain orinnate knowledge1 , so as not to introduce user bias into thesystem; and, instead let the models learn mostly from data,in contrast to the past where domain knowledge played acentral role in engineering model features.There is an ongoing debate [8] on how much domainknowledge is necessary for efficient learning. At one extremeis “blank slate” (or tabula rasa) learning where no domainknowledge is assumed a priori and everything is inducedfrom data, including model structure and hyperparameters.At the other end is the approach where everything is manually hard-wired based on domain expertise with little helpfrom data.While researchers agree that these extremes lead to poormodels, it is unclear where the sweet spot lies. In deeplearning, domain knowledge often contributes to selection ofnetwork architecture. The most successful example of thisidea pertains to the use of convolutional neural networks fortasks involving images or videos, because images exhibittranslational invariance. Similarly, recurrent neural networksare preferred for data with sequential structure. However, inthese situations, large amounts of training data are available.What about cases where data may be limited or sparse2 ( training data that is not fully representative of theentire data distribution) and of poor quality? In fact, whilein general data has become abundant in recent years, thereare several applications where sufficient and representativedata is hard to come by for building machine learningmodels, e.g., in modeling of physical processes in criticalinfrastructure such as power plants or nuclear reactors.There are several impediments in collecting data from suchsystems: 1) limited data: data available is limited in terms1 We use the terms domain knowledge or innate knowledge interchangeably here to refer to anything not learned using data.2 Note that we use the terms limited data and sparse data interchangeablyhere.I. I NTRODUCTION

of feature coverage since these systems typically run in anoperationally optimized setting and to collect data outsidethis narrow range is usually expensive or even unsafe, if at allpossible; 2) expensive data: in some instances, for examplemanufacturing facilities, collection of data may be disruptiveor require destructive measurements; 3) poor quality data:quality of data collected from physical infrastructure systemsis usually poor (e.g., missing, corrupted, or noisy data) sincethey typically have old and legacy components.We posit that in these situations, model performance canbe significantly improved by integrating domain knowledge,which might readily be available for these physical processesin the form of physical models, constraints, dependenciesrelationships, and knowledge of valid ranges of features. Inparticular, we ask:1) When data is limited or noisy, can model performancebe improved by incorporation of domain knowledge?2) When data is expensive, can satisfactory model performance be achieved with reduced data sizes throughincorporation of domain knowledge?To address these questions, in this paper, we proposeDANN (domain adapted neural networks), where domainbased constraints are integrated into the training process. Asshown in Fig. 1, DANN attempts to find a balance betweeninductive loss and domain loss. Specifically, we address theproblem of incorporating monotonic relationships betweenprocess variables (monotonicity constraints [9]) as well asincorporating knowledge relating to the normal quantitativerange of operation of process variables (approximation constraints [9]). We also study the change in model performancewhen multiple domain constraints are incorporated into thelearning model. In each case, we show that our proposeddomain adapted neural network model is able to achievesignificant performance improvements over domain agnosticmodels.Our main contributions are as follows:1) We propose DANN which augments the methodologyin [10] to incorporate both monotonicity constraintsand approximation constraints in the training of deepneural networks.2) We conduct a rigorous analysis by characterizing theperformance of domain based models with increasingdata corruption and decreasing training data size onsynthetic and real data sets.3) Finally, we also showcase the effect of incorporatingmultiple domain constraints into the training processof a single learning model.II. R ELATED W ORKIn recent times, with the permeation of machine learninginto various physical sciences, there has been an increasing attempt to leverage the power of learning models toaugment, simplify experimentation and otherwise replacecostly simulations in these fields. However, owing to theunderlying complexity of the function space and the corresponding lack of representative datasets, there have beena number of attempts at incorporating already existingdomain knowledge about a system into a machine learningframework or to overcome drawbacks of existing simulationframeworks using mahcine learning models. In [11], the authors utilize a stacked generalization approach to incorporatedomain knowledge into a logistic regression classifier forpredicting 30 day hospital readmission. In [12], the authorsutilize random forests for reconstructing discrepancies ina Reynolds-Averaged Navier-Stokes system (RANS) formodeling industrial fluid flows. It is a well known problemthat the predictive capabilities of RANS models exhibitlarge discrepancies. Wang et al. try to reconstruct thesediscrepancies through generalization of machine learningmodels in contexts where data is not available. There havealso been efforts to utilize machine learning techniques toquantify and reduce model-form uncertainty in decisionsmade by physics driven simulation models. In [13], [14]the authors achieve this goal using a Bayesian networkmodeling approach incorporating physics-based priors. Froma Bayesian perspective, our approach to integrating domainknowledge into the loss function is equivalent to adding itas a prior.In addition to incorporating domain knowledge, there havealso been attempts to develop models that are capableof performing more fundamental operations like sequentialnumber counting, and other related tasks which require thesystem to generalize beyond the data presented during thetraining phase. Trask et al. [15] propose a new deep learningcomputational unit called the Neural Arithmetic Logic Unit(NALU) which is designed to perform arithmetic operationslike addition, subtraction, multiplication, division, exponentiation etc. and posit that NALUs help vastly improve thegeneralization capabilities of deep learning models. Anotherrelated research work is the paper by Arabshahi et al. [16]in which the authors employ black-box function evaluationsand incorporate domain knowledge through symbolic expressions that define relationships between the given functions using tree LSTMs. Bongard et al. [17] propose theinverse problem of uncovering domain knowledge giventime-series data in a framework for automatically reverseengineering the functioning of a system. Their model learnsdomain rules through the intrusive approach of intelligentlyperturbing the operation of a system and analyzing theresulting consequences. In addition, they assume that all thedata variables are available for observation which is quiteoften not the case in many machine learning and physicalsystem settings.Mustafa in [9] proposes a framework for learning fromhints in inductive learning systems. The proposed frameworkincorporates different types of hints using a data assimilationprocess wherein data is generated in accordance with a

particular domain rule and fed into a machine learning modelas an extension of the normal training process. Each suchdomain based data point is considered one of the hints thatguides the model toward more domain amenable solutions.Generating data that is truly representative of a particular piece of innate knowledge without overtly biasing themodel is costly and non-trivial. Also, as stated in [9],direct implementation of hints in the learning process ismuch more beneficial than are methods of incorporatingdomain knowledge through data assimilation. Hence, wedevelop methods wherein innate knowledge about a systemis directly incorporated into the learning process and notthrough external costly means like data assimilation. Weshow that incorporating domain constraints directly into theloss function can be used to greatly improve model qualityof a learning algorithm like a deep neural network (NN)even if it is trained using a sparse, noisy dataset that isnot completely representative of the spectrum of operationalcharacteristics of a system.Research closest to ours has been conducted by Karpatneet al. [10]. Here, the authors propose a physics guidedneural network model for modeling lake temperature. Theyutilize the increasing monotonic relationship of water densitymeasurements with increasing depth as the physical domainknowledge that is incorporated into the loss function. Theypredict the density of water in a lake at different depths andutilize the predicted densities to calculate the correspondingwater temperature at those depths using a well establishedphysical relationship between water temperature and density.However, they incorporate only a single type of domainknowledge (i.e., monotonic relationships). In this work, wehave augmented the approach in [10] to model other typesof domain rules and characterize model behavior in manychallenging circumstances (to be detailed in later sections).III. P ROBLEM F ORMULATION AND S OLUTIONA PPROACHProblem Statement: Leverage domain knowledge to traina robust, accurate learning model that yields good modelperformance even with sparse, noisy training data.Innate knowledge about the functioning of a system S maybe available in several forms. One of the most commonforms of knowledge is a quantitative range of normal operation for a particular process variable Y in S. Another typeof domain knowledge could be incorporating monotonicallyincreasing or decreasing relationships between different process variables or measurements of the same process variabletaken in different contexts. To incorporate these domainbased constraints into the inductive learning process, wedevelop domain adapted neural networks (DANN).We select deep neural network models as the inductivelearner owing to their ability to model complex relationshipsand adopt the framework proposed in [10] for incorporatingdomain knowledge in the training of deep neural networkmodels.The generic hybrid loss function of the deep learning modelis depicted in Eqn. 1. Here, Loss(Y, Ŷ ) is a mean squarederror loss used in many inductive learning applications forregression and Y , Ŷ are the ground-truth and predictedvalues, respectively, of the target system variable. R(f ) is anL2 regularization term used to control model complexity ofthe model f . The LossD (Ŷ ) term is the domain loss directlyincorporated into the neural network loss function used toenforce that the model learned from training data is also inaccordance with certain accepted domain rules.argmin Loss(Y, Ŷ ) λD LossD (Ŷ ) λR(f )(1)fHere λD is a hyper-parameter determining the weight ofdomain loss in the objective function. We chose the valueof λD empirically (see Fig. 3). λ is another hyper-parameterdetermining the weight of the regularizer. We model twotypes of constraints: 1) Approximation Constraints; and, 2)Monotonicity Constraints.A. Approximation ConstraintsNoisy measurements quite often cause significant deviationin model quality. In such cases, the insights domain expertspossess about reasonable ranges of normal operation of thetarget variable could help in training higher quality models.We wish to incorporate these approximation constraintsduring model training, to produce more robust models.Such constraints m