Abstract
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. However, there is currently a lack of general techniques that can jointly fuse multiple data sets with varying fidelity levels while also estimating calibration parameters. To address this gap, we introduce a novel approach that, using latent-map Gaussian processes (LMGPs), converts data fusion into a latent space learning problem where the relations among different data sources are automatically learned. This conversion endows our approach with some attractive advantages such as increased accuracy and reduced overall costs compared to existing techniques that need to take a combinatorial approach to fuse multiple datasets. Additionally, we have the flexibility to jointly fuse any number of data sources and the ability to visualize correlations between data sources. This visualization allows an analyst to detect model form errors or determine the optimum strategy for high-fidelity emulation by fitting LMGP only to the sufficiently correlated data sources. We also develop a new kernel that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with quite a high accuracy and consistency. The implementation and use of our approach are considerably simpler and less prone to numerical issues compared to alternate methods. Through analytical examples, we demonstrate the benefits of learning an interpretable latent space and fusing multiple (in particular more than two) sources of data.
1 Introduction
Computer models are increasingly employed in the analysis and design of complex systems. For a particular system, there are typically various models available whose fidelity is generally related to their costs; i.e., accurate models are generally more expensive. In such a scenario, multi-fidelity modeling techniques are adopted to balance costs and accuracy when using all these models in the analyses [1,2]. Additionally, computer models typically have some calibration parameters which are estimated by systematically comparing their predictions to experiments/observations [3]. These parameters either correspond to some properties of the underlying system being modeled or act as tuning knobs that compensate for the model deficiencies. In this paper, we introduce a versatile, efficient, and unified approach for emulation-based multi-fidelity modeling and calibration (henceforth, we use the term data fusion to refer to both multi-fidelity modeling and calibration because they all involve fusing or assimilating multiple sources of data). Our approach is based on latent-map Gaussian processes and its core idea is to convert data fusion into a learning process where different data sources are related in a nonlinearly learned manifold.
Over the past few decades, many data fusion techniques have been developed for outer-loop applications such as design optimization, sequential sampling, or inverse parameter estimation. For example, multi-fidelity modeling can be achieved via space mapping [4–6] or multi-level [7–9] techniques where the inputs of the low-fidelity data are mapped via xl = F(xh). In this equation, xl and xh are the inputs of low- and high-fidelity data sources, respectively, and F(·) is the transformation function whose predefined functional form is calibrated such that yl(F(xh)) approximates yh(xh) as closely as possible. These techniques are particularly useful in applications where higher fidelity data are obtained by successively refining the discretization of the simulation domain [7,9], e.g., by refining the mesh when simulating the flow over an airfoil. The main disadvantage of space mapping techniques is that choosing a near-optimal functional form for F(·) is iterative and very cumbersome.
Two of the most important aspects of multi-fidelity modeling are choosing the emulators that surrogate the data sources and formulating the relation between these emulators. Correspondingly, several methods have been developed based on Gaussian processes (GPs) [3], Co-Kriging [10], polynomial chaos expansions [11,12], and moving least squares [13]. The interested reader is referred to Refs. [2,14] for more comprehensive reviews on multi-fidelity modeling and how they benefit outer-loop applications.
Multi-fidelity modeling is closely related to the calibration of computer models since the latter also involves working with at least two data sources where typically the low-fidelity one possesses the calibration parameters. Besides the traditional ways of estimation that are ad hoc and involve trial and error, there are more systematic methods that are based on generalized likelihood [15] or Bayesian principles [16].
Among existing methods for multi-fidelity modeling and calibration, the most popular emulator-based method in engineering design is that of Kennedy and O’Hagan (KOH) [3] which assimilates and emulates two data sources while estimating calibration parameters of the low-fidelity source (if there are any such parameters). KOH’s approach is one of the first attempts that considers a broad range of uncertainty sources arising during the calibration and subsequent uses of the emulator. This approach has been used in many applications including climate simulations [17], materials modeling [18], and modeling shock hydrodynamics [19].
KOH’s approach assumes that the discrepancies between the two data sources are additive2 and that both data sources and the discrepancy between them can be modeled via GPs. The approach then uses (fully [20,21] or modular [18,22–25]) Bayesian inference to find the posterior estimates of the GPs as well as the calibration parameters. The fully Bayesian version of KOH’s method offers advantages such as low computational costs for small data sets or quantifying various uncertainty sources (e.g., lack of data, noise, model form error, and unknown simulation parameters). However, obtaining the joint posteriors via Markov chain Monte Carlo (MCMC) is quite effortful and expensive, especially in high dimensions or with relatively large datasets. The modular version of KOH’s approach addresses this limitation by typically using point estimates for the GP hyperparameters of the low-fidelity data [3,23]. These estimates are obtained via maximum likelihood estimation (MLE) and, while they result in a small under-estimation of uncertainties with small data, provide accurate mean predictions.
A major limitation of KOH’s approach and other reviewed data fusion techniques is that they only accommodate two data sources at a time. That is, the fusion process must be repeated p times if there are p low-fidelity and one high-fidelity data sources. In addition to being tedious and expensive, this repetitive process does not provide a straightforward diagnostic mechanism for comparing the low-fidelity sources to identify, e.g., which one(s) perform similarly or have the smallest model form error.
In this paper, we aim to address the abovementioned limitations of the existing technologies for data fusion. Our primary contributions are threefold and summarized as follows. First, we convert multi-fidelity modeling into a latent space learning problem. This conversion is achieved via latent-map Gaussian processes (LMGPs) and endows our approach with important advantages such as flexibility to jointly fuse any number of data sources and ability to visualize correlations between them. This visualization provides the user with an easy-to-interpret diagnostic measure for identifying the relations between different data sources. We believe the joint fusion (of more than two sources) and the accompanying visualization aids reduce the overall costs of multi-fidelity modeling compared to reviewed methods since they eliminate the iterative process of data source selection and link the fusion results across the iterations (note that our approach is also applicable to problems with two data sources). Second, we develop a new kernel function that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with high accuracy and consistency. Third, the implementation of our approach is considerably simpler and less prone to numerical issues compared to the reviewed technologies (especially KOH’s approach).
The rest of the paper is organized as follows. In Sec. 2, we briefly review the relevant technical background on GPs and LMGPs (see Sec. 7 for Nomenclature). In Sec. 3, we introduce our approach to multi-fidelity modeling and calibration while demonstrating its performance on four pedagogical examples. In Sec. 4, we validate our approach against GPs and KOH’s method on six analytic and engineering examples. We conclude the paper in Sec. 5 by discussing the advantages and limitations of our approach, considerations that should be made in its application, and its application to multi-response problems.
2 Emulation via Latent-Map Gaussian Processes
We review emulation via GPs and a variation of GPs (i.e., LMGP) for data sets that include categorical inputs. Throughout, symbols or numbers enclosed in parentheses encode sample numbers and are used either as subscripts or as superscripts. For example, x(i) or x(i) denote the ith sample in a training data set while xi indicates the ith component of the vector . We use h and l either as superscript or as subscript to denote high- and low-fidelity data sources. For instance, and denote, respectively, the inputs and output of the ith sample in the high-fidelity data set. In cases where there is more than one low-fidelity source, we add a number to the l symbol, e.g., denotes the third low-fidelity source. Lastly, we distinguish between the data source (or the underlying function) and samples by specifying the functional dependence (e.g., y(x) is a function while y and y are, respectively, a scalar and a vector of values).
The correlation function in Eq. (2) depends on the distance between two arbitrary input points x and x′. Hence, traditional GPs cannot accommodate categorical inputs (such as gender and zip code) as they do not possess a distance metric. This issue is well established in the literature, and there exist a number of strategies that address it by reformulating the covariance function such that it can handle categorical variables [29–32]. In this paper, we use LMGPs [33] which are recently developed and shown to outperform previous methods.
We can construct ζ in a number of ways, see Ref. [33] for more information on selecting the priors. In this paper, we use a form of one-hot-encoding. Specifically, we first construct the 1 × mi vector for the categorical variable ti such that when ti is at level j and when ti is at level k ≠ j for, k ∈ 1, 2, · · ·, mi. Then, we set . For instance, in the above example with two categorical variables, t1 = {92697, 92093} and t2 = {math, physics, chemistry}, we encode the combination t = [92093, physics]T by ζ(t) = [0, 1, 0, 1, 0] where the first two elements encode zip code while the rest encode the subject.
By minimizing L one can solve for and subsequently obtain and using Eqs. (6) and (7). While many heuristic global optimization methods exist such as genetic algorithms [35] and particle swarm optimization [36], gradient-based optimization techniques based on, e.g., the L-BFGS algorithm [37], are generally preferred due to their ease of implementation and superior computational efficiency [26,38]. With gradient-based approaches, it is essential to start the optimization via numerous initial guesses to improve the chances of achieving global optimality [33,38].
After obtaining the hyperparameters via MLE, the response at any is estimated via where denotes expectation, , is an n × 1 vector with the ith element , and V is the covariance matrix with the (i, j)th element . Additionally, The posterior covariance between the responses at the two inputs and x′ is where .
The above formulations can be easily extended to cases where the data set is noisy. GPs (and hence LMGPs) can address noise and smoothen data by using a nugget or jitter parameter, δ, which is incorporated into the correlation matrix. That is, R becomes where In×n is the identity matrix of size n × n. If the nugget parameter is used, the estimated (stationary) noise variance in the data will be . The version of LMGP used in this paper finds only one nugget parameter and uses it for all categorical combinations; i.e., we assume that the noise level is the same for each data set. LMGP can be modified in a straightforward manner to have a separate nugget parameter (and hence separate noise estimate) for each categorical combination.
3 Proposed Framework for Data Fusion
In this section, we first explain the core idea and rationale of our approach in Sec. 3.1. Then, we detail how it is used for multi-fidelity modeling and calibration in Secs. 3.2 and 3.3, respectively. In the latter two subsections, we provide pedagogical examples to facilitate the discussions and elaborate on the benefits of the learned latent space in diagnosing the results. The notation introduced in Sec. 2 is also used here (see Sec. 7 for Nomenclature).
3.1 The Rationale Behind Using a Latent Space for Data Fusion.
Factors that affect the fidelity of various data sources are either known or not; in either case, they typically cannot be easily used in the fusion process. Consider an engineering application on predicting the fracture toughness of an alloy where an engineer states “model A and model B achieve errors of and when their predictions are tested against experimental data.” These inaccuracies and their difference can be due to many underlying factors such as noise in the experiments, missing physics in either of the models (especially model B), uncertain material properties (i.e., calibration parameters) that affect the fracture behavior, or numerical errors associated with the computer models (e.g., coarse discretization). It is very difficult to quantitatively incorporate all these factors into data fusion. Hence, existing fusion methods such as that of the Kennedy and O’Hagan [3] assign labels or qualitative variables to data (e.g., data from “model A” or data from “experiments”) and then develop fusion formulas that break down if the underlying assumptions are incorrect or if there are many information sources.
We argue that data fusion should be based on learned quantitative variables instead of assigned qualitative labels to enable instruction-free and versatile fusion. We use LMGPs to learn these quantitative variables (other methods can be used as well) in a latent space that aims to encode the underlying factors which distinguish different data sources. The power of latent spaces in learning hidden factors is perhaps best exemplified in computer vision where deep neural networks encode high-dimensional images to a low-dimensional latent space where a single axis learns smiling (Fig. 1(a)).

Data fusion as a latent space learning problem: (a) latent representation of facial features: A latent representation enables drastic reduction of dimensionality such that each axis encodes a complex feature. (b) Data fusion with LMGP: Calibration inputs and outputs, denoted by θ, are absent in multi-fidelity problems.

Data fusion as a latent space learning problem: (a) latent representation of facial features: A latent representation enables drastic reduction of dimensionality such that each axis encodes a complex feature. (b) Data fusion with LMGP: Calibration inputs and outputs, denoted by θ, are absent in multi-fidelity problems.
As shown in Fig. 1(b), data fusion via LMGP is achieved via the following steps. First, we augment the various datasets with categorical inputs that aim to distinguish the data sources and also add unknown calibration parameters (if applicable). Then, we fit a single LMGP to the combined data set to obtain emulators of the data sources and estimates of the calibration parameters (if applicable). Finally, once the LMGP is trained, we visualize the learned latent space to analyze the relations between the sources. In the following subsections, we provide more details on each of these steps.
Following the above--mentioned description, we summarize our goals in data fusion as building emulators for each data source (especially the high-fidelity one), estimating any unknown calibration parameters, and automatically obtaining the relation between the various data sources. We also note that our approach can simultaneously fuse any number of data sources with any level of fidelity. Without lack of generality, hereafter, we will assign only one source as high fidelity and the rest of the sources are treated as low fidelity. This assignment is adopted to simplify the descriptions and does not affect our approach at all since we do not use any knowledge on the fidelity level during fusion (e.g., if there are two experimental and three simulation data sets, we can assign any one of them as high-fidelity and the rest as low-fidelity).
3.2 Multi-Ffidelity Modeling via LMGP.
Using LMGP for multi-fidelity modeling is quite straightforward. Consider the case where multiple (i.e., two or more) data sources with different levels of accuracy are available, and the goal is to emulate each source while (1) having limited data, especially from the most accurate source, (2) accounting for potential noises with unknown variance, and (3) avoiding a priori determination of how different sources are related to each other. The last condition indicates that we do not know (1) how the accuracy of the low-fidelity models compare to each other, and (2) if low-fidelity models have inherent discrepancy which may be additive or not. While not necessary, we assume it is known which data source provides the highest fidelity because this source typically corresponds to either observations/experiments or a very expensive computer model.
We assume nh high-fidelity samples are available whose inputs and output are denoted by xh and yh, respectively. We also presume that the data set obtained from the ith low-fidelity source has samples where the inputs and outputs are denoted via and , respectively.
With the above-mentioned points in mind, we use two examples in the following subsections to demonstrate our approach to multi-fidelity modeling.
3.2.1 A Simple Analytical Example.
To perform data fusion with LMGP, we first append the inputs with one or more categorical variables that distinguish the data sources. We can use any number of multi-level categorical variables. That is, we can either (1) select a single variable with at least as many levels as there are data sources or (2) use a few multi-level categorical variables with at least as many level combinations as there are data sources. For example, with one categorical variable, we can choose t = {h, l1, l2, l3}, t = {1, 2, 3, 4}, t = {1, a, ab, 2}, or t = {a, b, c, d, e} for our pedagogical example with four data sources (in the last case level e does not correspond to any of the data sources).
For the remainder of this paper, we use two strategies for choosing categorical variables, see Fig. 2. Strategy 1 uses one categorical variable with as many levels as data sources, e.g., t = {a, b, c, d} or t = {1, 2, 3, 4}. We add the subscript s to an LMGP that uses this strategy since a single categorical variable is used to encode the data sources. Strategy 2 employs multiple categorical variables where the number of variables and their levels equals the number of data sources3, e.g., ti = {a, b, c, d} with i = 1, 2, 3, 4. We place the subscript m to an LMGP that uses strategy 2 to indicate that multiple categorical variables are employed. As we explain below, having more levels (or level combinations if more than one t is used) than data sources provides LMGP with more flexibility to learn the relation between the sources. This flexibility comes at the expense of having larger A and higher computational costs. As we demonstrate in Sec. 4, the performance of LMGP is relatively robust to this modeling choice as long as there are sufficient training samples and the number of latent positions does not greatly exceed the number of hyperparameters in A. Regarding the latter condition, note that when LMGP must find many latent positions with a small A (i.e., a very simple map), performance may suffer due to local optimality. For example, Strategies 2 with 4 data sources results in latent positions (one for each possible categorical level combination where only 4 corresponds to data sources) but there are only elements in A. These elements are supposed to map the 256 points in the latent space such that the 4 points which encode the data sources have inter-distances that reflect the underlying relation between their corresponding data sources. Without sufficient data and regularization, the learned map may provide a locally optimal solution.

Data preprocessing for multi-fidelity modeling via LMGP: We can use any number of multi-level categorical variables when fusing data with LMGP. Shown above are two strategies for choice of t for our example with four data sources. In strategy 1, we use one categorical variable with four levels (one for each data source) and assign each level to a unique data source. In strategy 2, we use a different categorical variable for each data source, and we give each categorical variable four levels (one for each data source) for a total of 44 = 256 categorical combinations. We assign only four of these combinations to our data sources (only these four are enumerated in the figure), leaving 252 combinations unused. Note that while LMGP finds latent positions for these 252 combinations, the positions are not meaningful since they do not correspond to any of the data sources. The number of elements in the A matrix (see Eq. (4) that must be estimated for LMGP are 8 and 32 for the first and second strategies, respectively.

Data preprocessing for multi-fidelity modeling via LMGP: We can use any number of multi-level categorical variables when fusing data with LMGP. Shown above are two strategies for choice of t for our example with four data sources. In strategy 1, we use one categorical variable with four levels (one for each data source) and assign each level to a unique data source. In strategy 2, we use a different categorical variable for each data source, and we give each categorical variable four levels (one for each data source) for a total of 44 = 256 categorical combinations. We assign only four of these combinations to our data sources (only these four are enumerated in the figure), leaving 252 combinations unused. Note that while LMGP finds latent positions for these 252 combinations, the positions are not meaningful since they do not correspond to any of the data sources. The number of elements in the A matrix (see Eq. (4) that must be estimated for LMGP are 8 and 32 for the first and second strategies, respectively.
The above description clearly indicates that LMGP can, in principle, fuse any number of data sets simultaneously. In practice, this ability of LMGP is bounded by the natural limitations of GPs such as scalability to big data or very high dimensions. The recent advancements in GP modeling for big or high-dimensional data [38–44] have addressed these limitations to some extent and can be directly used in LMGP for multi-fidelity modeling in our future works.

Approaches to data fusion: (a) LMGP with all available data: LMGP fit to all available data is able to emulate each data source with high accuracy. The inaccuracy of does not negatively impact high-fidelity emulation performance. (b) Standard GP: Standard GP fit to only the three available high-fidelity samples performs poorly. (c) Learned latent space: LMGP only uses four data sets to learn a latent space that indicates how “close” different data sources are with respect to each other. While the data sets are quite unbalanced (nh = 3 and ), LMGP can clearly visualize the relative accuracy of each low-fidelity model with respect to the high-fidelity data. (d) LMGP with only and yh(x): Despite the fact that misrepresents yh(x) in some regions, LMGP is able to use correlations between the two sources to accurately emulate yh(x) with approximately equivalent accuracy to when all sources are used.

Approaches to data fusion: (a) LMGP with all available data: LMGP fit to all available data is able to emulate each data source with high accuracy. The inaccuracy of does not negatively impact high-fidelity emulation performance. (b) Standard GP: Standard GP fit to only the three available high-fidelity samples performs poorly. (c) Learned latent space: LMGP only uses four data sets to learn a latent space that indicates how “close” different data sources are with respect to each other. While the data sets are quite unbalanced (nh = 3 and ), LMGP can clearly visualize the relative accuracy of each low-fidelity model with respect to the high-fidelity data. (d) LMGP with only and yh(x): Despite the fact that misrepresents yh(x) in some regions, LMGP is able to use correlations between the two sources to accurately emulate yh(x) with approximately equivalent accuracy to when all sources are used.
Plugging the latent positions into Eq. (12) shows that a relative distance of between two points scales the correlation function by . Thus, we can interpret the latent space as being a distillation of the correlations between the data sources. Note, however, that the term exp{− (x − x′)TΩx(x − x′)}, which accounts for the correlation between outputs at different points in the input space, remains the same as we change data sources. Thus, our modeling assumption is that this correlation is similar for all data sources. In layman’s terms, we expect each data source to have a relatively similar shape. This is often true in multi-fidelity problems and if this modeling assumption is not met, LMGP estimates Ωx to provide the best compromise between different sources, which may provide poor performance in emulation for some or all sources. To avoid making such a compromise, we can use the latent space to identify the dissimilar data source(s) and then repeat the fusion process after excluding them.
Note also that the objective function in Eq. (8) that is used to find the latent positions is invariant under translation and rotation. In order to find a unique solution, we enforce the following constraints in two dimensions (more constraints are needed for dz > 2): latent point 1 is placed at the origin, latent point 2 is positioned on the positive x axis, and latent point 3 is restricted to the y > 0 half-plane. We assign yh(x) to position 1 for both of our strategies as it yields more readable latent plots, but this choice is arbitrary and does not affect the relative distances between the latent positions as shown in Sec. 4.
Returning to our example with the above constraints in mind, we can see that the latent points corresponding to yh(x) and are close and the other points relatively distant, especially the point representing . This observation matches with our knowledge of the relative accuracies of the underlying functions with respect to yh(x) (this knowledge is not provided to LMGP). In other words, LMGP has accurately determined the correlations between the data sources despite the sparse sampling for yh(x). Given that appears to be much more accurate than other low-fidelity sources with respect to yh(x), one might consider fitting LMGP using only data from these two sources rather than all of the data to produce a more accurate high-fidelity emulator. The results of this approach, shown in Fig. 3(d), demonstrate that high-fidelity emulation performance is actually equivalent with all sources used; i.e., using less accurate sources does not make our estimate of yh(x) worse in this case because they include useful information about yh(x).
In order to support our assertion that a two-dimensional latent space is typically sufficient to encode the relationships between data sources, we show the latent space for LMGP fits all data sources with dz = 3 in Fig. 4. We enforce the following constraints in three dimensions: latent point 1 is placed at the origin, latent point 2 is positioned on the positive z1 axis, latent point 3 is restricted to the half-plane, and latent point 4 is restricted to z3 ≥ 0. These constraints reduce degrees-of-freedom by restricting translation, rotation, and reflection. In this case, we find that the relative distances between the latent points in Fig. 4 are nearly the same as those in Fig. 3(c), which indicates that two dimensions are sufficient to encode the relationships between the data sources.

Learned latent space with dz = 3: LMGP finds the latent positions to lie on a two-dimensional subspace
3.2.2 Effect of Categorical Variable Assignment.

Approaches to categorical variable assignment: (a) accuracy of data sources: Both low-fidelity sources are equally accurate, i.e., they have the same RRMSE with respect to yh(x). (b) Latent space for LMGPs All: We show the latent space for one repetition, but LMGP consistently finds one source to be close to and another to be distant from the position for yh(x) across repetitions. (c) Latent space for LMGPm All: We show the latent space for one repetition. The positions and relative distances are not consistent across repetitions. The gray dots correspond to latent positions that do not correspond to any data source. (d) High-fidelity emulation performance across 30 repetitions: LMGP outperforms GP in high-fidelity emulation for both categorical variable strategies. MSEs are calculated by comparing emulator predictions to analytic function outputs at 10,000 points. (e) and (f) Latent spaces with more data: With more data, LMGPs All, shown in (e), and LMGPm All, shown in (f), consistently find latent positions that accurately reflect the relative accuracies of the data sources. We do not show the latent positions not corresponding to any data sources in (f), and as such, the shown points do not conform to the 2D constraints.

Approaches to categorical variable assignment: (a) accuracy of data sources: Both low-fidelity sources are equally accurate, i.e., they have the same RRMSE with respect to yh(x). (b) Latent space for LMGPs All: We show the latent space for one repetition, but LMGP consistently finds one source to be close to and another to be distant from the position for yh(x) across repetitions. (c) Latent space for LMGPm All: We show the latent space for one repetition. The positions and relative distances are not consistent across repetitions. The gray dots correspond to latent positions that do not correspond to any data source. (d) High-fidelity emulation performance across 30 repetitions: LMGP outperforms GP in high-fidelity emulation for both categorical variable strategies. MSEs are calculated by comparing emulator predictions to analytic function outputs at 10,000 points. (e) and (f) Latent spaces with more data: With more data, LMGPs All, shown in (e), and LMGPm All, shown in (f), consistently find latent positions that accurately reflect the relative accuracies of the data sources. We do not show the latent positions not corresponding to any data sources in (f), and as such, the shown points do not conform to the 2D constraints.
The latent space for LMGP using one categorical variable is demonstrated in Fig. 5(b) and shows that this strategy enables LMGP to learn that both sources have inaccuracy with respect to yh(x). However, LMGP consistently finds one source to be significantly more accurate than the other as a result of the sparse sampling. By contrast, the positions found by LMGP using multiple categorical variables are very inconsistent across repetitions and often estimate one of the sources as being either extremely correlated or uncorrelated with yh(x) (Fig. 5(c)). This inconsistency is because LMGPm All has quite a few hyperparameters (1 roughness parameter and 18 parameters in the A matrix), which are difficult to estimate with scarce data. Across the repetitions of LMGPm All, at least one data source is always found to be well correlated with yh(x) so high-fidelity predictions are still good and much better than fitting a traditional GP to only the high-fidelity data (Fig. 5(d)). When we increase the available data to nh = 15, , both LMGPs All and LMGPm All consistently (i.e., across repetitions) find latent positions for the low-fidelity functions that are approximately equidistant from yh(x). We demonstrate this in Fig. 6, which shows histograms of the distances between the latent points for yh(x) and or in (a) and (b), respectively. Notably, LMGPm All is less consistent in both cases, with a few poor-performing outliers in Fig. 6(b). Interestingly, the positions for the two low-fidelity sources are in opposite directions from yh(x) which agrees with the fact that discrepancies are equal but of opposite sign (Figs. 5(e) and 5(f)). Notably, as we show in Fig. 7, this property is not a result of the constraints we apply to the latent points during fitting and persists even when no constraints are applied; i.e., all three points lie on a line.

Histogram of latent distances: The above figure shows a histogram of the relative distances across 30 repetitions between yh and each low-fidelity source for both strategies. (a) yh and : Both strategies find similar distances, with LMGPm All being only slightly less consistent. (b) yh and : Both strategies again find similar distances. This time, however, LMGPm All displays a higher number of poor-performing outliers.

Histogram of latent distances: The above figure shows a histogram of the relative distances across 30 repetitions between yh and each low-fidelity source for both strategies. (a) yh and : Both strategies find similar distances, with LMGPm All being only slightly less consistent. (b) yh and : Both strategies again find similar distances. This time, however, LMGPm All displays a higher number of poor-performing outliers.

Latent space with no constraints: The relative relationships between the data sources in the latent space remain the same without applying constraints to the locations of the points
While we did not apply noise to the samples in these pedagogical examples, as we demonstrate in Sec. 4, LMGP is fairly robust to noise both with respect to emulation performance and finding latent positions.
3.3 Calibration via LMGP.
Calibration problems closely resemble multi-fidelity modeling in that a number of high- and low-fidelity data sets are assimilated or fused together. However, in such problems, low-fidelity data sets4 typically involve calibration inputs which are not directly controlled, observed, or measured in the high-fidelity data (i.e., high-fidelity data have fewer inputs). Hence, in addition to building surrogate models, one seeks to inversely estimate these inputs during the calibration process.
Preprocessing the data for calibration via LMGP is schematically illustrated in Fig. 8. Following the same procedure described in Sec. 3.2, we append the inputs with categorical variables to distinguish data sources. We also augment the high-fidelity inputs with some unknown values to account for the missing calibration parameters. Once the mixed data set that contains all the low- and high-fidelity data are built, we directly use it in LMGP to not only build emulators for each data source but also estimate . Similar to multi-fidelity modeling, any number of data sets can be simultaneously used via LMGP for calibration.

Preprocessing of data for calibration: Multiple data sets are combined in a specific way and then directly used by LMGP. The high-fidelity data are augmented with NaNs since they lack calibration parameters, and all data are augmented with categorical IDs that denote the source from which a datum is drawn. We use strategy 1 for choice of t in both examples in Sec. 3.3.

Preprocessing of data for calibration: Multiple data sets are combined in a specific way and then directly used by LMGP. The high-fidelity data are augmented with NaNs since they lack calibration parameters, and all data are augmented with categorical IDs that denote the source from which a datum is drawn. We use strategy 1 for choice of t in both examples in Sec. 3.3.
We now illustrate the capabilities of LMGPs for calibration via two analytical examples where there are one high-fidelity data source yh(x) and up to two low-fidelity data sources, denoted by and . We presume that in both examples the goals are to accurately emulate the high-fidelity data source and estimate the calibration parameters. We note that once an LMGP is trained, it provides an emulator for each data source but here we only evaluate accuracy for surrogating yh(x) since much fewer data points are available from it, and hence, emulating it is more difficult.
3.3.1 A Simple Calibration Problem.
We set as 0.1 because it is the true value of the coefficient on the leading x3 term. Note that can match yh(x) perfectly with an appropriate choice of θ; i.e., has no model form error when (Fig. 9(a)). Conversely, no value of θ allows to match yh(x) since has a linear model form error. When solving this calibration problem, we assume there is no knowledge on whether low-fidelity models have discrepancies and expect the learned latent space of LMGP to provide diagnostic measures that indicate potential model form errors.

Calibration with LMGP: (a) Underlying functions with true calibration parameters: and yh(x) are coincident for . (b) Latent space for LMGPs All: Latent positions for yh(x) and are coincident while the position for is relatively more distant (albeit still quite close). (c) Histogram of estimated calibration parameters: We estimate θ over 30 repetitions where the LMGP fitted via all data yields more consistent estimates. All three models use a single categorical variable to encode data sources. (d) High-fidelity emulation performance: Using all data yields the best performance since data sources are correlated. (e) and (f) Latent space for and : LMGP cannot detect model form error between yh(x) and since data are scarce and an appropriately estimated θ enables to resemble yh(x) fairly well as shown in (e). LMGP can correctly detect that does not have model form error, as shown in (f). (g) with estimated calibration parameters versus yh(x): can nearly interpolate sparse training data for yh(x) with the appropriate calibration parameter.

Calibration with LMGP: (a) Underlying functions with true calibration parameters: and yh(x) are coincident for . (b) Latent space for LMGPs All: Latent positions for yh(x) and are coincident while the position for is relatively more distant (albeit still quite close). (c) Histogram of estimated calibration parameters: We estimate θ over 30 repetitions where the LMGP fitted via all data yields more consistent estimates. All three models use a single categorical variable to encode data sources. (d) High-fidelity emulation performance: Using all data yields the best performance since data sources are correlated. (e) and (f) Latent space for and : LMGP cannot detect model form error between yh(x) and since data are scarce and an appropriately estimated θ enables to resemble yh(x) fairly well as shown in (e). LMGP can correctly detect that does not have model form error, as shown in (f). (g) with estimated calibration parameters versus yh(x): can nearly interpolate sparse training data for yh(x) with the appropriate calibration parameter.
As shown in Fig. 9(b), the learned latent positions by LMGP are quite consistent with our expectations despite the fact that limited and unbalanced data are used in LMGP’s training. It is evident that the latent positions corresponding to yh(x) and are very close to each other, indicating negligible model form error. In contrast, the positions corresponding to yh(x) and are more distant which signals that has model form error.
The learned latent positions in Fig. 9(b) suggest that (when calibrated properly) captures the behavior of yh(x) better than . Correspondingly, one may argue calibrating individually may improve performance. To assess this argument, we fit LMGPs to three combinations of the available data sets and compare the performance of these LMGPs in terms of estimating and emulating yh(x). In all three cases, we use a single categorical variable to encode the data source, and hence, the subscript s is appended to the model names (so, calibrates via yh(x) and uses a single categorical variable). The results are shown in Figs. 9(c) and 9(d) and indicate that using both low-fidelity data sets provides the best performance since (1) are estimated more consistently as the distribution is centered at with small variations, and (2) errors (measured in terms of mean squared error, MSE) for predicting yh(x) are smaller. These observations can be explained by the fact that the highest relative distance between data sources in Fig. 9(b) is on the order of 0.05, which indicates that LMGP finds to be very similar to yh(x) and as this distance scales the correlation function by exp{( − 0.05)2} ≈ 0.998. That is, LMGP can distill useful knowledge from the correlation between and other sources to improve its performance in estimating θ and emulating yh(x). When is excluded from the calibration process and only is used in calibration, LMGP provides biased and less consistent estimates for θ and relatively large MSEs for predicting yh(x).
While the distance in the latent space typically encodes model form error that is not reduceable by adjusting θ, LMGP may mistake model form error for noise in the case that certain calibration parameters allow the low-fidelity model to closely match the high-fidelity function. This is the case if we fit LMGP to only yh(x) and . As shown in Fig. 9(e), LMGP places the latent positions for yh(x) and very close to each other when is excluded. We explain this observation by referring back to Fig. 9(c) where finds . Plotting for this value of θ reveals that it can nearly interpolate the training data (Fig. 9(g)). As such, LMGP mistakes 0.25 for the true value of θ and dismisses the small resultant error as noise. This also explains the aforementioned bias and inconsistency in estimating θ across repetitions as the value that comes closest to interpolating yh(x) is different depending on sampling variations. By contrast, LMGP fit to all data is able to leverage the information from to determine that has model form error. And, as expected, no model form error is indicated in the latent space if only is used in calibration (Fig. 9(f)).
As this simple example clearly indicates, a simultaneous fusion of multiple (i.e., more than 2) data sources can decrease identifiability issues in calibration. This property is one of the main strengths of our data fusion approach.
3.3.2 Calibration With Severe Model Form Error.
Based on Eq. (18), can be either π or 10π so the range of θ in yl(x) is chosen wide enough to include both values. As shown in Fig. 10(a), considering implies that the high-fidelity source is either noisy or has a high-frequency component that is missing from the low-fidelity source (note that in realistic applications the functional form of data sources is unknown so high-frequency trends can be easily misclassified as noise in which case they are typically smoothed out, i.e., not learned). Conversely, considering implies that yl(x) is expected to surrogate the high-frequency component of yh(x) and that sin(πx) is the discrepancy. Note that the analytic MSEs (calculated by comparing yh(x) and yl(x) at 10,000 sample points equally spaced over the input range) and cosine similarities (between yh(x) and yl(x), also at 10,000 sample points equally spaced over the input range) are identical for each choice of θ, i.e., both choices yield a discrepancy of the same magnitude, and we cannot determine which choice is better a priori based on MSEs or cosine similarity. We are interested in finding out which value is a better estimate for and whether LMGP is able to consistently infer this value purely from the low- and high-fidelity data sets. We do not corrupt the data sets with noise and investigate the effect of noise in Sec. 4.2.

Calibration via LMGP: (a) Plot of the underlying functions: Due to model form error, yl(x) is unable to capture the behavior of yh(x) regardless of the choice of θ. Choosing θ = π indicates a discrepancy of sin(10π), while choosing θ = 10π indicates a discrepancy of sin(π). Notably, the analytic MSEs (calculated by comparing yh(x) and yl(x) at 10,000 sample points equally spaced over the input range) for both choices of theta are 0.5, i.e., the magnitude of the error is the same for both choices of θ. (b) High-fidelity emulation performance: As we provide more low-fidelity data, LMGP’s performance on high-fidelity emulation increases.

Calibration via LMGP: (a) Plot of the underlying functions: Due to model form error, yl(x) is unable to capture the behavior of yh(x) regardless of the choice of θ. Choosing θ = π indicates a discrepancy of sin(10π), while choosing θ = 10π indicates a discrepancy of sin(π). Notably, the analytic MSEs (calculated by comparing yh(x) and yl(x) at 10,000 sample points equally spaced over the input range) for both choices of theta are 0.5, i.e., the magnitude of the error is the same for both choices of θ. (b) High-fidelity emulation performance: As we provide more low-fidelity data, LMGP’s performance on high-fidelity emulation increases.
We now explore the effects of the low-fidelity data set size on the performance while holding the number of high-fidelity data constant. Specifically, we examine nl = 30, 100, 200 with nh = 15 in each case. Note that standard GP trained on only the 15 available high-fidelity samples cannot learn the high-frequency behavior of yh(x) and instead interprets it as noise.
As shown in Fig. 10(b), increasing nl improves high-fidelity prediction and we can therefore consider the estimates of θ and the latent distances in the nl = 200 case to be the most accurate since they maximize prediction performance. Shown in Fig. 11(a) are histograms of the latent distances over 30 repetitions for each case. When few low-fidelity data are available, the latent distances are close to zero; with plentiful data, the latent distances are clustered around 0.5. This indicates that LMGP interprets yh(x) and yl(x) as being closely correlated when we have few low-fidelity data, but consistently learns that yl(x) has a noticeable error with respect to yh(x) as we provide more data. Without sufficient low-fidelity data, LMGP learns the low-frequency behavior of yh(x) which follows sin(πx) and dismisses the high-frequency behavior as noise. Consequently, LMGP finds a small latent distance since yl(x) can capture sin(πx) without error.

Analysis for sin wave example: (a) Histogram of latent distances: LMGP estimates distances near zero and 0.5 with a few and plentiful data points, respectively. There is a large variance in the latent distances for nl = 100, with a large spike at zero and a cluster near 0.5 which correspond to LMGP’s estimates for nl = 30 and nl = 200 respectively. That is, as the size of the data is increasing, LMGPs interpretation of model form error changes. (b) Histogram of : As more low-fidelity data are provided, estimates become more closely clustered around 10π. With few low-fidelity data, LMGP guesses θ = π almost half of the time but with nl = 200 LMGP almost consistently guesses θ = 10π which means that yl(x) has a high-frequency behavior.

Analysis for sin wave example: (a) Histogram of latent distances: LMGP estimates distances near zero and 0.5 with a few and plentiful data points, respectively. There is a large variance in the latent distances for nl = 100, with a large spike at zero and a cluster near 0.5 which correspond to LMGP’s estimates for nl = 30 and nl = 200 respectively. That is, as the size of the data is increasing, LMGPs interpretation of model form error changes. (b) Histogram of : As more low-fidelity data are provided, estimates become more closely clustered around 10π. With few low-fidelity data, LMGP guesses θ = π almost half of the time but with nl = 200 LMGP almost consistently guesses θ = 10π which means that yl(x) has a high-frequency behavior.
We now examine the histogram of in Fig. 11(b). When few low-fidelity data are available, estimates are clustered around both π and 10π while with plentiful data the estimates are tightly clustered around only 10π. This observation indicates that when little data are available, LMGP interprets yh(x) to more closely resemble sin(πx) almost half of the times which matches with the observation on the learned latent distances; i.e., the high-frequency behavior is interpreted as noise and not learned. As more low-fidelity data are available, LMGP is able to learn the high-frequency behavior of yh(x) using the low-fidelity data and interprets yh(x) as more closely resembling sin(10πx).
Why does LMGP prefer with more data? To answer this question, we note that in LMGP shifting the levels of the categorical variable is expected to reflect a change in data source. With , the shift in the categorical variable is supposed to “model” sin(10πx), which is much more difficult than the alternative. In other words, LMGP is trying to learn the simplest function that must be represented by a shift in the categorical variable (Fig. 12(a)). We further explore this conjecture by fitting an LMGP to 100 noiseless samples from yh(x) and 200 samples from yl(x). This amount of data is sufficient to learn both the high-frequency behavior of yh(x) and the high-frequencies of yl(x) (i.e., the behavior of yl(x) for large θ), and as such, we expect the latent positions and calibration estimates found by LMGP in this case to be optimal. As shown in Fig. 12(b), LMGP finds latent distances near 0.5 and θ = 10π very consistently; i.e., LMGP prefers to estimate the calibration parameters to minimize the complexity of the discrepancy function.

Effect of categorical variable and data set size: (a) Effect of shifting the level: With the shift in the categorical variable is supposed to “model” sin(10πx), which is much more difficult than the alternative. (b) Effect of data set size: with nh = 100 and nl = 200 LMGP consistently estimates θ as 10π so the shift in categorical variable learns the simplest discrepancy candidate, i.e., sin(πx).

Effect of categorical variable and data set size: (a) Effect of shifting the level: With the shift in the categorical variable is supposed to “model” sin(10πx), which is much more difficult than the alternative. (b) Effect of data set size: with nh = 100 and nl = 200 LMGP consistently estimates θ as 10π so the shift in categorical variable learns the simplest discrepancy candidate, i.e., sin(πx).
4 Results
To validate our approach in both multi-fidelity and calibration problems, we test our method on analytical functions and assess its performance against competing methods. In each example, we vary the size of the training data and the added noise variance and repeat the training process to account for randomness (20 times for the multi-fidelity problems and 30 times for the calibration problems). The knowledge of the value of the noise variance is not used in training. To measure accuracy, we use 10, 000 noisy test samples to obtain MSE (note that since the test data are noisy, the MSE obtained by an emulator cannot be smaller than the noise variance).
In our LMGP implementation, we always use dz = 2 and select −3 ≤ ai,j ≤ 3 during optimization where ai,j are the elements of the mapping matrix A. When using LMGP for calibration, the search space for each element of is restricted to [−2, 3] after scaling the data to the range [0, 1] (i.e., we select a search space larger than the sampling range for θ). We use the modular version of KOH’s approach where we set a uniform prior for θ over the sampling range defined in each problem statement. All optimizations are done based on the L-BFGS method, which is a second-order gradient-based optimization technique.
4.1 Multi-Fidelity Results.
We consider two analytical problems with high-dimensional inputs. In the first multi-fidelity problem, we consider a set of four functions that model the weight of a light aircraft wing [45]
These functions are ten-dimensional and have varying degrees of fidelity where, following the notation introduced in Sec. 3, yh(x) has the highest fidelity. Note that in we multiply Wp by zero which is equivalent to reducing the dimensionality of the function by one. As enumerated in Table 2, the above functions are listed in decreasing order with respect to accuracy; that is, and are the most and least accurate models, respectively. Table 2 is generated by evaluating the four functions in Eq. (19) on the same 10, 000 inputs as described in Sec. 3.2 (no noise is added to the outputs). This knowledge of relative accuracy of the data sources is not used when fitting an LMGP.
Relative accuracy of functions for wing-weight problem
RRMSE | 0.19912 | 1.1423 | 5.7484 |
RRMSE | 0.19912 | 1.1423 | 5.7484 |
Note: The functions are listed in decreasing order with respect to accuracy, with being especially inaccurate. 10000 points are used in calculating RRMSE.
We consider various amounts of available low-fidelity data, with and without noise. We also compare the two different settings introduced in Sec. 3.2 where subscripts s and m indicate whether a single or multiple categorical variables are used to encode the data sources in LMGP. We only take 15 samples for yh(x), which is a very small number given the high dimensionality of the input space. Additionally, we investigate the effect of fusing the four datasets jointly against fusing the high-fidelity data with each of the low-fidelity sources (in the former case the subscript All is appended to LMGP while in the latter case l1, l2 or l3 is used in the subscript depending on which source is used in addition to yh(x)).
The results are summarized in Fig. 13 and indicate that the different versions of LMGPs consistently outperform traditional GPs (only fitted to high-fidelity data) in all cases, even when only using the least accurate data source to augment high-fidelity emulation. This superior performance of LMGP is due to taking advantage of the correlations between datasets that compensates, to some extent, for the sparsity of the high-fidelity data. LMGP also has the advantage in consistency where fewer outliers are observed in MSE compared to GP. This consistency indicates that our modeling assumptions (e.g., how to encode the data source) marginally affect the performance in this example.

High-fidelity emulation performance for wing weight example: Performance of the LMGP strategies follows the same trend as data source accuracy for all cases, with LMGP using only arguably outperforming LMGP using all data sources. (a) nh = 15, , σ2 = 0: LMGP using all data sources provides consistent estimates with some outliers. (b) nh = 15, , σ2 = 25: LMGPs All performs noticeably better than other LMGP strategies for this case. (c) nh = 15, , σ2 = 0: LMGP using only arguably outperforms LMGP using all data sources by a very slim margin. (d) nh = 15, , σ2 = 25: Both LMGP strategies that use all data sources outperform those that only use and yh(x) by a slim margin.

High-fidelity emulation performance for wing weight example: Performance of the LMGP strategies follows the same trend as data source accuracy for all cases, with LMGP using only arguably outperforming LMGP using all data sources. (a) nh = 15, , σ2 = 0: LMGP using all data sources provides consistent estimates with some outliers. (b) nh = 15, , σ2 = 25: LMGPs All performs noticeably better than other LMGP strategies for this case. (c) nh = 15, , σ2 = 0: LMGP using only arguably outperforms LMGP using all data sources by a very slim margin. (d) nh = 15, , σ2 = 25: Both LMGP strategies that use all data sources outperform those that only use and yh(x) by a slim margin.
In cases without noise, i.e., Figs. 13(a) and 13(c), LMGPs fit to the data from and yh(x) perform on par with or better than the LMGPs that are fit to all data and the small differences are mostly due to sample-to-sample variations. However, in cases with noise, i.e., Figs. 13(b) and 13(d), using all the data sets improves the performance of LMGP. We explain this observation as follows: In the noiseless cases, LMGP is able to quite accurately learn the behavior of yh(x) using just and using all four data sets provides no additional advantage in learning yh(x) while (1) requiring the estimation of additional hyperparameters (in the A matrix) and (2) compromising the estimates of to handle the discrepancies between the four sources. By contrast, in the cases with noise, one source is insufficient for LMGP to reach the threshold in emulation accuracy (which equals the noise variance) for yh(x). Including additional data sources in these cases helps LMGP to differentiate noise from model form error.
For the remainder of this example, we investigate the most challenging version which has the fewest available data and highest level of noise. The latent space for this problem for LMGPs All, shown in Fig. 14(a), is once again a powerful diagnostic tool. While LMGP only has access to 15 noisy samples from the ten-dimensional function yh(x), the relative distances between latent positions match the relative accuracies of the data sources with respect to yh(x). The distance between yh(x) and is ≈0.4 yielding an approximate correlation of exp{− (0.42)} ≈ 0.85, which means that LMGP still uses information from in predicting the response for yh(x) despite the former’s low accuracy with respect to the latter.

Effect of constraints on the latent space: (a) Default constraints: The latent space for one sample repetition of LMGP fit all available data for the wing-weight function with nh = 15, , σ2 = 25. and are positioned at, respectively, the origin, positive z1-axis, and first or second quadrant. While the learned latent spaces are different across the 30 repetitions, the relative latent distances are consistent both for different repetitions and for different amounts of data/noise. We only show the latent space of a randomly selected repetition. (b) Alternate constraints: The training procedure and data are exactly the same as before except that the three constrains are now applied to and Note that the relative distances between data sources are the same between the two plots.

Effect of constraints on the latent space: (a) Default constraints: The latent space for one sample repetition of LMGP fit all available data for the wing-weight function with nh = 15, , σ2 = 25. and are positioned at, respectively, the origin, positive z1-axis, and first or second quadrant. While the learned latent spaces are different across the 30 repetitions, the relative latent distances are consistent both for different repetitions and for different amounts of data/noise. We only show the latent space of a randomly selected repetition. (b) Alternate constraints: The training procedure and data are exactly the same as before except that the three constrains are now applied to and Note that the relative distances between data sources are the same between the two plots.
We impose a number of constraints in order to obtain a unique solution for the latent positions since our objective function in Eq. (8) is invariant under translation and rotation. For a two-dimensional latent space, we fix the first position to the origin, the second position to the positive z1–axis, and the third position to the z2 > 0 half-plane. As we mentioned before in Sec. 3.2, we also assign the data sources to positions sequentially (i.e., ) with yh(x) at the origin for easier visualization of the relative correlations . While assigning the data sources to latent positions affects the learned latent positions, the relative distances between them remain the same as shown in Fig. 14(b). Since we typically know the data source with the highest fidelity, the learned latent space of LMGP provides an extremely easy way to assess the fidelity of different sources with respect to it.
Prediction performance on the low-fidelity sources for LMGPs All, shown in Fig. 15, follows the same trend as data source accuracy; i.e., it is best for and worst for . When fitting LMGP to multiple data sources, we expect prediction accuracy to be high on sources that are well correlated with others, i.e., whose latent positions are close together or form a cluster. Leveraging information from a well-correlated source improves prediction performance more than the alternative, so each source in the cluster gains a boost in prediction performance from the information of the other sources in that cluster. In this case, yh(x), , and form a cluster and as such we see that MSEs for and are much lower than those for .
The above equations indicate that all low-fidelity functions have nonlinear model form discrepancy. To roughly quantify these discrepancies, we follow the same procedure as in the previous example and calculate RRMSEs (Table 3). As it can be seen, the accuracy of the models increases with i (unlike the previous example—LMGP is robust with respect to this choice).
Relative accuracy of functions for borehole problem
RRMSE | 3.6671 | 1.3688 | 0.36232 |
RRMSE | 3.6671 | 1.3688 | 0.36232 |
Note: The functions are listed in increasing order with respect to accuracy, with being the most accurate by a significant margin.
We consider various amounts of available low-fidelity data, with and without noise. We also use a few combinations for training LMGP based on the selected data sets or how data sources are encoded. The results are summarized in Fig. 16 where, once again, LMGP convincingly outperforms GP in high-fidelity emulation, especially with noisy data (Figs. 16(b) and 16(d)). The overall trends in performance between strategies for LMGP are consistent across the various cases, with LMGP fit to only one low-fidelity source performing worse than LMGP fit to all data sources and with LMGPs All specifically performing the best. LMGPm All yields inconsistent results with nl = 50 or nl = 100, especially in the latter case where the box plots have stretched to include the outliers. This behavior is due to overfitting and the fact that there are many latent positions that must be placed in the latent space via a simple matrix-based map (256 positions and 32 elements in the A matrix). Note that even with these inconsistencies, LMGPm All frequently outperforms GP, , , , and , which indicates that using more than two data sets in fusion is indeed beneficial.

High-fidelity emulation performance for the borehole problem: (a) nh = 15, , σ2 = 0: LMGP strategies that use all data sources perform better than those using only one data source, with LMGPs All performing the best. (b) nh = 15, , σ2 = 6.25: LMGPs All performs noticeably better than other LMGP strategies for this case. (c) nh = 15, , σ2 = 0: LMGPs All again performs noticeably better than other LMGP strategies for this case. LMGPm All displays inconsistency in its estimates. (d) nh = 15, , σ2 = 6.25: LMGPs All again performs noticeably better than other LMGP strategies for this case. LMGPm All again displays inconsistency in its estimates.

High-fidelity emulation performance for the borehole problem: (a) nh = 15, , σ2 = 0: LMGP strategies that use all data sources perform better than those using only one data source, with LMGPs All performing the best. (b) nh = 15, , σ2 = 6.25: LMGPs All performs noticeably better than other LMGP strategies for this case. (c) nh = 15, , σ2 = 0: LMGPs All again performs noticeably better than other LMGP strategies for this case. LMGPm All displays inconsistency in its estimates. (d) nh = 15, , σ2 = 6.25: LMGPs All again performs noticeably better than other LMGP strategies for this case. LMGPm All again displays inconsistency in its estimates.
The learned latent space for LMGPs All which is the most challenging version of this problem (noisy samples, fewest available data) is shown in Fig. 17(a) which clearly indicates that relative distances among the positions match with the relative accuracy between the low- and high-fidelity sources: The position for is very close to that for yh(x), so LMGP weighs data from heavily when emulating yh(x) and vice versa. The position for is also close to both yh(x) and , but it is relatively more distant from yh(x) compared to .

Effects of correlations between data sources for borehole example: (a) Latent space: The latent space for one sample repetition of LMGP fit to all available data for the borehole function with nh = 15, , σ2 = 6.25. While the individual latent spaces are different for each repetition, the relative latent distances are consistent both for different repetitions and for different amounts of data/noise. (b) Low-fidelity MSEs: Low-fidelity prediction accuracy is better for and than for .

Effects of correlations between data sources for borehole example: (a) Latent space: The latent space for one sample repetition of LMGP fit to all available data for the borehole function with nh = 15, , σ2 = 6.25. While the individual latent spaces are different for each repetition, the relative latent distances are consistent both for different repetitions and for different amounts of data/noise. (b) Low-fidelity MSEs: Low-fidelity prediction accuracy is better for and than for .
Like in our first example, prediction performance on the low-fidelity sources for LMGPs All, shown in Fig. 17(b), follows a similar trend to data source accuracy; i.e., it is best for and and worst for , which is the least accurate source. As we mentioned before, we expect prediction accuracy to be high on sources whose latent positions are close together or form a cluster. In this case, yh(x), , and form a cluster, and as such, we see that MSEs for and are much lower than those for .
4.2 Calibration Results.
We compare our calibration approach to that of KOH by considering four test cases with varying degrees of complexity. Note that, while LMGP can simultaneously assimilate and calibrate any number of sources, KOH’s approach only works with two data sets at a time and relies on repeating the process as many times as there are low-fidelity sources.
Relative accuracy of functions for simple calibration problem
RRMSE | 0.22241 | 0.1285 |
RRMSE | 0.22241 | 0.1285 |
Note: We find the RRMSE in calibration problems using the same method as before but with the calibration parameters fixed to their true values at all input points. Both low-fidelity functions are relatively accurate, with more accurate than .
We show high-fidelity emulation performance for this problem in Fig. 18 where, similar to Sec. 4.1, LMGPs are trained under various settings in terms of which data sources are selected and how they are encoded. As it can be observed, LMGP performs on par with or better than KOH’s approach in high-fidelity emulation accuracy for all cases, and LMGPs All offers the most consistent performance for most cases. LMGP also performs particularly well in the cases with noise (Figs. 18(b) and 18(d)). Despite the inaccuracy of , LMGP fit to all data sources offers the most accurate emulation in all cases.

High-fidelity emulation performance: (a) nh = 3, , σ2 = 0: LMGP strategies generally perform better than KOH’s approach, with LMGPs All performing the best. Estimates for all strategies except LMGPs All are fairly inconsistent. (b) nh = 3, , σ2 = 2 · 10−5: LMGPs All performs noticeably better than other LMGP strategies for this case (and better than KOH’s approach). (c) nh = 3, , σ2 = 0: With the addition of more low-fidelity data, all approaches perform better. LMGPs All performs best by a very slim margin, and is more consistent in its performance than comparable strategies. (d) nh = 3, , σ2 = 2 · 10−5: With noise, performs nearly on par with LMGPs All and produces more consistent performance.

High-fidelity emulation performance: (a) nh = 3, , σ2 = 0: LMGP strategies generally perform better than KOH’s approach, with LMGPs All performing the best. Estimates for all strategies except LMGPs All are fairly inconsistent. (b) nh = 3, , σ2 = 2 · 10−5: LMGPs All performs noticeably better than other LMGP strategies for this case (and better than KOH’s approach). (c) nh = 3, , σ2 = 0: With the addition of more low-fidelity data, all approaches perform better. LMGPs All performs best by a very slim margin, and is more consistent in its performance than comparable strategies. (d) nh = 3, , σ2 = 2 · 10−5: With noise, performs nearly on par with LMGPs All and produces more consistent performance.
We next show calibration performance in Fig. 19 where LMGPs All consistently outperforms KOH in both accuracy and consistency, especially in the noiseless cases (Figs. 19(a) and 19(c)). Notably, KOH’s approach fit with yields biased estimates. With noise and little data (Fig. 19(b)), neither LMGP nor KOH’s approach are able to obtain a very consistent estimate for the calibration parameter across the repetitions. When more low-fidelity data are provided (Fig. 19(d)), LMGP is able to leverage the additional low-fidelity data to find a consistent estimate for θ while KOH’s approach does not improve in consistency.

Calibration performance: (a) nh = 3, , σ2 = 0: LMGP offers consistent and unbiased estimates. KOH’s approach suffers from bias and inconsistency. (b) nh = 3, , σ2 = 2 · 10−5: All approaches yield inconsistent estimates. (c) nh = 3, , σ2 = 0: Both KOH’s approach and LMGP yield consistent estimates, but KOH’s approach still suffers from bias. (d) nh = 3, , σ2 = 2 · 10−5: LMGP achieves higher consistency that KOH’s approach with the addition of more low-fidelity data. LMGP’s estimate is unbiased, while KOH’s approach still yields biased estimates.

Calibration performance: (a) nh = 3, , σ2 = 0: LMGP offers consistent and unbiased estimates. KOH’s approach suffers from bias and inconsistency. (b) nh = 3, , σ2 = 2 · 10−5: All approaches yield inconsistent estimates. (c) nh = 3, , σ2 = 0: Both KOH’s approach and LMGP yield consistent estimates, but KOH’s approach still suffers from bias. (d) nh = 3, , σ2 = 2 · 10−5: LMGP achieves higher consistency that KOH’s approach with the addition of more low-fidelity data. LMGP’s estimate is unbiased, while KOH’s approach still yields biased estimates.
We show the latent space from fitting LMGP to the most challenging version of this problem, i.e., nh = 3, , σ2 = 2 × 10−5. As demonstrated in Fig. 20(a), LMGP is able to accurately infer the correlations with only three noisy high-fidelity samples as the relative latent distances match the relative accuracies of the data sources. Thus, we expect the low-fidelity performance to be better for than for as the position for is relatively closer to yh(x), which means that LMGP leverages more information from yh(x) in predicting than in predicting . We assess the veracity of our expectation by examining low-fidelity prediction performance in Fig. 20(b), which indicates that prediction performance is indeed better for than for .

Effects of correlations between data sources: (a) Latent space: The latent space for one sample repetition of LMGP fit to all available data with nh = 3, , σ2 = 2 × 10−5. While the individual latent spaces are different for each repetition, the relative latent distances are consistent both for different repetitions and for different amounts of data/noise. (b) Low-fidelity MSEs: Low-fidelity prediction accuracy is better for than for .

Effects of correlations between data sources: (a) Latent space: The latent space for one sample repetition of LMGP fit to all available data with nh = 3, , σ2 = 2 × 10−5. While the individual latent spaces are different for each repetition, the relative latent distances are consistent both for different repetitions and for different amounts of data/noise. (b) Low-fidelity MSEs: Low-fidelity prediction accuracy is better for than for .
Next, we reconsider the example in Eq. (18) where and are the two valid choices for the true calibration parameter as discussed in Sec. 3.3. We fit LMGP with two approaches to categorical variable selection and consider various amounts of available low-fidelity data all with noise (the noiseless case is considered in Sec. 3.3).
The high-fidelity emulation performance is summarized in Fig. 21, which indicates that LMGP outperforms KOH’s approach by a similar margin for each case. Notably, LMGP’s performance is robust to the choice of categorical variable assignment for this problem as we see a similar variation in performance over repetitions between LMGPs All and LMGPm All. We explain this by noting that since there are only two data sources, LMGPm All finds a total of 22 = 4 latent positions with (2 + 2) × 2 = 8 elements in A which indicates that overfitting should not be a concern.

High-fidelity emulation performance for sin wave example: (a) nh = 30, nl = 30, σ2 = .09, (b) nh = 30, nl = 60, σ2 = .09, (c) nh = 30, nl = 100, σ2 = .09, and (d) nh = 30, nl = 200, σ2 = 0.09. LMGP outperforms KOH’s approach by a similar margin in all cases.
The estimates of the calibration parameters are provided in Fig. 22 and indicate that the estimation consistency in both approaches increases as nl is increased from 30 to 200. This increase is more prominent for LMGP. However, while LMGP converges on θ = 10π, KOH’s approach’s estimates are approximately evenly split between π and 10π. This behavior is because the L2 distance of sin(10πx) and sin(πx) from yh(x) is the same, and hence, KOH’s approach cannot favor one over the other [21,47,48]. As explained in Sec. 3.3, in this case, LMGP converges at θ = 10π as this choice provides not only a simpler discrepancy but also enables learning the high-frequency nature of yh(x).

Calibration performance for sin wave problem: (a) nh = 30, nl = 30, σ2 = .09, (b) nh = 30, nl = 60, σ2 = .09, (c) nh = 30, nl = 100, σ2 = .09, and (d) nh = 30, nl = 200, σ2 = .09
Finally, we show histograms of latent distances learned by LMGP in Fig. 23. The trends are quite similar to those seen in Sec. 3.3, with the latent distances being close to 0 for low amounts of low-fidelity data and converging on 0.5 as the amount of data is increased. When high-fidelity data are insufficient to learn the high-frequency behavior of yh(x), LMGP treats the high-frequency behavior as noise and finds yh(x) ≈ sin(πx). When low-fidelity data are also insufficient, LMGP cannot learn the behavior of yl(x) at high frequencies (i.e., for large θ). Thus, LMGP finds θ = π, which implies yl(x) = sin(πx), i.e., no model form error and a corresponding latent distance near zero. With sufficient low-fidelity data, however, LMGP learns the behavior of yl(x) for large θ and finds that θ = 10π yields a less complex discrepancy between yh(x) and yl(x).

Histogram of latent distances for sin wave problem: (a) nh = 30, nl = 30, σ2 = .09, (b) nh = 30, nl = 60, σ2 = .09, (c) nh = 30, nl = 100, σ2 = .09, (d) nh = 30, nl = 200, σ2 = .09
Relative accuracy of functions for borehole calibration problem
RRMSE | 0.049219 | 0.19838 |
RRMSE | 0.049219 | 0.19838 |
Note: Both low-fidelity functions are relatively accurate, with less accurate than .
We hold nh = 25 and nl = 100 constant and examine two cases, one without noise and one with noise applied to samples (σ2 = 100 with Range(yh(x)) ≈ 974 over the input range) and again fit LMGP with various strategies. In both cases, LMGP convincingly outperforms KOH’s approach in high-fidelity emulation, see Fig. 24. Notably, LMGP outperforms KOH’s approach given equivalent access to data, e.g., versus . LMGP’s performance is also robust to modeling choice, which we explain by noting that with three data sources the tm strategy for categorical variable selection yields 33 = 27 latent positions and 2 × (3 × 3) = 18 elements of A, i.e., the number of latent positions is on the same order of magnitude as the number of hyperparameters in A and the size of the dataset is large relative to the number of hyperparameters.

High-fidelity emulation performance: (a) nh = 25, , σ2 = 0: LMGPm All arguably performs better than LMGPs All. (b) nh = 25, , σ2 = 100: Results with noise are quite similar to those without.
As shown in Fig. 25(a) for the noiseless case, the latent positions found by LMGPs All show no model form error for and little model form error for , i.e., LMGP mistakes model form error in for noise since the error is so low. While these latent positions are not fully accurate as does still have model form error, the relative distances to the data sources do correctly indicate which is more accurate. With noise, shown in Fig. 25(b), the relative distances to yh(x) are nearly the same for both low-fidelity sources, although is slightly closer to yh(x) than to , which indicates that LMGP has more difficulty determining the magnitudes of the errors in the low-fidelity data sources in this case. The magnitudes of the latent distances are quite small in both cases, which reflect the fact that both low-fidelity data sources are relatively accurate when calibrated appropriately.

Latent positions: (a) nh = 25, , σ2 = 0: LMGP finds no model form error for and instead mistakes it for noise. (b) nh = 25, , σ2 = 100: LMGP correctly finds little error for both sources, but is unable to accurately determine the relative magnitudes of those errors.
Calibration performance, shown in Fig. 26, reveals inconsistent performance in estimating θ1 but consistent estimates for θ2 for both LMGP and KOH’s approach in all three cases. We explain this by noting that the main sensitivity indices (calculated using 10, 000 inputs sampled via Sobol sequence) for θ1 and θ2 are on the order of 10−4 and 10−1 respectively for the low-fidelity functions; i.e., variation in θ1 has very little effect on their outputs. Therefore, we expect θ1 to be very difficult to estimate. While LMGP’s estimates for θ1 suffer from high variance, the distributions are centered on the true parameter for both cases. By contrast, KOH’s approach produces biased estimates in all cases, although guesses nearly the correct parameter almost half the time in the case with noise (Fig. 26(b)). Both methods estimate θ2 quite accurately and consistently. KOH’s approach has lower variance in its estimates but more outliers when using compared to LMGP’s estimates using all data sources.

Calibration performance: (a) for nh = 25, , σ2 = 0: KOH’s approach produces biased estimates, while LMGP’s estimates are centered on the correct parameter with high variance. (b) for nh = 25, , σ2 = 100: KOH’s approach again produces biased estimates, with the caveat that finds the correct parameter nearly half the time. (c) for nh = 25, , σ2 = 0: All methods find the correct parameter consistently. finds the most accurate and consistent estimates, while has some outliers. (d) for nh = 25, , σ2 = 100: Both of KOH’s approaches have outliers, but estimate the correct parameter more consistently than LMGP.

Calibration performance: (a) for nh = 25, , σ2 = 0: KOH’s approach produces biased estimates, while LMGP’s estimates are centered on the correct parameter with high variance. (b) for nh = 25, , σ2 = 100: KOH’s approach again produces biased estimates, with the caveat that finds the correct parameter nearly half the time. (c) for nh = 25, , σ2 = 0: All methods find the correct parameter consistently. finds the most accurate and consistent estimates, while has some outliers. (d) for nh = 25, , σ2 = 100: Both of KOH’s approaches have outliers, but estimate the correct parameter more consistently than LMGP.
We examine one case with very small noisy data sets in which we set nh = 15, nl = 50, and σ2 = 16. We fit only LMGPs All as it is generally the best-performing model. LMGP consistently outperforms KOH’s method in high-fidelity emulation (Fig. 27(a)). Additionally, the latent space learned by LMGP shows model form error for all three low-fidelity sources, with the relative distances between the sources roughly matching their relative accuracies (Fig. 27(b)). Both KOH’s method and LMGP perform poorly in calibration for all four parameters. We explain this by noting that this problem suffers from identifiability issues and that the calibration parameters have both small Sobol sensitivities and low interaction (i.e., even increasing the number of data points will not resolve the issue). Notably, while LMGP displays inconsistent calibration estimates for each parameter, KOH’s method incorrectly shows consistent but biased estimates which are often quite far from the true calibration parameter (Fig. 28(c)). LMGP shows more uncertainty than KOH, which more accurately reflects the nature of the problem and its learned latent space can help the analyst in detecting identifiability issues.

Analysis for the Wing-Weight Problem: (a) high-fidelity emulation performance: LMGP displays much more consistency than KOH’s method. (b) latent space: The latent space accurately reflects the relative accuracies of the data sources.

Calibration Performance for the Wing-Weight Problem: We do not show the calibration results for as they are very poor due to the low accuracy of
5 Conclusion
In this paper, we present a novel latent-space-based approach for data fusion (i.e., multi-fidelity modeling and calibration) via latent-map Gaussian processes or LMGPs. Our approach offers unique advantages that can benefit engineering design in a number of ways such as improved accuracy and consistency compared to competing methods for data fusion. Additionally, LMGP learns a latent space where data sources are embedded with points whose distances can shed light on not only the relations among data sources but also potential model form discrepancies. These insights can guide diagnostics or determine which data sources cannot be trusted.
Implementation and use of our data fusion approach are quite straightforward as it primarily relies on modifying the correlation function of traditional GPs and assigning appropriate priors to the datasets. LMGP-based data fusion is also quite flexible in terms of the number of data sources. In particular, since we can assimilate multiple data sets simultaneously, we improve prediction performance and decrease non-identifiability issues that typically arise in calibration problems.
Since LMGPs are extensions of GPs, they are not directly applicable to extrapolation or big/high-dimensional data. However, extensions of GPs that address these limitations [27,38,41–44,49] can be incorporated into LMGPs. In our examples, we assumed all data sources are noisy and hence used a single parameter to estimate the noise. To consider different (unknown) noise levels, we need to have a parameter for each data source. We also note that the performance of LMGP in fusing small data can be greatly improved by endowing its parameters with priors and using Bayes’ rule for inference. In this case, the latent space will have a probabilistic nature, the trained model will be more robust to overfitting, and prediction uncertainties will be more accurate. Lastly, we have studied small data scenarios and not explored the effects of large data sets on the consistency of hyperparameter estimation. A detailed convergence study is needed to determine how the hyperparameters and the learned manifold are affected as the data set sizes grow. These directions will be investigated in our future works.
Lastly, we note that the proposed method can be directly applied to multi-response data sets with no modifications. To apply LMGP, we would treat each response as if it was a data source and then apply our data fusion method directly. However, with this strategy, each “data source’ would have the exact same set of input points, which will most likely cause numerical issues. While LMGP can be applied to multi-response data sets with some modifications (which may be presented in a future paper), the user should bear in mind that we do not necessarily a priori expect any level of correlation between the responses whereas with multi-fidelity problems we expect (but do not necessarily have) some correlation as all sources model the same system. Thus, we would recommend fitting LMGP to all responses and examining the latent space to see which responses are well-correlated. Then, fit individual emulators to uncorrelated responses while fitting an LMGP to whichever groups of responses that are correlated with each other.
Footnotes
Multiplicative terms have also been introduced to KOH’s approach but are seldom adopted as they increase the identifiability issues and computational costs while negligibly improving the mean prediction accuracy.
We have tried a binary encoding version of this strategy where a data source is assigned its own categorical variable with two levels where 0 indicates the source is inactive and 1 indicates that the source is active. We found the results of this case to be similar to those of strategy 2 presented in the paper.
Generally built via computer simulations.
Acknowledgment
This work was supported by the Early Career Faculty grant from NASA’s Space Technology Research Grants Program (Award Number 80NSSC21K1809).
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The data sets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
Nomenclature
- t =
matrix or vector encoding of the categorical combinations used in LMGP
- A =
matrix of hyperparameters of LMGP which determine the latent positions of the categorical combinations
- R =
correlation matrix for LMGP
- nh, =
respectively, the number of training data for the high-fidelity source and ith low-fidelity source. When all low-fidelity sources have the same number of training data, we simply use nl
- yh, , yh(x), yh =
respectively, a vector containing training outputs, the ith training output, the underlying data source, and the output of the underlying data source
- Xh, , xh =
respectively, the matrix of training inputs, the ith training input, and the input to the data source. In the case that the input is one-dimensional, these become xh, , and xh, respectively
- =
distance between, e.g., yh(x) and in the latent space. In the case that there are only two points in the latent space, we shorten this to just Δz
- θ, θ*, =
respectively, the calibration inputs, true calibration parameters, and estimated calibration parameters. In general, we use an asterisk to denote the true value of a parameter and a hat to denote an estimate
- σ2 =
noise variance
- , =
matrix of roughness parameters ωi for the numerical and calibration inputs, respectively
Subscripts
- h =
high-fidelity source
- li =
ith low-fidelity source. We use this and the above subscript to denote data sources and their corresponding latent points, e.g., yh(x) or . We also use this subscript to refer to strategies of KOH’s approach or LMGP which are fit to only yh and
- s, m =
respectively, strategy 1 and strategy 2 for categorical variable assignment during preprocessing of data for LMGP. We combine this with the above subscripts to fully describe a fitting strategy, e.g., denotes LMGP fit to only yh and using strategy 1 for preprocessing the data
- All =
a strategy of LMGP fit to data from all available sources