Theory – Tips and Tricks

UptimAI is a technology start-up company focused on software development and data analysis in engineering.

Software development:

  • Developing a breakthrough algorithm for propagation of uncertainty and surrogate modelling
  • Provides unique contribution, which other tools cannot provide
  • Makes the engineering simulations more reliable and closer to reality

Data analysis in engineering:

  • Creating mathematical models using sparse data
  • Deeper insight into design process-
  • Optimization of design based on measurements

Theory behind #

Mathematical model #

UptimAI algorithm is based on our unique evolution of the HDMR (High Dimensional Model Representation) method. This model consists of a set of independent so-called increment functions. Each increment function is treated independently, i.e. for each increment function is created an independent model.
These increments are additive, so their sum together with the nominal solution gives the model of output values.

Nominal Solution #

The nominal solution represents the standard deterministic solution. Combination of inputs for this solution are usually set as the assumed most common values from ranges of possible input values

First Order Increment Functions #

They represent the increment of one variable to the nominal solution considering only the influence of the given variable cleared of all other effects. Models with only first-order increments have no interactions of input parameters and can be easily understood and optimised.

Prediction of the model‘s behaviour can also be well predicted when first-order increments are those most influential according to the sensitivity analysis

Second Order Increment Functions #

They represent the increment of an interaction of two input variables to the nominal solution considering only the interaction influence of given variables cleared of all other effects. A model with second-order increments becomes more complex as software assistance is needed to visualize and analyze its behaviour.

Interactions of input variables tend to affect extreme values of outputs and increase the variance of results

Third and higher Order Increment Functions #

They represent the increment of an interaction of three or more input variables to the nominal solution, clearing all other effects. Models with higher-order increments are too complicated to be processed without specialized software.

Higher-order interactions usually affect the variance of results strongly and special attention to extreme values

Total Increment Functions #

Total Increment Functions are the sum of all lower-order increment functions of all input variables involved in an interaction. Each total increment is independent and represents total increment to the nominal solution of its input variables.
They are very useful for visualization of high-dimensional spaces, and , in general, can be summed to get better design. However, their addition does not give the direct output function.

Inputs #

Different distribution shapes can be created using the Input Preprocessor tool and then stored into the XDis.txt file. The distribution works as a weight function: the code samples mainly the regions with large values of the probability density function and uncertainty quantification is propagated with the application of the samples from Xdis.txt file. However, the core solver starts to sample the domain uniformly when it finds the function is complicated

Type of input distribution is chosen according to its real-world purpose – material characteristics or manufacturing tolerances (normal distribution), designed geometry (uniform), environmental conditions (Weibull distribution for wind speed, etc.), …

To run the uncertainty propagation correctly, there is a set of general rules and recommendations to be followed:

Boundary values:

  • Outer samples of each input distribution
  • Sampling of inputs is allowed inside boundaries only. Extrapolation outside the domain is prohibited

Nominal Sample:

  • Represents the standardly used deterministic solution
  • Must be inside boundaries of the domain
  • The Core Solver is designed to work most effectively when the nominal sample is in the mean value of the input distribution. The nominal sample should not be more than 10% of variable‘s range far from the mean value of the distribution

Coupling the Code #

The Core Solver itself can be started via the GUI or just simply in the console.
First, the Core Solver has to be coupled with the “black-box” (CFD, FEM, in-house code) by the coupling script (bash for Linux systems, batch for Windows), a set of commands controlling the exchange of data between these two codes
File with input combinations generated by the Core Solver and the expected file with outputs from the “black-box” are plain text files with sets of comma-separated values (DataToProcess.txt and DataProcessOut.txt)

DataToProcess.txt #

Set of input combinations to be processed with the coupled program. Each row represents one input combination consisting of comma-separated values of input variables Maximum expected precision of inputs is 8 digits

DataProcessOut.txt #

Set of results of the output evaluation. Each row represents a set of output corresponding to one line of inputs in the DataToProcess.txt file. Each column represents one type of output (the Core Solver processes one output at a time, others are used to feed the database *_it.json and they can be processed later)

Coupling Script #

Its main purpose is to interconnect the Core Solver with the “black box” code. It passes a matrix of input data to the “black box” and receive the matrix of outputs in the proper format.
Bash/Batch file “organizes” how the coupled program is started, for example parallelization is defined here. There is no mandatory form for this script.

Core Solver Setup program includes the feature for coupling script creation (this can be used as it is or serve as a baseline for your own coupling environments).

Iterative File #

During the uncertainty propagation process, all results from each iteration are saved into the dedicated *.json file specified in the Core Solver Setup. Since the Core Solver is dealing with one output at the time, results of other outputs are stored here for the next run!
The iterative file can be edited with caution in the Post Process tool to correct results from unconverged solutions, import additional sets of result or even whole new output column.
The iterative file is checked in each iteration of the Core Solver to find if it is corrupted – in such case the Core Solver exits and the file needs to be fixed manually or replaced.
Storing of a iterative file is done after the coupling script is processed – if the coupling script crashes before it is finished, the results are not stored in the iterative file!

Iterative file is the most important file, it holds all the simulation results and with it all the outputs can be restored without re-running of simulations.

Other important notes #

The Core Solver checks results if they have a correct number form, otherwise, the result is replaced with the Nan value and the solver tries to use an interpolated value instead. However, when the interpolation is not possible (for example if the sample is in a boundary), the Core Solver raises an error and exits!

Solving outputs of a problem in a different order may lead to slightly different convergence process, which is due to the priority of samples already stored in the database of the iterative *.json file. Variances in the process you may note:

  • Different convergence process: Samples taken from the database can have different positions from those called by the solver in a separate run. This may lead to different convergence process.
  • Different number of samples required: This can occur in case the increment function converges slightly below the threshold. With the different ordering of outputs, sampling of the increment function may change as well and an additional sample may be required to ensure that the convergence criteria are met.

Setting the Core Solver #

Global Residual #

Sets the general level of the desired accuracy of the mathematical model. It represents the threshold of change in mean value or variance of results between two iterations

It is rigidly set for the first iteration of each increment function, however, the residual scheme can adaptively adjust the preset accuracy to obtain the right convergence and the correct model, depending on the currently solved problem

Keep in mind that the algorithm allways selects the important variables and their increments, however, those with too low sensitivity can be neglected based on the defined residual value
Some general recommendations for the global residual value were assumed according to extensive testing on various benchmark cases:-

  • 0.1: 10% of relative change in mean value or variance of results from iteration to iteration. Usually for a preliminary analysis where the simulation is costly and the main interest is in the general process. It is advised to combine it with conservative settings of sampling.
  • 0.03 – 0.05: 3 to 5% of relative change in mean value or variance of results. Suits well for most of the engineering problems as it provides a fairly accurate model and requires only a reasonable number of samples. Works also with a problem where a moderate level of randomness/noise is expected.
  • 0.015: About 1.5% of relative change in mean value or variance from iteration to iteration. Results in an accurate model with only slight differences on “tails” of the final probability distribution of results – far extremes of the output function.
  • <0.003: Lower than 0.3% of relative change in mean value or variance from iteration to iteration. A model with accuracy beyond the usual needs of engineering cases having closely described “tails”. May require a large number of samples for evaluation.

Sampling Scheme for Prediction Algorithm #

The domain is covered with a few samples according to a specific scheme, other sample positions are based on function values and response at these points.

Practical advice for sampling #

  • The domain is covered with a few samples according to a specific scheme, other sample positions are based on function values and response at these points
  • Sampling scheme can change the accuracy to the samples-needed ratio
  • For sampling of variables leading to periodic or symmetrical functions (typically, but not only, angles of any kind), set the nominal sample out of symmetry axis of the input distribution!

Sampling scheme 1 #

  • Uses 2 positions to estimate the influence of variables, samples lie on the opposite side of the domain
  • Mean of samples is used to estimate the influence of the increment function, which is useful when the model has some degree of randomness
  • Considered to be standard and mostly suggested scheme

Sampling scheme 2 #

  • Uses only 1 position to estimate the influence of variables
  • Only used for problems, which are linear in nature and speed is preferred over the precision
  • This scheme is not suitable for complex problems, should be used with caution!

Sampling scheme 3 #

  • Uses 3 positions to estimate the influence of variables, an extension to scheme 1 with an additional randomly selected sample
  • Mean of samples is used to estimate the influence of the increment function, which is useful when the model has some degree of randomness
  • Very robust scheme, yet sample demanding!
  • Recommended for problems with results bounded from either one or both sides
  • Especially in cases where results are expected to be greater than one it is suggested to work with the logarithm of the output function

The prediction scheme works with increment functions, which are natively zero at the nominal solution. This is the basic property of the increment function:

One can easily deduce from an integral part that the increment function is always getting larger, further the sample is positioned from the nominal solution. Moreover, they are getting larger according to their influence (sensitivity). Therefore, the algorithm always selects important increment functions/variables. However, increment functions/variables with low importance do not have to be selected. Therefore, the algorithm is designed with a safe margin and such that it can happen that increment functions
with low importance are selected.

Prediction residual scheme #

Computes the residual which decides if the increment gets neglected.
The scheme defines the strategy how is the residual evaluated

Percent scheme #

  • Measures the percental influence of each increment function
  • It is suited for interpolation purposes as it is not dependent on input distribution
  • This type of residual scheme is a very conservative approach for non-standard distributions

Distance scheme #

  • Measures the statistical influence of each increment function, propagated distributions are considered to be input distribution shapes
  • Distribution of each neglected function is propagated by considering a linear model

Maximum increment order #

From empirical evidence is known that mostly low-order increment functions play an important role in high dimensional spaces. Higher orders are necessary to reach correct function approximation under the given threshold and to catch all details
Higher-order increment functions are tested during the final steps of the adaptive UQ process and if they are found, the Core Solver can lower the residual settings temporarily to ensure the final result is correct
In some cases, due to practical reasons, is convenient to restrict the order of increment functions (for example if user‘s main interest is in the sensitivity analysis only).

Limiting the increment order may shorten the computational time significantly, but evaluation of model precision needs to follow

Practical notes to increment order settings: #

  • Setting maximum increment order to 0 leads to full Uncertainty Quantification – this is the most suggested settings!
  • When a limit to the order of increment (up to 8) is set, the code automatically finishes once all the increment functions up to given increment order are converged
  • It is strongly recommended to check the Residual model in the Postprocess method of the UQ for the predicted quality of the surrogate model, especially when the maximum order of increment is restricted!

For certain problems, limit to the order of increment function may cause a huge bias in results. In general, problems with a large number of inputs are less sensitive to the increment function order, the residual value becomes more important

Usually, setting maximum increment order to 3 creates a fair compromise between the precision and number of computation. If the result consists of a large number of higher-order increments, consider increasing the limit
For problems with a large number of inputs try to lower the global residual when only a few inputs are identified as influential

Additional options #

Some features of the core solver need to be turned on separately to take action. They often give an additional tunning to the simulation process but could be suitable for more experienced users

Test scheme for higher-order increment functions #

  • Force the higher-order increment functions to be sampled if ‘‘by coincidence’’ are passed to the approximation phase and have low influence, i.e. not worthy to approximate
  • May select increment function with low overall influence due to safety reasons – switching it on may further reduce the required samples of the expensive function

Regions of Preference and Avoidance #

  • Allows starting our optimization technique called “regions of preference and avoidance”
  • Although switching the function off leads to faster finalizing steps of the process, without this option is the method not available in the Post Process tool!

Multiprocessing #

  • The Core Solver will use multiple cores of the processor to speed up the computation
  • Core Number option sets the number of cores reserved for other programs than the Core Solver

Progress iterative store file #

  • File containing the current state of the surrogate model
  • Saving the progress iterative file may significantly reduce the time of computation in the case of rerunning the complex problem with large number of inputs and high order of their interactions
  • Allways delete the file after changing the setup of the UQ process – the progress of the computation may differ with altered test or residual scheme, making the remaining progress iterative file incompatible

Max parallelism #

  • Sets the maximum number of processes of the coupled program running simultaneously
  • Using this setting will affect the automatically generated coupling script allowing the multiple run of the coupled program
  • Especially with softwares alocating a fixed number of cores for their operation, be aware exceeding the total number of cores available may lead to the crash of the system!

Example of results evaluation #

The case study used for giving some examples of results evaluation is the aerodynamic study of an airfoil (case study from the Tutorial II).

This case consists on the estimation of aerodynamic characteristics of a wing section (airfoil) based on its geometry, using the panel method to emulate the software used for numerical simulations in engineering

In the following section, results of the glide ratio of the airfoil are used mostly, marked also as the Output 1 (O1). Other outputs are coefficient of pitching moment for maximal glide ratio – Output 2 (O2) and minimal drag coefficient – Output 3 (O3).

Variance sensitivity #

Variance sensitivity is the common approach to evaluate the sensitivity of results to uncertainties in results, but:

  • It should be kept in mind that the number alone is not sufficient – always check the increment functions!
  • variance sensitivity reflects how much the problem is influenced by uncertainty for a given variable
  • It can be used as a guideline to which variables focus first
  • Sensitivities sum up to 1 and it represents the percentage

Mean sensitivity #

Represents the influence of variable on the mean value and is very useful for optimization purposes. Variables with a large influence on the mean should be optimized for better performance

  • Optimization of variables with low sensitivity for variance but large sensitivity for the mean value are very worthy to investigate – their optimization usually leads to stable improvement of results!
  • Functions shifting the mean have usually valley/mountain-top shape

Increment Function #

Visualization of results helps to understand the physical phenomena behind the problem. To see the relation between increment functions for sole input variables and increment functions of their interactions, use the Total Increment method

Higher order increment functions are often those more influential
By examining shapes of increment function you can understand the physical phenomena behind the solved problem
Especially in a case of FEM or CFD solution, always go through increment functions to see irregular points – you can also reveal an inconsistency in results suggesting convergence failure of the simulation

Regions of Preference Avoidance #

Preliminary statistical optimization tool which identifies ranges of input values leading to preferable/avoidable results
Gives quick insight into ways how to change the design to obtain a specific series of results – it provides optimization of regions instead of single points

  • Regions of input variables are additive in nature – more optimized regions you use, closer to the fully optimized result you get
  • Using input regions of preference lead only to results from the region of preference and vice versa!
  • Regions of preference/avoidance may overlap due to interactions of input variables – overlapping of input regions leads to overlapping of regions of results

Case 1 #

Since some input variables represent outer conditions with mostly unavoidable uncertainty, these cannot be used for the optimization. This first case shows the influence of the Reynolds number to optimized results of the minimum drag coefficient.

  • Reynolds number affects only the resulting region of preference
  • Reynolds number is not responsible for upper extremities in the drag coefficient
  • With reduced number of optimized input variables the resulting regions of preference/avoidance tend to overlap more

Case 2 #

This second case shows how to look for input regions leading to maximizing of results in Output 1 and minimizing results of Output 3

There is no preference region below the input value of 0.4 for Output 1

  • To reach optimal values for both outputs, inputs of this input variable has to be over 0.4
  • For the Output 3 is the preference region of input available for the full possible range of inputs – look for the best ratio of regions of preference and avoidance to get the best result
  • Optimal region of the input variable is 0.4 – 0.47

Regions of preference are not overlapping for both outputs

  • Due to counteracting regions of preference in two outputs of interest, it is difficult to use this variable for effective optimization
  • To increase probability of preferable result use the comparison across outputs – cut from regions of avoidance of both outputs to create a compromise solution
  • Desired range of input variable is 0.087 – 0.107

Case 3 #

For this third case it is shown how to look for input regions leading to maximizing of results in Output 1 and avoiding extremities in results of Output 2

Regions of preference are not overlapping for both outputs

  • For Output 1 proceed as in Case 2, search for ranges without the region of avoidance or ranges with the best ratio between regions
  • For other input variables choose range where the probability of both regions is low
  • Desired range of input variable is -0.065 – -0.055
  • For both outputs is necessary to set ranges of inputs close to the point where the ratio of input regions is switching – results will be less likely an extreme of one of the output regions
  • Desired range of input variable is 0.38 – 0.46

Frequently Asked Questions (FAQ) #

What is the expensive function?

The expensive function, also called the black-box function, processes given inputs and provides the desired output of interest. Usually has the form of a script computing the output directly or calling other necessary programs involved (like FEM or CFD solvers, custom in-house software etc.). It is coupled with the Core Solver via the batch script.

What should the batch script contain?

The only mandatory part of the batch script is the command running the expensive function. The example script seen on slide 34 is also able to prepare inputs from the DataToProcess.txt file and write the expensive function output into the DataProcessOut.txt file. This input/output manipulation does not have to be a part of the batch script in case the expansive function is able to read/write this files on its own.

What if the coupled program does not work properly?

The UptimAI Core Solver waits all the time until the coupled software generates the DataProcessOut.txt file. There is no deadline for the coupled software to do so, thus, in case of any failure, the program can be started again while the UptimAI tool still awaits the output file.

Situations, where the output file is filled with NaN values, are detected automatically. The UptimAI Core Solver interpolates output values from available samples when possible. However, outputs resulting from input combinations involving coordinates of the nominal solution or edges of the computational domain must be computed properly. When there is no chance to deal with error values, the Core Solver will stop with an alert.

If the whole process of uncertainty propagation ends prematurely for any reason (e.g. power shortage, users decision, …), there is the iterative storage .json file containing the present progress which can be restored / reviewed. Therefore, there is no need to solve the output values of previously investigated input combinations and increase the wall-clock-time of the uncertainty propagation excessively.

Why it is not possible to set domain‘s boundaries freely?

It is mandatory for boundaries of the computational domain to be a part of the input distribution. Therefore, boundaries suggested by the Input Preprocessor are minimum and maximum values of each input distribution. Range of each input distribution can be cropped by the user but cannot be extended. When reducing the range of a discrete input distribution, at least three unique values of the distribution have to remain.

All buttons and controls are covered by the text form of the sensitivity analysis!

Maximize the program window as much as possible, save the data, and leave the text form mode of the sensitivity analysis tool.

Increment functions of higher order are labelled with numbers only, e.g. dF1.2. Is there any legend to these?

Any time you move the mouse pointer over the label of an increment function, a frame with a description of variables involved will appear in the bottom left corner of the window.

How to set input values of the nominal sample?

Increment values are differences of the function output value from the output value of the nominal sample. Input values (or coordinates) of the nominal sample must be within the range of each input variable and not to be equal to its boundaries. It is recommended for the nominal sample to have inputs close to the mean value of the variable‘s input distribution. Nominal sample too far away from variable means may lead to increased interpolation errors on edges of the computational domain. Therefore, if there is a user-defined value of the nominal sample more than 10% of variable‘s range far from the mean value, an alert is raised to inform the user that a special caution will be needed when processing results.

For input variables having a symmetrical input distribution, extra caution is required. When their output function is expected to be of symmetrical shape also, it is highly recommended not to set their nominal value exactly to the location of the mean value of the corresponding input distribution! A typical example can be an angular position of the crankshaft, wave phase, etc.

What if the Core Solver cannot be started via the UptimAI Main Interface program?

When clicking the Run Core Solver button does not start the terminal window with the Core Solver, the most common reason for this behaviour is local firewall settings preventing the terminal to be run by other software. For this reason, it is possible to run each program of the UptimAI package separately. First, start the cmd window manually using your user rights. Then, the Core Solver can be started as described in Uncertainty propagation section of this document.

Why is the generated batch file not working for my project?

Automatic generation of the batch script under the user-defined name works with the example Function.bat file. It is identic as the one from page 34 of this document. The Core Solver Setup tool copies this example file under the new name into the project folder. Because of the individuality of each solved task, it is yet not possible to create a direct link to the expensive function with such a simple program as the Core Solver Setup. But the presented example file can give you an idea of how to create the link between programs.

In general, UptimAI software requires connection to a program which is able to load the DataToProcess.txt file and provide the DataProcessOut.txt file. Many users prefer to build own infrastructure in their „native“ programming language. In that case, the BATCH script only runs this code and does not process input and output files on its own. The use of absolute paths is necessary for a correct operation of the batch file.In general, UptimAI software requires connection to a program which is able to load the DataToProcess.txt file and provide the DataProcessOut.txt file. Many users prefer to build own infrastructure in their „native“ programming language. In that case, the BATCH script only runs this code and does not process input and output files on its own. The use of absolute paths is necessary for a correct operation of the batch file.

Why the Core Solver can’t read the DataProcessOut.txt file when it’s ready?

The formatting of the DataProcessOut.txt is strict, the output values have to be comma separated values and number of lines has to be the same as for the DataToProcess.txt file. On some systems, eventual surplus blank spaces around commas, or especially those hidden ones at the end of lines may cause troubles. Also, make sure if there is the end of line character for the last line as well.