Parametric Fitting with Library Models
Parametric fitting involves finding coefficients (parameters) for one or more models that you fit to data. The data is assumed to be statistical in nature and is divided into two components:
data = deterministic component + random component
The deterministic component is given by a parametric model and the random component is often described as error associated with the data:
data = parametric model + error
The model is a function of the independent (predictor) variable and one or more coefficients. The error represents random variations in the data that follow a specific probability distribution (usually Gaussian). The variations can come from many different sources, but are always present at some level when you are dealing with measured data. Systematic variations can also exist, but they can lead to a fitted model that does not represent the data well.
The model coefficients often have physical significance. For example, suppose you collected data that corresponds to a single decay mode of a radioactive nuclide, and you want to estimate the half-life (T1/2) of the decay. The law of radioactive decay states that the activity of a radioactive substance decays exponentially in time. Therefore, the model to use in the fit is given by
where y0 is the number of nuclei at time t = 0, and λ is the decay constant. The data can be described by
Both y0 and λ are coefficients that are estimated by the fit. Because T1/2 = ln(2)/λ, the fitted value of the decay constant yields the fitted half-life. However, because the data contains some error, the deterministic component of the equation cannot be determined exactly from the data. Therefore, the coefficients and half-life calculation will have some uncertainty associated with them. If the uncertainty is acceptable, then you are done fitting the data. If the uncertainty is not acceptable, then you might have to take steps to reduce it either by collecting more data or by reducing measurement error and collecting new data and repeating the model fit.
With other problems where there is no theory to dictate a model, you might also modify the model by adding or removing terms, or substitute an entirely different model.
The Curve Fitting Toolbox™ parametric library models are described in the following sections.
Selecting a Model Type Interactively
Select a model type to fit from the drop-down list in the Curve Fitting app.
What fit types can you use for curves or surfaces? Based on your selected data, the fit category list shows either curve or surface categories. The following table describes the options for curves and surfaces.
|Polynomial||Yes (up to degree 9)||Yes (up to degree 5)|
|Sum of Sine||Yes|
|Custom Linear Fitting||Yes|
For all fit categories, look in the Results pane to see the model terms, the values of the coefficients, and the goodness-of-fit statistics.
If your fit has problems, messages in the Results pane help you identify better settings.
Selecting Fit Settings
The Curve Fitting app provides a selection of fit types and settings that you can change to try to improve your fit. Try the defaults first, then experiment with other settings.
For an overview of how to use the available fit options, see Specifying Fit Options and Optimized Starting Points.
You can try a variety of settings within a single fit figure, and you can also create multiple fits to compare. When you create multiple fits you can compare different fit types and settings side by side in the Curve Fitting app. See Create Multiple Fits in Curve Fitting App.
Selecting Model Type Programmatically
You can specify a library model name as a character vector or string scalar when you call the
fit function. For
example, to specify a quadratic
f = fit( x, y, 'poly2' )
See List of Library Models for Curve and Surface Fitting to view all available library model names.
You can also use the
to construct a
fittype object for a library model,
and use the
fittype as an input to the
to find out what parameters you can set, for example:
For examples, see the sections for each model type, listed in the table in Selecting a Model Type Interactively. For details on all the functions for creating and analysing models, see Curve and Surface Fitting.
Using Normalize or Center and Scale
model types in the Curve Fitting app share the Center and
scale option. When you select this option, the tool refits
with the data centered and scaled, by applying the
to the variables. At the command line, you can use
an input argument to the
fitoptions function. See
fitoptions reference page.
Generally, it is a good idea to normalize inputs (also known as predictor data), which can alleviate numerical problems with variables of different scales. For example, suppose your surface fit inputs are engine speed with a range of 500–4500 r/min and engine load percentage with a range of 0–1. Then, Center and scale generally improves the fit because of the great difference in scale between the two inputs. However, if your inputs are in the same units or similar scale (e.g., eastings and northings for geographic data), then Center and scale is less useful. When you normalize inputs with this option, the values of the fitted coefficients change when compared to the original data.
If you are fitting a curve or surface to estimate coefficients, or the coefficients have physical significance, clear the Center and scale check box. The Curve Fitting app plots use the original scale with or without the Center and scale option.
At the command line, to set the option to center and scale the
data before fitting, create the default fit options structure, set
then fit with the options:
options = fitoptions; options.Normal = 'on'; options options = Normalize: 'on' Exclude: [1x0 double] Weights: [1x0 double] Method: 'None' load census f1 = fit(cdate,pop,'poly3',options)
Specifying Fit Options and Optimized Starting Points
About Fit Options
Interactive fit options are described in the following sections. To specify the same fit options programmatically, see Specifying Fit Options at the Command Line.
To specify fit options interactively in the Curve Fitting app, click the Fit Options button to open the Fit Options dialog box. All fit categories except interpolants and smoothing splines have configurable fit options.
The available options depend on whether you are fitting your data using a linear model, a nonlinear model, or a nonparametric fit type:
All the options described next are available for nonlinear models.
Lower and Upper coefficient constraints are the only fit options available in the dialog box for polynomial linear models. For polynomials you can set Robust in the Curve Fitting app, without opening the Fit Options dialog box.
Nonparametric fit types have no additional fit options dialog box (interpolant, smoothing spline, and lowess).
The fit options for the single-term exponential are shown next. The coefficient starting values and constraints are for the census data.
Fitting Method and Algorithm
Method — The fitting method.
The method is automatically selected based on the library or custom model you use. For linear models, the method is LinearLeastSquares. For nonlinear models, the method is NonlinearLeastSquares.
Robust — Specify whether to use the robust least-squares fitting method.
Off — Do not use robust fitting (default).
On — Fit with the default robust method (bisquare weights).
LAR — Fit by minimizing the least absolute residuals (LAR).
Bisquare — Fit by minimizing the summed square of the residuals, and reduce the weight of outliers using bisquare weights. In most cases, this is the best choice for robust fitting.
Algorithm — Algorithm used for the fitting procedure:
Trust-Region — This is the default algorithm and must be used if you specify Lower or Upper coefficient constraints.
Levenberg-Marquardt — If the trust-region algorithm does not produce a reasonable fit, and you do not have coefficient constraints, try the Levenberg-Marquardt algorithm.
Finite Differencing Parameters
DiffMinChange — Minimum change in coefficients for finite difference Jacobians. The default value is 10-8.
DiffMaxChange — Maximum change in coefficients for finite difference Jacobians. The default value is 0.1.
Note that DiffMinChange and DiffMaxChange apply to:
Any nonlinear custom equation, that is, a nonlinear equation that you write
Some of the nonlinear equations provided with Curve Fitting Toolbox software
However, DiffMinChange and DiffMaxChange do not apply to any linear equations.
Fit Convergence Criteria
MaxFunEvals — Maximum number of function (model) evaluations allowed. The default value is 600.
MaxIter — Maximum number of fit iterations allowed. The default value is 400.
TolFun — Termination tolerance used on stopping conditions involving the function (model) value. The default value is 10-6.
TolX — Termination tolerance used on stopping conditions involving the coefficients. The default value is 10-6.
Coefficients — Symbols for the unknown coefficients to be fitted.
StartPoint — The coefficient starting values. The default values depend on the model. For rational, Weibull, and custom models, default values are randomly selected within the range [0,1]. For all other nonlinear library models, the starting values depend on the data set and are calculated heuristically. See optimized starting points below.
Lower — Lower bounds on the fitted coefficients. The tool only uses the bounds with the trust region fitting algorithm. The default lower bounds for most library models are
-Inf, which indicates that the coefficients are unconstrained. However, a few models have finite default lower bounds. For example, Gaussians have the width parameter constrained so that it cannot be less than 0. See default constraints below.
Upper — Upper bounds on the fitted coefficients. The tool only uses the bounds with the trust region fitting algorithm. The default upper bounds for all library models are
Inf, which indicates that the coefficients are unconstrained.
For more information about these fit options, see the
in the Optimization Toolbox™ documentation.
Optimized Starting Points and Default Constraints
The default coefficient starting points and constraints for library and custom models are shown in the next table. If the starting points are optimized, then they are calculated heuristically based on the current data set. Random starting points are defined on the interval [0,1] and linear models do not require starting points.
If a model does not have constraints, the coefficients have neither a lower bound nor an upper bound. You can override the default starting points and constraints by providing your own values using the Fit Options dialog box.
Default Starting Points and Constraints
ci > 0
Sum of Sine
bi > 0
a, b > 0
Note that the sum of sines and Fourier series models are particularly sensitive to starting points, and the optimized values might be accurate for only a few terms in the associated equations.
Specifying Fit Options at the Command Line
Create the default fit options structure and set the option to center and scale the data before fitting:
options = fitoptions; options.Normal = 'on'; options options = Normalize: 'on' Exclude: [1x0 double] Weights: [1x0 double] Method: 'None'
Modifying the default fit options structure is useful when you
want to set the
Weights fields, and then fit your data using
the same options with different fitting methods. For example:
load census f1 = fit(cdate,pop,'poly3',options); f2 = fit(cdate,pop,'exp1',options); f3 = fit(cdate,pop,'cubicsp',options);
Data-dependent fit options are returned in the third output
argument of the
For example, the smoothing parameter for smoothing spline is data-dependent:
[f,gof,out] = fit(cdate,pop,'smooth'); smoothparam = out.p smoothparam = 0.0089
Use fit options to modify the default smoothing parameter for a new fit:
options = fitoptions('Method','Smooth','SmoothingParam',0.0098); [f,gof,out] = fit(cdate,pop,'smooth',options);
For more details on using fit options, see the
fitoptions reference page.