Finite mixture model is an effective way to accommodate both. Multi-mode refers to the case where the distribution has more than one peak. Over-dispersion refers to the case where the dispersion of the data is more than what the assumed model implies.  provides a good introduction to finite mixture models.
Consider again the loss given default case discussed in the paper of Matt Flynn . In this section, a complete modeling process is presented using finite mixture model that leverage the computing capability of PROC FMM in SAS v9.3 to model the variance in data that is not captured in a unified single model. Notice that because the dependent variable is bounded between 0 and 1, there are several options for modeling purpose. One can use a beta model, as discussed above; or a logit transformation can be applied first to the dependent variable (lgd) and a normal distribution is usually assumed to model the data. In this exhibition, the second approach is used.
First of all, a complete Exploratory Data Analysis (EDA) is conducted to study the distribution and relationship between and among dependent variable and covariates. Here, EDA consists of two exercises. At step one, the distribution and pairwise relationship of variables are studied. Then, non-parametric models are applied to each covariate to study the possible functional forms between dependent variable and independent variables.
The first figure shows the distribution of logit transformed Loss Given Default data. Apparently, it mimics normal distribution much closer than the raw distribution, but still it skews towards left rather than symmetric. Figure 2 below shows a matrix view of pairwise relationship between pairs of the four variables in the data
Figure 1. Histogram of Logit Transformed Loss Given Default Variable
Figure 2. Matrix Scatter Plot of All Variables.
In addition to a simple descriptive analysis using raw variables, it is often useful to conduct a non-parametric univariate regression so that the possible functional relationship between dependent variable and each covariate can be studied. Figure 3 shows the LOESS regression results of logit transformed LGD against the three covariates, namely, the annual average default rate, leverage coefficient by companies and industrial average default rate. It is noticeable that there is slightly quadratic relationship between logit(LGD) and average default rate by year and by industry. The linear relationship between logit(LGD) and leverage coefficient, however, is robust. On the other hand, we suspect that there is over-dispersion that can’t be captured adequately by these variables.
Figure 3. LOESS regression exhibition LGD against average default rate by year, leverage coefficient by firm and industrial average default rate.
In order to examine the sample and their relationship closer, an OLS model is built with both average default rates by year and by firm are modeled using quadratic relationship and leverage coefficient by firm is modeled as a linear term.
Figure 4. Diagnostics from polynomial OLS
Major model fitting diagnostic results are shown in figure 4. Key observations are:
1. The fit is reasonably well, with residuals approximately normally distributed.
2. There is noticeable over-dispersion when the residuals are plotted against predicted values. The variance is smaller at higher predicted values but larger at lower range.
3. Q-Q plot also indicates existence of over-dispersion.
As an alternative approach to a polynomial OLS, a finite mixture model is proposed. As a demonstration, we don’t go through the complete modeling process, especially the testing process to determine appropriate number of latent components; instead, a 2-component finite mixture model is used while only linear terms of the three covariates will enter the model. The purpose of this design is to show that even with a simpler and hence easier to interpret individual model structure, a finite mixture model is able to come up a better model that captures major stochastic effects in the data.
The following code builds a 2-component mixture model and examines the distribution of residuals.
ods html; ods graphics on; proc fmm data=lgddata2 plots(unpack); model lgd2 = lev lgd_a i_def /k=2 ; output out=fmm_out pred resid(component) resid(overall); run;quit; ods graphics off; proc sgplot data=fmm_out; density resid_1 /type=kernel; density resid_2 /type=kernel; histogram resid ; density resid /type=normal; run;
Figure 5. Distribution of residuals from a 2-component normal mixture model
There are two key features from PROC FMM that worth mention. First, the mixture of distributions can be from different families. For example, a normal distribution can be mixed with a T-distribution. Second, the probability each data point belongs to a latent class can be modeled with covariates, so to enhance the model interpretability.
The following code demonstrates a 2-component mixture model with one component is modeled as normally distributed while the other one is modeled as T-distributed.
The idea is simple. Outline a group of individual models of common distribution family in one MODEL statement, and the other groups of individual models of the same distribution family in other MODEL statement, without specifying the dependent variable but instead a ‘+’ symbol to indicate it is one layer add-up to existing individual models. Each group of individual models can be a mixture of multiple components. For example, in above code, if ‘K=2’ is specified in one of the MODEL statement, say in the first MODEL statement, then it tells SAS to model the data as a mixture of 2-component normal distribution and 1-component T distribution. This capability of modeling data with heterogeneous mixture distributions is a powerful tool in predictive modeling and advanced analytics. Of course, in this particular sample data, there is no immediate benefit by using a heterogeneous mixture distribution./* mixture of heterogeneous distributions */
ods graphics on;
proc fmm data=lgddata2;
model lgd2 = lev lgd_a i_def /dist=normal; model + lev lgd_a i_def /dist=t ; output out = fmm_out2 pred resid(component) resid(overall); run;
ods graphics off;
The second feature that gives PROC FMM extreme power in advanced analytics is the capability to model probability model of latent classes using covariates, so that analysts are able to study which factors will contribute to classifying data points into different latent classes. This feature gives analyst the power to interpret latent class with insightful factors and even provide better model fit. What an analyst needs to do is to add the following statement:
The probability model can be one of the following four: LOGITISTIC, PROBIT, LOG-LOG, and COMPLEMENTARY LOG-LOG. Please refer to SAS/STAT manual  for details.
. Matt Flynn, http://www.casact.org/education/spring/2011/handouts/C11-Flynn.pdf
. SAS Institute, Inc., SAS/STAT v9.3 User's Guide
. Geoffrey McLachlan, David Peel, Finite Mixture Models, John Wiley & Sons, 2000
data lgddata; informat lgd lev 12.9 lgd_a 6.4 i_def 4.3; input lgd lev lgd_a i_def; label lgd = 'Real loss given default' lev = 'Leverage coefficient by firm' lgd_a = 'Mean default rate by year' i_def = 'Mean default rate by industry'; cards; 0.747573451 0.413989786 0.6261 1.415 0.99 0.413989786 0.6849 1.415 0.06581075 0.230361142 0.4566 1.183 0.351287992 0.541339309 0.6261 2.353 0.25844921 0.541339309 0.4566 2.353 0.01968009 0.812 0.6715 0.743 0.931035513 0.546732229 0.6715 2.353 0.341254925 0.71 0.6715 1.183 0.35075456 0.855339361 0.6715 2.353 0.045826764 0.313983237 0.6261 0.743 0.025754193 0.190648237 0.4566 0 0.759732568 0.490953756 0.6261 2.353 0.757989999 0.910788759 0.6261 1.415 0.6 0.336071518 0.6261 1.183 0.374480256 0.414862374 0.4566 0.967 0.168726407 0.612063995 0.6261 1.183 0.283909643 0.693928717 0.6261 2.353 0.747018382 0.937072431 0.6261 1.415 0.686300059 0.801162532 0.6715 2.353 0.050051313 0.365725066 0.4566 0 /* More data is not shown here to save space */ ; run; /* dep var is between (0, 1), using logit transformation */ data lgddata2; set lgddata; lgd2=log(lgd/(1-lgd)); run; /* check raw dep var distribution */ proc sgplot data=lgddata2; histogram lgd2; density lgd2/type=normal; run; /* check relationship between raw dep and covariates */ proc sgscatter data=lgddata2; matrix lgd2 lev lgd_a i_def; run; /* study the relationship */ ods graphics on; proc loess data=lgddata2; model lgd2 = lev; run; proc loess data=lgddata2; model lgd2 = lgd_a; run; proc loess data=lgddata2; model lgd2 = i_def; run; ods graphics off; data lgddata2; set lgddata2; lgd_a2=lgd_a**2; i_def2=i_def**2/10; run; /* build baseline model */ ods graphics on; proc reg data=lgddata2; model lgd2 = lev lgd_a i_def lgd_a2 i_def2; output out=reg_out pred resid; run;quit; ods graphics off; ods html; ods graphics on; proc fmm data=lgddata2 plots(unpack); model lgd2 = lev lgd_a i_def /k=2 ; *probmodel lev lgd_a i_def; output out=fmm_out pred resid(component) resid(overall); *bayes; run;quit; ods graphics off; proc sgplot data=fmm_out; density resid_1 /type=kernel; density resid_2 /type=kernel; *density resid_3 /type=kernel; histogram resid ; density resid /type=normal; run; /* mixture of heterogeneous distributions */ ods graphics on; proc fmm data=lgddata2; model lgd2 = lev lgd_a i_def /dist=normal; model + lev lgd_a i_def /dist=t ; output out=fmm_out2 pred resid(component) resid(overall); PROBMODEL &covars; run; ods graphics off; proc sgplot data=fmm_out2; density resid_1 /type=kernel; density resid_2 /type=kernel; *density resid_3 /type=kernel; histogram resid ; density resid /type=normal; run;
Please comment on the article here: SAS Programming for Data Mining