Response to Alexander Shen's note "On the likelyhood for finite mixture models and Kirill Kalinin’s paper “Validation of the Finite Mixture Model Using Quasi-Experimental Data and Geography”"

Keywords:
finite mixture model

Alexander Shen writes, "the expression being maximized, considered as a function of \(f_0, f_i, f_e\), is a linear function on the triangle \(f_0 + f_i + f_e = 1, f_0; f_i; f_e \geq 0\)." The expression being maximized (from [5]) is not a function of \(f_0\), \(f_i\) and \(f_e\). These "probabilities" are functions of the likelihood and so depend on all the other parameter estimates. For example, when both "incremental" and "extreme" frauds are included the **R** code that implements the method, which follows [2, 1, 6, 4], iteratively evaluates the following until stable, maximizing values for the likelihood are found:

\(F = (1-f_{\mathrm{i}}-f_{\mathrm{e}})F_0 + f_{\mathrm{i}}F_I + f_{\mathrm{e}}F_E \)

\(h_0 = (1-f_{\mathrm{i}}-f_{\mathrm{e}})F_0/F \)

\( h_I = f_{\mathrm{i}}F_I/F \)

\(h_E = f_{\mathrm{e}}F_E/F \)

\(f_{\mathrm{i}} = \text{mean}(h_I) \)

\(f_{\mathrm{e}} = \text{mean}(h_E)\)

where \(F_0\), \(F_I\) and \(F_E\) are vectors of length \(n\) (the number of observations) that have the observation-specific likelihoods as elements. \(h_0\), \(h_I\) and \(h_E\) are also vectors of length \(n\), and \(F_0/F\), \(F_I/F\) and \(F_E/F\) are evaluated elementwise. The likelihood value actually maximized is \(\sum_{i=1}^n(\log(h_0F_0 + h_IF_I + h_EF_E))\) where \(h_0F_0\), \(h_IF_I\) and \(h_EF_E\) are elementwise products. Shen's "triangle" argument does not apply.

Results from the model of [5] differ from results produced by the algorithm of [3] in part because [3] describes a Monte Carlo simulation method not a statistical estimation method based on any kind of likelihood or probability specification.

*Received 02.07.2018.*

- Dempster A.P., Laird N.M., Rubin D.B. Maximum likelihood from incomplete data via the em algorithm. – Journal of the Royal Statistical Society. Series B (Methodological). 1977. V. 39. No. 1. P. 1–38.
- Hasselblad V. Estimation of finite mixtures of distributions from the exponential family. – Journal of the American Statistical Association. 1969. V. 64. No. 328. P. 1459–1471.
- Klimek P., Yegorov Y., Hanel R., Thurner S. Statistical detection of systematic election irregularities. – Proceedings of the National Academy of Sciences. 2012. V. 109. No. 41. P. 16469–16473.
- McLachlan G., Peel D. Finite Mixture Models. New York: Wiley, 2000.
- Mebane W.R., Jr. Election forensics: Frauds tests and observation-level frauds probabilities. – Paper presented at the 2015 Annual Meeting of the Midwest Political Science Association, Chicago, April 7–10, 2016, 2016.
- Wu C.F.J. On the convergence properties of the em algorithm. – Annals of Statistics. 1983. V. 11. No. 1. P. 95–103.