Response to Alexander Shen's note

Mebane W.

Abstract

Response to Alexander Shen's note "On the likelyhood for finite mixture models and Kirill Kalinin’s paper “Validation of the Finite Mixture Model Using Quasi-Experimental Data and Geography”"


Alexander Shen writes, "the expression being maximized, considered as a function of \(f_0, f_i, f_e\), is a linear function on the triangle \(f_0 + f_i + f_e = 1, f_0; f_i; f_e \geq 0\)." The expression being maximized (from [5]) is not a function of \(f_0\), \(f_i\) and \(f_e\). These "probabilities" are functions of the likelihood and so depend on all the other parameter estimates. For example, when both "incremental" and "extreme" frauds are included the R code that implements the method, which follows [2, 1, 6, 4], iteratively evaluates the following until stable, maximizing values for the likelihood are found:

\(F = (1-f_{\mathrm{i}}-f_{\mathrm{e}})F_0 + f_{\mathrm{i}}F_I + f_{\mathrm{e}}F_E \)

\(h_0 = (1-f_{\mathrm{i}}-f_{\mathrm{e}})F_0/F \)

\( h_I = f_{\mathrm{i}}F_I/F \)

\(h_E = f_{\mathrm{e}}F_E/F \)

\(f_{\mathrm{i}} = \text{mean}(h_I) \)

\(f_{\mathrm{e}} = \text{mean}(h_E)\)

where \(F_0\), \(F_I\) and \(F_E\) are vectors of length \(n\) (the number of observations) that have the observation-specific likelihoods as elements. \(h_0\), \(h_I\) and \(h_E\) are also vectors of length \(n\), and \(F_0/F\), \(F_I/F\) and \(F_E/F\) are evaluated elementwise. The likelihood value actually maximized is \(\sum_{i=1}^n(\log(h_0F_0 + h_IF_I + h_EF_E))\) where \(h_0F_0\), \(h_IF_I\) and \(h_EF_E\) are elementwise products. Shen's "triangle" argument does not apply.

Results from the model of [5] di ffer from results produced by the algorithm of [3] in part because [3] describes a Monte Carlo simulation method not a statistical estimation method based on any kind of likelihood or probability specification.

Received 02.07.2018.


References

  1. Dempster A.P., Laird N.M., Rubin D.B. Maximum likelihood from incomplete data via the em algorithm. – Journal of the Royal Statistical Society. Series B (Methodological). 1977. V. 39. No. 1. P. 1–38.
  2. Hasselblad V. Estimation of finite mixtures of distributions from the exponential family. – Journal of the American Statistical Association. 1969. V. 64. No. 328. P. 1459–1471.
  3. Klimek P., Yegorov Y., Hanel R., Thurner S. Statistical detection of systematic election irregularities. – Proceedings of the National Academy of Sciences. 2012. V. 109. No. 41. P. 16469–16473.
  4. McLachlan G., Peel D. Finite Mixture Models. New York: Wiley, 2000.
  5. Mebane W.R., Jr. Election forensics: Frauds tests and observation-level frauds probabilities. – Paper presented at the 2015 Annual Meeting of the Midwest Political Science Association, Chicago, April 7–10, 2016, 2016.
  6. Wu C.F.J. On the convergence properties of the em algorithm. – Annals of Statistics. 1983. V. 11. No. 1. P. 95–103.