The core idea of Empirical Likelihood (EL) is to use a maximum entropy discrete distribution supported on the data points and constrained by estimating equations related with the parameters of interest. As such, it is a non-parametric approach in the sense that the distribution of the data does not need to be specified, only some of its characteristics usually via moments. In short, it’s a non-parametric likelihood, which is fundamental for the likelihood-based statistical methodology.

Bayesian Analysis is a very popular and useful method in applications. As we discussed in the last post, it’s essentially an belief updating procedure through data, which is very natural in modeling. Last time, I said I did not get why there is a severe debate between Frequentist and Bayesian. Yesterday, I had a nice talk with Professor Xuming He from University of Michigan. When we talked about the Bayesian analysis, he made a nice point that in Frequentist analysis, the model mis-specification can be addressed in a very rigorous way to conduct valid statistical inference; while in Bayesian analysis, it is very sensitive to the likelihood as well as the prior, but how to do the adjustment is a big problem (here is a paper discussing model misspecification problems under Bayesian framework).

Before the discussion with Dr. Xuming He, intuitively, I thought it’s very natural and potentially very useful to combine empirical likelihood with Bayesian analysis by regarding empirical likelihood as the likelihood used in the Bayesian framework. But now I got to understand the importance of why Professor Nicole Lazar from University of Georgia had a paper on “Bayesian Empirical Likelihood” to discuss the validity of posterior inference: “…can likelihoods other than the density from which the data are assumed to be generated be used as the likelihood portion in a Bayesian analysis?” And the paper concluded that “…while they indicate that it is feasible to consider a Bayesian inferential procedure based on replacing the data likelihood with empirical likelihood, the validity of the posterior inference needs to be established for each case individually.”

But Professor Xuming He made a nice comment that Bayesian framework can be used to avoid the calculation of maximum empirical likelihood estimator by proving of the asymptotically normal posterior distribution with mean around the maximum empirical likelihood estimator. The original idea of their AOS paper was indeed to use the computational advantage from Bayesian side to solve the optimization difficulty in getting maximum empirical likelihood estimator. This reminded me of another paper about “Approximate Bayesian Computation (ABC) via Empirical Likelihood“, which used empirical likelihood to get improvement in the approximation at an overall computing cost that is negligible against ABC.

We know that for general Bayesian analysis, the goal is to be able to simulate from the posterior distribution by using MCMC for example. But in order to use MCMC to simulate from the posterior distribution, we need to be able to evaluate the likelihood. But sometimes, it’s hard to evaluate the likelihood due to the complexity of the model. For example, recently laundry socks problem is a hit online. Since it’s not that trivial to figure out the the likelihood of the process although we have a simple generative model from which we can easily simulate samples, Professor Rasmus Bååth presented a Bayesian analysis by using ABC. Later Professor Christian Robert presented exact ptobability calculations pointing out that Feller had posed a similar problem. And here is another post from Professor Saunak Sen. The basic idea for ABC approximation is to accept values provided the simulated sample is sufficiently close to the observed data point:

1. Simulate $\theta\sim\pi(\cdot)$, where $\pi$ is the prior;
2. Simulate $x$ from the generative model;
3. If $\|x-x^*\|$ is small, keep $\theta$, where $x^*$ is observed data point. Otherwise reject.

Now how to use empirical likelihood to help ABC? Actually although the original motivation is the same, that is to approximate the likelihood (ABC approximate the likelihood via simulation and empirical likelihood version of ABC is to use empirical likelihood to approximate the true likelihood), it’s more natural to start from the original Bayesian computation (this is also why Professor Christian Robert changed the title of their paper). For the posterior sample, we can generate as the following from the importance sampling perspective:

1. Simulate $\theta\sim\pi(\cdot)$, where $\pi$ is the prior;
2. Get the corresponding importance weight as $w=f(\theta|x)$ where $f(\theta|x)$ is the likelihood.

Now if we do not know the likelihood, we can do the following:

1. Simulate $\theta\sim\pi(\cdot)$, where $\pi$ is the prior;
2. Get the corresponding importance weight as $w=EL(\theta|x)$ where $EL(\theta|x)$ is the empirical likelihood.

This is the way of doing Bayesian computation via empirical likelihood.

The main difference between Bayesian computation via empirical likelihood and Empirical likelihood Bayesian is that the first one use empirical likelihood to approximate the likelihood in the Bayesian computation and followed by Bayesian inference, while the second one is that use Bayesian computation to overcome the optimization difficulty and followed by studying of the frequentist property.

updated[4/28/2015]: Here is a nice post talking about the issue for the Bayesian, especially for the model misspecification.