Thursday, December 9, 2021

Estimate parameters of Binomial Distribution ?

Lets assume we observe data as following sample: 

[X1, X2, X3, X4 ......Xm] 

Where vector size[0011111000...m] i.e. number of trials is fixed as "n", and number of success is observed [ k1, k2, k3 .....]

Now we know that pmf(n,p,k) = nCk(p)k (1-p)^(n-k)

{

pmf derivation

k success out of n trials can occur in nCk ways/patterns.

And each of those patterns can happen with probability of (p)k(1-p)n-k

nCk (p)k(1-p)n-k

}


So, likelihood function of observing the data i.e. joint probability can be written as

L = (pmf-1)(pmf-2)(pmf-3).......(pmf-m)

Log L = log(pmf-1) + log(pmf-2) + log(pmf-3).......log(pmf-m)

= K + [k1 + k2 + k3 + ... km]log(p) + [mn -(k1 + k2 + k3 + ... km)] log(1-p)

Gradient for maximizing likelihood 

LL = [k1 + k2 + k3 + ... km]/p - [mn -(k1 + k2 + k3 + ... km)]/(1-p) = 0

p^ = [k1 + k2 + k3 + ... km]/mn

No comments:

Post a Comment

Self Attention

  x → Embedding → MultiHeadAttention → Concat → Project to lower dim → → Add(x) → LayerNorm → FFN → Add → LayerNorm Vocab to embedding t...