Lets assume we observe data as following sample:
[X1, X2, X3, X4 ......Xm]
Where vector size[0011111000...m] i.e. number of trials is fixed as "n", and number of success is observed [ k1, k2, k3 .....]
Now we know that pmf(n,p,k) = nCk(p)k (1-p)^(n-k)
{
pmf derivation
k success out of n trials can occur in nCk ways/patterns.
And each of those patterns can happen with probability of (p)k(1-p)n-k
nCk (p)k(1-p)n-k
}
So, likelihood function of observing the data i.e. joint probability can be written as
L = (pmf-1)(pmf-2)(pmf-3).......(pmf-m)
Log L = log(pmf-1) + log(pmf-2) + log(pmf-3).......log(pmf-m)
= K + [k1 + k2 + k3 + ... km]log(p) + [mn -(k1 + k2 + k3 + ... km)] log(1-p)
Gradient for maximizing likelihood
∇LL = [k1 + k2 + k3 + ... km]/p - [mn -(k1 + k2 + k3 + ... km)]/(1-p) = 0
p^ = [k1 + k2 + k3 + ... km]/mn
No comments:
Post a Comment