Wednesday, December 29, 2021

Estimate parameters of Negative Binomial Distribution - NB?

 NB = Number of trials(e.g. say coin toss) "n" required for k success(heads of coin)

p/probability = fairness of coin is to be estimated


{

pmf Derivation

The trial ends on observing  kth success. So nth trial has the kth success.

So n-1 trial have k-1 success and those success can be distributed in all possible ways among n-1 location i.e.

n-1Ck-1 ways, and each way/pattern having probability of p(k-1)(1-p)(n-k) 

And we know that probability of success at nth position is "p".

Probability of n-1 trials and nth trial are independent so we can multiply them to get join distribution

so

PMF = p* n-1Ck-1 p(k-1)(1-p)(n-k) 

n-1Ck-1 p(k)(1-p)(n-k)

----------------------------------------------------------------------

Compounding probability distribution

Lets say p = Beta(α, β) i.e. parameter

P(no of trials=n,  prob of success = p)

 = NBpmf(n, k, p) * Betapmf(p|α, β)

For each value of "p" between 0 and 1, we calculate above probability and sum it up and thus get rid of variable p. [Posterior predictive distribution]

Beta Compounded Negative Binomial PMF = ∫ NBpmf(n, k, p) * Betapmf(p|α, β)

BNBpmf = ∫ n-1Ck-1 p(k)(1-p)(n-k) * p(α-1)(1-p)(β-1)/B(α, β) dp

n-1Ck-1/B(α, β)  ∫ p(k+α-1)(1-p)(n-k+β-1) dp

n-1Ck-1 B(k+α, β+n-k)/B(α, β)


}


Now assume following sample is observed for "n" i.e number of trials for achieving k success and we want to estimate p - probability of success

[n1 , n2 , n3 , n4...….nm]


 pmf1 = n1-1Ck-1 p(k)(1-p)(n1-k)  


Joint distribution of all the observed sample that are independent will product of each pmf


L = i=1m pmfi

Take log to make taking derivative simple, 

LL = i=1Σm log(pmfi

 i=1Σm log(ni-1Ck-1 p(k)(1-p)(ni-k)) 

For maximizing Log likelihood take derivative and equate to zero

LL 


p = mk/Σni




Thursday, December 9, 2021

Estimate parameters of Binomial Distribution ?

Lets assume we observe data as following sample: 

[X1, X2, X3, X4 ......Xm] 

Where vector size[0011111000...m] i.e. number of trials is fixed as "n", and number of success is observed [ k1, k2, k3 .....]

Now we know that pmf(n,p,k) = nCk(p)k (1-p)^(n-k)

{

pmf derivation

k success out of n trials can occur in nCk ways/patterns.

And each of those patterns can happen with probability of (p)k(1-p)n-k

nCk (p)k(1-p)n-k

}


So, likelihood function of observing the data i.e. joint probability can be written as

L = (pmf-1)(pmf-2)(pmf-3).......(pmf-m)

Log L = log(pmf-1) + log(pmf-2) + log(pmf-3).......log(pmf-m)

= K + [k1 + k2 + k3 + ... km]log(p) + [mn -(k1 + k2 + k3 + ... km)] log(1-p)

Gradient for maximizing likelihood 

LL = [k1 + k2 + k3 + ... km]/p - [mn -(k1 + k2 + k3 + ... km)]/(1-p) = 0

p^ = [k1 + k2 + k3 + ... km]/mn

Tuesday, December 7, 2021

Is the coin fair ? given that sample oberved X = [..............]

1st Method CLT:

We calculate E[X] = Proportion of success

H0 = proportion of fair coin = 0.5

Calculate p value proportion test.


2nd Method Binomial pmf:

Pmf = Bin(n_trials, k_success, p=0.5(fair coin null hypothesis))
CDF 
CDF(less than equal to k) < Alpha(significance level) reject null hypothesis


Unboxing blackbox logistic regression(MLE)

Imagine we have a blackbox executable of logistic regression and the 2 hyperparameters tuned are regularisation and probability threshold. 
How can we extract the Beta coefficients of the model ?

Self Attention

  x → Embedding → MultiHeadAttention → Concat → Project to lower dim → → Add(x) → LayerNorm → FFN → Add → LayerNorm Vocab to embedding t...