Tuesday, January 18, 2022

Yet another probability Puzzle - HHT vs HTT

Flip a coin until either HHT or HTT appears. Is one more likely to appear first? If so, which one and with what probability ?

Both the sequence start once first Heads come.
Clearly from the diagram we can see that there are more possibility of reaching HHT than HTT. It can be calculated from diagram as

p(HHT | H)  = 0.25 + 0.25[0.5 + 0.52 + 0.53 + 0.54 ..... ] + 0.25*p(HHT | H)

p(HHT | H) = 0.5/0.75

p(HTT | H) = 0.25 +  0.25*p(HTT | H)

p(HTT | H)  = 0.25/0.75 

Clearly 2 times more likely to reach HHT.



Simulating the above transition matrix/ Markov Chain:

H    --> 0, HH --> 1, HT --> 2, TT --> 3

A = np.array([[0.25, 0.0, 0.0, 0.0],
              [0.25, 0.5, 0.0, 0.0],
              [0.25, 0.5, 1.0, 0.0],
              [0.25, 0.0, 0.0, 1.0]])

x = np.array([1, 0, 0, 0])
for i in range(10):
    print(x)
    x = np.round(np.matmul(A, x),2)
    
Output>> [1 0 0 0]
[0.25 (0.25) 0.25 (0.25)]
[0.06 (0.19) 0.44 (0.31)]
[0.02 (0.11) 0.55 (0.32)]
[0.   (0.06) 0.61 (0.32)]
[0.   (0.03) 0.64 (0.32)]
[0.   (0.02) 0.66 (0.32)]
[0.   (0.01) 0.67 (0.32)]
[0.   (0.  ) 0.68 (0.32)]
[0.   (0.  ) 0.68 (0.32)]

Sunday, January 16, 2022

Basics of Hypothesis testing

 

Parametric

= Significance level = p(H0 is rejected | H0 is true) = Probability of making type-1 error if reject using this alpha. Before doing the statistical test, one writes down this number as: I am ok with   percentage chance of rejecting the null hypothesis i.e. H0 even if H0 should not have been rejected 

p-valuep(observing the sampled parameter estimates or more extreme than it | H0 is true)

βp(failing to reject H0 | Ha is True) = Probability of making type-2 error

1-β = Power  =  p(rejecting H0 | Ha is True)

    •   Power can be influenced by increasing sample size, difference to be observed, and 
    •   ∝%ile value as per null hypothesis, based on the effect size to be observed decide on n(shape of distribution)


Confidence Interval (say 95%)  = 95% probability, that the confidence interval contains true parameter


Bootstrap confidence Interval

  1. x1, x2, x3, .......xn is a data sample drawn from distribution F
  2. For each bootstrap sample δ = x_i - avg(F) 
  3. For each group calculate avg(δ)
  4. Find required quantile avg(δ) and make range around avg(F)
  5. [avg(F) - avg(δ)q1 ,  avg(F) - avg(δ)q2]

Bayesian Hypothesis Testing

P(H0 | Y=y)  > P(H1 | Y=y)





Tuesday, January 11, 2022

Zero Sum Game

Red Bus – Blue Bus problem 





The red bus - blue bus problem, states that the demand transfer happens proportional to the attributes of the product, as different attributes serve different utility to the customers.

This demand transfer can also be seen as Zero sum game, e.g. customer share shifting from 1 product to other product.


Thus a choice model can be created by modelling probability - using probit model or logit model based on attributes of products to understand which attribute influences the customer and by how much.


Utility Theory:





Thursday, January 6, 2022

Coxian Distribution

Coxian Distribution can be used to model:
Service time at a service center that provides bunch of service in a service with option of saying yes or no to continue at each stage.

The basic distribution of a Coxian distribution is
Exponential Distribution

and its pdf is given as
f(x) = μe-μx ; x >= 0

and is generally used to model time span between 2 events that come poison distributed e.g. time between 2 calls in a call center, or time it takes to cut hair in barber shop.

Now, if 2 tasks where service-time/dwell-time are exponentially distributed are placed in
1. Series, they are called Hypo-exponential
e.g.

μ1(hair-cut)   ---->    μ2(shampoo) 
X1    ------>   X2

Now if we need to find sum of  2 random variable then we do convolution

pdf(XX2 = x) = ∫  μ1e-μ_1t * μ2e-μ_2(x-t) dt  
Hint:1.  t+(x-t) =  x; and for all possible value of "t" from -infinity to infinity
        2.  (x-t) in second exponent ranges for values greater than 0, and thus "t" is limited to values between 0 and x 

Hypo-exponential p(x) = ......................

2. Parallel, they are called Hyper-exponential

{
Imagine there is a bag and there are 2 coins {A,B} inside it. An experimenter randomly picks a coin and tosses it to observe {Heads, Tails}.
So the pdf of observing Heads can be written as
P(outcome = Head) = (probability of selecting coin A)(probability of getting head in coin A) + (probability of selecting coin B)(probability of getting head in coin B)
}

to be continued

Self Attention

  x → Embedding → MultiHeadAttention → Concat → Project to lower dim → → Add(x) → LayerNorm → FFN → Add → LayerNorm Vocab to embedding t...