Divided Differences Exponential Models A Tutorial on Divided Differences and the Linear Exponential Model Peder Olsen, Vaibhava Goel, John Hershey and Charles Micchelli HLT Thursday Seminar IBM TJ Watson Research Center September 2, 2010 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models Outline 1 Divided Differences What are they? Examples Applications Opitz’ formula 2 Exponential Models Examples Linear Exponential Family Fused Features Experiments Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models The Beginning T eλ x Z (λ) P(x|λ) = where (1) x is a posterior ) n X x ∈ Rn xi = 1, xi ≥ 0 ( x ∈ Pn = (2) i=1 and log Z is the normalizer aka partition function Z Z (λ) = T eλ φ(x) dx = Pn Peder, Vaibhava, Charles and John n X Q i=1 e λi . j6=i (λi − λj ) Divided Differences 101 (3) Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Definition (Divided differences) [λ] f [λ1 , . . . , λn+1 ] f = f (λ) [λ2 , . . . , λn+1 ]f − [λ1 , . . . , λn ]f . = λn+1 − λ1 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Definition (Divided differences) [λ] f [λ1 , . . . , λn+1 ] f = f (λ) [λ2 , . . . , λn+1 ]f − [λ1 , . . . , λn ]f . = λn+1 − λ1 Example (Two Arguments) Divided difference for two arguments: [λ1 , λ2 ]f = Peder, Vaibhava, Charles and John f (λ2 ) − f (λ1 ) . λ2 − λ1 Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x f(x)=2/3 x3−3 x2+10/3 x 2 1.5 1 f(x) 0.5 0 −0.5 −1 −1.5 −0.5 0 0.5 1 Peder, Vaibhava, Charles and John 1.5 x 2 2.5 Divided Differences 101 3 3.5 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x f(x)=2/3 x3−3 x2+10/3 x 2 1.5 1 f(x) 0.5 [0,1]f=1 0 −0.5 −1 −1.5 −0.5 0 0.5 1 Peder, Vaibhava, Charles and John 1.5 x 2 2.5 Divided Differences 101 3 3.5 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x f(x)=2/3 x3−3 x2+10/3 x 2 1.5 [0,1]f 1 [2,3]f [0,1,2]f=−1 f(x) 0.5 0 −0.5 −1 [1,2]f −1.5 −0.5 0 0.5 1 Peder, Vaibhava, Charles and John 1.5 x 2 2.5 Divided Differences 101 3 3.5 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x f(x)=2/3 x3−3 x2+10/3 x 2 1.5 [1,2,3]f 1 f(x) 0.5 0 [0,1,2,3]f=2/3 −0.5 −1 [0,1,2]f −1.5 −0.5 0 0.5 1 Peder, Vaibhava, Charles and John 1.5 x 2 2.5 Divided Differences 101 3 3.5 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models Three Arguments Example (3 arguments) [λ1 , λ2 , λ3 ]f = = [λ2 , λ3 ]f − [λ1 , λ2 ]f λ3 − λ1 f (λ3 )−f (λ2 ) λ3 −λ2 − f (λ2 )−f (λ1 ) λ2 −λ1 λ3 − λ1 = some magic f (λ1 ) f (λ2 ) = + (λ1 − λ2 )(λ1 − λ3 ) (λ2 − λ1 )(λ2 − λ3 ) f (λ3 ) + (λ3 − λ1 )(λ3 − λ1 ) Peder, Vaibhava, Charles and John Divided Differences 101 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models Theorem The general explicit formula is [λ1 , . . . , λn ] f = n X Q i=1 f (λi ) . j6=i (λi − λj ) The proof requires use of the Vandermonde determinant. Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula What is [λ, λ, . . . , λ]f ? What is [λ, λ, . . . , λ]f ? The definition is unclear. Let’s compute the limit. Singularity or not? [x, x + ]f f (x + ) − f (x) −→ f 0 (x). = →0 For more variables we can do a similar computation and see that [λ, λ, . . . , λ]f = Peder, Vaibhava, Charles and John f (n−1) (λ) . (n − 1)! Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Power Functions Define the i’th power function pi (x) = x i , then [λ1 ]p0 [λ1 ]p1 [λ1 ]p2 [λ1 ]p3 [λ1 ]p4 [λ1 ]p5 =1 = λ1 = λ21 = λ31 = λ41 = λ51 [λ1 , λ2 ]p0 [λ1 , λ2 ]p1 [λ1 , λ2 ]p2 [λ1 , λ2 ]p3 [λ1 , λ2 ]p4 [λ1 , λ2 ]p5 Peder, Vaibhava, Charles and John =0 =1 = λ1 + λ2 = λ21 + λ1 λ2 + λ22 = λ31 + λ21 λ2 + λ1 λ22 + λ32 P = i+j=4 λi1 λj2 Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Power Functions Define the i’th power function pi (x) = x i , then [λ1 , λ2 ]p0 [λ1 , λ2 ]p1 [λ1 , λ2 ]p2 [λ1 , λ2 ]p3 [λ1 , λ2 ]p4 =0 =1 = λ1 + λ2 = λ21 + λ1 λ2 + λ22 P = i+j=3 λi1 λj2 Peder, Vaibhava, Charles and John [λ1 , λ2 , λ3 ]p0 [λ1 , λ2 , λ3 ]p1 [λ1 , λ2 , λ3 ]p2 [λ1 , λ2 , λ3 ]p3 [λ1 , λ2 , λ3 ]p4 =0 =0 =1 = λ1 + λ2 + λ3 P = i+j+k=2 λi1 λj2 λk3 Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Power Functions Define the i’th power function pi (x) = x i , then [λ1 , λ2 , . . . , λn ]pn+k−1 = X n Y i λjj i1 +···+in =k j=1 The formula can be computed recursively and efficiently using dynamic programming Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula The Exponential Function def Z (λ1 , . . . , λn ) = [λ1 , . . . , λn ] exp . No slick formula for computing Z (yet) Z (λ1 ) = e λ1 e λ1 − e λ2 Z (λ1 , λ2 ) = λ1 − λ2 n X e λi Q Z (λ1 , . . . , λn ) = . j6=i (λi − λj ) i=1 Can’t compute Z (x, x) with this formula, but is it good otherwise? Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Derivatives of Z The special case f = exp satisifies the remarkable property that its own derivative can be computed from itself: Theorem (Partition Function Derivative) ∂ Z (λ1 , . . . , λn ) = Z (λi , λ1 , . . . , λi , . . . , λn ) ∂λi for i = 1, 2, . . . , n. Proof. Compute LHS and RHS and verify that they are the same. Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula A Simple Formula We can compute the formula for Z exactly when λ has a special form: nn 1 2 Z (0, , , . . . , 1) = (e 1/n − 1)n n n n! Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula A Simple Formula We can compute the formula for Z exactly when λ has a special form: nn 1 2 Z (0, , , . . . , 1) = (e 1/n − 1)n n n n! >> n = 10; >> logz10 = n*log(n)-gammaln(n+1)+n*log(exp(1/n)-1) logz10 = -14.6002 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula A Simple Formula We can compute the formula for Z exactly when λ has a special form: nn 1 2 Z (0, , , . . . , 1) = (e 1/n − 1)n n n n! >> n = 10; >> logz10 = n*log(n)-gammaln(n+1)+n*log(exp(1/n)-1) logz10 = -14.6002 >> n = 100; >> logz100 = n*log(n)-gammaln(n+1)+n*log(exp(1/n)-1) logz100 = -363.2390 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Numerical Instability Galore We can also compute Z (λ) using the direct formula Z (λ1 , . . . , λn ) = n X Q i=1 Peder, Vaibhava, Charles and John e λi j6=i (λi − λj ) Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Numerical Instability Galore We can also compute Z (λ) using the direct formula Z (λ1 , . . . , λn ) = n X Q i=1 e λi j6=i (λi − λj ) function [y] = logz(lambda) n = length(lambda); y = 0; for i=1:n, y = y + exp(lambda(i)) / ... ( prod(lambda(1:i-1)-lambda(i)) ... * prod(lambda(i+1:n)-lambda(i)) ); end y = log(y); Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Numerical Instability Galore We can also compute Z (λ) using the direct formula Z (λ1 , . . . , λn ) = n X Q i=1 e λi j6=i (λi − λj ) >> y10 = logz([0:0.1:1]) y10 = -14.6014 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Numerical Instability Galore We can also compute Z (λ) using the direct formula Z (λ1 , . . . , λn ) = n X Q i=1 e λi j6=i (λi − λj ) >> y10 = logz([0:0.1:1]) y10 = -14.6014 >> logz10 logz10 = -14.6002 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Numerical Instability Galore We can also compute Z (λ) using the direct formula Z (λ1 , . . . , λn ) = n X Q i=1 e λi j6=i (λi − λj ) >> y100 = logz([0:0.01:1]) y100 = 128.5899 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Numerical Instability Galore We can also compute Z (λ) using the direct formula Z (λ1 , . . . , λn ) = n X Q i=1 e λi j6=i (λi − λj ) >> y100 = logz([0:0.01:1]) y100 = 128.5899 >> logz100 logz100 = -363.2390 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models Direct divided difference computation 5 6 What are they? Examples Applications Opitz’ formula x 10 eλi / Y (λi − λj ) j6=i total sum 4 Value 2 0 −2 −4 −6 1 2 3 Figure: Plot showing e λi / Z (λ) = 4.56 ∗ 10−7 4 Q 5 j6=i (λi Peder, Vaibhava, Charles and John 6 Index i 7 8 9 10 11 − λj ). Largest term: 6 ∗ 105 . Total: Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Lagrange Interpolation Given: Points (yi , xi ) for unknown function f . Find: Polynomial p(x) s.t. yi = p(xi ). Lagrange interpolation: p(λ) = n X Q `i (λ)f (λi ), where i=1 Peder, Vaibhava, Charles and John j6=i (λ − λj ) `i (λ) = Q . j6=i (λi − λj ) Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Newton’s interpolation formula uses the divided differences: p(λ) = [λ1 ]f + (λ − λ1 ) [λ1 , λ2 ]f +(λ − λ1 )(λ − λ2 ) [λ1 , λ2 , λ3 ]f + . . . +(λ − λ1 ) · · · (λ − λn−1 ) [λ1 , . . . , λn ]f . The special case λ1 = λ2 = · · · = λn gives the Taylor series. General case is a generalization of Taylor expansion. Peder, Vaibhava, Charles and John Divided Differences 101 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models Hermite–Genocchi Formula The divided differences can also be used to compute a certain class of integrals. Theorem (Hermite–Genocchi) Z f (n−1) (λT x) dx, [λ1 , λ2 , . . . , λn ]f = (4) Pn Proof. Verify for f = exp and use Laplace transform to write f in terms of exp. Peder, Vaibhava, Charles and John Divided Differences 101 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models Theorem (Peano Kernel identity) Let λmin = min {λ1 , . . . , λn } and λmax = max {λ1 , . . . , λn } then 1 [λ1 , λ2 , . . . , λn ]f = (n − 1)! Z λmax f (n−1) (t)M(t; λ1 , . . . , λn ) dt, λmin where M(t; λ1 , . . . , λn ) is the Peano kernel, which is the B-spline of degree n − 2 at the points λ1 , . . . , λn . Proof. Make a variable substitution y1 = λT x, and y2 , . . . , yn something else in Hermite–Genocchi formula. Charles Micchelli generalized B-splines to higher dimensions! This allowed new applications in 2 and 3 dimensional modeling. Peder, Vaibhava, Charles and John Divided Differences 101 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models plot of M(t; −1.0, 1.0) 0.5 0.4 M(t) 0.3 0.2 0.1 0 −0.1 −2 −1.5 −1 −0.5 0 t 0.5 1 1.5 Figure: Plot of M(t; −1, 1) Peder, Vaibhava, Charles and John Divided Differences 101 2 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models plot of M(t; −1.0, 0.0, 1.0) 1 0.8 M(t) 0.6 0.4 0.2 0 −2 −1.5 −1 −0.5 0 t 0.5 1 1.5 Figure: Plot of M(t; −1, 0, 1) Peder, Vaibhava, Charles and John Divided Differences 101 2 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models plot of M(t; −1.0, −0.5, 1.0) 1 0.8 M(t) 0.6 0.4 0.2 0 −2 −1.5 −1 −0.5 0 t 0.5 1 1.5 Figure: Plot of M(t; −1, −0.5, 1) Peder, Vaibhava, Charles and John Divided Differences 101 2 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models plot of M(t; −1.0, −0.3, 0.3, 1.0) 1.2 1 M(t) 0.8 0.6 0.4 0.2 0 −2 −1.5 −1 −0.5 0 t 0.5 1 1.5 Figure: Plot of M(t; −1, −1/3, 1/3, 1) Peder, Vaibhava, Charles and John Divided Differences 101 2 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models plot of M(t; −1.0, 0.0, 0.5, 1.0) 1.2 1 M(t) 0.8 0.6 0.4 0.2 0 −2 −1.5 −1 −0.5 0 t 0.5 1 1.5 Figure: Plot of M(t; −1, 0, 0.5, 1) Peder, Vaibhava, Charles and John Divided Differences 101 2 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Leibniz’ rule Derivatives:(fg )0 = fg 0 + f 0 g Divided differences: [λ0 , λ1 ](fg ) = [λ0 ]f [λ0 , λ1 ]g + [λ0 , λ1 ]f [λ1 ]g . More generally: [λ0 , . . . , λn ](fg ) = [λ0 ]f [λ0 , . . . , λn ]g +[λ0 , λ1 ]f [λ1 , . . . , λn ]g + · · · + [λ0 , . . . , λn ]f [λn ]g . Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Definition (Divided Difference Matrix) [λ0 ]f [λ0 , λ1 ]f . . . [λ0 , . . . , λn ]f 0 [λ1 ]f . . . [λ1 , . . . , λn ]f Tf (λ) = . .. . .. .. .. . . 0 ... 0 [λn ]f Example (Identity Function) λ0 1 0 0 λ1 1 def J = Tp1 (λ) = 0 0 λ2 .. .. . . 0 0 0 Peder, Vaibhava, Charles and John 0 ··· 0 0 ··· 0 1 0 . .. .. . . 0 λn Divided Differences 101 (5) (6) Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula DDM for exp Example (Exponential Function) E (λ) = Texp (λ) = Z (λ0 ) Z (λ0 , λ1 ) . . . Z(λ) 0 Z (λ1 ) . . . Z (λ1 , . . . , λn ) .. .. .. .. . . . . 0 ... 0 Z (λn ) Peder, Vaibhava, Charles and John Divided Differences 101 . What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models Opitz’ Formula Theorem (Opitz) The divided difference [λ1 , . . . , λn ]f is the (1, n)’th element of the matrix ∞ X f (i) (0) i Tf (λ) = J (7) i! i=0 if the Taylor series of f is everywhere convergent. Proof. Trivially: Tf +g (λ) = Tf (λ) + Tg (λ). By Leibniz’ rule: Tf ·g (λ) = Tf (λ) · Tg (λ). The rest follows by Taylor expansion Peder, Vaibhava, Charles and John Divided Differences 101 What are they? Examples Applications Opitz’ formula Divided Differences Exponential Models A Monster Formula We can compute Z (λ) from E (λ) = e J . ∇Z (λ) can be computed from 0 E (λ, λ) = B B B B B B B B B B B B B B B B B B B B B B B B B B B @ Z (λ1 ) 0 0 ... ... ... Z(λ) Z (λ2 , . . . , λn ) Z (λ3 , . . . , λn ) Z(λ, λ1 ) Z(λ) Z (λ1 , λ3 , . . . , λn ) Z (λ, λ1 , λ2 ) Z(λ, λ2 ) Z(λ) ... ... ... Z (λ, λ) Z (λ, λ2 , . . . , λn ) Z (λ, λ3 , . . . , λn ) . . . 0 0 . . . ... ... . . . Z (λn ) 0 . . . Z (λ1 , λn ) Z (λ1 ) . . . Z (λ1 , λ2 , λn ) Z (λ1 , λ2 ) . . . ... ... . . . Z(λ, λn ) Z(λ) . . . 0 . . . ... . . . 0 . . . 0 . . . 0 . . . ... . . . Z (λn ) Peder, Vaibhava, Charles and John Divided Differences 101 1 C C C C C C C C C C C C C C C C C C C C C C C C C C C A . Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula >> n=10; >> lambda=0:1/n:1; >> J=diag(lambda); >> for i=1:n, J(i,i+1)=1; end >> E=expm(J); >> log(E(1,n+1)) ans = -14.6002 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula >> n=10; >> lambda=0:1/n:1; >> J=diag(lambda); >> for i=1:n, J(i,i+1)=1; end >> E=expm(J); >> log(E(1,n+1)) ans = -14.6002 >> logz10 logz10 = -14.6002 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula >> n=100; >> lambda=0:1/n:1; >> J=diag(lambda); >> for i=1:n, J(i,i+1)=1; end >> E=expm(J); >> log(E(1,n+1)) ans = -239.4318 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula >> n=100; >> lambda=0:1/n:1; >> J=diag(lambda); >> for i=1:n, J(i,i+1)=1; end >> E=expm(J); >> log(E(1,n+1)) ans = -239.4318 >> logz100 logz100 = -363.2390 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Computing the Matrix Exponential is Hard Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula >> n=100; >> lambda=0:1/n:1; >> J=diag(lambda); >> for i=1:n, J(i,i+1)=1; end >> E=expm(J/32); >> E=E*E; E=E*E; E=E*E; E=E*E; E=E*E; >> log(E(1,n+1)) ans = -363.2383 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula >> n=100; >> lambda=0:1/n:1; >> J=diag(lambda); >> for i=1:n, J(i,i+1)=1; end >> E=expm(J/32); >> E=E*E; E=E*E; E=E*E; E=E*E; E=E*E; >> log(E(1,n+1)) ans = -363.2383 >> logz100 logz100 = -363.2390 Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models What are they? Examples Applications Opitz’ formula Computing log Z (λ) correctly We computed E (λ) by doing our own matrix exponential that takes advantage of J being sparse upper triangular. We also did all computations in the log domain and by using m m 2 e J = e J/2 and m e J/2 = I + J/2m + Peder, Vaibhava, Charles and John 1 (J/2m )2 + · · · 2! Divided Differences 101 Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments Definition (Exponential Family) T e λ φ(x) P(x|λ) = Z (λ) Z where Z (λ) = D D is the domain for x. Peder, Vaibhava, Charles and John T eλ Divided Differences 101 φ(x) dx (8) Examples Linear Exponential Family Fused Features Experiments Divided Differences Exponential Models Example (Normal Distribution) For diagonal covariance models: x d D = R , φN (x) = , x2 λ= µ/v −1/(2v) (9) The partition function is n log ZN (λ) = − 1 X µ2i + log(2πvi ). 2 vi i=1 Peder, Vaibhava, Charles and John Divided Differences 101 (10) Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments The Probability Simplex Definition (Probability Simplex) ( ) n X Pn = x ∈ Rn xi = 1, xi ≥ 0 . i=1 Peder, Vaibhava, Charles and John Divided Differences 101 (11) Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments Example (Dirichlet Distribution) Q D = Pn , φD (x) = log(x), Peder, Vaibhava, Charles and John ZD (λ) = Γ(λi + 1) P . Γ(d + i λi ) i Divided Differences 101 (12) Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments Example (Linear Exponential Family) D = Pn , φ(x) = x, Z (λ) = n X Q i=1 e λi . j6=i (λi − λj ) Z (λ) is the divided difference with respect to the exponential function. Peder, Vaibhava, Charles and John Divided Differences 101 (13) Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments Linear Exponential Family Plot Figure: Plot of a density for the linear exponential family Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments Dirichlet Distribution Plot Figure: Plot of a Dirichlet distribution Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments Cepstra linear discriminant analysis (LDA) based features: x ∈ R40 SPIF posterior features: p ∈ R44 x Normal + Linear : φ(x) = x2 p x Normal + Dirichlet : φ(x) = x2 . log(p) Peder, Vaibhava, Charles and John Divided Differences 101 Divided Differences Exponential Models Examples Linear Exponential Family Fused Features Experiments Results on BN50 fMMI only fMMI + linear exp family fMMI + Dirichlet Peder, Vaibhava, Charles and John WER 19.4% 18.8% 20.2% Divided Differences 101