Divided differences and the linear exponential family.

Divided Differences
Exponential Models
A Tutorial on Divided Differences and the Linear
Exponential Model
Peder Olsen, Vaibhava Goel, John Hershey and Charles
Micchelli
HLT Thursday Seminar
IBM TJ Watson Research Center
September 2, 2010
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
Outline
1
Divided Differences
What are they?
Examples
Applications
Opitz’ formula
2
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
The Beginning
T
eλ x
Z (λ)
P(x|λ) =
where
(1)
x is a posterior
)
n
X
x ∈ Rn xi = 1, xi ≥ 0
(
x ∈ Pn =
(2)
i=1
and log Z is the normalizer aka partition function
Z
Z (λ) =
T
eλ
φ(x)
dx =
Pn
Peder, Vaibhava, Charles and John
n
X
Q
i=1
e λi
.
j6=i (λi − λj )
Divided Differences 101
(3)
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Definition (Divided differences)
[λ] f
[λ1 , . . . , λn+1 ] f
= f (λ)
[λ2 , . . . , λn+1 ]f − [λ1 , . . . , λn ]f
.
=
λn+1 − λ1
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Definition (Divided differences)
[λ] f
[λ1 , . . . , λn+1 ] f
= f (λ)
[λ2 , . . . , λn+1 ]f − [λ1 , . . . , λn ]f
.
=
λn+1 − λ1
Example (Two Arguments)
Divided difference for two arguments:
[λ1 , λ2 ]f =
Peder, Vaibhava, Charles and John
f (λ2 ) − f (λ1 )
.
λ2 − λ1
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x
f(x)=2/3 x3−3 x2+10/3 x
2
1.5
1
f(x)
0.5
0
−0.5
−1
−1.5
−0.5
0
0.5
1
Peder, Vaibhava, Charles and John
1.5
x
2
2.5
Divided Differences 101
3
3.5
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x
f(x)=2/3 x3−3 x2+10/3 x
2
1.5
1
f(x)
0.5
[0,1]f=1
0
−0.5
−1
−1.5
−0.5
0
0.5
1
Peder, Vaibhava, Charles and John
1.5
x
2
2.5
Divided Differences 101
3
3.5
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x
f(x)=2/3 x3−3 x2+10/3 x
2
1.5
[0,1]f
1
[2,3]f
[0,1,2]f=−1
f(x)
0.5
0
−0.5
−1
[1,2]f
−1.5
−0.5
0
0.5
1
Peder, Vaibhava, Charles and John
1.5
x
2
2.5
Divided Differences 101
3
3.5
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
Computing [0, 1, 2, 3]f for f (x) = 2/3x 3 − 3x 2 + 10/3x
f(x)=2/3 x3−3 x2+10/3 x
2
1.5
[1,2,3]f
1
f(x)
0.5
0
[0,1,2,3]f=2/3
−0.5
−1
[0,1,2]f
−1.5
−0.5
0
0.5
1
Peder, Vaibhava, Charles and John
1.5
x
2
2.5
Divided Differences 101
3
3.5
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
Three Arguments
Example (3 arguments)
[λ1 , λ2 , λ3 ]f
=
=
[λ2 , λ3 ]f − [λ1 , λ2 ]f
λ3 − λ1
f (λ3 )−f (λ2 )
λ3 −λ2
−
f (λ2 )−f (λ1 )
λ2 −λ1
λ3 − λ1
= some magic
f (λ1 )
f (λ2 )
=
+
(λ1 − λ2 )(λ1 − λ3 ) (λ2 − λ1 )(λ2 − λ3 )
f (λ3 )
+
(λ3 − λ1 )(λ3 − λ1 )
Peder, Vaibhava, Charles and John
Divided Differences 101
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
Theorem
The general explicit formula is
[λ1 , . . . , λn ] f =
n
X
Q
i=1
f (λi )
.
j6=i (λi − λj )
The proof requires use of the Vandermonde determinant.
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
What is [λ, λ, . . . , λ]f ?
What is [λ, λ, . . . , λ]f ? The definition is unclear. Let’s compute
the limit.
Singularity or not?
[x, x + ]f
f (x + ) − f (x)
−→ f 0 (x).
=
→0
For more variables we can do a similar computation and see that
[λ, λ, . . . , λ]f =
Peder, Vaibhava, Charles and John
f (n−1) (λ)
.
(n − 1)!
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Power Functions
Define the i’th power function pi (x) = x i , then
[λ1 ]p0
[λ1 ]p1
[λ1 ]p2
[λ1 ]p3
[λ1 ]p4
[λ1 ]p5
=1
= λ1
= λ21
= λ31
= λ41
= λ51
[λ1 , λ2 ]p0
[λ1 , λ2 ]p1
[λ1 , λ2 ]p2
[λ1 , λ2 ]p3
[λ1 , λ2 ]p4
[λ1 , λ2 ]p5
Peder, Vaibhava, Charles and John
=0
=1
= λ1 + λ2
= λ21 + λ1 λ2 + λ22
= λ31 + λ21 λ2 + λ1 λ22 + λ32
P
= i+j=4 λi1 λj2
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Power Functions
Define the i’th power function pi (x) = x i , then
[λ1 , λ2 ]p0
[λ1 , λ2 ]p1
[λ1 , λ2 ]p2
[λ1 , λ2 ]p3
[λ1 , λ2 ]p4
=0
=1
= λ1 + λ2
= λ21 + λ1 λ2 + λ22
P
= i+j=3 λi1 λj2
Peder, Vaibhava, Charles and John
[λ1 , λ2 , λ3 ]p0
[λ1 , λ2 , λ3 ]p1
[λ1 , λ2 , λ3 ]p2
[λ1 , λ2 , λ3 ]p3
[λ1 , λ2 , λ3 ]p4
=0
=0
=1
= λ1 + λ2 + λ3
P
= i+j+k=2 λi1 λj2 λk3
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Power Functions
Define the i’th power function pi (x) = x i , then
[λ1 , λ2 , . . . , λn ]pn+k−1 =
X
n
Y
i
λjj
i1 +···+in =k j=1
The formula can be computed recursively and efficiently using
dynamic programming
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
The Exponential Function
def
Z (λ1 , . . . , λn ) = [λ1 , . . . , λn ] exp .
No slick formula for computing Z (yet)
Z (λ1 ) = e λ1
e λ1 − e λ2
Z (λ1 , λ2 ) =
λ1 − λ2
n
X
e λi
Q
Z (λ1 , . . . , λn ) =
.
j6=i (λi − λj )
i=1
Can’t compute Z (x, x) with this formula, but is it good otherwise?
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Derivatives of Z
The special case f = exp satisifies the remarkable property that its
own derivative can be computed from itself:
Theorem (Partition Function Derivative)
∂
Z (λ1 , . . . , λn ) = Z (λi , λ1 , . . . , λi , . . . , λn )
∂λi
for i = 1, 2, . . . , n.
Proof.
Compute LHS and RHS and verify that they are the same.
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
A Simple Formula
We can compute the formula for Z exactly when λ has a special
form:
nn
1 2
Z (0, , , . . . , 1) = (e 1/n − 1)n
n n
n!
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
A Simple Formula
We can compute the formula for Z exactly when λ has a special
form:
nn
1 2
Z (0, , , . . . , 1) = (e 1/n − 1)n
n n
n!
>> n = 10;
>> logz10 = n*log(n)-gammaln(n+1)+n*log(exp(1/n)-1)
logz10 = -14.6002
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
A Simple Formula
We can compute the formula for Z exactly when λ has a special
form:
nn
1 2
Z (0, , , . . . , 1) = (e 1/n − 1)n
n n
n!
>> n = 10;
>> logz10 = n*log(n)-gammaln(n+1)+n*log(exp(1/n)-1)
logz10 = -14.6002
>> n = 100;
>> logz100 = n*log(n)-gammaln(n+1)+n*log(exp(1/n)-1)
logz100 = -363.2390
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Numerical Instability Galore
We can also compute Z (λ) using the direct formula
Z (λ1 , . . . , λn ) =
n
X
Q
i=1
Peder, Vaibhava, Charles and John
e λi
j6=i (λi − λj )
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Numerical Instability Galore
We can also compute Z (λ) using the direct formula
Z (λ1 , . . . , λn ) =
n
X
Q
i=1
e λi
j6=i (λi − λj )
function [y] = logz(lambda)
n = length(lambda);
y = 0;
for i=1:n,
y = y + exp(lambda(i)) / ...
( prod(lambda(1:i-1)-lambda(i)) ...
* prod(lambda(i+1:n)-lambda(i)) );
end
y = log(y);
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Numerical Instability Galore
We can also compute Z (λ) using the direct formula
Z (λ1 , . . . , λn ) =
n
X
Q
i=1
e λi
j6=i (λi − λj )
>> y10 = logz([0:0.1:1])
y10 = -14.6014
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Numerical Instability Galore
We can also compute Z (λ) using the direct formula
Z (λ1 , . . . , λn ) =
n
X
Q
i=1
e λi
j6=i (λi − λj )
>> y10 = logz([0:0.1:1])
y10 = -14.6014
>> logz10
logz10 = -14.6002
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Numerical Instability Galore
We can also compute Z (λ) using the direct formula
Z (λ1 , . . . , λn ) =
n
X
Q
i=1
e λi
j6=i (λi − λj )
>> y100 = logz([0:0.01:1])
y100 = 128.5899
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Numerical Instability Galore
We can also compute Z (λ) using the direct formula
Z (λ1 , . . . , λn ) =
n
X
Q
i=1
e λi
j6=i (λi − λj )
>> y100 = logz([0:0.01:1])
y100 = 128.5899
>> logz100
logz100 = -363.2390
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
Direct divided difference computation
5
6
What are they?
Examples
Applications
Opitz’ formula
x 10
eλi /
Y
(λi −
λj )
j6=i
total sum
4
Value
2
0
−2
−4
−6
1
2
3
Figure: Plot showing e λi /
Z (λ) = 4.56 ∗ 10−7
4
Q
5
j6=i (λi
Peder, Vaibhava, Charles and John
6
Index i
7
8
9
10
11
− λj ). Largest term: 6 ∗ 105 . Total:
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Lagrange Interpolation
Given: Points (yi , xi ) for unknown function f .
Find: Polynomial p(x) s.t. yi = p(xi ).
Lagrange interpolation:
p(λ) =
n
X
Q
`i (λ)f (λi ),
where
i=1
Peder, Vaibhava, Charles and John
j6=i (λ − λj )
`i (λ) = Q
.
j6=i (λi − λj )
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Newton’s interpolation formula uses the divided differences:
p(λ) = [λ1 ]f + (λ − λ1 ) [λ1 , λ2 ]f
+(λ − λ1 )(λ − λ2 ) [λ1 , λ2 , λ3 ]f + . . .
+(λ − λ1 ) · · · (λ − λn−1 ) [λ1 , . . . , λn ]f .
The special case λ1 = λ2 = · · · = λn gives the Taylor series.
General case is a generalization of Taylor expansion.
Peder, Vaibhava, Charles and John
Divided Differences 101
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
Hermite–Genocchi Formula
The divided differences can also be used to compute a certain class
of integrals.
Theorem (Hermite–Genocchi)
Z
f (n−1) (λT x) dx,
[λ1 , λ2 , . . . , λn ]f =
(4)
Pn
Proof.
Verify for f = exp and use Laplace transform to write f in terms of
exp.
Peder, Vaibhava, Charles and John
Divided Differences 101
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
Theorem (Peano Kernel identity)
Let λmin = min {λ1 , . . . , λn } and λmax = max {λ1 , . . . , λn } then
1
[λ1 , λ2 , . . . , λn ]f =
(n − 1)!
Z
λmax
f (n−1) (t)M(t; λ1 , . . . , λn ) dt,
λmin
where M(t; λ1 , . . . , λn ) is the Peano kernel, which is the B-spline
of degree n − 2 at the points λ1 , . . . , λn .
Proof.
Make a variable substitution y1 = λT x, and y2 , . . . , yn something
else in Hermite–Genocchi formula.
Charles Micchelli generalized B-splines to higher dimensions! This
allowed new applications in 2 and 3 dimensional modeling.
Peder, Vaibhava, Charles and John
Divided Differences 101
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
plot of M(t; −1.0, 1.0)
0.5
0.4
M(t)
0.3
0.2
0.1
0
−0.1
−2
−1.5
−1
−0.5
0
t
0.5
1
1.5
Figure: Plot of M(t; −1, 1)
Peder, Vaibhava, Charles and John
Divided Differences 101
2
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
plot of M(t; −1.0, 0.0, 1.0)
1
0.8
M(t)
0.6
0.4
0.2
0
−2
−1.5
−1
−0.5
0
t
0.5
1
1.5
Figure: Plot of M(t; −1, 0, 1)
Peder, Vaibhava, Charles and John
Divided Differences 101
2
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
plot of M(t; −1.0, −0.5, 1.0)
1
0.8
M(t)
0.6
0.4
0.2
0
−2
−1.5
−1
−0.5
0
t
0.5
1
1.5
Figure: Plot of M(t; −1, −0.5, 1)
Peder, Vaibhava, Charles and John
Divided Differences 101
2
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
plot of M(t; −1.0, −0.3, 0.3, 1.0)
1.2
1
M(t)
0.8
0.6
0.4
0.2
0
−2
−1.5
−1
−0.5
0
t
0.5
1
1.5
Figure: Plot of M(t; −1, −1/3, 1/3, 1)
Peder, Vaibhava, Charles and John
Divided Differences 101
2
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
plot of M(t; −1.0, 0.0, 0.5, 1.0)
1.2
1
M(t)
0.8
0.6
0.4
0.2
0
−2
−1.5
−1
−0.5
0
t
0.5
1
1.5
Figure: Plot of M(t; −1, 0, 0.5, 1)
Peder, Vaibhava, Charles and John
Divided Differences 101
2
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Leibniz’ rule
Derivatives:(fg )0 = fg 0 + f 0 g
Divided differences: [λ0 , λ1 ](fg ) = [λ0 ]f [λ0 , λ1 ]g + [λ0 , λ1 ]f [λ1 ]g .
More generally:
[λ0 , . . . , λn ](fg ) = [λ0 ]f [λ0 , . . . , λn ]g
+[λ0 , λ1 ]f [λ1 , . . . , λn ]g
+ · · · + [λ0 , . . . , λn ]f [λn ]g .
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Definition (Divided Difference Matrix)

[λ0 ]f [λ0 , λ1 ]f . . . [λ0 , . . . , λn ]f
 0
[λ1 ]f
. . . [λ1 , . . . , λn ]f

Tf (λ) =  .
..
.
..
..
 ..
.
.
0
...
0
[λn ]f
Example (Identity Function)

λ0 1 0
 0 λ1 1

def

J = Tp1 (λ) =  0 0 λ2
 ..
..
.
.
0 0 0
Peder, Vaibhava, Charles and John

0 ··· 0
0 ··· 0 

1
0
.

.. ..

.
.
0
λn
Divided Differences 101





(5)
(6)
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
DDM for exp
Example (Exponential Function)



E (λ) = Texp (λ) = 

Z (λ0 ) Z (λ0 , λ1 ) . . .
Z(λ)
0
Z (λ1 )
. . . Z (λ1 , . . . , λn )
..
..
..
..
.
.
.
.
0
...
0
Z (λn )
Peder, Vaibhava, Charles and John
Divided Differences 101



.

What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
Opitz’ Formula
Theorem (Opitz)
The divided difference [λ1 , . . . , λn ]f is the (1, n)’th element of the
matrix
∞
X
f (i) (0) i
Tf (λ) =
J
(7)
i!
i=0
if the Taylor series of f is everywhere convergent.
Proof.
Trivially: Tf +g (λ) = Tf (λ) + Tg (λ).
By Leibniz’ rule: Tf ·g (λ) = Tf (λ) · Tg (λ).
The rest follows by Taylor expansion
Peder, Vaibhava, Charles and John
Divided Differences 101
What are they?
Examples
Applications
Opitz’ formula
Divided Differences
Exponential Models
A Monster Formula
We can compute Z (λ) from E (λ) = e J . ∇Z (λ) can be computed
from
0
E (λ, λ) =
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
Z (λ1 )
0
0
...
...
...
Z(λ)
Z (λ2 , . . . , λn )
Z (λ3 , . . . , λn )
Z(λ, λ1 )
Z(λ)
Z (λ1 , λ3 , . . . , λn )
Z (λ, λ1 , λ2 )
Z(λ, λ2 )
Z(λ)
...
...
...
Z (λ, λ)
Z (λ, λ2 , . . . , λn )
Z (λ, λ3 , . . . , λn )
.
.
.
0
0
.
.
.
...
...
.
.
.
Z (λn )
0
.
.
.
Z (λ1 , λn )
Z (λ1 )
.
.
.
Z (λ1 , λ2 , λn )
Z (λ1 , λ2 )
.
.
.
...
...
.
.
.
Z(λ, λn )
Z(λ)
.
.
.
0
.
.
.
...
.
.
.
0
.
.
.
0
.
.
.
0
.
.
.
...
.
.
.
Z (λn )
Peder, Vaibhava, Charles and John
Divided Differences 101
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
.
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
>> n=10;
>> lambda=0:1/n:1;
>> J=diag(lambda);
>> for i=1:n,
J(i,i+1)=1;
end
>> E=expm(J);
>> log(E(1,n+1))
ans = -14.6002
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
>> n=10;
>> lambda=0:1/n:1;
>> J=diag(lambda);
>> for i=1:n,
J(i,i+1)=1;
end
>> E=expm(J);
>> log(E(1,n+1))
ans = -14.6002
>> logz10
logz10 = -14.6002
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
>> n=100;
>> lambda=0:1/n:1;
>> J=diag(lambda);
>> for i=1:n,
J(i,i+1)=1;
end
>> E=expm(J);
>> log(E(1,n+1))
ans = -239.4318
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
>> n=100;
>> lambda=0:1/n:1;
>> J=diag(lambda);
>> for i=1:n,
J(i,i+1)=1;
end
>> E=expm(J);
>> log(E(1,n+1))
ans = -239.4318
>> logz100
logz100 = -363.2390
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Computing the Matrix Exponential is Hard
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
>> n=100;
>> lambda=0:1/n:1;
>> J=diag(lambda);
>> for i=1:n,
J(i,i+1)=1;
end
>> E=expm(J/32);
>> E=E*E; E=E*E; E=E*E; E=E*E; E=E*E;
>> log(E(1,n+1))
ans = -363.2383
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
>> n=100;
>> lambda=0:1/n:1;
>> J=diag(lambda);
>> for i=1:n,
J(i,i+1)=1;
end
>> E=expm(J/32);
>> E=E*E; E=E*E; E=E*E; E=E*E; E=E*E;
>> log(E(1,n+1))
ans = -363.2383
>> logz100
logz100 = -363.2390
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
What are they?
Examples
Applications
Opitz’ formula
Computing log Z (λ) correctly
We computed E (λ) by doing our own matrix exponential that
takes advantage of J being sparse upper triangular. We also did all
computations in the log domain and by using
m
m 2
e J = e J/2
and
m
e J/2 = I + J/2m +
Peder, Vaibhava, Charles and John
1
(J/2m )2 + · · ·
2!
Divided Differences 101
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Definition (Exponential Family)
T
e λ φ(x)
P(x|λ) =
Z (λ)
Z
where
Z (λ) =
D
D is the domain for x.
Peder, Vaibhava, Charles and John
T
eλ
Divided Differences 101
φ(x)
dx
(8)
Examples
Linear Exponential Family
Fused Features
Experiments
Divided Differences
Exponential Models
Example (Normal Distribution)
For diagonal covariance models:
x
d
D = R , φN (x) =
,
x2
λ=
µ/v
−1/(2v)
(9)
The partition function is
n
log ZN (λ) = −
1 X µ2i
+ log(2πvi ).
2
vi
i=1
Peder, Vaibhava, Charles and John
Divided Differences 101
(10)
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
The Probability Simplex
Definition (Probability Simplex)
(
)
n
X
Pn = x ∈ Rn xi = 1, xi ≥ 0 .
i=1
Peder, Vaibhava, Charles and John
Divided Differences 101
(11)
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Example (Dirichlet Distribution)
Q
D = Pn ,
φD (x) = log(x),
Peder, Vaibhava, Charles and John
ZD (λ) =
Γ(λi + 1)
P
.
Γ(d + i λi )
i
Divided Differences 101
(12)
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Example (Linear Exponential Family)
D = Pn ,
φ(x) = x, Z (λ) =
n
X
Q
i=1
e λi
.
j6=i (λi − λj )
Z (λ) is the divided difference with respect to the exponential
function.
Peder, Vaibhava, Charles and John
Divided Differences 101
(13)
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Linear Exponential Family Plot
Figure: Plot of a density for the linear exponential family
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Dirichlet Distribution Plot
Figure: Plot of a Dirichlet distribution
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Cepstra linear discriminant analysis (LDA) based features: x ∈ R40
SPIF posterior features: p ∈ R44


x
Normal + Linear : φ(x) =  x2 
p


x
Normal + Dirichlet : φ(x) =  x2 .
log(p)
Peder, Vaibhava, Charles and John
Divided Differences 101
Divided Differences
Exponential Models
Examples
Linear Exponential Family
Fused Features
Experiments
Results on BN50
fMMI only
fMMI + linear exp family
fMMI + Dirichlet
Peder, Vaibhava, Charles and John
WER
19.4%
18.8%
20.2%
Divided Differences 101