arXiv:0904.1907v1  [cs.IT]  13 Apr 2009
Average Entropy Functions
Qi Chen, Chen He, Lingge Jiang, Qingchuan Wang
Dept. of Electronic Eng.
Shanghai Jiao Tong Univ.
Shanghai, China 200240
Email: {cq094, chenhe, lgjiang, r6144}@sjtu.edu.cn
Abstract— THIS PAPER IS ELIGIBLE FOR THE STUDENT
PAPER AWARD. The closure of the set of entropy functions
associated with n discrete variables, Γ
∗
n, is a convex cone in
(2n −1)-dimensional space, but its full characterization remains
an open problem. In this paper, we map Γ
∗
n to an n-dimensional
region Φ
∗
n by averaging the joint entropies with the same number
of variables, and show that the simpler Φ
∗
n can be characterized
solely by the Shannon-type information inequalities.
I. INTRODUCTION
Given an n-dimensional discrete random vector X
=
(X1, . . . , Xn), for each non-empty subset α of N
=
{1, 2, . . ., n} there is a joint entropy H(Xα) with Xα =
(Xi)i∈α, and the 2n −1 joint entropies form the entropy
function (H(Xα))α⊆N,α̸=∅of X. We can then deﬁne Γ∗
n ⊆
R2n−1 as the set of all possible entropy functions involving
n discrete random variables, and Γ
∗
n as its closure. A vector
H ∈R2n−1 is called entropic if H ∈Γ∗
n, and almost entropic
if H ∈Γ
∗
n [1].
All H
=
(Hα)α⊂N,α̸=∅
∈
Γ
∗
n satisfy the following
Shannon-type information inequalities for any subsets α, β
of N (we let H∅= 0 for convenience):
Hα ≥0,
(1)
Hα ≤Hβ,
α ⊆β,
(2)
Hα + Hβ ≥H(α∪β) + H(α∩β).
(3)
However, (1)–(3) are not sufﬁcient conditions for an H ∈
R2n−1 to be almost entropic when n ≥4 [2]. In other words,
denoting by Γn the set of vectors in R2n−1 satisfying (1)–(3),
we have
Γ
∗
n ⊊Γn,
n ≥4.
(4)
A number of non-Shannon-type information inequalities sat-
isﬁed by the members of Γ
∗
n have subsequently been found in
[2]–[4], but the full characterization of Γ
∗
n remains an open
problem.
In this paper, we will show that an averaged version of Γ
∗
n
can be more easily characterized.
Deﬁnition 1: For a vector H = (Hα)α⊂N,α̸=∅∈R2n−1,
we deﬁne its average as
Ψ(H) ≜(h1, . . . , hn),
(5)
where hk =
 n
k
−1 P
|α|=k Hα. If H is the entropy function
of random vector X, we call h = Ψ(H) the average entropy
function. Ψ then maps Γ∗
n to the set Φ∗
n ≜Ψ(Γ∗
n) of all
average entropy functions, Γ
∗
n to the closure Φ
∗
n, and Γn to
Φn ≜Ψ(Γn).
From the deﬁnition (1)–(3) of Γn, Φn can be given by
Φn = {(h1, . . . , hn) | hk−1 −2hk + hk+1 ≤0,
k = 1, . . . , n},
(6)
where we let h0 = 0 and hn+1 = hn for convenience. Φ
∗
n is
obviously a subset of Φn since Γ
∗
n ⊆Γn, but we will show that
they are actually equal. In other words, Φ
∗
n is characterizable
solely with the Shannon-type information inequalities.
Theorem 1: Φ
∗
n = Φn.
This theorem will be proved in the next section.
II. PROOF OF THE THEOREM
It is only necessary to prove that Φn ⊆Φ
∗
n. We ﬁrst
introduce a one-to-one transform to give Φn a simpler form.
Deﬁnition 2: For a vector h = (h1, . . . , hn) ∈Rn, we
deﬁne its second-order difference as
Θ(h) = (g1, . . . , gn),
(7)
where gk = hk−1 −2hk + hk+1, k = 1, . . . , n, with h0 = 0
and hn+1 = hn. Θ maps Φ∗
n to Λ∗
n ≜Θ(Φ∗
n), Φ
∗
n to Λ
∗
n, and
Φn to Λn ≜Θ(Φn).
From (6), we have
Λn = {(g1, . . . , gn) | gk ≤0, k = 1, . . . , n}.
(8)
As Ψ and Θ are both linear maps, and Γ
∗
n is a convex cone
[5], Φ
∗
n and Λ
∗
n are both convex cones as well. Therefore, to
prove that Φn ⊆Φ
∗
n or equivalently Λn ⊆Λ
∗
n, it is sufﬁcient
to prove that
gk ≜(0, . . . , 0
| {z }
k−1
, −a, 0, . . . , 0) ∈Λ∗
n
(9)
for k = 1, . . . , n and some a > 0. In other words, for each
k we need to ﬁnd a random vector X whose average entropy
function is
hk ≜Θ−1(gk) = a · (1, 2, . . . , k, . . . , k).
(10)
This X can be constructed from a Reed-Solomon code.
Speciﬁcally, let q be a power-of-two larger than n, C be the
codeword set of an (n, k) Reed-Solomon code on GF(q),
and X = (X1, . . . , Xn) be a random codeword uniformly
distributed over C, then the entropy function of X is (10) with
a = log q, as shown below.
Let j1, . . . , jn be distinct indices in 1, . . . , n. Accord-
ing to the properties of Reed-Solomon codes, given any
x∗
j1, . . . , x∗
jk
∈
GF(q),
there
exists
a
unique
x
=
(x1, . . . , xn) ∈C with xjl = x∗
jl, l = 1, . . . , k. For any x∗
j1 ∈
GF(q), there are thus qk−1 codewords x ∈C with xj1 = x∗
j1,
one for each value combination on k −1 other positions, and
since X is equal to each codeword with probability q−k, we
have p(Xj1 = x∗
j1) = q−1, so H(Xj1) = log q. Similarly,
H(Xj1, Xj2) = 2 log q, ..., H(Xj1, . . . , Xjk) = k log q. For
l = k+1, . . . , n, given xj1, . . . , xjl, there is either one match-
ing codeword in C or none, therefore p(Xj1 = xj1, . . . , Xjl =
xjl) is q−k on its support, and H(Xj1, . . . , Xjl) = k log q.
Consequently, the average entropy function of X is (10) with
a = log q as desired, and for each l, all
 n
l

l-variable joint
entropies of X that are being averaged actually have the same
value.
III. DISCUSSION
Determination of Γ
∗
n is important due to its close connection
to the capacity region of general multi-source multi-sink wired
networks [6], [7], but this seems to be a difﬁcult problem,
and even if a full characterization is found, computational
difﬁculties due to Γ
∗
n’s high dimensionality and complex
structure might reduce its usefulness in practice [8]. What we
have shown is that the region Φ
∗
n obtained by averaging the
k-variable joint entropies has a much simpler structure: it is
not affected by the non-Shannon information inequalities, and
the linear Reed-Solomon codes used in the proof suggest that
the suboptimality of linear network coding is also hidden by
this averaging. On one hand, this means that further work
on the characterization of Γ
∗
n must focus on the variation
among the k-variable entropies, not just their averages. On
the other hand, many practically interesting networks have a
somewhat symmetric structure, possibly in a statistical sense,
and an appropriately averaged version of Γ
∗
n (not necessarily
as simplistic as Φ
∗
n) might provide a tractable method for the
determination of their capacity regions.
Average entropy functions are also closely related to the
MAP EXIT functions discussed in e.g. [9] for large n.
ACKNOWLEDGMENT
This paper was supported by National Natural Science
Foundation of China Grants No. 60772100 and No. 60872017.
REFERENCES
[1] R. W. Yeung, “A framework for linear information inequalities,” IEEE
Trans. Inf. Theory, vol. 43, pp. 1924–1934, Nov. 1997.
[2] Z. Zhang and R. W. Yeung, “On characterization of entropy function via
information inequalities,” IEEE Trans. Inf. Theory, vol. 44, pp. 1440–
1452, Nov. 1998.
[3] X. Yan, R. Yeung and Z. Zhang, “A class of non-Shannon type infor-
mation inequalities and their applications,” IEEE Int. Symp. Inf. Theory,
Washington, DC, June 2001.
[4] R. Doughter, C. Freiling and K. Zeger, “Six new non-Shannon information
inequalities,” IEEE Int. Symp. Inf. Theory, Seattle, WA, June 2006.
[5] Z. Zhang and R. W. Yeung, “A non-Shannon type conditional inequality
of information quantities,” IEEE Trans. Inf. Theory, vol. 43, pp. 1982–
1986, Nov. 1997.
[6] X. Yan, R. Yeung and Z. Zhang, “The capacity region for multi-source
multi-sink network coding,” IEEE Int. Symp. Inf. Theory, Nice, France,
June 2007.
[7] T. Chan and A. Grant, “Dualities between entropy functions and network
codes,” IEEE Trans. Inf. Theory, vol. 54, no. 10, pp. 4470–4487, Oct.
2008.
[8] F. Mat´uˇs, “Inﬁnitely many information inequalities,” IEEE Int. Symp. Inf.
Theory, Nice, France, June 2007.
[9] C. Measson, A. Montanari, and R. Urbanke, “Maxwell construction: The
hidden bridge between iterative and maximum a posteriori decoding,”
Jun. 2005, arXiv:cs.IT/0506083.