arXiv:0904.1907v1 [cs.IT] 13 Apr 2009 Average Entropy Functions Qi Chen, Chen He, Lingge Jiang, Qingchuan Wang Dept. of Electronic Eng. Shanghai Jiao Tong Univ. Shanghai, China 200240 Email: {cq094, chenhe, lgjiang, r6144}@sjtu.edu.cn Abstract— THIS PAPER IS ELIGIBLE FOR THE STUDENT PAPER AWARD. The closure of the set of entropy functions associated with n discrete variables, Γ ∗ n, is a convex cone in (2n −1)-dimensional space, but its full characterization remains an open problem. In this paper, we map Γ ∗ n to an n-dimensional region Φ ∗ n by averaging the joint entropies with the same number of variables, and show that the simpler Φ ∗ n can be characterized solely by the Shannon-type information inequalities. I. INTRODUCTION Given an n-dimensional discrete random vector X = (X1, . . . , Xn), for each non-empty subset α of N = {1, 2, . . ., n} there is a joint entropy H(Xα) with Xα = (Xi)i∈α, and the 2n −1 joint entropies form the entropy function (H(Xα))α⊆N,α̸=∅of X. We can then define Γ∗ n ⊆ R2n−1 as the set of all possible entropy functions involving n discrete random variables, and Γ ∗ n as its closure. A vector H ∈R2n−1 is called entropic if H ∈Γ∗ n, and almost entropic if H ∈Γ ∗ n [1]. All H = (Hα)α⊂N,α̸=∅ ∈ Γ ∗ n satisfy the following Shannon-type information inequalities for any subsets α, β of N (we let H∅= 0 for convenience): Hα ≥0, (1) Hα ≤Hβ, α ⊆β, (2) Hα + Hβ ≥H(α∪β) + H(α∩β). (3) However, (1)–(3) are not sufficient conditions for an H ∈ R2n−1 to be almost entropic when n ≥4 [2]. In other words, denoting by Γn the set of vectors in R2n−1 satisfying (1)–(3), we have Γ ∗ n ⊊Γn, n ≥4. (4) A number of non-Shannon-type information inequalities sat- isfied by the members of Γ ∗ n have subsequently been found in [2]–[4], but the full characterization of Γ ∗ n remains an open problem. In this paper, we will show that an averaged version of Γ ∗ n can be more easily characterized. Definition 1: For a vector H = (Hα)α⊂N,α̸=∅∈R2n−1, we define its average as Ψ(H) ≜(h1, . . . , hn), (5) where hk = n k −1 P |α|=k Hα. If H is the entropy function of random vector X, we call h = Ψ(H) the average entropy function. Ψ then maps Γ∗ n to the set Φ∗ n ≜Ψ(Γ∗ n) of all average entropy functions, Γ ∗ n to the closure Φ ∗ n, and Γn to Φn ≜Ψ(Γn). From the definition (1)–(3) of Γn, Φn can be given by Φn = {(h1, . . . , hn) | hk−1 −2hk + hk+1 ≤0, k = 1, . . . , n}, (6) where we let h0 = 0 and hn+1 = hn for convenience. Φ ∗ n is obviously a subset of Φn since Γ ∗ n ⊆Γn, but we will show that they are actually equal. In other words, Φ ∗ n is characterizable solely with the Shannon-type information inequalities. Theorem 1: Φ ∗ n = Φn. This theorem will be proved in the next section. II. PROOF OF THE THEOREM It is only necessary to prove that Φn ⊆Φ ∗ n. We first introduce a one-to-one transform to give Φn a simpler form. Definition 2: For a vector h = (h1, . . . , hn) ∈Rn, we define its second-order difference as Θ(h) = (g1, . . . , gn), (7) where gk = hk−1 −2hk + hk+1, k = 1, . . . , n, with h0 = 0 and hn+1 = hn. Θ maps Φ∗ n to Λ∗ n ≜Θ(Φ∗ n), Φ ∗ n to Λ ∗ n, and Φn to Λn ≜Θ(Φn). From (6), we have Λn = {(g1, . . . , gn) | gk ≤0, k = 1, . . . , n}. (8) As Ψ and Θ are both linear maps, and Γ ∗ n is a convex cone [5], Φ ∗ n and Λ ∗ n are both convex cones as well. Therefore, to prove that Φn ⊆Φ ∗ n or equivalently Λn ⊆Λ ∗ n, it is sufficient to prove that gk ≜(0, . . . , 0 | {z } k−1 , −a, 0, . . . , 0) ∈Λ∗ n (9) for k = 1, . . . , n and some a > 0. In other words, for each k we need to find a random vector X whose average entropy function is hk ≜Θ−1(gk) = a · (1, 2, . . . , k, . . . , k). (10) This X can be constructed from a Reed-Solomon code. Specifically, let q be a power-of-two larger than n, C be the codeword set of an (n, k) Reed-Solomon code on GF(q), and X = (X1, . . . , Xn) be a random codeword uniformly distributed over C, then the entropy function of X is (10) with a = log q, as shown below. Let j1, . . . , jn be distinct indices in 1, . . . , n. Accord- ing to the properties of Reed-Solomon codes, given any x∗ j1, . . . , x∗ jk ∈ GF(q), there exists a unique x = (x1, . . . , xn) ∈C with xjl = x∗ jl, l = 1, . . . , k. For any x∗ j1 ∈ GF(q), there are thus qk−1 codewords x ∈C with xj1 = x∗ j1, one for each value combination on k −1 other positions, and since X is equal to each codeword with probability q−k, we have p(Xj1 = x∗ j1) = q−1, so H(Xj1) = log q. Similarly, H(Xj1, Xj2) = 2 log q, ..., H(Xj1, . . . , Xjk) = k log q. For l = k+1, . . . , n, given xj1, . . . , xjl, there is either one match- ing codeword in C or none, therefore p(Xj1 = xj1, . . . , Xjl = xjl) is q−k on its support, and H(Xj1, . . . , Xjl) = k log q. Consequently, the average entropy function of X is (10) with a = log q as desired, and for each l, all n l  l-variable joint entropies of X that are being averaged actually have the same value. III. DISCUSSION Determination of Γ ∗ n is important due to its close connection to the capacity region of general multi-source multi-sink wired networks [6], [7], but this seems to be a difficult problem, and even if a full characterization is found, computational difficulties due to Γ ∗ n’s high dimensionality and complex structure might reduce its usefulness in practice [8]. What we have shown is that the region Φ ∗ n obtained by averaging the k-variable joint entropies has a much simpler structure: it is not affected by the non-Shannon information inequalities, and the linear Reed-Solomon codes used in the proof suggest that the suboptimality of linear network coding is also hidden by this averaging. On one hand, this means that further work on the characterization of Γ ∗ n must focus on the variation among the k-variable entropies, not just their averages. On the other hand, many practically interesting networks have a somewhat symmetric structure, possibly in a statistical sense, and an appropriately averaged version of Γ ∗ n (not necessarily as simplistic as Φ ∗ n) might provide a tractable method for the determination of their capacity regions. Average entropy functions are also closely related to the MAP EXIT functions discussed in e.g. [9] for large n. ACKNOWLEDGMENT This paper was supported by National Natural Science Foundation of China Grants No. 60772100 and No. 60872017. REFERENCES [1] R. W. Yeung, “A framework for linear information inequalities,” IEEE Trans. Inf. Theory, vol. 43, pp. 1924–1934, Nov. 1997. [2] Z. Zhang and R. W. Yeung, “On characterization of entropy function via information inequalities,” IEEE Trans. Inf. Theory, vol. 44, pp. 1440– 1452, Nov. 1998. [3] X. Yan, R. Yeung and Z. Zhang, “A class of non-Shannon type infor- mation inequalities and their applications,” IEEE Int. Symp. Inf. Theory, Washington, DC, June 2001. [4] R. Doughter, C. Freiling and K. Zeger, “Six new non-Shannon information inequalities,” IEEE Int. Symp. Inf. Theory, Seattle, WA, June 2006. [5] Z. Zhang and R. W. Yeung, “A non-Shannon type conditional inequality of information quantities,” IEEE Trans. Inf. Theory, vol. 43, pp. 1982– 1986, Nov. 1997. [6] X. Yan, R. Yeung and Z. Zhang, “The capacity region for multi-source multi-sink network coding,” IEEE Int. Symp. Inf. Theory, Nice, France, June 2007. [7] T. Chan and A. Grant, “Dualities between entropy functions and network codes,” IEEE Trans. Inf. Theory, vol. 54, no. 10, pp. 4470–4487, Oct. 2008. [8] F. Mat´uˇs, “Infinitely many information inequalities,” IEEE Int. Symp. Inf. Theory, Nice, France, June 2007. [9] C. Measson, A. Montanari, and R. Urbanke, “Maxwell construction: The hidden bridge between iterative and maximum a posteriori decoding,” Jun. 2005, arXiv:cs.IT/0506083.