4 - arXiv

1
Joint source-channel coding with feedback
arXiv:1501.07640v1 [cs.IT] 30 Jan 2015
(extended)
Victoria Kostina
Yury Polyanskiy
Sergio Verd´u
California Institute of Technology
Massachusetts Institute of Technology
Princeton University
Pasadena, CA 91125
Cambridge, MA 02139
Princeton, NJ 08544
vkostina@caltech.edu
yp@mit.edu
verdu@princeton.edu
Abstract
This paper quantifies the fundamental limits of variable-length transmission of a general (possibly
analog) source over a memoryless channel with noiseless feedback, under a distortion constraint. We
consider excess distortion, average distortion and guaranteed distortion (d-semifaithful codes). In contrast
to the asymptotic fundamental limit, a general conclusion is that allowing variable-length codes and
feedback leads to a sizable improvement in the fundamental delay-distortion tradeoff. In addition, we
investigate the minimum energy required to reproduce k source samples with a given fidelity after
transmission over a memoryless Gaussian channel, and we show that the required minimum energy is
reduced with feedback and an average (rather than maximal) power constraint.
Index Terms
Variable-length coding, joint source-channel coding, lossy compression, single-shot method, finiteblocklength regime, rate-distortion theory, feedback, memoryless channels, Gaussian channels, energydistortion tradeoff, Shannon theory.
I. I NTRODUCTION
A famous result of Shannon [1] states that feedback cannot increase the capacity of memoryless
channels. Burnashev [2] showed that feedback improves the error exponent in a variable-length
This work was supported in part by the National Science Foundation (NSF) under Grant CCF-1016625, by the NSF CAREER
award under Grant CCF-1253205, and by the Center for Science of Information (CSoI), an NSF Science and Technology Center,
under Grant CCF-0939370.
February 2, 2015
DRAFT
2
setting. Polyanskiy et al. [3] showed that allowing variable-length coding and non-vanishing
error-probability ǫ boosts the ǫ-capacity of the discrete memoryless channel (DMC) by a factor
of 1 − ǫ. Furthermore, as shown in [3], if both feedback and variable-length coding are allowed,
C
is approached at a fast speed O logℓ ℓ as the average allowable delay
the asymptotic limit 1−ǫ
ℓ increases:1
(1 − ǫ) log M ⋆ (ℓ, ǫ) = ℓC + O (log ℓ)
(1)
where M ⋆ (ℓ, ǫ) is the maximum number of messages that can be distinguished with error
probability ǫ at the average delay ℓ, and C is the channel capacity. This is in contrast to
channel coding at fixed blocklength n where in most cases only a O √1n convergence rate
is attainable, even when feedback is available, see [3], [4]. Thus, variable-length coding with
feedback (VLF) not only boosts the ǫ-capacity of the channel, but also markedly accelerates
the speed of approach to it. Moreover, zero-error communication is possible at an average rate
arbitrarily close to capacity via VLFT codes, a class of codes that employs a special termination
symbol to signal the end of transmission, which always recognized error-free by the receiver
[3]. As discussed in [3], the availability of zero-error termination symbols models that common
situation in which timing information is managed by a higher layer whose reliability is much
higher than that of the payload.
In [5], we treated variable-length data compression with nonzero excess distortion probability.
In particular, we showed that in fixed-to-variable-length compression of a block of k i.i.d. source
outcomes, the minimum average encoded length ℓ⋆ (k, d, ǫ) compatible with probability ǫ of
exceeding distortion threshold d satisfies, under regularity assumptions,
r
kV(d) − (Q−1 (ǫ))2
2
ℓ⋆ (k, d, ǫ) = (1 − ǫ)kR(d) −
e
+ O (log k)
2π
(2)
where R(d) and V(d) are the rate-distortion and the rate-dispersion functions, and Q is the
standard normal complementary cumulative distribution function. The second term in the expansion (2) becomes more natural if one notices
Q−1 (ǫ)2
1
E [Z · 1{Z > Q−1 (ǫ)] = √ e− 2 .
2π
1
Unless explicitly noted, all log and exp in this paper are to arbitrary matching base, which also defines units of entropy and
mutual information.
February 2, 2015
DRAFT
3
As elaborated in [5], the expansion (2) has an unusual feature: the asymptotic fundamental limit
is approached from the “wrong” side, e.g. larger dispersions and shorter blocklengths reduce the
average compression rate.
In this paper, we consider variable-length transmission of a general (possibly analog) source
over a DMC with feedback, under a distortion constraint. This variable-length joint sourcechannel coding (JSCC) setting can be viewed as a generalization of the setups in [3], [5].
Related work includes an assessment of nonasymptotic fundamental limits of fixed-length JSCC
in [6]–[8], a dynamic programming formulation of zero-delay JSCC with feedback in [9], and a
practical variable-length almost lossless joint compression/transmission scheme in [10]. Various
feedback coding strategies are discussed in [11]–[17]. Practical feedback communication schemes
in the literature that implement VLF include [18]–[21].
We treat several scenarios that differ in how the distortion is evaluated and whether a termination symbol is allowed. In all cases, we analyze the average delay required to achieve the
objective. The summary of our results in Section III, where as before, C, R(d), V(d) denote the
channel capacity, and the source rate-distortion and rate-dispersion functions, is:
•
Under average distortion criterion, E [d(S k , Sˆk )] ≤ d, the minimal average delay ℓ⋆ (k, d)
attainable via VLF codes transmitting k source symbols satisfies
ℓ⋆ (k, d)C = kR(d) + O (log k) .
(3)
•
Under excess distortion criterion, P[d(S k , Sˆk ) > d] ≤ ǫ, the minimal average delay attainable
•
via VLF codes transmitting k source symbols satisfies
r
kV(d) − (Q−1 (ǫ))2
2
e
+ O (log k) .
(4)
ℓ⋆ (k, d, ǫ) C = (1 − ǫ)kR(d) −
2π
Under guaranteed distortion criterion, P[d(S k , Sˆk ) > d] = 0, the minimal average delay
attainable via VLFT codes transmitting k source symbols satisfies
ℓ⋆t (k, d, 0) C = kR(d) + O (log k) .
(5)
Similar to (1), approaching the limits in (3), (4) and (5) only requires an extremely thin
feedback link, namely, the decoder sends just a single acknowledgement signal once it is
ready to decode (stop-feedback)2 . Note that (4) exhibits significant similarities with (2): the
2
Stop-feedback is not to be confused with the termination symbol, which is a special symbol that the encoder can transmit
error-free in order to tell the decoder that the transmission has ended and it is time to decode.
February 2, 2015
DRAFT
4
asymptotic limit is approached from below, i.e. in contrast to the results in [6], [22], [23],
smaller blocklengths and larger source dispersions are beneficial. Note also that the first term of
the expansion in (4) can be attained with variable-length codes without feedback.
Interestingly, naive separated source/channel coding fails to attain any of the limits mentioned.
For example, even the sign of the second term in (4) is not attainable. This observation led us to
believe, initially, that competitive schemes in this setting should be of successive refinement and
adaptation sort such as in [24], [25], or dynamic programming-like as in [9], [26]. It turns out,
however, that like the fixed-length JSCC achievability schemes in [6], [7], attaining limits (3)(5) requires a rather simple variation on the separation architecture: one only needs to allow a
variable-length interface between the source coder and the channel coder. Note that typically,
separation is understood in the sense that the output of the source coder (compressor) is treated
as pure bits, which can be arbitrarily permuted without affecting performance of the concatenated
scheme [8], [27]. Similarly, the performance of a variable-length separated scheme is insensitive
to permutations (but not additions or deletions) of the bits at the output of the source coder.
These semi-joint achievability schemes are the subject of Section II.
Energy-limited codes are the subject of Section IV. The optimal energy-distortion tradeoff
achievable in the transmission of a general source over the AWGN channel is studied in Section
V. In that setting, disposing of the restriction on the number of channel uses per source sample,
we limit the total available transmitter energy E and we study the tradeoff between the source
dimension k, the total energy E and the fidelity of reproduction. Related prior work includes
a study of asymptotic energy-distortion tradeoffs [28] and a nonasymptotic analysis of energy
per bit required to reliably send k bits through an AWGN channel [29]. The main results in
Section V are the following:
•
Under average distortion constraint, the total minimum energy required to transmit k source
samples over an AWGN channel with feedback satisfies
log e
Ef⋆ (k, d) ·
= kR(d) + O(log k) ,
N0
where N20 is the noise power per degree of freedom.
•
(6)
Under excess distortion constraint, the total minimum energy required to transmit k source
samples over an AWGN channel without feedback satisfies
p
log e
E ⋆ (k, d, ǫ) ·
= kR(d) + k (2R(d) log e + V(d))Q−1 (ǫ) + O (log k)
N0
February 2, 2015
(7)
DRAFT
5
•
Under excess distortion constraint, the total minimum average energy3 required to transmit
k source samples over an AWGN channel with feedback satisfies
r
kV(d) − (Q−1 (ǫ))2
log
e
2
= kR(d)(1 − ǫ) −
e
Ef⋆ (k, d, ǫ) ·
+ O (log k)
N0
2π
(8)
Particularizing (8) to ǫ = 0 also covers the case of guaranteed distortion. The first term in
the expansion (8) can be achieved even without feedback, as long as ǫ > 0 and the power
constraint is understood on the average over the codebook.
We point out the following parallels between the variable-length codes and energy-limitedcodes.
•
Under average distortion, in both cases the fundamental limit is approached at speed O
(3), (6)).
•
log k
k
(cf.
Allowing a non-vanishing excess-distortion probability and variable length (or variable
power) boosts the asymptotic fundamental limit by a factor of 1 − ǫ.
•
Allowing both feedback and variable length (or power) leads to expansions (4), (8), in which
shorter blocklengths are beneficial.
•
In both variable length coding with termination and average energy-limited coding, guaranteed distortion (ǫ = 0) can be attained, even though the channel is noisy, as long as feedback
is available.
II. F EEDBACK
CODES FOR NON - EQUIPROBABLE MESSAGES
In this section we consider joint source-channel coding assessing reliability by the probability
that the (possibly non-equiprobable) message is reproduced correctly. Our key tool will be two
extensions of the channel coding bounds for the DMC with feedback from [3]. VLF and VLFT
codes are formally defined as follows.
Definition 1. A variable-length feedback code (VLF) transmitting message W (taking values in
W) over the channel {PYi |X i Y i−1 }∞
i=1 with input/output alphabets A/B is defined by:
1) A random variable U ∈ U revealed to the encoder and decoder before the start of the
transmission.
3
The energy constraint in (7) is understood on a per-codeword basis. The energy constraint in (8) is understood as average
over the codebook.
February 2, 2015
DRAFT
6
2) A sequence of encoding functions fn : U × W × Bn−1 7→ A, defining the channel inputs
Xn = fn U, W, Y n−1
(9)
3) A sequence of decoding functions gn : U × Bn 7→ W, n = 1, 2, . . .
4) A non-negative integer-valued random variable τ , a stopping time of the filtration Fn =
σ {U, Y1, . . . , Yn }. which satisfies4
c is computed at the time instant τ
The final decision W
c = gτ (U, Y τ )
W
(10)
The value E [τ ] is the average transmission length of the given code.
A very similar concept is that of an VLFT code:
Definition 2. A variable-length feedback code with termination (VLFT) transmitting W ∈ W
over the channel {PYi |X i Y i−1 }∞
i=1 with input/output alphabets A/B is defined similarly to VLF
codes with the exception that condition 4) in the Definition 1 is replaced by
4’) A non-negative integer-valued random variable τ , a stopping time of the filtration Gn =
σ{W, U, Y1 , . . . , Yn }.
The idea of allowing τ to depend on the true message W models the practical scenarios
(see [3]) where there is a highly reliable control layer operating in parallel with the data channel,
which notifies the decoder when it is time to make a decision.
The two special cases of the above definitions are stop-feedback and fixed-to-variable codes:
1) stop-feedback codes are a special case of VLF codes where the encoder functions {fn }∞
n=1
satisfy:
fn (U, W, Y n−1 ) = fn (U, W ) .
(11)
Such codes require very limited communication over feedback: only a single signal to stop
the transmission once the decoder is ready to decode.
4
Note that ℓ need not be an integer.
February 2, 2015
DRAFT
7
2) fixed-to-variable codes, defined in [30], are also required to satisfy (11), while the stopping
time is5
τ = inf{n ≥ 1 : gn (U, Y n ) = W } ,
(12)
and therefore, such codes are zero-error VLFT codes.
h
i
c ≤
For both VLF and VLFT codes, we say that a code that satisfies E [τ ] ≤ ℓ and P W 6= W
ǫ, when averaged over U, message and channel, is an (ℓ, ǫ) code for source/channel W, {PYi |X i Y i−1 }∞
i=1 .
The random variable U serves as the common randomness shared by both transmitter and
receiver, which is used to initialize the codebook. As a consequence of Caratheodory’s theorem,
the amount of this common randomness can always be reduced to just a few bits: as shown in
[3, Theorem 19], if there exists an (ℓ, ǫ) code with |U| = ∞, then there exists an (ℓ, ǫ) code
with |U| ≤ 3. Allowing for common randomness does not affect the asymptotic expansions, but
leads to more concise expressions for non-asymptotic achievability bounds. Furthermore, if the
channel is symmetric no common randomness is needed at all [3, Theorem 3].
First, we quote an achievability result [3, (107)-(118)]:
Theorem 1 ([3]). For every DMC with capacity C, any M and probability of error ǫ there exists
an (ℓ, ǫ) stop-feedback code for the message W taking M values6 such that
Cℓ ≤ log M + log
1
+ a0 ,
ǫ
where
a0 ,
max log
x1 ,x2 ,y1 ,y2
PY |X (y1 |x1 )
,
PY |X (y2 |x2 )
(13)
and the maximum is taken over all pairs (xi , yi ) with PY |X (yi |xi ) > 0, i = 1, 2.
We note that a similar result is also contained in many other works, starting from [2] and
later in [31], [32].
Next, we tighten Theorem 1 in the case of non-equiprobable messages.
5
As explained in [30], this model encompasses fountain codes in which the decoder can get a highly reliable estimate of τ
autonomously without the need for a termination symbol.
6
Although the result in [3] is stated for average (over equiprobable messages) error probability, it in fact applies to arbitrary
distribution of messages, so W need not be equiprobable. Furthermore, if we do not insist on |U| ≤ 3, Theorem 1 even applies
to maximal probability of error.
February 2, 2015
DRAFT
8
Theorem 2. For every DMC with capacity C and random variable W there exists an (ℓ, ǫ)
stop-feedback code for W with
Cℓ ≤ H(W ) + log
1
+ a0
ǫ
(14)
where a0 is defined in (30).
Proof: If H(W ) = ∞ then there is nothing to prove, so we assume otherwise. Let PY∗ be
the capacity achieving output distribution of the DMC. We define information density as usual:
△
ıX;Y (a; b) = log
n
n
ıX n ;Y n (a ; b ) =
n
X
PY |X (b|a)
PY∗ (b)
(15)
ıX;Y (ai ; bi )
(16)
1
.
PW (m)
(17)
i=1
△
ıW (m) = log
where we used the memorylessness assumption in (16).
In the achievability scheme of [3, Theorem 3], at time n the decoder observes Y n , computes M
information densities ıX n ;Y n (Cn1 ; Y n ), . . . , ıX n ;Y n (CnM ; Y n ), where Cn1 , . . . , CnM are the codewords,
and stops once any of them exceeds a threshold. Instead of one common threshold, we assign
lower thresholds to the more likely messages.
Code construction: We define the common randomness (revealed to the encoder and decoder
before the transmission starts) to be a random variable U as follows:
PU , PX ∞ × . . . × PX ∞
{z
}
|
(18)
|W|
where X ∞ consists of i.i.d. copies drawn from (any) capacity-achieving input distribution. A
∞
realization of U defines |W| infinite dimensional vectors C∞
m ∈ A , m ∈ W. Having observed
that the value of W equals m ∈ W, the encoder transmits channel symbols Xn as follows:
△
Xn = fn (m) = Cnm
(19)
At time n, the decoder computes the values
In (m) = ıX n ;Y n (fn (m); Y n ) − ıW (m),
(20)
The decoder defines the stopping times:
τm = inf{n ≥ 0 : In (m) ≥ γ}
February 2, 2015
(21)
DRAFT
9
c is made by the decoder at the
where γ > 0 is an arbitrary constant. The final decision W
stopping time τ ⋆ :
τ ⋆ = min τm
(22)
m∈W
c = g(Y τ ⋆ ) = argmin τm .
W
(23)
m∈W
where the tie-breaking rule is immaterial.
Analysis: We claim that, averaged over U, we have:
c ] ≤ exp{−γ}
P[W 6= W
CE [τ ⋆ ] ≤ H(W ) + γ + a0 .
(24)
(25)
Indeed, let us set for convenience
△
τ = τW .
Then from the union bound we have, cf. [3, Theorem 3]:
X
c |W = m0 ] ≤
P[W 6= W
P[τm ≤ τ |W = m0 ]
(26)
(27)
m∈W\{m0 }
(28)
Due to memorylessness of the channel, ıX n ;Y n (X n ; Y n )−nC is a martingale. By Doob’s optional
stopping theorem (e.g. [33]) we have
E [ıX τ ;Y τ (X τ ; Y τ )] = CE [τ ]
(29)
CE [τ ] − H(W ) = E [Iτ (S)] ≤ γ + a0
(30)
so
where a0 is an upper bound to the maximum jump size of the process {ıX n ;Y n (X n ; Y n )}∞
n=1 .
Next, as in [3, (111)-(118)] we have by a change of measure argument for every m 6= m0 :
P[τm = n|W = m0 ] ≤ exp{−ıW (m) − γ}P[τm = n|W = m] .
(31)
P[τm ≤ τ |W = m0 ] ≤ P[τm < ∞|W = m0 ]
(32)
≤ exp{−ıW (m) − γ} ,
(33)
Consequently,
February 2, 2015
DRAFT
10
where we used that P [τ < ∞] = P [τm < ∞|W = m] = 1, which in turn follows from (30).
Summing (33) over all m 6= m0 and using (27) we get (24).
The estimate of average length (25) follows from τ ⋆ ≤ τ and (30). The result (14) follows by
taking γ = log 1ǫ .
Remark 1. A slightly less sharp bound could also be derived via a variable-length separated
scheme: compress W losslessly into a variable-length string {0, 1}∗ with average length less
than H(W ), cf. [34], then send the length via O(log H(W )) channel symbols and very small
probability of error and finally send the data bits.
Next, we extend the zero-error bound in [3, Theorem 10] to the case of non-equiprobable
messages:
Theorem 3. For every DMC with capacity C there exists a constant a1 such that for every
discrete random variable W there exists an (ℓ, 0) VLFT code with
Cℓ ≤ H(W ) + a1
(34)
Proof: Without loss of generality, we assume that H(W ) < ∞ and W takes values in
positive integers. The codebook is a countable collection of infinite strings C∞
m , m = 1, 2, . . ..
Given the codebook and the realization of W = m, the encoder sends
△
Xn = f n (m) = Cnm
The decoder outputs m error-free at the stopping time τ ⋆ given by:
⋆
τ = inf In (m) > max In (j) ,
n
j<m
(35)
(36)
where
In (m) = ıX n ;Y n (Cnm ; Y n ) − ıW (m).
(37)
According to (36), if the true message is m the transmission stops at the first instant n when
In (m) exceeds all In (j), j < m. Note that τ depends on the transmitted message m, as permitted
by the paradigm of VLFT codes.
February 2, 2015
DRAFT
11
Analysis:
P [τ ⋆ > n|W = m] = P
"
[
j<m
#
{In (j) > In (m)} |W = m
(38)
∞
Applying the random coding argument, we now assume that the codebook strings C∞
1 , C2 , . . .
are drawn i.i.d. from PX ∞ = PX × PX × . . ., where PX is the capacity-achieving channel input
¯ n an independent copy of X n and taking the expectation of the right
distribution. Denoting by X
side of (38) with respect to the codebook, we have
"
#
[ PW (j)PY |X (Y |Xj )
P
≥1
P
(m)P
(Y
|X
)
W
Y
|X
m
j<m
"
( m−1 )#
¯ n)
X
PW (j)PY n |X n (Y |X
≥ 1 | X n, Y n
≤ E min 1,
P
n |X n )
n
n
(Y
P
(m)P
W
Y |X
j=1
"
( m−1
)#
¯ n )|Y n
X PW (j) E PY n |X n (Y n |X
≤ E min 1,
PW (m)
PY n |X n (Y n |X n )
j=1
(39)
(40)
(41)
= E exp(−|ıX n ;Y n (X n ; Y n ) − ıW (m)|+ ) ,
+ where we used the union bound and min{1, a} = exp − log a1 . Applying (41) we get
⋆
E [τ ] =
∞
X
P [τ ⋆ > n]
(42)
n=0
≤
∞
X
n=0
E exp(−|ıX n ;Y n (X n ; Y n ) − ıW (W )|+ ) .
(43)
Finally, (43) implies (34) by applying the result [3, (162)-(179)]:
∞
X
γ
E exp(−|ıX n ;Y n (X n ; Y n ) − γ|+ ) ≤ + a1 .
C
n=0
III. A SYMPTOTIC
(44)
EXPANSIONS OF THE RATE - DISTORTION TRADEOFF
A. Definitions
We move from the setup of Section II where a discrete message is transmitted over the
feedback channel to a more general scenario of Section III, in which a possibly analog signal
is transmitted over a channel with feedback, under a fidelity constraint. We will consider the
following scenarios:
February 2, 2015
DRAFT
12
1) excess distortion: A VLF code transmitting memoryless source (S k , S k ) with reproduction
alphabet Sˆk and separable distortion measure d : S k × Sˆk 7→ [0, +∞] is called a (k, ℓ, d, ǫ)
excess-distortion code if
E [τ ] ≤ ℓ
(45)
P[d(S k , Sˆk ) ≤ d] ≤ ǫ
(46)
The corresponding fundamental limit is
△
ℓ⋆ (k, d, ǫ) = inf{ℓ : ∃ an (k, ℓ, d, ǫ) VLF code} .
(47)
2) average distortion: A VLF code satisfying, instead of (46), an average constraint
E [d(S k , Sˆk )] ≤ d
(48)
is called a (k, ℓ, d) average-distortion code. The corresponding fundamental limit is
△
ℓ⋆ (k, d) = inf{ℓ : ∃ an (k, ℓ, d) VLF code} .
(49)
3) guaranteed distortion: A VLFT code transmitting memoryless source (S k , S k ) with reproduction alphabet Sˆk and separable distortion metric d is called a (k, ℓ, d, 0) guaranteeddistortion code if it achieves ǫ = 0 in (46). The corresponding fundamental limit is
△
ℓ⋆t (k, d, 0) = inf{ℓ : ∃ an (k, ℓ, d, 0) VLFT code} .
(50)
We will use the following notation for the various minimizations of information measures:
RS (d) ,
RS (d, ǫ) ,
Hd,ǫ (S) ,
min
PZ|S : S7→Sˆ :
E[d(S,Z)]≤d
min
I(S; Z)
PZ|S : S7→Sˆ :
P[d(S,Z)>d]≤ǫ
I(S; Z)
min
c : S7→Sˆ :
P[d(S,c(S))>d]≤ǫ
H(c(S)).
(51)
(52)
(53)
The quantity in (53) is referred to as the (d, ǫ)-entropy of the source S [35]. The (ǫ, 0)-entropy
is also known as just ǫ-entropy [35]:
Hd,0 (S) ,
February 2, 2015
min
c : S7→Sˆ :
d(S,c(S))≤d a.s.
H(c(S)).
(54)
DRAFT
13
Provided that the infimum in (51) is achieved by some transition probability kernel PZ ⋆ |S , the
d-tilted information in s ∈ S is defined as [23]
S (s, d) , − log E [exp (−λ⋆ d(s, Z ⋆ ) + λ⋆ d)]
(55)
λ⋆ = −R′S (d).
(56)
where
In the almost-lossless compression, Z ⋆ = S and
S (s, d) , ıS (s)
, log
1
.
PS (s)
(57)
(58)
B. Regularity assumptions on the source
We assume that the source, together with its distortion measure, satisfies the following assumptions:
A1 The source {Si } is stationary and memoryless, PS k = PS × . . . × PS .
P
A2 The distortion measure is separable, d(sk , z k ) = k1 ki=1 d(si , zi ).
A3 The distortion level satisfies dmin < d < dmax , where dmin is the infimum of values at which
the minimal mutual information quantity RS (d) is finite, and dmax = inf z∈M
c E [d(S, z)],
where the expectation is with respect to the unconditional distribution of S.
A4 The rate-distotortion function is achieved by a unique PZ⋆ |S : RS (d) = I(S; Z⋆).
A5 E [d12 (S, Z⋆ )] < ∞ where the expectation is with respect to PS × PZ⋆ .
The rate-dispersion function of the source satisfying assumptions A1–A5 is given by [23]
V(d) = Var (S (S, d)) .
We showed in [5] that under assumptions A1–A5 for all 0 ≤ ǫ ≤ 1

r
RS k (d, ǫ)
kV(d) − (Q−1 (ǫ))2
2
+ O (log k) .
= (1 − ǫ)kR(d) −
e
2π
k 
Hd,ǫ (S )
February 2, 2015
(59)
(60)
DRAFT
14
C. Average distortion
Theorem 4. Under assumptions A1–A5 we have
Cℓ∗ (k, d) = kR(d) + O(log k) .
(61)
Proof:
Achievability: fix 1 < p ≤ 12, ǫ > 0, source codebook c1 , . . . , cM and consider a separated
h
i
c
c
c
scheme S − f(S) − X − Y − W − c(W ) such that P f(S) 6= W ≤ ǫ and
f(sk ) = arg
min
m∈{1,...,M }
d(sk , cm )
c(m) = cm
sk ∈ S k
(62)
m ∈ {1, . . . , M}
(63)
The average distortion is bounded by
h
n
oi
k
k
k
c
≤ E d(S , cW ) + E d(S , cW
E d S , cW
c )1 W 6= W
c
p1 1− 1
≤E
min d(S k , cm ) + E dp (S k , cW
ǫ p
c)
m∈{1,...,M }
(64)
(65)
where (65) is by H¨older’s inequality. Taking an expectation of both sides over c1 , . . . , cM
drawn i.i.d. from PZk⋆ , we conclude via a random coding argument that there exists a separate
source/channel code with average distortion bounded by
1
d ≤ d¯ + E dp (S k , Z1 ) ǫ1− p ,
where we denoted
d¯ , E
min
m∈{1,...,M }
k
d(S , Zm ) ,
(66)
(67)
and PS k Z1 ...ZM = PS k PZk⋆ . . . PZk⋆ . The work of Pilc [36] (finite alphabet) and Yang and Zhang
[37] (abstract alphabet) shows that under assumptions A1–A5,
¯ + O(log k).
log M = kR(d)
(68)
1
Letting ǫ = k p −2 , we conclude by assumption A5 that the second term in (66) is bounded by
1
, i.e. d ≤ d¯ + k1 . Finally, by Theorem 1, Cℓ ≤ log M + O (log k), and the ‘≤’ direction in (61)
k
follows by the differentiability of R(d) in the region of assumption A3.
Converse: By data-processing inequality and [2, Lemma 1-2] we have
kR(d) ≤ ℓC
(69)
for any (k, ℓ, d) VLF code.
February 2, 2015
DRAFT
15
D. Excess distortion
Theorem 5. Under assumptions A1–A5 and any ǫ > 0 we have
r
kV(d) − (Q−1 (ǫ))2
2
+ O (log k)
ℓ⋆ (k, d, ǫ)C = (1 − ǫ)kR(d) −
e
2π
(70)
Proof:
Achievability: Pair a lossy compressor S k → W with excess-distortion probability ǫ′ = ǫ − √1k
and H(W ) = Hd,ǫ′ (S k )7 with a VLF code from Theorem 2 transmitting W with probability of
√1 .
k
error
Apply (60) to (14)8 .
Converse: Apply the data-processing inequality and [2, Lemma 1-2] to get:
ℓC ≥ RS k (d, ǫ)
(71)
for every (k, ℓ, d, ǫ) VLF code.
E. Guaranteed distortion
Theorem 6. Under assumptions A1–A5, we have
ℓ⋆t (k, d, 0)C = kR(d) + O (log k)
(72)
Proof: For the achievability we note that the estimate of the Hd,ǫ (S k ) in (60) applies with
ǫ = 0 and thus
Hd,0 (S k ) = kR(d) + O(log k) .
(73)
Then, we can pair the mapping achieving Hd,0 (S k ) with the zero-error VLFT code from Theorem 3.
7
Although the optimal mapping c⋆ that achieves (d, ǫ)-entropy is not known in general, the existence of good approximations
satisfying the constraint in (53) that approach Hd,ǫ (S k ) to within log2 Hd,ǫ (S k ) bits is guaranteed by a random coding argument,
see [5].
8
Note that (60) also holds if ǫ in the left side is replaced by ǫ + O
February 2, 2015
√1
k
DRAFT
16
Conversely, repeating the argument of [3, Theorem 4], with the replacement of the right side
of [3, (67)] by RS (d, ǫ) we conclude that any (ℓ, d, ǫ) VLFT code must satisfy
RS (d, ǫ) ≤ Cℓ + log(ℓ + 1) + log e .
(74)
F. Discussion
We make several remarks regarding the rate-distortion tradeoff in all three settings considered
above:
1) The case d = dmin is special and is excluded in the assumptions of Theorems 4–6. However,
in the most important special case of a distortion measure that satisfies


dmin ,
a=b
d(a, b) =
,

> dmin, a 6= b
(75)
d = dmin corresponds to almost-lossless transmission, and both Theorems 4 and 5 apply
with R(d) and V(d) equal to the entropy and the varentropy of the source, respectively, as
long as the source is stationary and memoryless and the third moment of ıS (S) is finite.
2) For almost-lossless transmission of finite alphabet sources, the asymptotic expansion (70)
can be achieved by reliably (i.e. with probability of error ∼ k1 ) sending through the channel
the type of the source outcome first, and then reliably sending each message whose type is
one the most likely types with total mass 1 − ǫ.
3) Even if the channel is not symmetric, the asymptotic expansions in Theorems 4–6 can be
achieved without common randomness U, by using constant composition channel codebooks. For instance, consider the scheme in Theorem 2 with PX ∞ drawn from the distribution equiprobable over the capacity-achieving type. Since E [τ |X ∞ ] = E [τ ] a.s., almost
every such codebook attains the bound in (30) up to logarithmic terms, resulting in a
deterministic construction attaining (70).
4) Note that (70) is achieved by a stop-feedback code. We can further show that even without
any feedback one can still achieve the optimal first order performance
p
ℓC ≤ (1 − ǫ)kR(d) + O( k log k),
February 2, 2015
(76)
DRAFT
17
provided variable-length channel coding is allowed. Indeed, one can first use the variablelength excess-distortion compressor from [5] on S k to get a binary string of average length
√
(1−ǫ)kR(d)+O( k), see (2). Then, truncating the length at k 2 and transmitting 2 log k data
1
,
k2
we can reliably inform the encoder about the total number of data
√
bits b to be sent next. We may then use a capacity-achieving code of length Cb +O( b log b)
bits with reliability
to send the data bits with reliability
1
k
[22].
5) The naive separation achieves at most:
p
ℓC ≥ (1 − ǫ)kR(d) + a k log k, a > 0.
(77)
Indeed, according to Theorem 1, the number of messages M that can be transmitted via
VLF code and error probability η satisfies
log M ≥
ℓC
+ O (log ℓ) .
1−η
(78)
On the other hand, the number of codewords of a source code with probability of exceeding
distortion d no greater than ζ satisfies [23]
log M ≤ kR(d) +
Optimizing over η + ζ ≤ ǫ yields (77).
p
kV(d)Q−1 (ζ) + O (1) .
(79)
6) The Schalkwijk-Bluestein [38] (see also [39]) elegant linear feedback scheme for the transmission of a single Gaussian sample S ∼ N (0, σ 2) over the AWGN channel achieves the
mean-square error
σ2
,
(1+P )n
after n channel uses, where P is the average transmit SNR. In
other words, the minimum delay in transmitting a Gaussian sample over a Gaussian channel
with feedback is given by
ℓ⋆ (1, d) =
as long as
R(d)
C
R(d)
,
C
(80)
is integer.9 Note that (80) is achieved with fixed, not variable length,
and average, not maximal, power constraint. If there are k Gaussian samples to transmit,
repeating the scheme for each of the samples achieves
ℓ⋆ (k, d) = k
9
If
R(d)
C
R(d)
,
C
(81)
= 1, no feedback is needed.
February 2, 2015
DRAFT
18
which implies, in particular, that in general our estimate of O (log k) in (61) is too conservative. Beyond Gaussian sources and channels, a sufficient condition for a fixed-length
JSCC feedback scheme to achieve (81) is provided in [16].
IV. E NERGY- LIMITED
FEEDBACK CODES FOR NON - EQUIPROBABLE MESSAGES
In this section, we study the transmission of a message over an AWGN channel under an
energy constraint. We would like to know how much information can be pushed through the
channel, if a total of E units of energy is available to accomplish the task. Formally, the codes
studied in this section are defined as follows.
Definition 3. An energy-limited code without feedback for the transmission of a random variable
W taking values in W over an AWGN channel is defined by:
1) A sequence of encoders fn : W 7→ A, defining the channel inputs
Xn = fn (W )
satisfying
P
"∞
X
j=1
2) A decoder g : B
∞
7→ W.
(82)
#
Xj2 ≤ E = 1
(83)
Definition 4. An energy-limited feedback code for the transmission of a random variable W
taking values in W over an AWGN channel is defined by:
1) A sequence of encoders fn : W × Bn−1 7→ A, defining the channel inputs
Xn = fn W, Y n−1
satisfying
∞
X
j=1
2) A decoder g : B∞ 7→ W.
E Xj2 ≤ E
(84)
(85)
An (E, ǫ) code for the transmission of random variable W over the Gaussian channel is a
h
i
c ≤ ǫ.
code with energy bounded by E and P W 6= W
February 2, 2015
DRAFT
19
Definitions 3–4 do not impose any restrictions on the number of degrees of freedom n, restricting instead the total available energy. The problem of transmitting a message with minimum
energy was posed by Shannon [40], who showed that the minimum energy per information
bit compatible with vanishing block error probability converges to N0 loge 2 as the number of
information bits goes to infinity, where
N0
2
is the noise power per degree of freedom. Recently,
Polyanskiy et al. [29, Theorem 7] showed a dynamic programming algorithm for the error-free
transmission of a single bit over an AWGN channel with feedback that attains exactly Shannon’s
optimal energy per bit tradeoff
E = N0 loge 2.
(86)
The next non-asymptotic achievability result leverages that algorithm to transmit error-free a
binary representation of a random variable over the AWGN channel by means of a variablelength separate compression/transmission scheme.
Theorem 7. There exists a zero-error feedback code for the transmission of a random variable
W over the AWGN channel with energy
E < N0 loge 2 (H(W ) + 1)
(87)
Proof: The encoder converts the source into a variable-length string using the Huffman
code, so that the codebook is prefix-free and the expectation of the encoded length ℓ(W ) is
bounded as
E [ℓ(W )] < H(W ) + 1 .
(88)
Next each bit (out of ℓ(W )) is transmitted at the optimal energy per bit tradeoff N0 loge 2 using
the zero-error feedback scheme in [29, Theorem 7]. Transmissions corresponding to different bits
are interleaved diagonally (see Fig. 1): the first bit is transmitted in time slots 1, 2, 4, 7, 11, . . ., the
second one in 3, 5, 8, 12, . . ., and so on. The channel encoder is silent at those indices allocated
to source bits ℓ(W ) + 1, ℓ(W ) + 2, . . . For example, if the codeword has length 2 nothing is
transmitted in time slots 6, 9, 13, . . .. The receiver decodes the first transmitted bit focusing on
the time slots 1, 2, 4, 7, 11, . . . It proceeds successively with the second bit, etc., until it forms
a codeword of the Huffman code, at which point it halts. Thus, it does not need to examine
February 2, 2015
DRAFT
20
Bit number
Sequence of time slots
1
1
2
4
7
2
3
5
8
···
3
..
.
6
9 ···
···
ℓ(W )
Fig. 1.
Illustration of the diagonal numbering of channel uses in Theorem 7.
the outputs of the time slots corresponding to information bits that were not transmitted, and in
which the encoder was silent.
Since the scheme spends N0 loge 2 energy per bit, The total energy to transmit the codeword
representing W is
ℓ(W )N0 loge 2.
(89)
Taking the expectation of (89) over W and applying (88), (87) follows.
For concreteness, in the proof of Theorem 7 we have referred to the degrees of freedom as
time slots. In practice, decoding in finite time is feasible if the channel is not bandlimited.
In the converse direction, we have
H(W ) ≤
E
log e.
N0
(90)
Indeed, due to the zero-error requirement and data processing, H(W ) = I(W ; g(Y ∞ )) ≤
I(X ∞ ; Y ∞ ), which in turn is bounded by the right side of (90) [40].
Our next achievability result studies the performance of a variable-length separated scheme.
Theorem 8. Fix positive E1 and E2 such that
E1 + E2 ≤ E.
Denote
ε(E, m) , 1 − √
February 2, 2015
1
πN0
Z
∞
−∞
(91)
√ m−1
2
+ E 
−x
1 − Q  x q
e N0 dx.


(92)
N0
2
DRAFT
21
Assume that W takes values in {1, 2, . . . , M}. There exists an (E, ǫ) non-feedback code for the
transmission of random variable W over an AWGN channel without feedback such that
ǫ ≤ E [ε (E1 , W )] + ε(E2 , ⌊log2 M⌋ + 1)
(93)
where ℓ(W ) denotes the length of W .
Proof: Assume that the outcomes of W are ordered in decreasing probabilities. Consider
the following variable-length separated achievability scheme: the source outcome m is first
losslessly represented as a binary string of length ⌊log2 m⌋ by assigning it to m-th binary string
in {∅, 0, 1, 00, 01, . . .} (the most likely outcome is represented by the empty string). Then, all
binary strings are grouped according to their encoded lengths. A channel codebook is generated
for each group of sequences. The encoded length is sent over the channel with high reliability, so
the decoder almost never makes an error in determining that length. Then the encoder makes an
ML decision only between sequences of that length. A formal description and an error analysis
follow.
Codebook: the collection of M + ⌊log M⌋ + 1 codewords
p
cj = E1 ej , j = 1, 2, . . . , M
p
cj = E2 ej , j = M + 1, M + ⌊log2 M⌋ + 1
(94)
(95)
where {ej , j = 1, 2, . . .} is an orthonormal basis of L2 (R∞ ).
Encoder: The encoder sends the pair (m, ⌊log2 m⌋) by transmitting cm + cm+⌊log2 m⌋+1 .
Decoder: Having received the infinite string corrupted by i.i.d. Gaussian noise z, the decoder
first (reliably) decides between ⌊log2 M⌋ + 1 possible values of ⌊log2 m⌋ based on the minimum
distance:
ℓˆ , argmin kz − cj k, j = M + 1, . . . , M + ⌊log2 M⌋ + 1
(96)
As shown in [41, p. 258]), [42], [29, Theorem 3], the probability of error of such a decision
is given by ε(E, ⌊log2 M⌋ + 1). This accounts for the second term in (93). The decoder then
ˆ
ˆ
decides between 2ℓ messages10 j with ⌊log j⌋ = ℓ:
ˆ
ˆ
cˆ , argmin kz − cj k, j = 2ℓ , . . . , min{2ℓ+1 − 1, M}
10
(97)
ˆ
More precisely, 2ℓ messages if ℓˆ ≤ ⌊log 2 M ⌋ − 1 and M − 2⌊log2 M ⌋ + 1 ≤ 2⌊log2 M ⌋ messages if ℓˆ = ⌊log 2 M ⌋.
February 2, 2015
DRAFT
22
The probability of error of this decision rule is similarly upper bounded by ε (E, m), provided
that the value of ⌊log2 m⌋ was decoded correctly: ℓˆ = ⌊log2 m⌋. Since 2⌊log2 m⌋ ≤ m, this
accounts for the first term in (93).
Normally, one would choose 1 ≪ E2 ≪ E1 so that the second term in (93), which corresponds
to the probability of decoding the length incorrectly, is negligible compared to the first term,
and the total energy E ≈ E1 . Moreover, if W takes values in a countably infinite alphabet,
one can truncate it so that the tail is negligible with respect to the first term in (93). To ease
the evaluation of the first term in (93), one might use i ≤
1
.
PW (i)
In the equiprobable case, this
weakening leads to E [ε (E1 , W )] ≤ ε (E1 , M).
If the power constraint is average rather than maximal, a straightforward extension of Theorem
8 ensures the existence of an (E, ǫ) code (average power constraint) for the AWGN channel with
ǫ ≤ E [ε (E1 (⌊log2 W ⌋), W )] + ε(E2 , ⌊log M⌋ + 1),
(98)
where E1 : {0, 1, . . . , ⌊log2 M⌋} 7→ R+ and E2 ∈ R+ are such that
E [E1 (⌊log2 W ⌋)] + E2 ≤ E.
V. A SYMPTOTIC
(99)
EXPANSIONS OF THE ENERGY- DISTORTION TRADEOFF
A. Problem setup
This section focuses on the energy-distortion tradeoff in the JSCC problem. Like in Section
IV, we limit the total available transmitter energy E without any restriction on the (average)
number of channel uses per source sample. Unlike Section IV, we allow general (not neccesarily
discrete) sources, and we study the tradeoff between the source dimension k, the total energy E
and the fidelity of reproduction. Thus, we identify the minimum energy compatible with target
distortion without any restriction on the time-bandwidth product (number of degrees of freedom).
As in Section III-A, we consider both the average and the excess distortion criteria.
Formally, we let the source be a k-dimensional vector S k in the alphabet S k . A (k, E, d, ǫ)
energy-limited code is an energy-limited code for (S k , S k ) with total energy E and probability
≤ ǫ of exceeding d (see (46)). Similarly, a (k, E, d) energy-limited code is an energy-limited
code for (S k , S k ) with total energy E and average distortion not exceeding d (see (48)). The
February 2, 2015
DRAFT
23
goal of Section V is to characterize the minimum energy required to transmit k source samples
at a given fidelity, i.e. to characterize the following fundamental limits:
Ef⋆ (k, d) , {inf E : ∃ a (k, E, d) feedback code} ,
(100)
Ef⋆ (k, d, ǫ) , {inf E : ∃ a (k, E, d, ǫ) feedback code}
(101)
as well as the corresponding limits E ⋆ (k, d) and E ⋆ (k, d, ǫ) of the energy-limited non-feedback
codes.
B. Literature review
If the source produces equiprobable binary strings of length k, Shannon [40] showed that the
minimum energy per information bit compatible with vanishing block error probability converges
to
E ⋆ (k, 0, ǫ)
→ loge 2 = −1.59 dB
kN0
(102)
as k → ∞, ǫ → 0. The fundamental limit in (102) holds regardless of whether feedback is
available. Moreover, this fundamental limit is known to be the same regardless of whether the
channel is subject to fading or whether the receiver is coherent or not [43]. Polyanskiy et al.
refined (102) as [29, Theorem 3]
E ⋆ (k, 0, ǫ)
p
1
log e
= k + 2k log e Q−1 (ǫ) − log k + O (1)
N0
2
(103)
log e
= (1 − ǫ)k + O (1)
N0
(104)
for transmission without feedback, and as [29, Theorem 8]
Ef⋆ (k, 0, ǫ)
for transmission with feedback. Moreover, [29, Theorem 7] (see also (86)) shows that in fact
E ⋆ (k, 0, 0) log e = kN0 ,
(105)
i.e. in the presence of full noiseless feedback, Shannon’s limit (102) can be achieved with equality
already at k = 1 and ǫ = 0.
For the finite blocklength behavior of energy per bit in fading channels, see [44].
For the transmission of a memoryless source over the AWGN channel under an average
distortion criterion, Jain et al. [28, Theorem 1] pointed out that as k → ∞,
E ⋆ (k, d) log e
→ R(d).
k
N0
February 2, 2015
(106)
DRAFT
24
Note that (106) still holds even if noiseless feedback is available.
Unlike Polyanskiy et al. [29], we allow analog sources and arbitrary distortion criteria, and
unlike Jain et al. [28], we are interested in a nonasymptotic analysis of the minimum energy per
sample.
C. Energy-limited feedback codes
Our first result in this section is a refinement of (106).
Theorem 9. Let the source and its distortion measure satisfy assumptions A1–A5. The minimum
energy required to transmit k source symbols with average distortion ≤ d over an AWGN channel
with feedback satisfies
Ef⋆ (k, d) ·
log e
= kR(d) + O(log k)
N0
(107)
Proof: Achievability. The expansion in (107) is achieved by the following separated source/channel
scheme. For the source code, we use the code of Yang and Zhang [37] (abstract alphabet) that
compresses the source down to M representation points with average distortion d such that
log M = kR(d) + O(log k).
(108)
For the channel code, we transmit the binary representation of M error-free using the optimal
scheme of Polyanskiy et al. [29, Theorem 7], so that
log M =
E
log e.
N0
(109)
Converse. By data processing, similar to (90),
kR(d) ≤
E
log e.
N0
(110)
For the transmission of a Gaussian sample over the feedback AWGN channel, the SchalkwijkBluestein scheme [38], [39] attains exactly the fundamental limit
Ef⋆ (k, d) ·
log e
= kR(d)
N0
(111)
for k = 1. For k Gaussian samples, transmitting the Schalkwijk-Bluestein codewords corresponding to i-th source sample in time slots i, k + i, 2k + i, . . . attains (111) exactly for all
k = 1, 2, . . ..
February 2, 2015
DRAFT
25
Theorem 10. In the transmission of a source satisfying the assumptions A1–A5 over an AWGN
channel with feedback, the minimum average energy required for the transmission of k source
samples under the requirement that the probability of exceeding distortion d is no greater than
0 ≤ ǫ < 1 satisfies, as k → ∞,
log e
Ef⋆ (k, d, ǫ)
= (1 − ǫ)kR(d) −
N0
r
kV(d) − (Q−1 (ǫ))2
2
+ O (log k)
e
2π
(112)
Proof: Achievability. Pair a lossy compressor S k → W with excess-distortion probability ǫ
and H(W ) = Hd,ǫ (S k ) with the achievability scheme in Theorem 7 and apply (87) and (60).
Converse. Again, the converse result follows proceeding as in (90), invoking (60).
Comparing (112) and (70), we observe that, similar to the setup in Section III, allowing
feedback and average power constraint reduces the asymptotically achievable minimum energy
per sample by a factor of 1 − ǫ. As in Section III, that limit is approached from below rather
than from above, i.e. finite blocklength helps.
D. Energy-limited non-feedback codes
Our next result generalizes [29, Theorem 3]. Loosely speaking, it shows that the energy E,
probability of error ǫ and distortion d of the best non-feedback code satisfy
r
E
2E
log e − kR(d) ≈ kV(d) +
log e · Q−1 (ǫ) .
N0
N0
I.e. source and channel dispersions add up, as in the usual (non-feedback) joint source-channel
coding problem [6], [8]. More precisely, we have the following:
Theorem 11 (Asymptotics, no feedback). In the transmission of a stationary memoryless source
(satisfying the assumptions A1–A5) over the AWGN channel, the minimum energy necessary for
achieving probability 0 < ǫ < 1 of exceeding distortion d satisfies, as k → ∞,
E ⋆ (k, d, ǫ)
p
log e
= kR(d) + k (2R(d) log e + V(d))Q−1 (ǫ) + O (log k)
N0
(113)
Proof:
February 2, 2015
DRAFT
26
Achievability: We let total energy E be such that
p
log e
a
−1
E
+ b log k,
= kR(d) + k (2R(d) log e + V(d))Q
ǫ− √
N0
k
(114)
and we show that a > 0 and b can be chosen so that the excess distortion probability is bounded
by ǫ.
We consider a good lossy code with M = exp(2kR(d)) representation points, so that the
probability that the source is not represented within distortion d is exponentially small. We show
that a combination of that code with the variable-length separated scheme in Theorem 8 achieves
(114). First, we prove the following generalization of Theorem 8 to the lossy case: for any M,
there exists an (k, E, d, ǫ′ ) code for the AWGN channel (without feedback) such that
h
M i
1
k
k
′
(B
(S
)))
,
+
ε(E
,
⌊log
M⌋
+
1)
+
E
1
−
P
ǫ ≤ E ε E1 , k
d
2
Z⋆
PZ⋆ (Bd (S k ))
(115)
where
Bd (sk ) , {z k ∈ Sˆk : d(sk , z k ) ≤ d}.
(116)
Towards that end, let the representation points Z1 , Z2 , . . . , ZM be drawn i.i.d. from PZk⋆ . The
source encoder goes down the list of the representation points and outputs the index of the first
d-close match to S k :
W , min{i : d(S k , Zi ) ≤ d}
(117)
(if there is no such index, it outputs 1). Averaged over Z1 , . . . , ZM , the probability that no dclose match is found is upper bounded by the third term in (115) (e.g. [23, Theorem 10]). The
index W is then transmitted over the channel using the scheme in Theorem 8, with the total
probability of error averaged over all lossy codebooks given by
h
M i
ǫ′ ≤ E [ε (E1 , W )] + ε(E2 , ⌊log M⌋ + 1) + E 1 − PZk⋆ (Bd (S k )))
(118)
Since conditioned on S = s, W is geometrically distributed with success probability PZk⋆ (Bd (sk )),
we have
E W |S k = sk =
1
PZk⋆ (Bd (sk ))
(119)
Noting that ε(E, m) is a concave function of m, we have by Jensen’s inequality
E [ε (E, W )] ≤ E ε E, E W |S k } ,
(120)
which gives the first term in (93).
February 2, 2015
DRAFT
27
We proceed to show that with the choice of
E1 = E − c log E,
(121)
for an appropriate c > 0, and M = exp(2kR(d)), the right side of (115) is upper bounded by ǫ.
A reasoning similar to [23, (108)–(111)] and the Cram´er-Chernoff bound yield
h
M i
E 1 − PZ k⋆ (Bd (S k ))
≤ exp(−ka1 )
for some a1 > 0. On the other hand, (103) [29, Theorem 3] implies
1
.
ε (E, m) = P [log m > G(E)] + O √
k
D
2
E
1
E 2E
where G(E) = N N0 log e − 2 log N0 , N0 log e
(122)
(123)
Applying (122) and (123) to (93), we conclude that the excess-distortion probability is bounded
above by
P log
1
1
> G(E − c log E) + P [log (log M + 1) > G (c log E)] + O √
.
PZ k⋆ (Bd (S k ))
E
(124)
The second term in (124) can be made to decay as fast as O √1E for large enough c. To
evaluate the first term in (124), we recall [23, Lemma 2], which states that for k large enough,
"
#
k
X
1
K
P log
(125)
≤
S (Si , d) + C log k + c0 ≥ 1 − √
k
PZ k⋆ (Bd (S ))
k
i=1
where c0 and K are constants, and C is a constant explicitly identified in [23, Lemma 2]. Letting
b = c+C−
1
2
and applying (125) to upper-bound the first term in (124), we conclude by the
Berry-Ess´een theorem that
a
1
1
> G (E − c log E) ≤ ǫ − √ + O √
P log
,
PZ k⋆ (Bd (S k ))
k
k
(126)
Since a can be chosen so that (124) is upper bounded by ǫ for k large enough, this concludes
the proof of the achievability part.
Converse: The result in [6, Theorem 1] states that:
k
inf2 P S k (S , d) − ıX;Y¯ (x; Y) ≥ γ | S − exp (−γ)
ǫ ≥ sup sup E
γ>0
February 2, 2015
PY
¯
x : kxk ≤E
(127)
DRAFT
28
where S k (sk , d) is the d-tilted information in sk ∈ S k defined in (55), ıX;Y¯ (x; y) , log
dPY|X=x
(y),
dPY
¯
and PY|X=x and PY¯ are specialized to
N0
N 0,
PY¯ =
(128)
2
k=1
∞
Y
N0
D
(129)
N xk ,
PY|X=x =
2
k=1
kxk2
2kxk2
D
ıX;Y¯ (x; Y) = N
log e,
log e
(130)
N0
N0
P
Next, we let in (127) γ = 21 log NE0 . Since S k (S k , d) = ki=1 S (Si , d) is a sum of independent
D
∞
Y
random variables, the Berry-Ess´een bound applies to the probability in (127), and the converse
direction of (114) follows since kxk2 ≤ E.
If the maximal power constraint in (83) is relaxed to (85), then Ea⋆ (k, d, ǫ), the minimum
average power required for transmitting k source samples over an AWGN channel with the
probability of exceeding distortion d smaller than or equal to 0 < ǫ < 1 satisfies, under
assumptions A1–A5:
Ea⋆ (k, d, ǫ)
p
log e
k log k ,
= R(d)(1 − ǫ)k + O
N0
(131)
i.e. the asymptotically achievable minimum energy per sample is reduced a factor of 1 − ǫ if a
maximal power constraint is relaxed to an average one. This parallels the result in (76), which
shows that variable-length coding over a channel reduces the asymptotic fundamental limit by
a factor of 1 − ǫ compared to fixed-length coding, even without feedback.
Proof of (131): Observe that Theorem 10 ensures that a smaller average energy than that
in (131) is not attainable even with full noiseless feedback. In the achievability direction, let
(f ⋆ , g⋆ ) be the optimal variable-length source code achieving the probability of exceeding d equal
to ǫ′ (see [5, Section III.B]). Denote by ℓ(f ⋆ (s)) the length of f ⋆ (s). Let M be the size of that
code. Set the energy to transmit the codeword of length ℓ(f ⋆ (S k )) to
p
ℓ(f ⋆ (S k ))N0 loge 2 + k log k
(132)
As shown in [5], E [ℓ(f ⋆ (S))] is equal to the right side of (70) (with ǫ replaced by ǫ′ ). Choosing
ǫ′ = ǫ −
√a
k
for some a, we conclude that indeed the average energy satisfies (131). Moreover,
(123) implies that the expression inside the expectation in (98) is O √1k . It follows that for a
large enough a, the excess distortion probability is bounded by ǫ, and the proof is complete.
February 2, 2015
DRAFT
29
VI. C ONCLUSION
We have considered several scenarios for joint source-channel coding with and without feedback. Our main conclusions are:
1) The average delay vs. distortion tradeoff with feedback, as well as the average energy vs.
distortion tradeoff with feedback, is governed by channel capacity, and the source ratedistortion and rate-dispersion functions. In particular, the channel dispersion plays no role.
2) In variable-length coding with feedback, the asymptotically achievable minimum average
length is reduced by a factor of 1 − ǫ, where ǫ is the excess distortion probability. This
asymptotic fundamental limit is approached from below, i.e., counter-intuitively, smaller
source blocklengths may lead to smaller attainable average delays.
3) Introducing a termination symbol that is always decoded error-free allows for transmission
over noisy channels with guaranteed distortion.
4) Variable-length transmission without feedback still improves the asymptotic fundamental
limit by a factor of 1 − ǫ, where ǫ is the excess distortion probability.
5) In all the cases we have analyzed the approach to the fundamental limits is very fast:
O logk k , where k is the source blocklength. This behavior is attained, under average
distortion, by a separated scheme with stop-feedback.
6) The setting of a wideband Gaussian channel with an energy constraint exhibits many
interesting parallels with the variable-length coding setting. Allowing a non-vanishing excess
distortion probability ǫ and an average (rather than maximal) energy constraint reduces the
asymptotically achievable minimum energy by a factor of 1−ǫ. In the presence of feedback,
just as in the variable-length coding, this asymptotic fundamental limit is approached from
below.
7) Error-free transmission of a random variable W over the AWGN channel with feedback,
with almost optimal energy consumption, is possible via a variable-length separated scheme.
8) More generally, variable-length separated schemes perform remarkably well in all considered scenarios.
R EFERENCES
[1] C. E. Shannon, “The zero error capacity of a noisy channel,” IRE Transactions on Information Theory, vol. 2, no. 3, pp.
8–19, Sep. 1956.
February 2, 2015
DRAFT
30
[2] M. V. Burnashev, “Data transmission over a discrete channel with feedback: Random transmission time,” Problems of
Information Transmission, vol. 12, no. 4, pp. 10–30, 1976.
[3] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Feedback in the non-asymptotic regime,” IEEE Transactions on Information
Theory, vol. 57, no. 8, pp. 4903–4925, 2011.
[4] S. L. Fong and V. Y. F. Tan, “Asymptotic expansions for Gaussian channels with feedback under a peak power constraint
asymptotic expansions for gaussian channels with feedback under a peak power constraint,” ArXiv preprint arXiv:1410.2390,
Oct. 2014.
[5] V. Kostina, Y. Polyanskiy, and S. Verd´u, “Variable-length compression allowing errors (extended),” ArXiv preprint,
arXiv:1402.0608, Jan. 2014.
[6] V. Kostina and S. Verd´u, “Lossy joint source-channel coding in the finite blocklength regime,” IEEE Transactions on
Information Theory, vol. 59, no. 5, pp. 2545–2575, May 2013.
[7] A. Tauste Campo, G. Vazquez-Vilar, A. Guill´en i F`abregas, and A. Martinez, “Random-coding joint source-channel bounds,”
in Proceedings 2011 IEEE International Symposium on Information Theory, Saint-Petersburg, Russia, Aug. 2011, pp. 899–
902.
[8] D. Wang, A. Ingber, and Y. Kochman, “The dispersion of joint source-channel coding,” in Proceedings 49th Annual Allerton
Conference on Communication, Control and Computing, Monticello, IL, Sep. 2011.
[9] T. Javidi and A. Goldsmith, “Dynamic joint source-channel coding with feedback,” in IEEE International Symposium on
Information Theory, Istanbul, Turkey, July 2013, pp. 16–20.
[10] G. Caire, S. Shamai, and S. Verd´u, “Almost-noiseless joint source-channel coding-decoding of sources with memory,” in
Proc. Fifth International ITG Conference on Source and Channel Coding (SCC), Jan. 2004, pp. 295–304.
[11] I. A. Ovseevich, “Capacity of a randomized channel with feedback and matching of the source,” Problemy Peredachi
Informatsii, vol. 4, no. 1, pp. 52–59, 1968.
[12] R. Fang, “Optimum linear feedback scheme for matching colored Gaussian sources with white Gaussian channels,” IEEE
Transactions on Information Theory, vol. 16, no. 6, pp. 789–791, 1970.
[13] V. Chande, H. Jafarkhani, and N. Farvardin, “Joint source-channel coding of images for channels with feedback,” in
Proceedings 1998 IEEE Information Theory Workshop, Feb. 1998.
[14] V. Kafedziski, “Joint source channel coding of images over frequency selective fading channels with feedback using dct and
multicarrier block pulse amplitude modulation,” in Proceedings Asilomar Conf. Signals, Systems and Computers, vol. 1,
1998, pp. 37–41.
[15] J. Lu, A. Nosratinia, and B. Aazhang, “Progressive joint source-channel coding in feedback channels,” in Proceedings
1999 Data Compression Conference, 1999, pp. 140–148.
[16] M. Gastpar and B. Rimoldi, “Source-channel communication with feedback,” in Proceedings 2003 IEEE Information
Theory Workshop, Paris, France, 2003, pp. 279–282.
[17] R. Manakkal and B. Rimoldi, “A source-channel coding scheme for discrete memoryless channels with feedback,” in
Proceedings 2005 IEEE International Symposium on Information Theory, Adelaide, Australia,, 2005, pp. 1511–1515.
[18] M. Horstein, “Sequential transmission using noiseless feedback,” IEEE Transactions on Information Theory, vol. 9, no. 3,
pp. 136–143, 1963.
[19] J. M. Ooi and G. W. Wornell, “Fast iterative coding techniques for feedback channels,” IEEE Transactions on Information
Theory, vol. 44, no. 7, pp. 2960–2976, 1998.
[20] R. Ahlswede, “A constructive proof of the coding theorem for discrete memoryless channels with feedback,” in Transactions
February 2, 2015
DRAFT
31
of the Sixth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Prague, Sep.
1971.
[21] O. Shayevitz and M. Feder, “Optimal feedback communication via posterior matching,” IEEE Transactions on Information
Theory, vol. 57, no. 3, pp. 1186–1222, 2011.
[22] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Channel coding rate in finite blocklength regime,” IEEE Transactions on
Information Theory, vol. 56, no. 5, pp. 2307–2359, May 2010.
[23] V. Kostina and S. Verd´u, “Fixed-length lossy compression in the finite blocklength regime,” IEEE Transactions on
Information Theory, vol. 58, no. 6, pp. 3309–3338, June 2012.
[24] N. Farvardin and V. Vaishampayan, “Optimal quantizer design for noisy channels: An approach to combined source-channel
coding,” IEEE Trans. on Information Theory, vol. 33, no. 6, pp. 827–838, 1987.
[25] A. Amanullah and M. Salehi, “Joint source-channel coding in the presence of feedback,” in Proc. 27th Asilomar Conf.
Sig. Syst. Comp., 1993, pp. 930–934.
[26] S. Gorantla and T. Coleman, “Information-theoretic viewpoints on optimal causal coding-decoding problems,” arXiv
preprint arXiv:1102.0250, 2011.
[27] B. Hochwald and K. Zeger, “Tradeoff between source and channel coding,” IEEE Transactions on Information Theory,
vol. 43, no. 5, pp. 1412–1424, 1997.
[28] A. Jain, D. Gunduz, S. R. Kulkarni, H. V. Poor, and S. Verd´u, “Energy-distortion tradeoffs in Gaussian joint source-channel
coding problems,” IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 3153–3168, 2012.
[29] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Minimum energy to send k bits through the Gaussian channel with and without
feedback,” IEEE Transactions on Information Theory, vol. 57, no. 8, pp. 4880–4902, 2011.
[30] S. Verd´u and S. Shamai, “Variable-rate channel capacity,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp.
2651–2667, 2010.
[31] A. Tchamkerten and E. Telatar, “Variable length coding over an unknown channel,” vol. 52, no. 5, pp. 2126–2145, May
2006.
[32] M. Naghshvar and T. Javidi, “Active sequential hypothesis testing,” The Annals of Statistics, vol. 41, no. 6, pp. 2703–2738,
2013.
[33] E. C
¸ inlar, Probability and Stochastics.
Springer, 2011.
[34] A. Wyner, “An upper bound on the entropy series,” Information and Control, vol. 20, no. 2, pp. 176–181, 1972.
[35] E. C. Posner, E. R. Rodemich, and H. Rumsey, “Epsilon-entropy of stochastic processes,” The Annals of Mathematical
Statistics, vol. 38, no. 4, pp. 1000–1020, 1967.
[36] R. Pilc, “The transmission distortion of a discrete source as a function of the encoding block length,” Bell Syst. Tech. J.,
vol. 47, no. 6, pp. 827–885, July/August 1968.
[37] E. Yang and Z. Zhang, “On the redundancy of lossy source coding with abstract alphabets,” IEEE Transactions on
Information Theory, vol. 45, no. 4, pp. 1092–1110, May 1999.
[38] J. Schalkwijk and L. Bluestein, “Transmission of analog waveforms through channels with feedback,” IEEE Transactions
on Information Theory, vol. 13, no. 4, pp. 617–619, 1967.
[39] T. J. Cruise, “Achievement of rate-distortion bound over additive white noise channel utilizing a noiseless feedback channel,”
Proceedings of the IEEE, vol. 55, no. 4, pp. 583–584, Apr. 1967.
[40] C. E. Shannon, “Communication in the presence of noise,” Proceedings of the IRE, vol. 37, no. 1, pp. 10–21, 1949.
[41] J. M. Wozencraft and I. M. Jacobs, Principles of communication engineering.
February 2, 2015
John Wiley & Sons, 1965.
DRAFT
32
[42] Y. P. Pyatoshin, “Some properties of m-ary communication systems with coding,” Problemy Peredachi Informatsii, vol. 4,
no. 1, pp. 45–51, 1968.
[43] S. Verd´u, “Spectral efficiency in the wideband regime,” (Invited Paper) IEEE Trans. Information Theory, Special Issue on
Shannon Theory: Perspective, Trends and Applications, vol. 48, no. 6, pp. 1319–1343, June 2002.
[44] W. Yang, G. Durisi, and Y. Polyanskiy, “Minimum energy to send k bits over Rayleigh-fading channels,” in IEEE
International Symposium on Information Theory, submitted, 2015.
February 2, 2015
DRAFT