# Slepian-Wolf coding

Jack K. Wolf and Brian M. Kurkoski (2008), Scholarpedia, 3(11):6789. | doi:10.4249/scholarpedia.6789 | revision #136682 [link to/cite this article] |

The **Slepian-Wolf theorem** deals with the lossless compression of two or more correlated data streams (Slepian and Wolf, 1973). In the best-known variation, each of the correlated streams is encoded separately and the compressed data from all these encoders are jointly decoded by a single decoder as shown in Figure 1 for two correlated streams. Such systems are said to employ **Slepian-Wolf coding**, which is a form of **distributed source coding**. Lossless compression means that the source outputs can be constructed from the compression version with arbitrary small error probability by suitable choice of a parameter in the compression scheme.

An important aspect of the Slepian-Wolf theorem is that, compared to an encoder that assumes the data streams are independent, the separate encoders can achieve better compression rates by exploiting the fact that the data streams are correlated. The surprising result is that Slepian-Wolf coding can in fact achieve the same compression rate as the optimal single encoder that has all correlated data streams as inputs.

The Slepian-Wolf theorem has practical consequences for systems where the correlated data streams are physically separated or where the encoder has limited computational ability. It can be applied to sensor networks such as those for monitoring temperature or seismic activity where wireless transmitters, distributed over some environment, collect data and transmit it to a central location (Chong and Kumar, 2003). Two transmitters that are near each other sense similar values and thus produce correlated outputs. Because transmitter resources such as battery power are limited, transmitting at higher compression rates improves the system’s performance. The Slepian-Wolf theorem has practical application even when the encoder has access to the multiple correlated data streams. For example, to reduce complexity for image and video compression for mobile telephones, the streams may be encoded separately without reducing the compression rate (Girod et al. 2005).

## Contents |

## Example

Suppose the weather in two neighboring towns, Janestown and Thomasville, is correlated, and that a resident of one town, Jane, wants to send the weather history for a year to a resident of the other town, Thomas. Suppose further that each day the weather in each town is equally likely to be either “good” or “bad” independently of the past weather, but that the weather in the two towns each day is different with probability \(p\ .\)

Assume that Jane records the daily weather in Janestown for a year. Similarly, Thomas does the same for the weather in Thomasville. At the end of the year, Jane wants to send a message to Thomas that will allow Thomas to determine the weather for the year in Janestown. Jane is frugal and wants to send the shortest message possible. It is clear that all that Jane needs to transmit is the difference between the two daily weather sequences, that is, a “1” if the weather is different, and a “0” if the weather is the same. This difference sequence has 365 digits, of which 365\(p\) are “1”s and 365\((1-p)\) are “0”, on average. If Jane knew the weather in both towns, she could use a fundamental result from source coding theory that states that the difference sequence can be losslessly compressed to a transmitted binary message of length approximately \[ (365)h_2(p) \] where \(h_2(p) = -[p \log_2(p) + (1-p)\log_2(1-p)]\) is the entropy of the difference sequence. What is surprising is that Jane can use this same length message even if she doesn't know the weather in Thomasville and hence doesn't know the difference sequence. This follows from the Slepian-Wolf theorem.

## Slepian-Wolf Theorem

The efficiency of the system is measured by the rates in encoded bits per source symbol of the compressed data streams that are output by the encoders. The Slepian-Wolf Theorem specifies the set of rates that allow the decoder to reconstruct these correlated data streams with arbitrarily small error probability.

Although the theorem holds for much more general classes of inputs, the special case stated below is for two correlated data streams, \(\underline X_1\ ,\) and \(\underline X_2\ ,\) which are formed from making \(n\) independent drawings from a joint probability distribution \(P(X_1=x_1,X_2=x_2)\ .\)

As shown in Figure 1, encoder 1, observes \(\underline X_1\) and then sends a message to the decoder which is a number from the set \(\{1, 2,\cdots, 2^{nR_1} \}\ .\) Similarly, encoder 2, observes \(\underline X_2\) and then sends a message to the decoder which is a number from the set \(\{1, 2,\cdots, 2^{nR_2}\}\ .\) The outputs from the two encoders are the inputs to the single decoder. The decoder, upon receiving these two inputs, outputs two \(n\)-vectors \(\underline X^*_1\) and \(\underline X^*_2\) which are estimates of \(\underline X_1\) and \(\underline X_2\ ,\) respectively.

The systems of interest are those for which the probability that \(\underline X^*_1\) does not equal \(\underline X_1\) or \(\underline X^*_2\) does not equal \(\underline X_2\) can be made as small as desired by choosing \(n\) sufficiently large. Such a system is said to be an **admissible system** and the rate pair \((R_1,R_2)\) for an admissible system is said to be an **admissible rate pair**. The **admissible rate region** is the closure of the set of all admissible rate pairs.

The following entropies can be calculated for the pair of random variables \(X_1\) and \(X_2\) with joint probability distribution \(P(X_1=x_1,X_2=x_2)\ :\) \[ H(X_1,X_2), H(X_1|X_2), H(X_2|X_1), H(X_1)\] and \( H(X_2). \) When calculating entropies, all logarithms will be taken to the base 2.

**Slepian-Wolf Theorem** The admissible rate region for the pair of rates, \((R_1,R_2)\ ,\) is the set of points that satisfy the three inequalities:
\[
\begin{array}{rcl}
R_1 &\geq& H(X_1|X_2),\\
R_2 &\geq& H(X_2|X_1),\\
R_1 + R_2 &\geq& H(X_1,X_2).
\end{array}
\]
This admissible rate region is shown in Figure 2.

The significance of the Slepian-Wolf theorem is seen by comparing it with the entropy bound for compression of single sources. Separate encoders which ignore the source correlation can achieve rates only \(R_1 + R_2 \geq H(X_1) + H(X_2)\ .\) However, for Slepian-Wolf coding, the separate encoders exploit their knowledge of the correlation to achieve the same rates as an optimal joint encoder, namely, \(R_1 + R_2 \geq H(X_1,X_2)\ .\)

## Proof of Slepian-Wolf Theorem

The necessity of the three inequalities follows by considering a modified system in which the pair of source sequences, \(\underline X_1\) and \(\underline X_2\ ,\) are input to a single encoder. The output of this single encoder must have a rate at least \(H(X_1,X_2)\ .\) Thus, \(R_1 + R_2 \geq H(X_1,X_2)\ .\) Furthermore, if the single encoder knows both \(\underline X_1\) and \(\underline X_2\) and the decoder somehow already knows \(\underline X_2\ ,\) the single encoder would require a code whose rate is at least \(H(X_1|X_2)\ .\) Thus, \(R_1 \geq H(X_1|X_2)\ .\) The remaining inequality \(R_2 \geq H(X_2|X_1)\) follows from symmetry.

To show the sufficiency of the inequalities, first consider the particular rate pair \(R_1= H(X_1|X_2)\ ,\) \(R_2=H(X_2)\) on the boundary of the admissible rate region. Note that if \(R_2=H(X_2)\ ,\) then the output of Encoder 2 suffices to reconstruct \(\underline X_2\) so that the block diagram shown in Figure 3 reduces to that shown in Figure 4.

For \(R_2=H(X_2)\ ,\) an admissible system will be demonstrated for which \(R_1 = H(X_1|X_2)\ .\) Once the admissibility of the rate pair \(R_1=H(X_1|X_2),\) \(R_2=H(X_2)\) has been demonstrated, then the admissibility of the entire rate region is established because: 1) the admissibility of the rate pair \(R_1=H(X_1), \) \(R_2=H(X_2|X_1)\) follows from symmetry; 2) the admissibility of all rate pairs on the straight line connecting the rate pair \(R_1=H(X_1|X_2),\) \(R_2=H(X_2)\) with the rate pair \(R_1=H(X_1),\) \(R_2=H(X_2|X_1)\) follows from time-sharing; and 3) the admissibility of all rates within the admissibility region but not on the boundaries follows from wasting bits.

The original construction of an admissible system at the rate point \(R_1=H(X_1|X_2),\) \(R_2=H(X_2)\) was found for a particular model of the statistics of the correlated source pair called the twin binary symmetric source. This is the construction that will be sketched here. The paper that was published (Slepian and Wolf, 1973) used a different construction that applies to much more general sources.

A twin binary symmetric source is a memoryless source whose outputs \(X_1\) and \(X_2\) are binary random variables (taking on the values 0 and 1) described by \[ \begin{array}{rcl} P(X_1=0) = P(X_1=1)&=&1/2,\\ P(X_2=0|X_1=1) = P(X_2=1|X_1=0) &=& p,\\ P(X_2=0|X_1=0) = P(X_2=1|X_1=1) &=& (1-p) \end{array} \] where \(p\) is a parameter satisfying \(0 \leq p \leq 1\ .\) Note that \[ P(X_2=0) = P(X_2=1)=1/2. \]Define \(h_2(p) = -[p\log_2(p) + (1-p)\log_2(1-p)]\ .\) For the twin binary symmetric source: \[ \begin{array}{rcl} H(X_1) &=& 1,\\ H(X_2) &=& 1,\\ H(X_2|X_1) &=& h_2(p),\\ H(X_1|X_2) &=& h_2(p),\\ H(X_1, X_2) &=& 1 + h_2(p). \end{array} \] For the twin binary symmetric source, the rate point of interest has \(R_1= H(X_1|X_2) = h_2(p)\) and \(R_2= H(X_2) = 1\ .\)

The key to the problem of compressing \(\underline X_1\) is to think of the model of the twin binary symmetric source as if \(\underline X_1\) were passed through a binary symmetric channel (BSC) with bit error probability \(p\) to obtain \(\underline X_2\ .\) It is well known that for large \(n\ ,\) a parity-check code (or linear code) exists for the BSC with approximately \(2^{n(1-h_2(p))}\) codewords such that a decoder that sees the output of the channel, \(\underline X_2\ ,\) would be able to tell reliably which code word was at the input of the channel. The problem with applying this idea directly to this source coding problem is that the input to the channel, \(\underline X_1\ ,\) is not necessarily one of the \(2^{n(1-h_2(p))}\) codewords of the parity check code since \(\underline X_1\) can in fact be any of \(2^{n}\) binary \(n\)-vectors. Thus, another idea is needed.

The missing idea comes from the so-called coset decomposition of the group of \(2^n\) binary \(n\)-vectors in terms of the subgroup of the codewords. A coset of this subgroup is formed by taking any binary \(n\)-vector that is not in the subgroup and adding it bit-by-bit-modulo-2 to each vector in the subgroup to form a new set of \(2^{n(1-h_2(p))}\) vectors. This process is repeated, each time choosing as the vector to be added, a binary vector that has not appeared before, either in the original subgroup or any of the previously constructed cosets. The process ceases when all \(2^{n}\) binary vectors have appeared in either the original subgroup or in one of the cosets. It is easy to show that cosets are either identical or disjoint and thus that every binary \(n\)-vector appears in one and only one coset when one considers the subgroup itself also to be a coset.

For a code of block length \(n\) with \(2^{n(1-h_2(p))}\) codewords, there will be \(2^{nh_2(p)}\) cosets since \[ 2^{nh_2(p)} = 2^n/ 2^{n(1-h_2(p))}. \] Furthermore, the set of vectors in each coset have the same error correction properties as the original linear code because the vectors in any coset are just translated versions of the code words in the original code. Such a translated code is called a coset code.

Applying this construction to the problem at hand, \(\underline X_1\) must be in one of the cosets of the group code. If the source encoder transmits to the decoder the identity of the coset containing \(\underline X_1\ ,\) the decoder can determine \(\underline X_1\) from this knowledge and its knowledge of \(\underline X_2\) by using a decoder for the coset code that operated on the "received" word \(\underline X_2\ .\) Since there are \(2^{nh_2(p)}\) cosets, the encoder must transmit \(nh_2(p)\) binary digits. The rate of transmission is \(h_2(p) = H(X_1|X_2)\ .\) This establishes the admissibility of the rate point \[ R_1= H(X_1|X_2) = h_2(p) \ \textrm{ and } \ R_2= H(X_2) = 1. \] The entire admissible rate region then follows from symmetry, time-sharing and wasting bits.

The Slepian-Wolf theorem applies to a much wider class of problems than that described here. The statistical description of the source sequences can be of a much more general nature, there can be more than two correlated source sequences, and the configuration of the encoders and decoder can be of many different types.

## References

- Slepian, D and Wolf, J K (1973). Noiseless coding of correlated information sources.
*IEEE Transactions on information Theory*19: 471-480. doi:10.1109/tit.1973.1055037.

- Wyner, A D (1974). Recent results in the Shannon theory.
*IEEE Transactions on information Theory*20: 2-10.

- Chong, C-Y and Kumar, S P (2003). Sensor Networks: Evolution, Opportunities, and Challenges.
*Proceedings of the IEEE*91(8): 1247-1256. doi:10.1109/jproc.2003.814918.

- Cover, T M (1975) A Proof of the Data Compression Theorem of Slepian and Wolf for Ergodic Sources,
*IEEE Transactions on Inform. Theory*, IT-21, No.3, 226-228

- Girod, B; Aaron, A M; Rane, S and Rebollo-Monedero, D (2005). Distributed video coding.
*Proceedings of the IEEE*93(1): 71-83. doi:10.1109/jproc.2004.839619.

**Internal references**

- Tomasz Downarowicz (2007) Entropy. Scholarpedia, 2(11):3901.