Learning outcomes of the course unit
Objective of the course is to provide the student with the ability to understand and apply the basic rules of information theory, and in particular:
- the physical meaning of the main information quantities, namely, entropy and mutual information and their inter-relationships
- the concept of typical sequences and their role in data compression
- the concept of channel capacity
The abilities in applying the above-mentioned knowledge are in particular in the:
- design and analysis of data compression codes
- calculation of the capacity of a given transmission channel
A first course in basic probability theory
Course contents summary
1.1 Definition of basic information theory quantities: entropy, mutual information
1.2 Basic theorems, relations and inequalities
1.3 Sufficient statistics
2.1 Asymptotic equipartition property (AEP) and typical set
2.2 Entropy rate
2.3 Discrete-time Markov chains (DTMC)
2.4 AEP theorem for stationary ergodic sources
3.1 Data Compression: definitions, examples,
3.2 Kraft inequality, Noiseless coding theorem
3.3 Huffman and Lempel-Ziv codes
4.1 Channel capacity: definitions, examples
4.2 Typical-sequence dcoding. Jointly typical set and its properties.
4.2 Proof Channel Coding Theorem
4.3 Joint source-channel coding theorem
5.1 Differential entropy: definition, examples
6.1 Mutual information for discrete X and continuous Y: examples
6.2 Capacity of additive Gaussian channel, Shannon Capacity formula
6.3 Parallel Gaussian channels
6.4 Capacity of discrete-time additive Gaussian channel with memory.
6.5 Capacity of continuous-time additive Gaussian channel with memory. Water pouring.
Syllabus (every class = 2 hours)
Course organization, objectives, textbooks, exam details. Sneaky preview of the course, motivations, applications. Assigned Reading of Ch.1 of textbook. Physical justification and definition of entropy. Examples of entropy calculation. Up to sec. 2.1.
Definition of joint and conditional entropy, eample 2.2.1. Relative entropy, mutual information and their relation. Chain rules for PMFs and entropy.
Relative conditional entropy, conditional mutual information, chain rules for D and I. Inequalities for D and I. max and min of H, H(X|Y)<=H(X) and generalizations. Convex functions. Jensen's inequality, examples.
first hour: logsum inequality, convexity of D, concavity of H. concavity of I in p(x) and convexity in p(y|x). Exercise: mixing increases entropy. second hour: Definition of Markov chain and first properties for 3 random variables (RV) X,Y,Z. Data processesng inequality. Counterexample.
first hour: sufficient statistics: definition in terms of mutual information. Examples: number of successes in repeated trials; sample mean in estimation of common mean in a vector of independent gaussian RVs. Sufficient statistics and hypothesis testing: factorization theorem. Second hour: Fano inequality. Exercise 2.32.
Exercises 2.5, 2.4, 2.27, 2.30 (after brief introduction to the method of Lagrange multipliers), 2.21.
Ch 3 asymptotic equipartition property (AEP): introduction. Probability theory refresher: i.p. convergence, Chebychev inequality, Weak law of large numbers, AEP. Typical set and properties. Example with binary sequences.
first hour: relation among Typical set and high-probability sets. Theorem 3.3.1. second hour: Problem solving: Exercises 3.8, 3.9. Ch 4: Entropy rates: introduction. Definition of discrete-time stochastic process and stationarity.
first hour: introduction to discrete-time Markov chains (DTMC): transition matrix, updatge law (Chapman-Kolmogorov), stationary distribution. Two-state example: state diagram, evolution towards limit distribution. Evaluation with flux balancing. second hour: Entropy rates H and H'. H=H' for stationary processes. Statement of AEP theorem for stationary ergodic sources (Shannon/Breiman/McMillan). Explicit evaluation of H for DTMC. Examples.
Doubly-stochastic matrices and uniform steady-state distribution. Connections with entropy as defined in statistical thermodynamics: DTMC on microstates with doubly-stochastic transition matrix. Entropy increases towards steady-state distribution entropy. Example 4 (eq 4.50-4.52). Hidden Markov models (HMM): entropy rate.
Problem solving: Ex. 4.1 mixing increases entropy. Conditions for observable Y in a HMM sian una DTMC. examples where Y is not a DTMC. Point a. of Ex. 4.18 on Entropy Rate of stationary but not ergodic process.
Problem solving: first hour: points b, c of Ex. 4.18. second hour: Es 4.10 entropy rate of a second order markov process: study of hidden markov chain. Ex. 4.6.
Ch 5: Data compression. Examples of codes. Kraft inequality. Search of optimal codes with Lagrange multipliers method. Noiseless coding theorem.
Comments on first Shannon Theorem: when p not dyadic. Quasi-optimal Shannon Codes. Shannon super-codes are asymptotically optimal. Extra cost on minimal code length when using a PMF that differs from true PMF. McMillan Theorem: every uniquely decodable theorem satisfies Kraft inequality. Introduction to Huffman codes: examples 1, 2.
Huffman codes: example 3 (dummy symbols), Exercise 5.32, example 5.73 (set of different optimal codelengths). Competitive optimality of Shannon code. Proof of optimality of Huffman code.
first hour: Optimal compression of Markov sources. Description of Lempel-Ziv algorithm for universal compression. second hour: Channel capacity: introduction, definition of discrete memoryless channel (DM
 T. M. Cover, J. A. Thomas, "Elements of Information Theory". John Wiley and Sons, 1991.
 R. Blahut, "Principles and Practice of Information Theory". Addison-Wesley, 1988.
 J. Cioffi, "Ch. 8: Fundamental Limits of Coding and Sequences", http://www.stanford.edu/~cioffi
Classroom teaching, 42 hours. In-class problem solving, 6 hours.
Homeworks assigned weekly.
Assessment methods and criteria
Oral only, to be scheduled on an individual basis. When ready, please contact the instructor by email at alberto.bononi[AT]unipr.it by specifying the requested date.
The exam consists of solving some proposed exercises and explaining theoretical details connected with them, for a total time of about 1 hour.
The exam may be split into two distinct parts and scheduled on different days at the student's request: Part 1: up to end of Data Compression; Part 2: from Channel coding till the end.
At the exam, you can bring your summary of important formulas in an A4 sheet to consult if you so wish. Some sample exercises can be found on the course website.