# Algorithmic Information Dynamics

Algorithmic Information Dynamics (AID) is an algorithmic probabilistic framework for model generation alternative and complementary to other fields and methods of experimental inference and causal discovery such as statistical methods and classical information theory and can be related to areas such as computational mechanics, and program synthesis. Unlike other methods such as Bayesian networks, AID does not rely on graphical models or classical probability distributions, at least not as the result of the model generation.

AID is the result of combining Algorithmic information theory (AIT) and Perturbation/Intervention Analysis where interventions are induced or naturally occurring in an open system subject to external influences. AID is the body of the foundations and methods that makes Algorithmic information theory more suitable for scientific application in areas of science from genetics to cognition and constitutes a true alternative to estimating algorithmic complexity by popular lossless compression algorithms such as LZW which are entropy estimators, and to other probabilistic approaches of scientific discovery and model inference. AID connects various fields of science and areas of active research such as logical inference, reasoning, causality, complexity, probability, and dynamics. AID provides the foundations, framework and numerical methods to mine and systematically produce computable hypotheses as candidates generative (and mechanistic) models for natural or artificial phenomena. AID provides the tools to explore the space of computable models. AID and the theory on which it relies upon is the theory of algorithmic complexity, and, more relevant, Algorithmic probability.

## Foundations and Origins

**Algorithmic Probability**

Introduced by Ray Solomonoff, Leonid Levin and related works by Andrey Nikolaevich Kolmogorov and Gregory Chaitin, Algorithmic probability considers the probability of a (discrete) object to be produced by a computable mechanistic set of rules that is Turing universal and imposes a distribution of input-output mappings established by Levin and called the Universal Distribution. Formally, a computable process that produces a string \(s\) is a program \(p\) that when executed on a universal Turing machine \(U\) produces the string \(s\) as output.

As \(p\) is itself a binary string, we can define the discrete universal a priori probability, \(m(s)\ ,\) to be the probability that the output of a universal prefix Turing machine \(U\) is \(s\) when provided with fair coin flips on the input tape. Formally,

\[ m(s) \;:=\; \sum_{p\;:\;U(p)=s} 2^{-\ell(p)}, \]

where the sum is over all halting programs \(p\) for which \(U\) outputs the string \(s\ .\) As \(U\) is a *prefix* universal Turing machine the set of valid programs forms a prefix-free set and thus the sum is bounded due to Kraft's inequality.

A formulation of AID is a Bayesian approach where the agnostic prior distribution is always assumed to be the so-called Universal Distribution as introduced by Levin, and any updates to the specific distribution of the problem are updated according to also to the Universal Distribution according to the theory of Algorithmic probability that favors short descriptions. Introduced by Ray Solomonoff, the theory of universal inductive inference as he also called it, is a theory of prediction based on observations. An example is the prediction of the next digit in the sequence \(s=1234567...\) According to Algorithmic probability, the next digit should be 8 because the simplest (shortest computable) model able to generate that sequence is the successor function \(x=1;x:=x+1\) and such a model would generate 8 as a predictor of the next digit given the data. The only assumption is that the prior probability distribution follows a computable probability distribution even if such distribution is unknown, because as proven by Levin, all computable distributions converge into what is known the Universal Distribution, a result that has been called 'miraculous' in the scientific literature and was praised by the late Marvin Minsky as the most important theory in science relevant to AI. Algorithmic probability captures (not originally on purpose) longstanding principles on which science has been founded, the principle (also known as Occam's razor) that the simplest explanation (in this case the most algorithmically probably which turns out to be the shortest algorithmic description) is more likely the correct one, and the principle of multiple explanations (Epicurus) to keep all explanations consistent with the data and Bayes's Rule (transform the a priori distribution to a posterior distribution according to the evidence), keep all hypothesis consistent with data.

**Numerical Methods**

AID was conceived and introduced by Zenil, Kiani, and Tegnér in the early 2010s and is currently an active area of research but the methods leading to AID were introduced in the mid-2000s. While it was not known, and not experiment had been conceived to test, if Algorithmic probability would empirically capture the intuition of Occam's principle other than as established in the theory, Zenil and colleagues produced empirical probability distributions from various models of computation motivated by Algorithmic probability in order to sample the Universal Distribution. The results showed that the output distribution were compatible among them and with the theoretical and empirical expectations to conform with Occam's principle as those elements with the highest probability produced by computable models were found to be the most algorithmic simple according to various complementary measures, including computable ones which converge in results while the least frequent elements the more random according to all other measures.

The numerical application of AID beyond its theoretical formulation strongly relies upon such numerical methods developed by Zenil et al. designed to combine and expand classical information theory at characterizing randomness with measures different to the popular use of compression algorithms such as Lempev-Ziv widely used to claim approximations to algorithmic complexity. These methods on which AID relies upon (but it is also independent of) consist of the so-called Coding Theorem (CTM) and Block Decomposition (BDM) methods.

## The Coding Theorem Method (CTM)

Most of the attempts to approximate algorithmic complexity have been made by way of popular lossless compression algorithms such as LZ or LZW. However, if we take the example of the sequence \(s=1234567...\), an algorithm such as LZ or LZW will fail at compressing \(s\) despite its obvious compressed form (\(x=1;x:=x+1\)), this is because algorithms such as LZ and LZW are entropy estimators and they build a dictionary of most frequent words to assign them shorter codes, yet because of the nature of \(s\) as a Borel normal number no subsequence is represented more than any other and the resulting compressed file will be of about the same size of </math>s</math> itself (modulo change of data type transformation which is only a transliteration between e.g. ASCII to binary). Because of these limitations of popular lossless compression algorithms, and other reasons, Zenil et al. introduced a method based on the so-called Coding theorem. The Coding Theorem in the context of Algorithmic probability due L.A. Levin (1974) establishes equality between Algorithmic probability and Algorithmic complexity, formally:

\[ m(s) = 2^{-K(s)} + c \] or equivalently,

\[ - \log \; m(s) \;=\; K(s)+ c. \] where \( c\) is a constant.

Based on this fundamental theorem, the Coding Theorem Method or CTM is a method that provides an alternative path to the widespread use of popular compression algorithms such as Lempev-Ziv to approximate Algorithmic complexity by way of Algorithmic probability via the Coding theorem. CTM does not rely upon purely statistical principles such as those on which popular lossless compression algorithms such as LZ and LZW are based (designed to find statistical regularities and thus more related more to classical information theory than to algorithmic complexity). The aim of CTM is to take the probabilistic content that other approaches pump into their generative models out of the generative models supporting a natural or artificial phenomenon as usually done in science. As an example, traditionally the boundary of what part of the probabilistic content of a belongs to the model itself or to the explanation of the possible explanatory models has traditionally been conflated in statistical and probabilistic approaches. For example, when modelling the outcome of throwing a dice what a statistical model ends up quantifying is the uncertainty external to the process itself, the degree of uncertainty as an observer hence quantifying a property of the observer and not the process related to the dice itself. This is because the process of throwing the dice is deterministic according to the laws of classical mechanics that are known to govern the dice trajectory (if quantum fluctuations are believed to be probabilistic, at short distances wouldn't have any effect), the probabilistic content of the model describing the dice outcome is, therefore, strange to the actual process. What an Algorithmic probability-like approach like CTM would do in principle and its major difference is to produce a set of deterministic models describing the dice trajectory and outcome without making any recourse to probability. It is then the distribution of possible computable models explaining the dice where a probability is imposed but is not at the core of the individual models themselves which in turn cannot only be tested beyond their outcome predictions as mechanistic (step-by-step) descriptions of the process but also offer a non-black-box approach where each model can be followed step-by-step and model states correspond to constructive (e.g. physical) states as opposed to random variables with no state-to-state correspondence between model and phenomenological data.

Unlike popular lossless compression algorithms that are guaranteed not to be able to characterize objects such as \(s\), CTM can, in principle, because when running all possible computer programs up to the size of \(s\) in bits there is a non-zero probability that CTM will find such a program if such a program exists (i.e. if \(s\) is not algorithmic random as we know it is not for this particular \(s\)).

## The Block Decomposition Method (BDM)

One way to see BDM is as a weighted version of Shannon's entropy that introduces algorithmic randomness into the original formulation of classical information, one that is able to help tell apart statistical randomness from algorithmic randomness. To illustrate this let's take the sequence \(s = 1234567... \) again. While the digits of \(s\) are distributed with maximal entropy because it can be seen as the digital expansion of the so-called Champernowne constant that has been proven to be Borel normal (because it does not show any statistical patterns in the sense of overrepresentations of any digit subsequence repeated any more times than any other subsequence), when considering the set of all possible sequences some of which do show statistical patterns, the appearance of each digit in \(s\) would be characterized as unexpected when we may have liked to say that the most salient feature of \(s\) is, in reality, its not random nature as it can be generated by a simple deterministic generative process. This is a key methodological distinction because (1) the sequence would be characterized as maximally 'disordered' by statistical approaches such as Shannon Entropy unless upon its application one already has the information that \(s\) has been generated by a deterministic process thereby self-defeating the use of the measure in the first place (in science this is the rule rather than the exception when observing data: one wants to investigate the nature of an object when the nature of such object is not yet known), and (2) it is of high empirical value in practice and application to science. For example, if the purpose is to quantify human memory, a person would not need to learn \(s\) digit by digit in order to generate it illustrating the algorithmic nature of human cognitive processes beyond statistical patterns. The sequence \(s\) may look very special but in fact, most sequences are of this type, they won't have any statistical regularity, and in fact not even any short description (low algorithmic randomness) when looking at the set of all possible sequences, but the difference between those that do not have a short algorithmic description versus those that appear statistically random but have a short description is divergent, we only chose \(s\) because it was the most obvious for illustration (another example are the digits of a mathematical constant such as \(\pi\)). Indeed, CTM and BDM have found many applications in psychometrics and cognition due to this advantage (see Applications of AID).

What BDM does is to extend the power gained by CTM to quantify algorithmic randomness by implementing a divide-and-conquer approach whereby the data is decomposed in pieces small enough that an exhaustive computational search finds the set of all computable models able to generate that piece of data. The sequence of small generators then supports the larger piece of data from which insight is gained as to their algorithmic properties. Small pieces of data supported by even smaller programs than those pieces are called causal patches in the context of AID as they are the result of a causal relationship between the set of these computable models and the observed data whereas models of about the same length than the data are not causally explained by a shorter generating mechanism and are therefore considered random and not causally supported, A sequence of computer programs smaller than their matched patches constitutes a sufficient test for non-randomness therefore informative of its causal content as its components can be explained by underlying small computable models shorter than the original whole data itself.

## Algorithmic Intervention Analysis

Algorithmic Information Dynamics consists of automatic discovering of primary causal computable models and provides the tools to explore the algorithmic effects that perturbations to systems and data (such as strings or networks) have on their underlying computable models. For example, when a random bit-flip is performed to the following string pattern \(0000000...\) in position \(N\) such perturbation will have a significant effect with respect to the length of the original shortest program \(P\) consisting of a loop printing \(N\) number of 0s because \(P\) would need to account for the 1 inserted at random position \(X\) from \(1\) to \(N\). If, however, the string is random the effect of any random change will have no effect because every bit required already its own description. When it comes to deletion as a perturbation, random deletions have similar effects in both simple and random objects amounting to about \(\log(X)\) with \(X\) the number of elements deleted.

AID can substitute, or complement, approaches to causal analysis. For example, Judea Pearl et al. work on causality assumes a starting causal model but cannot provide the means to primary causal discovery in the first place. One can formulate a cause-effect question in the language of the do-calculus as a probability between a random variable and an intervention \(P(L|do(D))\) and one can also do so in the context of AID. In one of the typical examples by Pearl himself, if \(L\) means human lifespan and \(do(D)\) the use of some drug \(D\) the probability \(P\) of \(D\) having an effect on \(L\) is quantified by classical probability to calculate \(P\). What AID does is to substitute \(P\) by AP, the algorithmic probability that \(D\) exerts an effect on \(L\) with the chief advantage that not only one obtains a probability for the dependency (and direction) between \(D\) and \(L\) but also a set of candidate models explaining such a connection where no classical probability is involved in any of the inference or description of each individual model. AP would then impose a natural non-uniform algorithmic probability distribution over the space of candidate models connecting \(D\) and \(L\) based on estimations of the Universal Distribution which in effect introduces a simplicity bias favoring shorter models as opposed to algorithmic random all of which have already been found to explain the causal connection between \(D\) and \(L\). AID thus removes the need for classical probability distributions and, more important, produces a set of generative models no longer derived from traditional statistics (e.g. regression, correlation) which is used even in the do-calculus to determine the probability of a dependency thereby falling back to the methods the do-calculus itself was circumventing. While the do-calculus is a major improvement in the area of causal discover, AID complements it offering a path to leave classical probability behind and take the next step towards departing from correlation confounding methods. Another key difference between the original do-calculus and the algorithmic probability calculus is that the do-calculus makes a distinction between \(P(L|do(D))\) and \(P(L|D)\) but in AID both hypothesis, \(AP(L|D)\) and \(AP(D|L)\), can be tested independently as they have different meanings under algorithmic probability. \(AP(L|D)\) means that \(L\) is the generative mechanism of \(D\) and \(AP(D|L)\) means that \(D\) is the generative mechanism of \(L\). Each of them would trigger a different explorative process under AID, the former would look for the set of smaller to larger computable models denoted by \({L}\) that can explain \(D\) while the latter would look for all the generative models \(\lbrace D \rbrace\) that generate \(L\). Intuitively, this means that, for example, to explain how a barometer falling (\( X \)) could be the cause for a storm would likely not find many shorter generative mechanisms to causally explain the occurrence of a storm (\( Y \)), while the other direction would find shorter computable models for the barometer fall to be the effect and not the cause. Clearly, disregarding the result, \(AP(X|Y)\) and \(AP(Y|X)\) are not equivalent. In the language of AID, conforming more to a typical notation in graph and set theory, the first example would be written as \(AP(L \backslash D)\) or \(C(L \backslash D)\) where \(C\) is the algorithmic complexity of \(L\) with intervention \(D\) (in the context of AID that is usually a deletion to a model that includes D) and the second example as \(AP(X \backslash Y)\) or \(C(X \backslash Y)\). Alternatively, \(AP(L \backslash \lbrace D \rbrace )\) would be \(L\) with a set of interventions \(\lbrace D \rbrace\). Multivariate causal analysis can be achieved by conditional versions of algorithmic complexity and is currectly an area of active research.

In Pearls account to causal analysis, the so-called causal ladder up to human-grade reasoning consists of 3 levels, with the first being that of pattern observation covered by traditional statistics long-based on regression and correlation, interventions as the one his do-calculus suggests and AID allows, and, the 3rd level, that of counterfactuals or the power to imagine what would happen if conditions were different. AID also provides the most fundamental step in the causal analysis which is causal discovery under the same framework without having to make recourse to different approaches for different purposes (discovery vs analysis). While the do-calculus comes with no first guidelines for how to come up with a first testable model, AID can generate a set of computable models ranked by likeliness and offers a full path towards the full replacement and substitution of regression and correlation in the description of a model, taking out the statistical nature out of the causal description. In addition, AID can also potentially cover all three levels of causality in Judea's ladder and provides the methodological framework to address each without any making recourse to classical probability, regression or correlation. While interventions in AID can be conducted both interventions and counterfactuals in AID are covered by the underlying simulation of all possible computable hypotheses (hence all 'if' questions) up to the length of the original observation, and methods such as CTM and BDM allow the estimation of processes at all these 3 levels by decomposing the original problem into smaller problems with the additional advantage that AID ranks the counterfactuals naturally by algorithmic probability likeliness based on the Universal Distribution as a prior.

## Connection to Dynamical Systems

While some theoretical results have connected algorithmic complexity and dynamical systems not many applications have connected them. One of the main results in AID is that if a chain of causally connected processes is unaffected and generated from the same generative mechanism then its algorithmic complexity remains constant up to a (double) logarithmic term accounting for the time step when the system is to be reproduced up to a certain observable time, anything departing from such quantity indicates that the process has been subject to an external perturbation or interaction which can be pinpointed by AID. Numerical experiments quantifying the change in the number of attractors in a dynamical system (and therefore the average depth or shallowness of their attractors) demonstrates that, on the one hand, when removing elements that reduce the algorithmic complexity of a dynamical system (with respect to the average length of the models explaining its original description) systematically reduces the number of attractors. On the other hand, when removing elements that increase the dynamical system algorithmic complexity (hence makes it more algorithmic random), the number of attractors increases.

## Other Related Concepts

**Resource-bounded complexity**

**Minimal Description Length**

While AID and MDL are ultimately related and based on the principles of Algorithmic probability, Minimum description length departs from AID and AIT when

**Computational Mechanics**

AID can be considered as partially or fully connected to the area of Computational Mechanics but (1) it differs from the formulation of Crutchfield et al in that there is no probabilistic content at the core of the candidate generative models and (2) AID by way of CTM and BDM provide the means to implement other computational mechanics approaches including potentially that of Crutchfield's as methods to mine the space of computable automata (but unlike Crutchfield's approach, CTM finds sets of non-probabilistic automata).

**Measures of sophistication**

Because AID has this property to distinguish simple and random and assign them similar values AID is considered a measure of sophistication similar to Bennett's Logical Depth.

**Granger causality and transfer entropy**

**Machine Learning and Artificial Intelligence **

More recently AID has been introduced to machine intelligence as an inference engine and has shown (1) that it can outperform statistical approaches at difficult, if not impossible for statistical approaches, at tasks such as feature selection, model generation and dimension reduction, and (2) unlike previous beliefs, landscape differentiation for numerical optimization is not necessary or sufficient for machine learning approaches to produce results similar if not better than deep learning.

## Challenges and Limitations

The computational resources needed to compute CTM at the core of AID limit the size of its application but BDM extends and combines CTM with other computable measures such as Shannon entropy itself. One of the most active areas of AID research is to find ways to make the algorithmic measures to be more relevant and to find possible shortcuts towards the computation of the computable region in the uncomputable space that is relevant for applications in science.

## Applications of AID

Biology, genetics, finance, cognition, psychometrics, complexity, networks, artificial intelligence, data science

## References

- G. J. Chaitin. On the length of programs for computing finite binary sequences.
*Journal of the ACM*, 13(4):547--569, 1966.

- P. Gàcs. On the symmetry of algorithmic information.
*Soviet Mathematics Doklady*, 15:1477--1480, 1974.

- R. G. Gallager.
*Information Theory and Reliable Communication*. Wiley, New York, 1968.

- N. Gauvrit, F. Soler-Toscano, H. Zenil, Natural Scene Statistics Mediate the Perception of Image Complexity, Visual Cognition, Volume 22, Issue 8, pp. 1084-1091, 2014.

- N. Gauvrit, H. Singmann, F. Soler-Toscano, H. Zenil, Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method, Behavior Research Methods, Volume 48, Issue 1, pp. 1-16, 2015.

- N. Gauvrit, H. Zenil, F. Soler-Toscano and J.-P. Delahaye, Algorithmic complexity for short binary strings applied to psychology: a primer, Behavior Research Methods, vol. 46-3, pp 732-744, 2014.

- S. Hernández-Orozco, N.A. Kiani, H. Zenil, Algorithmically Probable Mutations Reproduce Aspects of Evolution, such as Convergence Rate, Genetic Memory, and Modularity, Royal Society Open Science 5:180399, 2018.

- M. Hutter. On Universal Prediction and Bayesian Confirmation.
*Theoretical Computer Science*, 384:1 (2007) 33-48

- A. N. Kolmogorov. Three approaches to the quantitative definition of information.
*Problems of Information and Transmission*, 1(1):1--7, 1965.

- A. N. Kolmogorov. Combinatorial foundations of information theory and the calculus of probabilities.
*Russian Mathematical Surveys*, 38(4):27--36, 1983.

- L. A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory.
*Problems of Information Transmission*, 10(3):206--210, 1974.

- M. Li and P. M. B. Vitanyi.
*An Introduction to Kolmogorov Complexity and its Applications*. Springer, New York, 2nd edition, 1997.

- F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines, PLoS ONE 9(5): e96223, 2014.

- F. Soler-Toscano, H. Zenil, A Computable Measure of Algorithmic Probability by Finite Approximations with an Application to Integer Sequences, Complexity vol. 2017 (2017), Article ID 7208216

- F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit , Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures, Computability, vol. 2, no. 2, pp 125-140, 2013.

- R. J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2.
*Information and Control*, 7:1--22 and 224--254, 1964.

- G. Terrazas, H. Zenil and N. Krasnogor, Exploring Programmable Self-Assembly in Non DNA-based Computing, Natural Computing, vol 12(4): 499--515, 2013.

- H. Zenil and J-P. Delahaye, “On the Kolmogorov-Chaitin complexity for short sequences”. In C. Calude (ed) Randomness and Complexity: From Leibniz to Chaitin, World Scientific Publishing Company, 2007.

- H. Zenil and J-P. Delahaye, Numerical Evaluation of the Complexity of Short Strings: A Glance Into the Innermost Structure of Algorithmic Randomness, Applied Mathematics and Computation 219, pp. 63-77, 2012.

- H. Zenil, G. Ball and J. Tegnér, Testing Biological Models for Non-linear Sensitivity with a Programmability Test. In P. Liò, O. Miglino, G. Nicosia, S. Nolfi and M. Pavone (eds), Advances in Artificial Intelligence, ECAL 2013, pp. 1222-1223, MIT Press, 2013.

- H. Zenil, Programmability: A Turing Test Approach to Computation. In L. De Mol and G. Primiero (eds.), Turing in Context, Koninklijke Vlaamse Academie van België voor Wetenschappen en Kunsten (Belgian Academy of Sciences and Arts), Contactforum, 2014.

- H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis, Correlation of Automorphism Group Size and Topological Properties with Program-size Complexity Evaluations of Graphs and Complex Networks, Physica A: Statistical Mechanics and its Applications, vol. 404, pp. 341–358, 2014.

- H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit, Two-Dimensional Kolmogorov Complexity and Validation of the Coding Theorem Method by Compressibility, PeerJ Computer Science, 1:e23, 2015.

- H. Zenil, Algorithmicity and Programmability in Natural Computing with the Game of Life as an In Silico Case Study, Journal of Experimental & Theoretical Artificial Intelligence, Volume 27, Issue 1, pp. 109-121, 2015.

- H. Zenil, N.A. Kiani, J. Tegnér, Numerical Investigation of Graph Spectra and Information Interpretability of Eigenvalues, In F. Ortuño & I. Rojas (Eds.): 3rd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO) 2015, Part II, LNCS 9044, pp. 395--405. Springer, 2015.

- H. Zenil, N.A. Kiani and J. Tegnér, Methods of Information Theory and Algorithmic Complexity for Network Biology, Seminars in Cell and Developmental Biology, vol. 51, pp. 32-43, 2016.

- H. Zenil, Approximations to Algorithmic Probability. In Robert A. Meyers (ed), 2nd. Edition of the Springer Encyclopedia of Complexity and Systems Science, 2017

- H. Zenil, N.A. Kiani and J. Tegnér, Low Algorithmic Complexity Entropy-deceiving Graphs, Physical Review E 96, 012308, 2017.

- H. Zenil, N.A. Kiani and J. Tegnér, Algorithmic Information Dynamics of Emergent, Persistent, and Colliding Particles in the Game of Life. In A. Adamatzky (ed), From Parallel to Emergent Computing, Taylor & Francis / CRC Press, pp.367--383, 2019.

- H. Zenil, S. Hernández-Orozco, N.A. Kiani, F. Soler-Toscano, A. Rueda-Toicen, A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity, Entropy 20(8), 605, 2018.

- H. Zenil, N.A. Kiani, A. Zea, J. Tegnér, Causal Deconvolution by Algorithmic Generative Models, Nature Machine Intelligence, vol 1, pages 58–66, 2019.

- H. Zenil, P. Minary, Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences, Nucleic Acids Research, gkz750, 2019.

- A. K. Zvonkin and L. A. Levin. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms.
*Russian Mathematical Surveys*, 25(6):83--124, 1970.

**Internal references**

- Marcus Hutter (2007) Algorithmic information theory. Scholarpedia, 2(3):2519.

- Marcus Hutter, Shane Legg, Paul M.B. Vitanyi (2007) Algorithmic probability. Scholarpedia, 2(8):2572.

- Rodney G. Downey and Jan Reimann (2007) Algorithmic randomness. Scholarpedia, 2(10):2574.

- Paul M.B. Vitanyi (2007) Andrey Nikolaevich Kolmogorov. Scholarpedia, 2(2):2798.

- Ming Li and Paul M.B. Vitanyi (2007) Applications of algorithmic information theory. Scholarpedia, 2(5):2658.

- Olaf Sporns (2007) Complexity. Scholarpedia, 2(10):1623.

- Tomasz Downarowicz (2007) Entropy. Scholarpedia, 2(11):3901.

- Matteo Gagliolo (2007) Universal search. Scholarpedia, 2(11):2575.

## Recommended reading

For an introduction to Kolmogorov complexity, history and more references:

- An Introduction to Kolmogorov Complexity and Its Applications
- Methods and Applications of Algorithmic Complexity, Beyond Lossless Compression Algorithms, forthcoming (2020), Springer Verlag
- Algorithmic Information Dynamics, forthcoming (2020), Cambridge University Press

**Foundational Papers**

**Applications Papers**

**Media and Popular Science Publications**

## External links

- Algorithmic Information Dynamics course at the Sta Fe Institute platform ComplexityExplorer
- Video produced by Nature to explain an application of AID to machine intelligence

## See also

Algorithmic "Kolmogorov" Complexity, Algorithmic "Solomonoff" Probability, Universal "Levin" Search, Algorithmic "Martin-Loef" Randomness, Recursion Theory, Applications of AIT.