# Universal search

Post-publication activity

Curator: Matteo Gagliolo

Universal or Levin Search is an algorithm for solving inversion problems, devised by Leonid A. Levin (1973, 1984). It is related to the concept of $$Kt$$ or Levin complexity, a computable, time-bounded version of Algorithmic Complexity.

The method consists in interleaving the execution of all possible programs on a universal Turing machine, sharing computation time equally among them, until one of the executed programs manages to solve the given inversion problem.

If there exists a program $$p\ ,$$ of length $$l(p)\ ,$$ that can solve the problem in time $$\mbox{time}(p)\ ,$$ then Universal Search will solve the problem in a time $$2^{l(p)+1}\mbox{time}(p)$$ at most. This exponential growth of computational cost in the algorithmic complexity $$l(p)$$ of the fastest solver makes practical applications of Universal Search problematic: nonetheless, the algorithm has inspired various others, including Hutter Search, the Optimal Ordered Problem Solver (OOPS) and the Gödel Machine.

## Levin complexity

The concept of algorithmic complexity quantifies the amount of information contained in an object as the size of the shortest program that can produce that object, without taking in account the resources that its execution requires. In the context of a Turing machine, these resources are space (e. g., maximum numbers of cells of the work tape used) and time (the number of execution steps, assuming that each step takes a fixed amount of time).

A time bounded version of the Kolmogorov complexity of a binary string $$x$$

$\tag{1} K(x) := \min_p\{\ell(p): U(p)=x\}$

can be obtained penalizing execution time

$\tag{2} Kt(x) \;=\; \min_p \{\ell(p)+\log(\mbox{time}(p)) : U(p)=x \}$

Equation (2) corresponds to the logarithm of the age of the binary string $$x$$ (Li and Vitanyi, Section 7.4), i. e., the time it would take to generate $$x$$ executing all possible programs in lexicographic order, in a "dovetailing" style. If we assume universal Turing machines that can simulate each other in linear time, then an invariance theorem similar to the one for standard Kolmogorov complexity can be proved. This allows us to talk of the $$Kt$$ complexity of a string, without explicit reference to the particular universal Turing machine $$U$$ used in Equation (2).

More precisely, consider a Kolmogorov-Uspensky universal Turing machine $$U\ ,$$ interpreting input programs according to a prefix code (prefix Turing machine, Li and Vitanyi, Chapter 3). Assume all possible programs $$p$$ are listed in lexicographic order, and executed in parallel, in a sequence of phases $$i=1,2,...\ :$$ in phase $$i\ ,$$ the execution of the first $$i$$ programs is resumed for one step. When a program $$p$$ halts the machine, the result on the output tape is compared to $$x\ :$$ if equal, $$Kt(x)$$ can be evaluated as in Equation (2). Note that this dovetailing execution scheme ensures that there is no other $$p$$ that can generate $$x$$ with a lower $$Kt(x)\ :$$ unlike Kolmogorov complexity, Levin complexity is computable, even though its computation is intractable in practice.

Levin complexity and other resource-bounded generalizations of algorithmic complexity are considered in Chapter 7 of (Li and Vitanyi, 1997).

## Speed prior

An alternative to the universal prior distribution $$M(x)\ ,$$ which takes into account computation time, is presented in (Schmidhuber, 2002). Writing $$p\rightarrow_i x$$ if a program $$p$$ generates an output with prefix $$x$$ after $$2^{i-l(p)}$$ instructions, the Speed Prior probability for $$x$$ can be defined as:

$S(x) \;=\; \sum_{i=1}^{\infty} 2^{-i}\sum_{p\rightarrow_i x}2^{-l(p)}$

It is intuitive to see that the largest contribution to $$S(x)$$ is $$2^{-Kt(x)}\ :$$ analogous to $$2^{-K(x)}$$ for $$M(x)\ .$$ However, it is unknown what class of distributions is dominated by $$S\ .$$

## Universal search

An inversion problem consists in finding, for a given function $$\phi$$ and a given value $$y\ ,$$ a value $$x$$ such that $$\phi(x)=y\ .$$ Without loss of generality, consider a function $$\phi$$ mapping binary strings to binary strings, $$\phi:\{0,1\}^*\rightarrow\{0,1\}^*\ .$$

Universal Search is an iterative search algorithm for solving inversion problems. Given a universal Turing machine, at iteration $$i\ ,$$ it executes each possible program $$p\in\{0,1\}^i$$ of length $$l(p)\leq i\ ,$$ for $$2^i2^{-l(p)}$$ steps. The algorithm halts when one of the programs writes a solution $$x$$ and verifies that $$\phi(x)=y\ ,$$ otherwise it moves to the next iteration, incrementing $$i:=i+1\ .$$

## History

Universal Search was first briefly mentioned in (Levin, 1973), and later discussed in (Levin, 1984). Related ideas were proposed by (Adleman, 1979). A description of the algorithm can be found in (Solomonoff, 1985; Gurevich, 1988; Li and Vitanyi, 1997, sec. 7.4). Its first practical applications were presented in (Schmidhuber, 1995; Schmidhuber, 1997).

## Hutter search

Hutter search (Hutter 2002, see also Hutter 2005, Chapter 7) is an algorithm that evaluates an arbitrary function, with an asymptotically optimal computational complexity, by restricting the computation to algorithms that are provably equivalent to the given function, and have a provable, and quickly computable, bound on execution time. The function itself can also be a solver for some problem class, mapping a parametric representation of a problem instance to a parametric representation of its solution: in this sense, Hutter search can be considered a general method for accelerating algorithms.

Let $$U$$ be a reference universal Turing machine, and $$time_p(x)$$ the time taken by $$U$$ to execute a program $$p$$ with input string $$x$$ and halt. Given an algorithmic representation $$p^*$$ of a function, Hutter search conducts three computations in parallel. The first consists in an enumeration of all possible proofs in a formal logic system: whenever a proof is found stating that a program $$p$$ is functionally equivalent to $$p^*\ ,$$ and has an input dependent time bound $$t$$ (i. e., $$\forall x : p(x)=p^*(x) \mbox{ and time}_p(x) \leq t(x)$$), the pair $$(p,t)$$ is added to a list of candidates $$L\ .$$ The second computation is itself a parallel computation of all time bounds $$t$$ currently on the list, with computation time allocated proportionally to $$2^{-l(p)-l(t)}\ ,$$ analogous to Levin search, but taking into account the complexity of both the program and the time bound $$t(x)\ .$$ The third computation consists in executing the program $$p(x)$$ with the current best time bound $$t(x)\ :$$ whenever the second computation finds a program with a better time bound (accounting for the time already spent by the current best), this computation is aborted, and the new fastest algorithm is started instead. The three parts are stopped when the last one halts, producing a solution, which will by construction be equivalent to $$p^*(x)\ .$$

Hutter proves that if the two parts share computational resources proportionally to $$\epsilon, \epsilon, 1-2\epsilon$$ respectively, then the algorithm has a time bound $$(1+\epsilon)$$ times the bound of the fastest algorithm, plus additive terms proportional to $$1/\epsilon\ .$$ Unfortunately, these latter additive terms are large enough to render the algorithm impractical, mainly due to the process of enumerating the proofs. An interesting theoretical implication of Hutter search is that the fastest implementation of a function is also among the shortest programs that provably compute it.

Section 7.2 of (Hutter, 2005) describes AIXI$$tl\ ,$$ a computable version of the optimal reinforcement learning agent AIXI, based on a similar idea, limiting the length of proofs and programs, and their runtime.

## OOPS and other incremental variations

Universal search allocates computation time according to a uniform distribution over programs. A number of more practical variations are adaptive, in the sense that the distribution used for allocating time to candidate solvers is updated in an incremental fashion, based on the results over a sequence of related and increasingly difficult problems.

'Adaptive Levin Search' (ALS - Wiering and Schmidhuber, 1996) updates the probabilities at every input cell, in order to increase the probability of the solution found for the last problem. The ratio behind this heuristic is that it should speed up the search process whenever the current problem has a solution that is similar to the previous ones.

A similar approach is taken in Probabilistic Incremental Program Evolution (PIPE - Salustowicz, 1997), in which the distribution over programs is stored as a tree; and in Self-Modifying Probabilistic Learning Algorithms (SMPLA - Schmidhuber et al., 1997), in which a meta-learning level is added, undoing unsuccessful probability modifications (Success Story Algorithm).

(Solomonoff, 2003) presents a collection of ideas for solving sequences of time-limited optimization problems by searching in a space of problem solving techniques, allocating time to them according to their probabilities, and updating the probabilities according to positive and negative results.

The Optimal Ordered Problem Solver (OOPS - Schmidhuber, 2004) is also aimed at incremental learning. Every time a problem instance is solved, the successful program is stored on the input tape. For the following problem instance, two searches are run in parallel: half of the time is allocated to testing prolongations of the latest solver, the other half to fresh programs, starting from scratch. Both searches are conducted using backtracking to simulate a parallel execution of all candidate solvers. Time is allocated according to a distribution over programs, which is obtained multiplying the probabilities of the individual instructions. If the instruction set includes primitives that can modify these probabilities, then OOPS can in principle display meta-learning capabilities by finding programs that speed-up the search for future solutions.

## Gödel machine

The Gödel Machine (Schmidhuber, 2006) is a general paradigm for solving arbitrary problems, including optimization problems and reinforcement learning. Inspired by the work of Kurt Gödel, it is based on a set of axioms and a programming language for encoding and deriving formal proofs.

The machine interacts with an environment, through a dedicated hardware and software system, and its aim is to maximize a reward over a possibly limited lifetime.

The axioms include a detailed formal description of the machine's software and hardware, including both the components interacting with the environment and those dealing with the formal proofs, and a possibly partial description of the environment.

The Gödel machine starts its interaction with the environment according to some initial program; in the meantime, a proof search algorithm, is used to find provably optimal modifications of the machine's software. In other words, a Gödel machine can rewrite any part of its software, but only after it can prove that the modification will increase the expected reward for the remaining lifetime: in this case, meta-learning can influence any aspect of the machine's behavior, including the proof search itself, by rewriting the code controlling it.

(Schmidhuber, 2006) also describes example instantiations of a Gödel Machine, in which the initial proof search algorithms are variants of universal search and OOPS, respectively.

## References

1. L. Adleman, Time, space, and randomness. Tech. rep. MIT/LCS/79/TM-131, MIT, Lab. for Computer Science, March 1979.
2. Yu. Gurevich, The logic of computer science column. EACTS Bulletin, 35:71-82, 1988.
3. M. Hutter, The fastest and shortest algorithm for all well-defined problems. International Journal of Foundations of Computer Science, 13(3):431--443, 2002. (online)
4. M. Hutter, Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin, 2005. (online)
5. L. A. Levin, Universal sequential search problems. Problems of Information Transmission, 9(3):265--266, 1973.
6. L. A. Levin, Randomness Conservation Inequalities: Information and Independence in Mathematical Theories. Information and Control, 61:15-37, 1984.
7. M. Li and P. M. B. Vitanyi, An Introduction to Kolmogorov Complexity and its Applications. Springer, New York, 2nd edition, 1997, and 3rd edition 2007. (online)
8. R. P. Salustowicz and J. Schmidhuber, Probabilistic Incremental Program Evolution. Evolutionary Computation, 5(2):123-141, 1997.
9. J. Schmidhuber, Discovering solutions with low Kolmogorov complexity and high generalization capability. In A. Prieditis and S. Russell, eds, Machine Learning: Proceedings of the Twelfth International Conference, 488-496, 1995.
10. J. Schmidhuber, Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5):857-873, 1997.
11. J. Schmidhuber, J. Zhao and M. Wiering Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28:105-130, 1997.
12. J. Schmidhuber, The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In J. Kivinen and R. H. Sloan, eds, Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), 216--228. Springer, 2002. (online)
13. J. Schmidhuber, Optimal Ordered Problem Solver, Machine Learning, 54:211-254, 2004. (online)
14. J. Schmidhuber, Gödel machines: self-referential universal problem solvers making provably optimal self-improvements. In B. Goertzel and C. Pennachin, eds.: Artificial General Intelligence, p. 119-226, 2006. (online)
15. R. J. Solomonoff, Optimum sequential search. Memorandum, Oxbridge Research, Cambridge, Mass., June 1984.
16. R. J. Solomonoff, Progress in Incremental Machine Learning. Tech. rep., IDSIA-16-03, IDSIA, October 2003.

Internal references

• M. Hutter, Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin, 2005. (online)
• M. Li and P. M. B. Vitanyi, An Introduction to Kolmogorov Complexity and its Applications. Springer, New York, 2nd edition, 1997, and 3rd edition 2007. (online)
• J. Schmidhuber, The new AI: General & sound & relevant for physics. In Real AI: New Approaches to Artificial General Intelligence. Springer, 2006. (online)
• R.J. Solomonoff, The discovery of algorithmic probability Journal of Computer and System Sciences, 55(1):73-88, 1997.