At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Neural Networks » Advanced

Universal Meta Optimization

By Forrest Briggs

Abstract

Many problems in artificial intelligence such as training neural networks are optimization problems. A universal optimizer is an optimization algorithm that performs well on a large number of optimization problems. Evolution is a powerful universal optimization algorithm that already exists. The process of using an optimization algorithm to search all computable optimizers is meta optimization. Optimizers found this way could be cable of optimizing themselves.

Universal Optimization

A universal optimizer is an algorithm that performs well on a large set of optimization problems. This section will define the terms universal optimizer and optimization problem and show how the performance of a universal optimizer can be quantified with respect to a set of optimization problems.

An optimization problem is defined by a domain and a fitness function. Given a domain set X and a fitness function f : X → |R, we wish to find the elements x ∈ X such that f is maximized. For example, training a neural network consists of optimizing a fitness function from the weights of the neural net to the reals. This function could take the form f : |Rn → |R if the neural network has n real weight parameters. An optimizer such as an evolutionary algorithm repeatedly guesses elements x ∈ X, then receives the evaluation of f(x). A good optimizer uses the information it gains from previous guesses to make better subsequent guesses. In this way, an evolutionary algorithm makes new guesses that are mutations and/or matings of previous guesses with high fitness.

In order to use existing optimization techniques to optimize the performance of an optimizer, it is necessary to quantify the performance of the optimizer. We quantify exactly how good a particular optimizer is with respect to a particular optimization problem in the following way. For k iterations, we tell the optimizer to make a guess, then evaluate the guess' fitness and return the fitness to the optimizer. Let xi be the ith guess made by the optimizer. The fitness of the optimizer could be the maximum, average, or sum of f(x1) ... f(xk).

We call an optimizer universal if it has a good fitness with respect to a large set of optimization problems. The universal fitness of an optimizer with respect to a set of functions is the average of its fitness with respect to each optimization problem in the set.

Evolution is a powerful universal optimizer because it is applicable to many different domains, including |Rn and {0,1}* (binary strings). Particle swarm optimization can outperform evolution, but is limited to the domain |Rn [1]. There are also hybrids between evolution and particle swarm optimization [2].

Universal Models

The Turing-Church Thesis states that Turing Machines can compute any computable function. A universal model is a computational system that can compute the same functions as a Turing Machine, and is said to be Turing-complete. Models have parameters that determine what function they compute. Parameters are like the source code to a program, and the model defines how the source is to be interpreted. Turing Machines are a model and their parameters are their states, alphabet and transitions. Many different kinds of neural networks have been shown to have capabilities approaching or equal to Turing-completeness [3][4]. The parameters of neural net models tend to be weights and delays that represent the properties of connections between neurons.

There are two uses of universal models in universal meta optimization. We would like to use existing algorithms to search the space of all computable optimizers. With a universal model, each possible set of parameters specifies an algorithm that the model computes. Thus each individual set of parameters for a model can be used as the source code to an optimizer program. In order to search the space of all computable optimizers, we search the space of the parameters of a universal model.

Given a domain X and a universal model, let P be the set of all parameters for the model that deterministically specify O, the set of all optimizers that run for k iterations, repeatedly choosing a guess x ∈ X, then receiving the evaluation of f(x). The dynamics of the model define a map from elements of P to elements of O. Since for any optimizer in O, we can measure its universal fitness, we shall also define the function u : P → |R, which maps parameters to the universal fitness of their corresponding optimizers. u is the function that we will optimize in meta optimization.

In order to establish the universality of an optimizer, we measure its fitness with respect to a large set of optimization problems. An ideal set of optimization problems would be all of the computable functions on a particular domain. Since this set is potentially infinite, in practice we must settle for a randomly chosen subset of it. By randomly choosing the parameters of instances of a universal model, we can effectively chose random functions from the set of all computable functions. To establish the approximate universal fitness of an optimizer, we calculate its fitness with respect to randomly generated functions drawn from the pool of all computable functions.

Given a domain X and a universal model, let P be the set of all parameters for the model that deterministically specify F, the set of all functions f : X → |R that the model can compute. The dynamics of the model define a map from elements of P to elements of F.

Note the parameter set of a single model can be mapped to both optimizers and optimization problems. In practice, it is probably easier to have a single universal model that serves both purposes.

In order to use evolution to optimize in the parameter space of the model, it must be possible to define evolutionary operations that act on sets of parameters to produce new parameters with mutations or mating. Particle swarm optimization has the more stringent requirement that the parameters of the model be real numbers. These considerations need to be taken into account when choosing a universal model. Neural networks, which often have a matrix of real valued weights as parameters are a natural choice as a universal model for meta optimization. C++ programs, which have text source, would not be a good universal model for meta optimization, because it would be difficult to write an optimization algorithm that acts on C++ source code.

Universal Meta Optimization

In the search for a better universal optimizer, the search space is all computable optimizers. There is a bijection between this set and the parameters of a universal model. We can measure the universal fitness of any given optimizer. Thus we can frame the task of finding a computable optimizer with a high universal fitness as an optimization problem in which the domain is the set of all parameters for a universal model and the function being optimized quantifies how good the optimizer corresponding to a particular set of parameters is at optimizing a well distributed random subset of the computable functions. Given a particular domain X, a universal model with possible parameters P that map to optimizers O and a set of randomly generated functions F, we wish to find elements p ∈ P that maximize u : P → |R. The function u can be optimized using already existing optimization algorithms such as evolution. Doing so could theoretically produce more efficient optimization algorithms.

If by maximizing u, we can find an optimizer that can optimize functions on the same domain as its parameters, then it can be used to optimize its own performance. To do so, simply present it with the optimization problem u. For example if our model is a neural network with parameters in |Rn and we find a set of parameters that corresponds to a good universal optimizer on the domain |Rn, then the optimizer can optimize in the domain of its own parameters and can therefore be used to find other optimizers.

Calculating u(p) for some p ∈ P can be a computationally intensive process. In order to get an accurate measure of the performance of the optimizer corresponding to p, it may be necessary to test it against a large number of different optimization problems and to the test long term performance of the optimizer with many guesses on each problem. If the number of guesses optimizers are allowed to make and the number of optimization problems each optimizer is tested against are arbitrarily chosen and remain fixed, then the process of finding optimizer truly is equivalent to optimizing u : P → |R. This is desirable when we want to use optimizers to optimize themselves. However, there is no good way of knowing in advance how many tests will need to be done in order to calculate the fitness of optimizers well enough to facilitate evolution. As evolution progresses and slows, it becomes necessary to know the fitness of each optimizer with more precision. An estimation of the fitness of each optimizer can be kept and dynamically refined. For example, the first time evolution guesses an optimizer, it could be evaluated against 100 optimization problems. Every subsequent time that that optimizer is used as a parent for another optimizer, it is evaluated against an additional 10 optimization problems. In this way, optimizers that are used as the basis of future optimizers are more thoroughly evaluated than optimizers that are identified as having low fitness.

Universal meta optimization is a kind of meta learning, which Jurgen Schmidhuber has been researching since 1987. Meta learning is the process of using learning algorithms to create other learning algorithms. Working with Jurgen Schmidhuber, Sepp Hochreiter used LSTM [5] neural nets as a model that enabled meta learning [6].

Conclusion

A Universal optimizer maximizes the output of a set of functions. Universal models can be used to represent any computable optimizer or optimization problem. Universal meta optimization offers a way to find better universal optimization algorithms including optimizers that optimize themselves. Since optimization is used in a large variety of AI problems, faster optimization algorithms are very useful.

References

[1] Neural Network Learning using Particle Swarm Optimizers by Matt Settles and Bart Rylander.

[2] Particle Swarm, Genetic Algorithm, and their Hybrids: Optimization of a Profiled Corrugated Horn Antenna by Jacob Robinson, Seelig Sinton, and Yahya Rahmat-Samii.

[3] General-Purpose Computation with Neural Networks: A Survey of Complexity Theoretic Results by Siri Sima and Pekka Orponen 2003.

[4] Turing Universality of Neural Nets (Revisited) by J. Pedro Neto, Hava T. Siegelmann, J. Felix Costa, and C.P.Suarez Araujo 1999.

[5] Long Short Term Memory by Sepp Hochreiter and Jurgen Schmidhuber 1995.

[6] A. S. Younger, S. Hochreiter, and P. R. Conwell. Meta-Learning with Backpropagation. Proceedings of the International Joint Conference on Neural Networks (IEEE-2001). pp. 2001-2006, 2001

Submitted: 07/07/2004

Article content copyright © Forrest Briggs, 2004.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -