by A/Prof. Peter Tino from University of Birmingham, U.K.
State space modelling is a popular approach to processing data with temporal dependencies. There is a huge variety of state space models, but they all share the same common underlying modelling principle: Collapse all the ‘necessary’ information about the past into an ‘abstract’ information processing state, which can be updated recursively as more and more data arrives. Outputs of such models are then determined based on the state information (representing the past) and the current data.
In machine learning, state space models take on many forms in various contexts, depending on e.g.
- state space nature and cardinality (e.g. finite number of states, discrete state space, continuous state space)
- whether temporal mapping (inputs → outputs) or input series modelling is considered (supervised vs. unsupervised learning)
- model formulation framework – e.g. probabilistic vs. non-probabilisticSome well-known model classes include
- finite state machines – whether with probabilistic or non-probabilistic transitions. These models have a finite number of possible states and finite alphabets of possible inputs/outputs.
- recurrent neural networks and Echo State Networks – parameterized state space models using model structures of artificial neural networks. The state space is infinite (uncountable), inputs outputs can be continuous or discrete.
- hidden Markov models – probabilistic formulations of state space models with finite number of states and discrete/continuous observations.
- recursive extensions of self-organizing maps – topographic maps of data operating with additional state information representing the past.
- (extended) Kalman filters – probabilistic formulations of state space models with continuous state space and typically continuous inputs/outputs.
In the tutorial I will first present a unified general view of all such systems as non-autonomous input-driven dynamical systems. I will present several possible taxonomies of state space models along the lines outlined above.
The difference between observable and unobservable state space will be made and the implications of this difference for parameter adaptation will be explained. It is more difficult to learn models with unobservable states (such as e.g. recurrent neural networks, hidden Markov models) and it is important to clarify, whether given the application at hand, one really needs the full power of general state space modelling, or observable state space models such as AR models, Markov chains, prediction suffix trees etc. could be used.
Having done the general part of the tutorial, I will explain in detail different model formulations and adaptation of their free parameters, always making links to the general principles outlined in the first part of the tutorial. To cover as broad spectrum of modelling choices as possible I will concentrate on:
- fixed order Markov chains and variable memory length Markov models – examples of probabilistic models with finite observable state space. I will describe their fitting to the data, smoothing regularizations in cases of insufficient data and efficient implementations in terms of fractal prediction machines.
- hidden Markov models – simple examples of probabilistic models with un- observable finite state space. I will explain how the problem of unobservable state space gets manifested in parameter estimation – latent variable model with e.g. EM-style maximum likelihood estimation + exponential explosion of possible state paths – and will show how can this be dealt with efficiently.
- recurrent neural networks – non-probabilistic parameterized state space mod- els with continuous unobservable state space. I will explain how the problem of continuous unobservable state space gets manifested in parameter estimation – information latching problem. I will also introduce basic training procedures for recurrent nets (real time recurrent learning and back-prop through time) with their recent extensions such as Kalman filter based smoothing of parameter up- dates. Time permitting, I will also introduce the LSTM model.
- reservoir models – an important subclass of recurrent networks with non- trainable dynamic part (reservoir). I will explain the motivation (dealing with information latching problem of recurrent nets) and theoretical advances, as well as successful application of this (still very active) area of research.
- recSOM – recursive extension of self-organizing map. I will explain theory shed- ding light on limitations of these kind of approaches. These models will be con- trasted with probabilistic options, such as GTM Through Time.
- Kalman filters – probabilistic models with linear state transitions and state readouts under the assumptions of Gaussian noise. The model formulation and fitting will be presented in the light of parameter fitting of hidden Markov models and recurrent networks, emphasizing the differences stemming from the model formulation.The tutorial will be concluded by reviewing some applications in 2
- computational finance – prediction suffix trees and recurrent networks used in options trading through volatility forecasts.topographic mapping of chorals by J.S. Bach and fluxes from binary star complexes – probabilistic formulations of topographic maps for data with temporal dependencies.
- language modelling – variable length Markov models and fractal prediction machines as probabilistic models over symbol streams.
We expect that the tutorial will be of interest to a wide variety of AI-2013 attendees, in particular
- theoreticians and practitioners working with data exhibiting temporal dependen- cies,
- researchers interested in dynamical systems and their applications as modelling tools, and
- young researchers (PhD students, fresh post-docs) possibly perplexed by the va- riety of modelling frameworks for temporally dependent data, who would benefit from the unifying viewpoint on which this tutorial is based.NOTE: No prior knowledge is required. Key concepts will be illustrated in an accessible way.
Peter Tino’s Biography
Peter Tino has a direct and extensive research experience with theoretical and practical issues related to (probabilistic and non-probabilistic) modelling and learning of temporal data in the context of both supervised and unsupervised learning. He has published in Neural Computation, IEEE Trans. on Pattern Analysis and Machine Intelligence, IEEE Trans. on Neural Networks, IEEE Trans. on Evolutionary Computation and Machine Learning. (See full list of publications).
Peter is an associate editor of Scientific Reports (Nature Publishing), IEEE Transactions on Neural Networks and Learning Systems (IEEE), Pattern Analysis and Applications (Springer) and Neural Processing Letters (Springer). He has co-chaired several program committees of international conferences and served as area chair for recurrent networks at ICANN 2011; Vice Chair of Neural Networks Technical Committee of the IEEE Computational Intelligence Society and member of several other IEEE Computational Intelligence Society Technical Committees.
Peter has received several research awards such as IEEE Transactions on Neural Networks Outstanding Paper Award (1998, 2011), IEEE Transactions on Evolutionary Computation Outstanding Paper Award (2010) and UK-Hong Kong Fellowship for Excellence.