Biological modelling with P systems
- Francisco J. Romero Campero*, School of Computer Science and IT, University of Nottingham, UK
- Daniela Besozzi, Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano
Introduction
Multi-cellular organisations constitute complex systems that function at different spatio-temporal scales. Typical networks of protein-protein and protein-gene interactions, such as signal transduction pathways and transcriptional networks, operate at the molecular level within a time scale of seconds or minutes. Single-cell processes such as cellular growth and division take minutes to hours to complete [ Harvey Lodish et al (2008) ]. Whereas developmental processes such as the emergence of territories of gene expression in a developing embryo work at the level of the entire population of cells and their time scale ranges from hours to days [ E.H. Davidson (2006) ]. In this respect, two recently introduced disciplines, systems biology [ T. Ideker et al (2001) ] and synthetic biology [ S.A. Benner, A.M. Sismour (2005) ], follow an intregrative approach to the study, design and construction of biological, in particular cellular, phenomena as complex multi-scale systems. Due to the complexity of multi-cellular systems human intuition is not sufficient to comprehend their functioning which confers mathematical/computational modelling a central role in the study of cellular systems from two complementary aspects [ Z. Szallasi et al. (2006) ]. On the one hand, in systems biology, modelling assists in the generation of new hypothesis about the functioning of cellular systems and the design of experiments to validate or refute them. On the other hand, in synthetic biology, models are used as blueprints of candidate artificial cellular systems that contribute to the assessment of their risks/cost and benefits prior to their construction in the wet lab.
The applicability to multi-cellular systems of the classical modelling methodology for complex systems based on differential equations (DE) is questionable [ D.T. Gillespie (1977), D.T. Gillespie (2007) ] due to the inherent discrete and stochastic character of cellular systems and the assumptions of continuity and determinism in DE models. Additionally, DE models are not easily integrated, extended or modified which prevents the collaborative development of complex models. In this respect, novel modelling frameworks based on computational paradigms such as process calculi, petri nets and membrane computing are emerging. These approaches provide formal machine-readable specifications of cellular systems that naturally represent cellular discrete entities as computational abstract structures and the interactions between them as instructions or rules. The simulation of this type of models are obtained by applying operational semantics or algorithms to the specifications that compose them. According to this, models of this type are termed executable or algorithmic biology models, see [ J. Fisher and T.A. Henzinger (2007) ] and [ C. Priami (2009) ].
In this article we discuss a modelling framework within executable/algorithmic biology based on membrane computing termed stochastic P systems. Our approach allows the specification of the compartmentalised structure of individual cells as well as the spatial arrangement of populations of cells in colonies or tissues. The specifications in this framework are modular which facilitates parsimonious model development and easy integration of partial models.
Stochastic P systems
The computational abstract structure used in our modelling framework to represent the structure and functioning of a single cell is a variant of membrane computing called stochastic P systems [ G. Paun and F.J. Romero-Campero (2008) ]. More specifically, a stochastic P system, SP system for short, is a rule-based specification of a compartmentalised discrete and stochastic dynamical system. Formally, a SP system is given by a tuple of the following form:
\(\tag{1} \mathcal{SP} = (M,\mu,L,I_{l_1},\dots,I_{l_n},R_{l_1},\dots,R_{l_n}) \)
where:
- \(M\) is a finite set of strings identifying the molecular species involved in the system such as genes, RNAs, proteins and molecular signals.
- \(\mu\) is membrane structure defining the compartments of the system. Compartments can contain other compartments. There must exist an outermost membrane which defines the compartment that contains all the others acting as the boundary of the system. The membrane structure can be represented graphically using a Venn diagram as in Figure 1.
- \(L = \{ l_1, \dots, l_n \}\) is finite set of labels identifying compartments defined by \(\mu\ ,\) i.e. nucleus, mitochondrion, cytoplasm, etc.
- \(I_{l_k}\) for each \(1 \leq k \leq n\) describes the initial number of molecules condition of the compartment defined by membrane </math>k</math> consisting of a multiset of objects over </math>M</math> describing the initial number of molecules of the molecular species present in the corresponding compartment at the inital state of the computation or evolution of the system.
- \(R_{l_k} = \{ r_1^{l_k}, \dots, r_{m_{l_k}}^{l_k} \}\) for each \(1 \leq k \leq n\) is a set of multiset rewriting rules describing the molecular interactions such as complex formation, protein binding to a gene and signal diffusion taking place inside and between the compartments of the system. Each set of rewriting rules \(R_{l_k}\) is specifically associated to the compartment identified by the label \(l_k\ .\) These multiset rewriting rules are of the following form:
\(\tag{2} r_i^{l_k} \; : \; o_1 \; [ \; o_2 \; ]_l \stackrel{c_i^{l_k}}{\rightarrow} o_1' \; [ \; o_2' \; ]_l \)
where \(o_1,o_2\) and \(o_1',o_2'\) are multisets of objects (possibly empty) over \(M\) representing the molecular species consumed and produced in the corresponding molecular interaction. The square brackets and the label \(l\) describe the compartment involved in the interaction. An application of a rule of this form changes the content of the membrane with label \(l\) by replacing the multiset \(o_2\) with \(o_2'\) and the content of the membrane outside by replacing the objects \(o_1\) with \(o_1'\ .\) A stochastic constant \(c_i^{l_k}\) is specifically associated with each rule in order to determine the probability of applying the rule and the time elapsed between rule applications according to our stochastic semantics.
Stochastic P systems have been used to model signal transduction pathways [ A. Paun et al. (2006) ], bacterial gene regulation [ F.J. Romero-Campero, M.J. Perez-Jimenez (2008a) ] and bacterial populations [ F.J. Romero-Campero, M.J. Perez-Jimenez (2008b) ].
Modularity in Stochastic P systems
A P system module is identified with a name, \(Mod\ ,\) and three finite ordered sets of variables \(O = \{O_1, ..., O_x \}\ ,\) \(C = \{C_1, ..., C_y \}\) and \(Lab = \{L_1, ..., L_z \}\) and it consists of a finite set of rewriting rules of the form in (2):
\( Mod(O, C, Lab) = \{r_1, ..., r_n \} (3) \)
The objects, stochastic constants and labels of the rules in module \(Mod\) can contain variables from \(O\ ,\) \(C\) or \(Lab\) respectively. These variables can be instantiated with specific molecular species names, numerical values for the stochastic constants and compartment names. The instatiation of a module \(Mod(O, C, Lab)\) with specific values \(o = {o1 , . . . , ox }\ ,\) \(c = \{c_1, ..., c_y \}\) and \(lab = \{l_1, ..., l_z \}\) for \(O\ ,\) \(C\) and \(Lab\) respectively is represented as:
\( Mod(\{o_1, ..., o_x \}, \{c_1, ..., c_y \}, \{l_1, ..., l_z \}) (4) \)
the rules are obtained by applying the corresponding substitutions \(O_1 = o_1, ..., O_x = o_x , C_1 = c_1, ..., C_y = c_y\) and \(L_1 = l_1, ..., L_z = l_z\ .\)
Our definition of P system module allows the hierarchical description of a complex module, \(M(O,C,Lab)\ ,\) by obtaining its rules as the set union of simpler modules, \( M(O,C,Lab) = M_1(O_1,C_1,Lab_1) \cup ... \cup M_n(O_n,C_n,Lab_n) \) with \(O=O1 \cup ... \cup O_n\ ,\) \(C=C_1 \cup ... \cup C_n and Lab = Lab_1 \cup ... \cup Lab_n\ .\) Finally, the set of rules, \(R_{l_k}\ ,\) in SP systems can be specified in a modular way as the set union of several instantiated P system modules, \(R_{l_k} = M_1(o_1, c_1, lab_1) \cup ... \cup M_{n_k}(o_{n_k}, c_{n_k}, lab_{n_k})\ .\)
The use of modularity allows us to define libraries or collections of modules:
\( Lib = \{Mod_1(O_1, C_1, Lab_1), ..., Mod_m(O_m, C_m, Lab_m)\} (5) \)
Modules from different libraries can be instantiated with multiple specific molecular species names, stochastic constants and compartment names so they can be reused in different SP system models of cellular systems where we know certain behaviour produced by a module is present in the case of systems biology or where we want to engineer the system to exhibit a specific function perfomed by a particular module in the case of synthetic biology. Modularity also allows us to develop our models in a parsimonious way by specifying sub-models independently and easily integrating them subsequently.
Specification of Spatial Distribution: Lattice Population P systems
Finite Point Lattice:
Given \(\mathcal{B} = \{v_1, ..., v_n \}\) a list of linearly independent basis vectors and a list of integer bounds \((\alpha_1^{min}, \alpha_1^{max}, ..., \alpha_n^{min} , \alpha_n^{max})\) a finite point lattice, or lattice for short, \(Lat\) in \(R^n\) denoted as:
\( Lat = (\mathcal{B}, (\alpha_1^{min}, \alpha_1^{max}, ..., \alpha_n^{min} , \alpha_n^{max})) (6) \)
is the collection of regularly distributed points, \(P(Lat)\ ,\) obtained as follows:
\( P(Lat)=\{ \sum_{i=1}^{n} \alpha_{i} v_{i} : \forall i=1, ..., n (\alpha_i \in Z \wedge \alpha_i^{min} \leq \alpha_i \leq \alpha_i^{max} ) \} (7) \)
Given a finite point lattice, \(Lat\ ,\) each point \(x =\sum_{i=1}^{n} \alpha_{i} v_{i} \in Lat\) is uniquely identified by the coefficients \(\{\alpha_i : i=1, ..., n\}\) and consequently it will be denoted as \(x=(\alpha_1, ..., \alpha_n)\ .\) In order to represent a population of cells, each cell type with its compartmentalised structure, characteristic molecular species and molecular processes is represented using a SP system according to Definition 1. The rules of each SP system are possibly specified in a modular way according to Definition 2.
Lattice Population P system
A lattice population P system, LPP system for short, is a formal specification of an ensemble of cells distributed according to a specific geometric disposition given by the following tuple:
\( LPP = (Lat, (SP_1, ..., SP_p), Pos, (T_1, ..., T_p )) (8) \)
where
- Lat is a finite point lattice in \(R_n\) (typically n=1, 2, 3) as in Definition 3 describing the geometry of the ensemble of cells.
- \( SP_1, ..., SP_p \) are SP systems as in Definition 1 specifying the different cell types in the population.
- \( Pos : P (Lat) \rightarrow \{ SP_1, ..., SP_p \}\) is a function mapping SP systems \(SP_1, ..., SP_p\) clones over lattice positions. Please note that the mapping can be deterministic or randomised.
- \(T_k = \{r_1, ..., r_{n_k} \}\) for each \(1 \leq k \leq p\) is a finite set of rewriting rules termed translocation rules that are added to the skin membrane of the respective SP system SP k in order to allow the interchange of objects between SP systems located in different points in the lattice. These rules are of the following form:
\( r_i^k \; : \; [ \; obj \; ]_l \stackrel{\mathbf{v}}{\bowtie} [ \; \; ] \stackrel{c_i^k}{\rightarrow} [ \; \; ]_l \stackrel{\mathbf{v}}{\bowtie} [ \; obj \; ] \)
where \(obj\) is a multiset of objects, \(v\) is a vector in \(R^n\) and \(c_k\) is the stochastic constant used in our algorithm to determine the dynamics of rule applications. The application of a rule of this form in the skin membrane with the label \(l\) of the SP system \(SP_k\) located in the point \(p\ ,\) \(Pos(p) = SP_k\ ,\) removes the objects \(obj\) from this membrane and places them in the skin membrane of the SP system \(SP_k'\) located in the point \(p + v\ ,\) \(Pos(p + v) = SP_k'\ .\)
An example: Waves of gene expression
References
- S.A. Benner, A.M. Sismour (2005). Synthetic biology, Nature Review, Genetics, 6:533–543.
- E.H. Davidson (2006). The Regulatory Genome: Gene Regulatory Networks In Development And Evolution. Academic Press; 2nd Revised edition.
- H. Lodish et al (2008). Molecular Cell Biology. W. H. Freeman; 6th edition.
- J. Fisher and T.A. Henzinger (2007). Executable cell biology. Nature Biotechnology, 25(11):1239–1249.
- D.T. Gillespie (1977). Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry, 81(25):2340 – 2361.
- D.T. Gillespie (2007). Stochastic simulation of chemical kinetics. Annual Review of Physical Chemistry, 58:35 – 55.
- T. Ideker et al (2001). A new approach to decoding life: systems biology. Annual Review of Genomics and Human Genetics, 2:343 – 372.
- G. Paun and F.J. Romero-Campero (2008). Membrane computing as a modeling framework: cellular systems case studies. Lecture Notes in Computer Science, 5016:168–214.
- A. Paun et al. (2006). Modeling signal transduction using P Systems. Lecture Notes in Computer Science, 4361:100 – 122.
- C. Priami (2009). Algorithmic systems biology. Communications of the ACM, 52(5):80–88.
- F.J. Romero-Campero, M.J. Perez-Jimenez (2008a). Modelling gene expression control using P systems: The lac operon, a case study. BioSystems, 91(3):438–457.
- F.J. Romero-Campero, M.J. Perez-Jimenez (2008b). A model of the quorum sensing system in Vibrio fischeri using P systems. Artificial Life, 14(1):95 – 109.
- Z. Szallasi et al. (2006). System Modelling in Cellular Biology: From Concepts to Nuts and Bolts. MIT Press.