Bayesian networks (BBNs) [139], also known as causal probabilistic networks, represent one of the most successful advances of artificial intelligence (AI). Born at the confluence of statistics and AI, BBNs provide a sound way to discover models of interaction among variables. When applied to functional genomics, they offer the opportunity to discover complex pathways of activation and interaction among genes. Although still in an early stage, the applications of BBNs to functional genomics already displays the common problems statistical methods encounter in the realm of functional genomics: the large number of variables met by a small number of cases and the consequent under-determination of the models learnable from data. Nonetheless, because BBNs provide at least the potential of determining genetic networks on the basis of probability theory, and therefore all the desiderata that flow from such a grounding, they are beginning to attract increasing interest in this field. This section first reviews the basic concepts underpinning BBNs and then describes the early attempts to apply this methodology to functional genomics.

Bayesian networks Bayesian methods provide a sound and exible framework to reason about the interplay between conditions and effects under uncertainty. The theoretical core of bayesian theory itself—the well-known Bayes' theorem—is built around the intuition that the path leading to an effect can be reversed and a probability can be assigned to the set of possible conditions responsible for that effect [148]. The resulting approach also provides a straightforward way to integrate human-derived domain knowledge and domain information coming from field operations (i.e., empirical biology) or simulated data.

The bayesian approach shares, with all sound knowledge representation formalisms based on probability theory, two main drawbacks. The first is representational in nature and arises from the fact that a naïve encoding of knowledge as a joint probability distribution of all the variables in the domain is very difficult to acquire from human experts. The second is computational, but is still based on the same naïve interpretation of the probabilistic representation: The probability distribution over the domain variables, so difficult to acquire from experts, can grow so large to be very costly to store and very expensive to use for reasoning.[15]

Blending together probability and graph theory, BBNs provide a way to render the probabilistic representation of knowledge representationally amenable and computationally feasible. A BBN uses conditional independence assumptions encoded by a Direct Acyclic Graph (DAG) to break down an otherwise unmanageable joint probability distribution over the domain variables into a set of smaller components, easier to define and cheaper to use. In a BBN, the nodes in the graph represent variables and directed arcs represent stochastic dependencies among the variables. As a simple example, consider the scenario displayed in figure 4.19 composed of three variables, initial_Condition, External_Action, and Effect, each having two states "True" and "False." The joint effect of the the two parent variables on the Effect is represented by the DAG in figure 4.19, with the directed links from the variables initial_Condition and External_Action pointing to the variable Effect. Following the direction of the arrows, we call the variable Effect a child of initial_Condition and External_Action, which become its parents.

Figure 4.19: Example of a bayesian network. This network describes the impact of initial_Condition and External_Action on Effect. This network is equivalent to the underlying joint probability distribution over the three variables, shown in table 4.1. In particular, figure 4.19 shows a BBN representing the combined action of a quite frequent initial_Condition (90% chance of being true) and a rarely taken External_Action (20% chance of being true) over an Effect. The probability table associated with the node Effect

Figure 4.19: Example of a bayesian network. This network describes the impact of initial_Condition and External_Action on Effect. This network is equivalent to the underlying joint probability distribution over the three variables, shown in table 4.1. In particular, figure 4.19 shows a BBN representing the combined action of a quite frequent initial_Condition (90% chance of being true) and a rarely taken External_Action (20% chance of being true) over an Effect. The probability table associated with the node Effect shapes the interplay between initial_Condition and Externai_Action so that, when only the initiai_Condition is in place, then Effect will be present for sure, unless Externai_Action is deployed to dilute the impact of the initiai_Condition. This mechanism is described by the fact that, when initiai_Condition is true and Externai_Action is false, the probability that the Effect is true is 1, whereas when initiai_Condition is true and Externai_Action is also true, then the probability that the variable Effect is true is only 0.2. On the other hand, when the initiai_Condition is not in place, then the Effect is always absent, regardless of the presence or absence of the Externai_Action. The BBN in figure 4.19 decomposes the joint probability distribution of the three variables in table 4.1 into three probability distributions, one for eachvariable in the domain. This decomposition is the key factor for providing both a verbal and a human-understandable description of the system, like the one used for the BBN in figure 4.19, and for efficiently storing and handling this distribution, which grows exponentially with the number of variables in the domain.

Initial_Condition |
External_Action |
Effect | |

True |
True |
True |
0.036 |

True |
True |
False |
0.144 |

True |
False |
True |
0.720 |

True |
False |
False |
0.000 |

False |
True |
True |
0.000 |

False |
True |
False |
0.020 |

False |
False |
True |
0.000 |

False |
False |
False |
0.080 |

Notwithstanding these differences, both the BBN and the joint probability distribution represent the same stochastic model of the combined Effect of initiai_Condition and Externai_Action. The BBN can therefore be used to perform all the kinds of reasoning operations provided by a probabilistic representation of the system. As mentioned before, bayesian theory provides naturally a variety of methods to reason about the behavior of a system and to understand the interplay between its conditions and its effects. Assuming suffcient microarray data (a difficult assumption at present, but likely to become less so in the near future), this is suffcient to model cellular physiology.[16] Given a set of initial conditions, a BBN can forecast their effects (prediction—particularly useful for hypothesis testing). Conversely, a BBN can walk backward a chain of dependencies and determine the probability of the Initial Condition given the observation of an effect (explanation—essential in understanding complex genetic networks). Even more interesting, a BBN can identify the best configuration of initial conditions to achieve a particular effect (optimization—again useful for generating testable hypotheses as well as identifying pharmacological targets), and can identify the individual contribution of an Initial Condition to the realization of an effect. This ability to propagate information along the network without a prespecification of the inputs allows the analyst to "drop" evidence and conjectures into the network and figure out the effect of different initial conditions by "flickering" the input of the system or demanding different outcomes (sensitivity analysis).

Although, in their original concept, BBNs were designed to encode knowledge of human experts, their statistical roots soon prompted the development of methods able to learn them directly from databases. The advantages of modularity and compactness did not go unnoticed in the statistical community,so that BBNs and, in general, graphical models, enjoy a growing popularity in statistical practice, disclosing new directions for research and applications. The process of learning a BBN

from a database consists of the induction of its two different components:

1. The graphical structure of conditional dependencies (model selection)

2. The conditional probabilities quantifying the BBN (probability estimation)

Here, we take a bayesian approach to "learning" BBNs, so that the process of learning probabilistic models is regarded as the process of updating prior information on the basis of available evidence. This framework fosters the integration of expert domain knowledge and data analysis; when available, expert knowledge can be integrated in the discovery process but it is not necessary, as in traditional knowledge-based systems.

On the other hand, lack of human expert knowledge can be represented by uniform prior distributions of probability, and the discovery process is totally left to data. As far as AI is concerned, this line of research was pioneered in [168, 52]and further developed in [36, 88]. A parallel line of research is going on in statistics, in the field of graphical models in [190, 115]. The deep theoretical connection between probability theory—the foundation of BBNs—and statistics creates a powerful link able to provide scientific meaning and development methods.

Current bayesian techniques for learning the graphical structure of a BBN from data are based on the evaluation of the posterior probability of network structure, i.e., the probability of the graphical model conditional on the data. We can take this measure as a scoring metric to compare alternative models of interactions, e.g., to identify the most probable set of variables which are parents of a phenotype. This probability is then used as scoring metric to discriminate among a set of possible models. We are therefore looking for a way to compute the probability p(M|") of a model of interaction M given the data ". By Bayes' theorem we have

As we are using this quantity to compare models and all models are compared over the same data, p(") turns out to be a constant and can be removed. If we assume that all models are equally likely, as is appropriate for an unsupervised method, then the quantity p(M|") becomes proportional to the quantity p("|M), known as marginal likelihood, which can be efficiently computed in closed form.

The decomposition of the joint probability distribution induced by the graphical model decomposes the marginal likelihood p("|M) of the entire model into the product of the marginal likelihoods of each node and its parents, so that we can learn a model locally, by maximizing the marginal likelihood node by node. Still, the space of the possible sets of parents for an effect grows exponentially with the number of parents involved, but successful heuristic search procedures exist (both deterministic and stochastic) to render the task feasible. The decomposition of the marginal likelihood allows the user to assess locally the strength of the dependency of an effect upon its parents as a comparative measure against rival models (including the model in which the effect is not influenced by any variable reported in the database). While the marginal likelihood per se is difficult to interpret, it can be used to compute a more intuitive measure, known as the Bayes factor, to assess the strength of evidence of one model against another model. Since the Bayes factor is the ratio

PWA)

between the posterior probability of two alternative models M1 and M2, under the same assumptions used in the reduction of Eq. (4.13.3), the Bayes factor can be computed as the ratio of the marginal likelihood

of the two alternative models. This more intuitive measure will tell the analyst how many times the model M1 is more probable than model M2, and the confidence in the selected model will be a function of the distance between the selected model and the most probable alternative models [105]. Once the graphical model is known, the decomposition of the underlying joint probability distribution induced by the graphical model provides, once again, efficient methods to learn the conditional distributions that quantify the dependencies in the network by taking advantage of local computations and conjugate analysis.

Bayesian networks applications in functional genomics BBNs are not new to genetic research. In fact, networks based on directed acyclic graphs actually originated from the genetics studies of Sewall Wright [194], who developed a method called path analysis [195, 196], a recognized ancestor of BBNs. The application of BBNs to functional genomics is, on the other hand, very recent. bbns hold the promise of answering very interesting questions in functional genomics and, in principle, they seem to be the right technology to take advantage of the massively parallel analysis of whole-genome data to discover how genes interact, control each other, and align themselves in pathways of activation.

BBNs offer a different view than do the more popular clustering algorithms currently used for the analysis of massively parallel gene expression data [63, 167]. While these algorithms attempt to locate groups of genes that have similar expression patterns over a set of experiments to discover genes that are co-regulated, BBNs dive into the regulatory circuitry of genetic expression to discover the web of dependencies among genes.

The promise of BBNs in functional genomics goes even further, as intensive research efforts have been addressed, during the past decade, to define conditions under which BBNs actually uncover the causal model underlying the data [140, 89]. The most ambitious question is therefore the following: Given a set of microarray data, can we discover a causal model of interaction among different genes? The challenge is the common problem of sound statistical methods when faced with microarray data: a large number of variables with a small number of measurements. In the context of BBNs, this situation results in the inability to discriminate among the set of possible models as the small amount of data is not sufficient to identify a single most probable model.

Friedman et al. [71] address these problems using partial models of BBNs and a measure of confidence in a learned model. The strategy they follow is to search a space of underspecified models, each comprising a set of BBNs, and select a class of models rather than a single one. They also adopt a measure of confidence based on bootstrapping to evaluate the reliability of each discovered dependency in the database to avoid the risk of ascribing a causal role to a gene when not enough information is actually available to support the claim. Hartemink et al. [85] are tackling the underdetermination problem by tuning the unsupervised search of the most probable network structure. They leverage on established biological knowledge to select a small number of networks and then limit their comparisons to these networks only.

The use of BBNs in functional genomics is still in its infancy and the common problems of functional genomics data have still to be solved and much work is still left to render the technology fit for the task. Nonetheless, even these early efforts make clear the potential of these methods to dissect the inner structure of the regulatory circuitry of life.

[14]In practice, sufficiently high-resolution time series of microarray measurements performed over an adequate interval are rare if not absent.

[15]See the discussion of the appetite for hard-to-get probabilities in Szolovits and Parker [173].

[16]With the caveats about false reductionisms articulated in section 1.4.1.

Was this article helpful?

## Post a comment