The Markov Chain, Probabilities, New ML Approaches

August 09, 2022 | 4 minutes read

A Markov Chain (MC) refers to a mathematical concept that is used to describe transitions from one state to another in accordance with a specific set of probabilistic rules. In the context of artificial intelligence and machine learning applications, MCs are a form of Probabilistic Graphical Models (PGMs), a powerful framework that can be used to represent complex domains in conjunction with probability distributions. To this point, the probability of transitioning to a particular state within an MC will be dependent on both the current state within the model, as well as the time that has elapsed since this current state was realized. To illustrate this point further, consider an individual that flips a two-sided coin 100 times.

As a two-sided coin can only result in two different states, such as heads or tails, each flip of the coin will have the same probability of landing on one of these two different states. This being said, the state of the coin at any particular time will the primary factor that influences the probability of observing the other state. Moreover, if the individual in question were to record every instance where the coin they are flipping had landed on either heads or tails, these collective observations would constitute a Markov Chain. When analyzing these observations, it could be seen that the probability of landing on heads or tails when flipping this coin 100 times would be 50% and 50% respectively.

Probabilistic Graphical Models

A PGM represents one of the many ways in which software developers can describe a probability of random variables in relation to a particular machine or deep learning problem. More specifically, PGMs utilize graphs to describe what specific variables within a particular probability distribution will interact with one another, where each node within the model will represent a variable, while each edge will represent a direct interaction between these variables. Through this configuration, these models can be created using fewer parameters than are required to successfully create other models within the realm of machine learning and artificial intelligence. In turn, PGMs can make effective predictions using lesser amounts of data.

On top of this, these smaller models also allow software developers to cut costs with respect to computational power, as PGMs rely on fewer performance inferences and samples to operate effectively. To this end, PGMs will generally contain both a graphical representation of the model, in addition to a generative process that will outline the manner in which the random variables within the model will be generated. Likewise, PGMs are typically divided into two different types, directed PGMs, otherwise known as Bayesian Networks, and undirected PGMs, also known as Markov random fields or the Markov Chain Monte Carlo.

Markov Chain Monte Carlo

Due to the fact that inferring values with probabilistic models is often unfeasible and impractical, software developers will instead use approximation methods to generate random variables within their models. For this reason, Markov Chain Monte Carlo (MCMC) sampling is one method that can be used to randomly generate high dimensional probability distributions on a systematic level. This approach combines the Markov chain concept with the Monte Carlo technique, another method that can be used to randomly sample a probability distribution with respect to the approximation of a particular quantity. Through the application of these two methods, machine learning algorithms can be trained to hone in on a specific quantity that is being approximated in regard to a probability distribution, even with an expansive number of random variables involved, effectively facilitating accurate and efficient predictions.

Despite the complex mathematical concepts that are associated with predictive algorithms, the idea behind these models is relatively simple, enabling machines to grasp human concepts that involve making predictions. Through the application of the Markov chain concept, software engineers have been able to create applications that are able to predict baseball scores, stock market performance, and future weather predictions, among a host of other applications. In this way, consumers in our current digital age have been able to leverage new products and services in their everyday lives in new and intuitive ways, as predicting future occurrences such as the weather within a given region of the world has historically been a painstaking and arduous process that often required a higher degree of understanding, training, and specialization.