mchmm

A Python package for statistical modeling with Markov chains and Hidden Markov models. Built on NumPy and SciPy, mchmm provides efficient implementations of core algorithms including Viterbi decoding and Baum-Welch parameter estimation. The package also includes visualization capabilities for understanding model structure and behavior.

Key Features

Discrete Markov Chains: Build transition models from sequence data with automatic state inference
Hidden Markov Models: Implement HMMs with customizable observation and state spaces
Viterbi Algorithm: Find most likely state sequences for new observations
Baum-Welch Algorithm: Learn HMM parameters from unlabeled sequence data
Statistical Testing: Built-in chi-squared tests for model validation
Visualization: Generate directed graphs of Markov models using Graphviz
Simulation: Generate synthetic sequences from trained models

Donate

If you find this package useful, please consider donating any amount of money. This will help me spend more time on supporting open-source software.

Installation

Install from PyPI:

pip install mchmm

Or install from source:

git clone https://github.com/maximtrp/mchmm.git cd mchmm pip install . --user

Dependencies

NumPy - Numerical computing
SciPy - Scientific computing and statistics
Graphviz - Graph visualization (optional)

Quick Start

Discrete Markov Chains

Initializing a Markov chain using some data.

import mchmm as mc a = mc.MarkovChain().from_data('AABCABCBAAAACBCBACBABCABCBACBACBABABCBACBBCBBCBCBCBACBABABCBCBAAACABABCBBCBCBCBCBCBAABCBBCBCBCCCBABCBCBBABCBABCABCCABABCBABC')

Now, we can look at the observed transition frequency matrix:

a.observed_matrix # array([[ 7., 18., 7.], # [19., 5., 29.], # [ 5., 30., 3.]])

And the observed transition probability matrix:

a.observed_p_matrix # array([[0.21875 , 0.5625 , 0.21875 ], # [0.35849057, 0.09433962, 0.54716981], # [0.13157895, 0.78947368, 0.07894737]])

You can visualize your Markov chain. First, build a directed graph with graph_make() method of MarkovChain object. Then render() it.

graph = a.graph_make( format="png", graph_attr=[("rankdir", "LR")], node_attr=[("fontname", "Roboto bold"), ("fontsize", "20")], edge_attr=[("fontname", "Iosevka"), ("fontsize", "12")] ) graph.render()

Here is the result:

Pandas can help us annotate columns and rows:

import pandas as pd pd.DataFrame(a.observed_matrix, index=a.states, columns=a.states, dtype=int) # A B C # A 7 18 7 # B 19 5 29 # C 5 30 3

Viewing the expected transition frequency matrix:

a.expected_matrix # array([[ 8.06504065, 13.78861789, 10.14634146], # [13.35772358, 22.83739837, 16.80487805], # [ 9.57723577, 16.37398374, 12.04878049]])

Calculating Nth order transition probability matrix:

a.n_order_matrix(a.observed_p_matrix, order=2) # array([[0.2782854 , 0.34881028, 0.37290432], # [0.1842357 , 0.64252707, 0.17323722], # [0.32218957, 0.21081868, 0.46699175]])

Carrying out a chi-squared test:

a.chisquare(a.observed_matrix, a.expected_matrix, axis=None) # Power_divergenceResult(statistic=47.89038802624337, pvalue=1.0367838347591701e-07)

Finally, let's simulate a Markov chain given our data.

ids, states = a.simulate(10, start='A', seed=np.random.randint(0, 10, 10)) ids # array([0, 2, 1, 0, 2, 1, 0, 2, 1, 0]) states # array(['A', 'C', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A'], dtype='<U1') "".join(states) # 'ACBACBACBA'

Hidden Markov Models

Build HMMs from paired observation and state sequences. This example uses a DNA fragment with TATA box annotation:

import mchmm as mc obs_seq = 'AGACTGCATATATAAGGGGCAGGCTG' sts_seq = '00000000111111100000000000' a = mc.HiddenMarkovModel().from_seq(obs_seq, sts_seq)

Unique states and observations are automatically inferred:

a.states # ['0' '1'] a.observations # ['A' 'C' 'G' 'T']

The transition probability matrix for all states can be accessed using tp attribute:

a.tp # [[0.94444444 0.05555556] # [0.14285714 0.85714286]]

There is also ep attribute for the emission probability matrix for all states and observations.

a.ep # [[0.21052632 0.21052632 0.47368421 0.10526316] # [0.57142857 0. 0. 0.42857143]]

Converting the emission matrix to Pandas DataFrame:

import pandas as pd pd.DataFrame(a.ep, index=a.states, columns=a.observations) # A C G T # 0 0.210526 0.210526 0.473684 0.105263 # 1 0.571429 0.000000 0.000000 0.428571

Directed graph of the hidden Markov model:

Graph can be visualized using graph_make method of HiddenMarkovModel object:

graph = a.graph_make( format="png", graph_attr=[("rankdir", "LR"), ("ranksep", "1"), ("rank", "same")] ) graph.render()

Viterbi Algorithm

Decode the most likely state sequence for new observations:

new_obs = "GGCATTGGGCTATAAGAGGAGCTTG" vs, vsi = a.viterbi(new_obs) # states sequence print("VI", "".join(vs)) # observations print("NO", new_obs) # VI 0000000001111100000000000 # NO GGCATTGGGCTATAAGAGGAGCTTG

Baum-Welch Algorithm

Learn HMM parameters from unlabeled sequence data using expectation-maximization:

obs_seq = 'AGACTGCATATATAAGGGGCAGGCTG' a = hmm.HiddenMarkovModel().from_baum_welch(obs_seq, states=['0', '1']) # training log: KL divergence values for all iterations a.log # { # 'tp': [0.008646969455670256, 0.0012397829805491124, 0.0003950986109761759], # 'ep': [0.09078874423746826, 0.0022734816599056084, 0.0010118204023946836], # 'pi': [0.009030829793043593, 0.016658391248503462, 0.0038894983546756065] # }

The inferred transition (tp), emission (ep) probability matrices and initial state distribution (pi) can be accessed as shown:

a.ep, a.tp, a.pi

This model can be decoded using Viterbi algorithm:

new_obs = "GGCATTGGGCTATAAGAGGAGCTTG" vs, vsi = a.viterbi(new_obs) print("VI", "".join(vs)) print("NO", new_obs) # VI 0011100001111100000001100 # NO GGCATTGGGCTATAAGAGGAGCTTG

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.circleci		.circleci
.github/workflows		.github/workflows
docs		docs
images		images
mchmm		mchmm
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
DESCRIPTION.rst		DESCRIPTION.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mchmm

Key Features

Donate

Installation

Dependencies

Quick Start

Discrete Markov Chains

Hidden Markov Models

Viterbi Algorithm

Baum-Welch Algorithm

About

Uh oh!

Releases 6

Uh oh!

Contributors 3

Uh oh!

Languages

License

maximtrp/mchmm

Folders and files

Latest commit

History

Repository files navigation

mchmm

Key Features

Donate

Installation

Dependencies

Quick Start

Discrete Markov Chains

Hidden Markov Models

Viterbi Algorithm

Baum-Welch Algorithm

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors 3

Uh oh!

Languages