Dense Subgraph Clustering

Overview

The repository contains the implementation of density-based community detection methods, including the recommended DSC-Flow-Iter and other methods such as DSC-FISTA(int), DSC-Flow, and DSC-FISTA-Iter.

The repository also contains the script to run a recommended pipeline, which consists of four stages:

Running DSC-Flow-Iter on the input network
Running Leiden-Mod, RTRex, IKC(5) on the input network
Constructing an unweighted consensus network using the constrained voting strategy at the majority rule consensus level
Running Leiden-CPM(0.01) on the obtained network from Stage 3 and post-processing the result with WCC.

The preprint of the conference version of the work (which described a different pipeline) and supplementary materials are on the arxiv (link). If you use our work, you can use the following BibTeX entry to cite.

@misc{vule2025dsc-conf-arxiv, title={Dense Subgraph Clustering and a New Cluster Ensemble Method}, author={The-Anh Vu-Le and João Alfredo Cardoso Lamy and Tomás Alessi and Ian Chen and Minhyuk Park and Elfarouk Harb and George Chacko and Tandy Warnow}, year={2025}, eprint={2508.17013}, archivePrefix={arXiv}, primaryClass={cs.SI}, url={https://arxiv.org/abs/2508.17013v2}, }

The extended version of the work with the new recommended pipeline (implemented here) is being prepared for journal submission. We will update the README with the link to the preprint of the journal version once it is available.

Usage

Running a DSC technique

Command We can run DSC-Flow-Iter using the following command:

./bin/flow-iter <edgelist> <com> <density>

where

<edgelist> is the path to the input edgelist file (CSV format with header source,target)
<com> is the path to the output community file (CSV format with header node_id,cluster_id)
<density> is the path to the output density file (CSV format with header node_id,value)

For DSC-Flow, replace ./bin/flow-iter with ./bin/flow.

We can run DSC-FISTA(int)-Iter using the following command:

./bin/fista-int-iter <niters> <edgelist> <com> <density>

where

<niters> is the number of iterations to run (recommended: 200)
<edgelist> is the path to the input edgelist file (CSV format with header source,target)
<com> is the path to the output community file (CSV format with header node_id,cluster_id)
<density> is the path to the output density file (CSV format with header node_id,value)

For DSC-FISTA-Iter, replace ./bin/fista-int-iter with ./bin/fista-frac-iter. For DSC-FISTA(int), replace ./bin/fista-int-iter with ./bin/fista-int.

Note Please make sure the parent directory of <com> and <density> exists before running the command. Otherwise, it will still run without producing the output files.

Example

./bin/flow-iter test/input/bitcoin_alpha.csv test/output/dsc-flow-iter/bitcoin_alpha/com.csv test/output/dsc-flow-iter/bitcoin_alpha/density.csv

./bin/fista-int-iter 200 test/input/bitcoin_alpha.csv test/output/dsc-fista-int/bitcoin_alpha/com.csv test/output/dsc-fista-int/bitcoin_alpha/density.csv

Running the recommended pipeline

Command We can run the recommended pipeline using the following command:

bash pipeline.sh <edgelist> <output_directory>

where

<edgelist> is the path to the input edgelist file (CSV format with header source,target)
<output_directory> is the path to the output directory where the results will be saved

Output

The output will be saved in the specified <output_directory>. The main results are:

Stage 1:
- dsc-flow-iter/com.csv: The community detection result of DSC-Flow-Iter
Stage 2:
- leiden-mod/com.csv: The community detection result of Leiden-Mod
- RTRex/com.csv: The community detection result of RTRex
- ikc-5/com.csv: The community detection result of IKC(5)
Stage 3:
- merged/edge.csv: The edgelist of the network obtained from combining the results of Stage 1 and 2. This will be a weighted network.
- unweighted/edge.csv: The edgelist of the network without weights obtained by removing the weights from the weighted network.
Stage 4:
- final/com.csv: The community detection result obtained by running Leiden-CPM(0.01) on the merged network.
- final+wcc/com.csv: The community detection result after post-processing the final result with WCC.

Hence, the output community detection results will be available in <output_directory>/final+wcc/com.csv.

Installation

Setup external dependencies

DSC methods

To build and compile DSC methods, run the following command:

bash build.sh

Python dependencies

To install additional Python dependencies, run:

pip install leidenalg networkit pandas

leidenalg and networkit are required to run Leiden algorithms and IKC. pandas is the common dependency for all methods to process CSV files.

RTRex

Make edits to amazon-RTRExtractor/RTRex/Escape/Nucleus.h to fix a warning: replace free(stack) with delete[] stack in line 241 and line 301. This is because stack is allocated using new[], so it should be deallocated using delete[] instead of free(). This is usually a warning, but RTRex is compiled with -Werror, which treats warnings as errors, so it will fail to compile without this fix.

Then, setup RTRex using the following commands from the root directory of the repository:

cd amazon-RTRExtractor/RTRex/clustering make clean cd .. make clean make

The binary file will be available at amazon-RTRExtractor/RTRex/clustering/RTRex. It is recommended to move the binary file to bin/RTRex to run the wrapper script (i.e., run mv amazon-RTRExtractor/RTRex/clustering/RTRex bin/RTRex). Also make sure to give execute permission to the binary file (i.e., run chmod +x bin/RTRex).

Cluster ensemble

Setup ClusterMerger using the following commands from the root directory of the repository (require cmake, bison, flex):

cd ClusterMerger ./setup.sh ./easy_build_and_compile.sh

Post-processing

Setup constrained-clustering using the following commands from the root directory of the repository (require cmake, bison, flex)

cd constrained-clustering ./setup.sh ./easy_build_and_compile.sh

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ClusterMerger @ abb9478		ClusterMerger @ abb9478
amazon-RTRExtractor @ 6765bbd		amazon-RTRExtractor @ 6765bbd
constrained-clustering @ 741924f		constrained-clustering @ 741924f
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
build.sh		build.sh
pipeline.sh		pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dense Subgraph Clustering

Overview

Usage

Running a DSC technique

Running the recommended pipeline

Installation

Setup external dependencies

DSC methods

Python dependencies

RTRex

Cluster ensemble

Post-processing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dense Subgraph Clustering

Overview

Usage

Running a DSC technique

Running the recommended pipeline

Installation

Setup external dependencies

DSC methods

Python dependencies

RTRex

Cluster ensemble

Post-processing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages