The repository contains the implementation of density-based community detection methods, including the recommended DSC-Flow-Iter and other methods such as DSC-FISTA(int), DSC-Flow, and DSC-FISTA-Iter.
The repository also contains the script to run a recommended pipeline, which consists of four stages:
- Running DSC-Flow-Iter on the input network
- Running Leiden-Mod, RTRex, IKC(5) on the input network
- Constructing an unweighted consensus network using the constrained voting strategy at the majority rule consensus level
- Running Leiden-CPM(0.01) on the obtained network from Stage 3 and post-processing the result with WCC.
The preprint of the conference version of the work (which described a different pipeline) and supplementary materials are on the arxiv (link). If you use our work, you can use the following BibTeX entry to cite.
@misc{vule2025dsc-conf-arxiv, title={Dense Subgraph Clustering and a New Cluster Ensemble Method}, author={The-Anh Vu-Le and João Alfredo Cardoso Lamy and Tomás Alessi and Ian Chen and Minhyuk Park and Elfarouk Harb and George Chacko and Tandy Warnow}, year={2025}, eprint={2508.17013}, archivePrefix={arXiv}, primaryClass={cs.SI}, url={https://arxiv.org/abs/2508.17013v2}, } The extended version of the work with the new recommended pipeline (implemented here) is being prepared for journal submission. We will update the README with the link to the preprint of the journal version once it is available.
Command We can run DSC-Flow-Iter using the following command:
./bin/flow-iter <edgelist> <com> <density>where
<edgelist>is the path to the input edgelist file (CSV format with headersource,target)<com>is the path to the output community file (CSV format with headernode_id,cluster_id)<density>is the path to the output density file (CSV format with headernode_id,value)
For DSC-Flow, replace ./bin/flow-iter with ./bin/flow.
We can run DSC-FISTA(int)-Iter using the following command:
./bin/fista-int-iter <niters> <edgelist> <com> <density>where
<niters>is the number of iterations to run (recommended: 200)<edgelist>is the path to the input edgelist file (CSV format with headersource,target)<com>is the path to the output community file (CSV format with headernode_id,cluster_id)<density>is the path to the output density file (CSV format with headernode_id,value)
For DSC-FISTA-Iter, replace ./bin/fista-int-iter with ./bin/fista-frac-iter. For DSC-FISTA(int), replace ./bin/fista-int-iter with ./bin/fista-int.
Note Please make sure the parent directory of <com> and <density> exists before running the command. Otherwise, it will still run without producing the output files.
Example
./bin/flow-iter test/input/bitcoin_alpha.csv test/output/dsc-flow-iter/bitcoin_alpha/com.csv test/output/dsc-flow-iter/bitcoin_alpha/density.csv./bin/fista-int-iter 200 test/input/bitcoin_alpha.csv test/output/dsc-fista-int/bitcoin_alpha/com.csv test/output/dsc-fista-int/bitcoin_alpha/density.csvCommand We can run the recommended pipeline using the following command:
bash pipeline.sh <edgelist> <output_directory>where
<edgelist>is the path to the input edgelist file (CSV format with headersource,target)<output_directory>is the path to the output directory where the results will be saved
Output
The output will be saved in the specified <output_directory>. The main results are:
- Stage 1:
dsc-flow-iter/com.csv: The community detection result of DSC-Flow-Iter
- Stage 2:
leiden-mod/com.csv: The community detection result of Leiden-ModRTRex/com.csv: The community detection result of RTRexikc-5/com.csv: The community detection result of IKC(5)
- Stage 3:
merged/edge.csv: The edgelist of the network obtained from combining the results of Stage 1 and 2. This will be a weighted network.unweighted/edge.csv: The edgelist of the network without weights obtained by removing the weights from the weighted network.
- Stage 4:
final/com.csv: The community detection result obtained by running Leiden-CPM(0.01) on the merged network.final+wcc/com.csv: The community detection result after post-processing the final result with WCC.
Hence, the output community detection results will be available in <output_directory>/final+wcc/com.csv.
To build and compile DSC methods, run the following command:
bash build.sh To install additional Python dependencies, run:
pip install leidenalg networkit pandasleidenalg and networkit are required to run Leiden algorithms and IKC. pandas is the common dependency for all methods to process CSV files.
Make edits to amazon-RTRExtractor/RTRex/Escape/Nucleus.h to fix a warning: replace free(stack) with delete[] stack in line 241 and line 301. This is because stack is allocated using new[], so it should be deallocated using delete[] instead of free(). This is usually a warning, but RTRex is compiled with -Werror, which treats warnings as errors, so it will fail to compile without this fix.
Then, setup RTRex using the following commands from the root directory of the repository:
cd amazon-RTRExtractor/RTRex/clustering make clean cd .. make clean makeThe binary file will be available at amazon-RTRExtractor/RTRex/clustering/RTRex. It is recommended to move the binary file to bin/RTRex to run the wrapper script (i.e., run mv amazon-RTRExtractor/RTRex/clustering/RTRex bin/RTRex). Also make sure to give execute permission to the binary file (i.e., run chmod +x bin/RTRex).
Setup ClusterMerger using the following commands from the root directory of the repository (require cmake, bison, flex):
cd ClusterMerger ./setup.sh ./easy_build_and_compile.shSetup constrained-clustering using the following commands from the root directory of the repository (require cmake, bison, flex)
cd constrained-clustering ./setup.sh ./easy_build_and_compile.sh