This blog post proclaims and describes the Raku package “ML::ROCFunctions”, [AAp0], that facilitates the utilization of Receiver Operating Characteristic (ROC) functions.
The ROC framework is used for analysis and tuning of binary classifiers, [Wk1]. (The classifiers are assumed to classify into a positive/true label or a negative/false label. )
For computational introduction to ROC utilization (in Mathematica) see the article “Basic example of using ROC with Linear regression”, [AA1].
This package has counterparts in Mathematica, Python, and R. See [AAp1, AAp2, AAp3].
The examples below use the packages “Data::Generators”, [AAp4, AA3], “Data::Reshapers”, [AAp5], and “Data::Summarizers”, [AAp6], described in the article “Introduction to data wrangling with Raku”, [AA2].
Installation
Via zef-ecosystem:
zef install ML::ROCFunctions From GitHub:
zef install https://github.com/antononcube/Raku-ML-ROCFunctions Usage examples
Properties
Here are some retrieval functions:
use ML::ROCFunctions; say roc-functions('properties'); # (FunctionInterpretations FunctionNames Functions Methods Properties) roc-functions('FunctionInterpretations') # {ACC => accuracy, AUROC => area under the ROC curve, Accuracy => same as ACC, F1 => F1 score, FDR => false discovery rate, FNR => false negative rate, FOR => false omission rate, FPR => false positive rate, MCC => Matthews correlation coefficient, NPV => negative predictive value, PPV => positive predictive value, Precision => same as PPV, Recall => same as TPR, SPC => specificity, Sensitivity => same as TPR, TNR => true negative rate, TPR => true positive rate} say roc-functions('FPR'); # &FPR Single ROC record
Definition: A ROC record (ROC-hash or ROC-hash-map) is an object of type Associative that has the keys: “FalseNegative”, “FalsePositive”, “TrueNegative”, “TruePositive”. Here is an example:
{perl6, eval=FALSE} {FalseNegative => 50, FalsePositive => 51, TrueNegative => 60, TruePositive => 39}
Here we generate a random “dataset” with columns “Actual” and “Predicted” that have the values “true” and “false” and show the summary:
use Data::Generators; use Data::Summarizers; my @dfRandomLabels = random-tabular-dataset(200, <Actual Predicted>, generators => {Actual => <true false>, Predicted => <true false>}); records-summary(@dfRandomLabels) # +--------------+--------------+ # | Predicted | Actual | # +--------------+--------------+ # | true => 103 | false => 106 | # | false => 97 | true => 94 | # +--------------+--------------+ Here is a sample of the dataset:
use Data::Reshapers; to-pretty-table(@dfRandomLabels.pick(6)) # +-----------+--------+ # | Predicted | Actual | # +-----------+--------+ # | false | false | # | true | false | # | false | true | # | false | false | # | true | false | # | false | true | # +-----------+--------+ Here we make the corresponding ROC hash-map:
to-roc-hash('true', 'false', @dfRandomLabels.map({$_<Actual>}), @dfRandomLabels.map({$_<Predicted>})) # {FalseNegative => 49, FalsePositive => 58, TrueNegative => 48, TruePositive => 45} Multiple ROC records
Here we make random dataset with entries that associated with a certain threshold parameter with three unique values:
my @dfRandomLabels2 = random-tabular-dataset(200, <Threshold Actual Predicted>, generators => {Threshold => (0.2, 0.4, 0.6), Actual => <true false>, Predicted => <true false>}); records-summary(@dfRandomLabels2) # +--------------+-----------------+--------------+ # | Predicted | Threshold | Actual | # +--------------+-----------------+--------------+ # | true => 105 | Min => 0.2 | false => 107 | # | false => 95 | 1st-Qu => 0.2 | true => 93 | # | | Mean => 0.402 | | # | | Median => 0.4 | | # | | 3rd-Qu => 0.6 | | # | | Max => 0.6 | | # +--------------+-----------------+--------------+ Remark: Threshold parameters are typically used while tuning Machine Learning (ML) classifiers.
Here we group the rows of the dataset by the unique threshold values:
my %groups = group-by(@dfRandomLabels2, 'Threshold'); records-summary(%groups) # summary of 0.4 => # +---------------+-------------+-------------+ # | Threshold | Actual | Predicted | # +---------------+-------------+-------------+ # | Min => 0.4 | false => 37 | true => 36 | # | 1st-Qu => 0.4 | true => 35 | false => 36 | # | Mean => 0.4 | | | # | Median => 0.4 | | | # | 3rd-Qu => 0.4 | | | # | Max => 0.4 | | | # +---------------+-------------+-------------+ # summary of 0.6 => # +-------------+---------------+-------------+ # | Actual | Threshold | Predicted | # +-------------+---------------+-------------+ # | true => 33 | Min => 0.6 | false => 33 | # | false => 32 | 1st-Qu => 0.6 | true => 32 | # | | Mean => 0.6 | | # | | Median => 0.6 | | # | | 3rd-Qu => 0.6 | | # | | Max => 0.6 | | # +-------------+---------------+-------------+ # summary of 0.2 => # +---------------+-------------+-------------+ # | Threshold | Actual | Predicted | # +---------------+-------------+-------------+ # | Min => 0.2 | false => 38 | true => 37 | # | 1st-Qu => 0.2 | true => 25 | false => 26 | # | Mean => 0.2 | | | # | Median => 0.2 | | | # | 3rd-Qu => 0.2 | | | # | Max => 0.2 | | | # +---------------+-------------+-------------+ Here we find and print the ROC records (hash-maps) for each unique threshold value:
my @rocs = do for %groups.kv -> $k, $v { to-roc-hash('true', 'false', $v.map({$_<Actual>}), $v.map({$_<Predicted>})) } .say for @rocs; # {FalseNegative => 19, FalsePositive => 20, TrueNegative => 17, TruePositive => 16} # {FalseNegative => 15, FalsePositive => 14, TrueNegative => 18, TruePositive => 18} # {FalseNegative => 11, FalsePositive => 23, TrueNegative => 15, TruePositive => 14} Application of ROC functions
Here we define a list of ROC functions:
my @funcs = (&PPV, &NPV, &TPR, &ACC, &SPC, &MCC); # [&PPV &NPV &TPR &ACC &SPC &MCC] Here we apply each ROC function to each of the ROC records obtained above:
my @rocRes = @rocs.map( -> $r { @funcs.map({ $_.name => $_($r) }).Hash }); say to-pretty-table(@rocRes); # +----------+-----------+----------+----------+----------+----------+ # | ACC | MCC | NPV | PPV | TPR | SPC | # +----------+-----------+----------+----------+----------+----------+ # | 0.458333 | -0.083398 | 0.472222 | 0.444444 | 0.457143 | 0.459459 | # | 0.553846 | 0.107970 | 0.545455 | 0.562500 | 0.545455 | 0.562500 | # | 0.460317 | -0.045894 | 0.576923 | 0.378378 | 0.560000 | 0.394737 | # +----------+-----------+----------+----------+----------+----------+ ROC plots
Often classifiers are evaluated using ROC curves of FPR-vs-TPR. Here is a plot made with Mathematica using the Mathematica-to-Raku connection described in [AA4]:

References
Articles
[Wk1] Wikipedia entry, “Receiver operating characteristic”.
[AA1] Anton Antonov, “Basic example of using ROC with Linear regression”, (2016), MathematicaForPrediction at WordPress.
[AA2] Anton Antonov, “Introduction to data wrangling with Raku”, (2021), RakuForPrediction at WordPress.
[AA3] Anton Antonov, “Data::Reshapers”, (2022), RakuForPrediction at WordPress.
[AA4] Anton Antonov, “Connecting Raku to Mathematica”, (2021), RakuForPrediction-book at GitHub.
Packages
[AAp0] Anton Antonov, ML::ROCFunctions Raku package, (2022), GitHub/antononcube.
[AAp1] Anton Antonov, ROCFunctions Mathematica package, (2016-2022), MathematicaForPrediction at GitHub/antononcube.
[AAp2] Anton Antonov, ROCFunctions Python package, (2022), Python-packages at GitHub/antononcube.
[AAp3] Anton Antonov, ROCFunctions R package, (2021), R-packages at GitHub/antononcube.
[AAp4] Anton Antonov, Data::Generators Raku package, (2021), GitHub/antononcube.
[AAp5] Anton Antonov, Data::Reshapers Raku package, (2021), GitHub/antononcube.
[AAp6] Anton Antonov, Data::Summarizers Raku package, (2021), GitHub/antononcube.
2 thoughts on “ML::ROCFunctions”