I’m working through the runtime analysis of scikit-learn’s OneVsRestClassifier for two cases:
- LogisticRegression (solver=
lbfgs,C=2.0,max_iter=1000) - RidgeClassifier (
alpha=1.0)
So far I’ve derived:
OVR Logistic (LBFGS) -------------------- For each of K classes and T inner iterations: – Forward pass (X·w): O(n·c) – Batch gradient (Xᵀ·…): O(n·c) – LBFGS update: O(c² + n·c) ⇒ fit cost = O(K · T · n · c) (assuming n ≫ c) OVR Ridge (Cholesky) -------------------- – Build Gram matrix XᵀX once: O(n·c²) – For each of K classes: – Solve (G + λI)w = b via Cholesky: O(c³) ⇒ fit cost = O(n·c² + K·c³) - Are there any scikit-learn implementation details (e.g. caching, sparse optimizations) I’ve overlooked?
- Is it valid to simply multiply the per-class cost by K for One-vs-Rest, or have I misapplied the additive-then-multiplicative rule?
I’d really appreciate any feedback or pointers to gotchas in the actual code since I am very inexperienced with runtime complexities.