I have 3 months of categorized bank transaction data and need to identify recurring cash inflows and outflows for lending risk modeling.
Complications: 1. Income dates shift earlier when payday falls on a weekend (paid Friday). 2. Some individuals have multiple income sources with different periodicities. 3. Amounts may vary around the mean (bonuses, allowances, side gigs). 4. There are many one-off outliers in both inflows and outflows. 5. Recurrence should be defined as at least one instance per month.
I’m considering two approaches:
Rule-based temporal recurrence detection • Detect events that occur “near the same calendar day” ± k days • Include adjustments for weekend/holiday pay behavior • Model amount variance as small perturbations
DBSCAN or density-based clustering
Using a feature space combining: • day-of-month modulo 30 • amount • transaction category
My concern is that DBSCAN may not perform well with shifted periodicity (e.g., 25th → 23rd if weekend), whereas rule-based models might overfit or fail with multiple income streams.
Question: What statistical approach is most appropriate for identifying recurring financial transactions in this setting?