Add CT ingestion planning groundwork#1173
Add CT ingestion planning groundwork#1173PrzemyslawKlys merged 28 commits intocodex/dependency-isolation-prfrom
Conversation
Code ReviewOverall, this is well-structured foundational work. The XML documentation is thorough, the saturating arithmetic is handled carefully, the test coverage is meaningful (not just happy-path), and the immutable init-only design is consistent throughout. A few items worth addressing before follow-up work builds on these contracts: Bug: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4bd077a6f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| state.TransientFailureRatio.HasValue && | ||
| state.TransientFailureRatio.Value >= 0.5d && | ||
| state.LastFailureUtc.HasValue) | ||
| { | ||
| DateTimeOffset deferUntilUtc = state.LastFailureUtc.Value + rateLimit.CooldownAfterRateLimit; |
There was a problem hiding this comment.
Honor recovery success before applying failure-ratio deferral
This deferral branch only checks TransientFailureRatio and LastFailureUtc, so a provider can still be deferred immediately after a successful request. CtProviderRuntimeStateUpdater.Apply explicitly clears CooldownUntilUtc on success, but Decide can re-defer until LastFailureUtc + CooldownAfterRateLimit whenever the ratio remains >= 0.5, which is common after a fail-then-success sequence (ratio 0.5). That blocks recovered providers and can unnecessarily stall ingestion scheduling.
Useful? React with 👍 / 👎.
| NotAfterUtc = new DateTimeOffset(certificate.NotAfter.ToUniversalTime()), | ||
| DnsNames = ExtractDnsNames(certificate), | ||
| IsSelfSigned = IsSelfSignedCertificate(certificate), | ||
| WeakKey = keySize > 0 && keySize < 2048, |
There was a problem hiding this comment.
Use key-type-aware weak-key threshold in DER parsing
The WeakKey flag is computed as keySize < 2048 for all certificates, which misclassifies normal ECDSA certificates (for example P-256/P-384) as weak. Because this model is used for normalized CT records, that false positive propagates into risk analysis and filtering based on WeakKey. The threshold should depend on key algorithm (e.g., EC vs RSA/DSA/DH) rather than a single 2048-bit cutoff.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d78ffbab70
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| bool requiresFullCertificateFromProvider = requiresFullCertificateMaterial && hydrationRequests == 0; | ||
| if (!SupportsOperation(profile, workload.Operations, requiresFullCertificateFromProvider)) |
There was a problem hiding this comment.
Treat hydration-capable providers as supported
When RequireFullCertificate is true, this branch treats hydrationRequests == 0 as proof that the provider must return DER directly, so SupportsOperation rejects metadata-only providers even if they advertise CertificateHydration. With default workload values (EstimatedCertificatesToHydrate/HydrationRequestsPerCertificate are 0), a provider like crt.sh HTTP is marked unsupported for full-certificate workloads despite being able to satisfy them via follow-up hydration calls, which can incorrectly remove viable providers from scheduling.
Useful? React with 👍 / 👎.
| { | ||
| var names = new HashSet<string>(StringComparer.OrdinalIgnoreCase); | ||
| AddIfNotEmpty(names, SafeGetNameInfo(certificate, X509NameType.DnsName)); | ||
| AddIfNotEmpty(names, SafeGetNameInfo(certificate, X509NameType.SimpleName)); |
There was a problem hiding this comment.
Drop non-DNS subject names from DnsNames
Adding X509NameType.SimpleName directly into DnsNames can inject organization/person display names for certificates whose subject simple name is not a hostname (for example S/MIME or code-signing certs). Those non-DNS strings are then normalized as discovered names, which can corrupt CT-derived hostname analytics and downstream filtering that assumes DnsNames are actual DNS candidates.
Useful? React with 👍 / 👎.
| Code Review: Add CT ingestion planning groundwork Overall this is well-structured foundational work. Immutable value-object design, thorough XML docs, careful overflow protection, and a solid test suite. The self-signed detection using raw byte comparison and the algorithm-specific weak key thresholds (RSA < 2048, ECDSA < 224, DSA < 2048) are both correct choices that are easy to get wrong. A few things worth addressing. ISSUES
In CtCertificateRecord.cs, TryExtractCommonName splits certificate.Subject on bare commas. Distinguished names can contain escaped commas per RFC 4514 (e.g. CN=Smith, John), so this misparsed CN values in those cases. certificate.GetNameInfo(X509NameType.SimpleName, false) is already called in ExtractDnsNames and is a safer alternative.
Only totalRequestCount uses checked. The other counters (successfulRequestCount, transientFailureCount, rateLimitedCount, timeoutCount) do not, and will silently wrap on overflow if state is persisted across a very long-lived service. Either apply checked consistently or document that callers are expected to periodically reset state.
A 401 (Invalid API token) sets IsPermanentlyFailed = true, but a subsequent successful request clears it. If credentials are rotated that is the right behavior, but the name 'permanent' implies explicit caller reset. Either rename (e.g. IsNonRetryableFailed) or add a doc comment clarifying that success clears this flag — callers implementing ICtProviderStateStore need to know this.
Rate-limit events already set CooldownUntilUtc via ResolveCooldownUntilUtc, but they also increment TransientFailureRatio. This can trigger a second deferral path in Decide() based on the ratio threshold even after the cooldown expires. For an API with a burst quota the accumulated ratio can defer the provider longer than the actual cooldown intended. Consider whether rate-limit events should contribute to the failure ratio or be tracked separately.
(Already flagged by the prior reviewer, confirming the finding.) The raw ASN.1 parser runs first and is sound, but edge-case malformed extensions could fall through to locale-dependent san.Format(false) output. A comment documenting this constraint would help future maintainers. MINOR NOTES
SUMMARY The architecture is clean and the planning logic is well-tested. The main items to address before merging are (1) the fragile CN splitting, (2) the checked inconsistency, and (3) documenting the IsPermanentlyFailed clearing behavior. The rate-limit/ratio interaction is worth a discussion about intended semantics. |
DomainDetective/CertificateTransparency/CtProviderRuntimeStateUpdater.cs Fixed Show fixed Hide fixed
PR Review: Add CT Ingestion Planning GroundworkThis is a well-structured addition. The planning/estimation architecture is clean, arithmetic overflow protection is thoughtful, and the test coverage is solid. A few items worth addressing before merge: Bug: Missing SHA-1 ECDSA OID in |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b705ac03d3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| EffectiveRequestSpacing = effectiveSpacing, | ||
| EstimatedRequestDuration = rateLimit.EstimatedRequestDuration, | ||
| EstimatedMinimumDuration = estimatedDuration, | ||
| EstimatedCompletionUtc = startUtc + estimatedDuration, |
There was a problem hiding this comment.
Clamp completion timestamp when duration saturates
EstimateCapacity intentionally saturates very large durations to TimeSpan.MaxValue (via MultiplyTimeSpan), but then unconditionally computes startUtc + estimatedDuration and startUtc + estimatedFirstRunDuration. For large workloads or aggressive spacing values, this can throw ArgumentOutOfRangeException instead of returning an estimate, which breaks planning on exactly the oversized inputs this method is trying to handle safely.
Useful? React with 👍 / 👎.
PR Review: Add CT ingestion planning groundworkOverall this is a well-designed, well-documented addition. The separation of concerns is clean, the use of 1. Silent exception swallowing without diagnostics In 2. Missing test data fixture in the diff
3. Hand-rolled ASN.1 parser edge cases
4. The four OID strings ( 5. Nested ternary in The concurrency calculation uses a nested ternary. A plain 6.
7. On .NET 5+ Minor nits
What is working well
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e59289afc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| double transientFailureRatioCooldownThreshold = | ||
| double.IsNaN(TransientFailureRatioCooldownThreshold) || | ||
| double.IsInfinity(TransientFailureRatioCooldownThreshold) || | ||
| TransientFailureRatioCooldownThreshold <= 0d |
There was a problem hiding this comment.
Preserve zero transient-failure threshold in normalization
CtProviderRateLimitProfile documents the cooldown threshold as a value from 0 to 1, but Normalize() treats 0 as invalid and silently resets it to 0.5. A caller that intentionally sets TransientFailureRatioCooldownThreshold = 0 (to defer immediately after any transient failure) will instead allow retries until failures reach 50%, which changes scheduling behavior and weakens operator-configured safeguards.
Useful? React with 👍 / 👎.
| PR Review posted via Claude - see full review in next steps |
| Full review follows. Posting in parts due to formatting constraints. |
| Bug 1 - Silent exception swallowing: In CtCertificateRecord.cs ExtractDnsNames, the caught exception ex is completely discarded. If SAN parsing silently fails in production there is no way to learn about it. Consider storing a truncated diagnostic message in the record or returning it via the Diagnostics list on CtCertificateQueryResult. The same silent-discard pattern appears in SafeGetNameInfo. |
| Bug 2 - IsPermanentlyFailed propagation (undocumented): In CtProviderRuntimeStateUpdater.Apply, the expression IsPermanentlyFailed = !success and (permanentFailure or previous IsPermanentlyFailed == true) means a RateLimited or TransientFailure outcome on an already-permanently-failed provider keeps the flag set. This is defensible (only a success clears it), but contradicts the property XML doc. Add a comment or update the doc to make this explicit. |
| Bug 3 - Checked arithmetic inconsistent with saturation strategy: In CtProviderRuntimeStateUpdater.Apply, counters like totalRequestCount use checked(...) arithmetic which throws OverflowException after long.MaxValue requests. CtIngestionPlanner saturates at int.MaxValue instead of throwing. Either add overflow guards or document the deliberate choice to throw vs. saturate for long-lived service deployments. |
PR Review — CT Ingestion Planning GroundworkOverall this is a well-structured addition. The immutable data models (all Security — Hand-rolled ASN.1/BER parser
// Instead of the hand-rolled walker: var reader = new AsnReader(sanExtension.RawData, AsnEncodingRules.DER); var seqReader = reader.ReadSequence(); while (seqReader.HasData) { var tag = seqReader.PeekTag(); if (tag == new Asn1Tag(TagClass.ContextSpecific, 2)) // dNSName names.Add(seqReader.ReadCharacterString(UniversalTagNumber.IA5String, tag)); else seqReader.ReadEncodedValue(); }The comment says "pre-.NET 8 compatibility" but Code smell — Exception-as-flow-control for overflow detection
// CtIngestionPlanner — current pattern private static long SaturatingMultiply(long a, long b) { try { return checked(a * b); } catch (OverflowException) { return long.MaxValue; } }Consider using private static long SaturatingMultiply(long a, long b) { if (a == 0 || b == 0) return 0; if (a > 0 && b > 0 && a > long.MaxValue / b) return long.MaxValue; if (a < 0 || b < 0) return 0; // negatives shouldn't appear here return a * b; }These helpers are in a planning hot-path called once per provider per planning cycle so the practical impact is low, but the pattern is worth addressing. Design — Static |
| Area | Assessment |
|---|---|
| Immutability / data models | Good — all init-only, Normalize() clamping |
| Interface contracts | Clean separation of query, state, and planning |
| Hand-rolled ASN.1 parser | Needs review — System.Formats.Asn1 is available |
| Overflow saturation helpers | Works but exception-as-flow-control is a smell |
| Test coverage | Strong; add malformed-DER edge cases |
| Documentation | Thorough and accurate |
Happy to discuss any of these points further.
Code Review — Add CT ingestion planning groundworkOverall this is a well-structured PR. The immutable value-object design, thorough XML documentation, and intent-revealing factory methods are all good patterns. The test suite covers the planning logic well. Below are issues ranging from bugs to design observations. Bugs1. Missing SHA-1 ECDSA OID in The OID // current — only covers RSA and DSA SHA-1 variants private static bool IsSha1Signature(string? algorithmOid) => algorithmOid is "1.2.840.113549.1.1.5" // sha1WithRSAEncryption or "1.2.840.10040.4.3"; // id-dsa-with-sha1 // should also include // "1.2.840.10045.4.1" — ecdsa-with-SHA12. The code uses a single // problematic — flags ECDSA-256 as weak IsWeakKey = keySize is > 0 and < 2048 ? ...A minimal fix branches on the public key algorithm OID: IsWeakKey = algorithm switch { "1.2.840.113549.1.1.*" => keySize is > 0 and < 2048, // RSA "1.2.840.10040.4.1" => keySize is > 0 and < 1024, // DSA "1.2.840.10045.2.1" => keySize is > 0 and < 224, // ECC _ => false };3. RFC 4514 escaped commas break Splitting on bare 4. Fragile self-signed detection uses formatted string comparison
IsSelfSigned = certificate.SubjectName.RawData.SequenceEqual(certificate.IssuerName.RawData);Design / Consistency5. Unreachable guard in capacity estimation ( The 6.
7. The flag is cleared by any successful outcome. "Permanent" conventionally implies the caller must explicitly reset the state. Consider renaming to 8. Rate-limit outcomes contribute to If rate-limit requests increment both 9. The same property controls cooldown duration for both explicit rate-limit responses and transient failure ratio exceedance. These may need different durations in practice. Adding a dedicated 10. Hardcoded 50 % transient failure threshold The ratio Test Gaps
Positives
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6097efa342
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| EstimatedCompletionUtc = AddTimeSpanSaturating(startUtc, estimatedMinimumDuration), | ||
| MaxConcurrentRequests = rateLimit.MaxConcurrentRequests.GetValueOrDefault(1), | ||
| MaximumRequestsPerRun = rateLimit.MaximumRequestsPerRun, | ||
| EstimatedRunCount = capacities.Max(static estimate => Math.Max(0, estimate.EstimatedRunCount)), |
There was a problem hiding this comment.
Recompute run budget from aggregate request count
When operation-specific budgets are enabled, aggregate capacity sets EstimatedRunCount to the max of per-segment run counts, but FirstRunRequestCount is the sum of segment first-run counts. If MaximumRequestsPerRun is configured, this can under-report run slicing: e.g. two segments with 80 requests each and MaximumRequestsPerRun=100 produce EstimatedRunCount=1 even though 160 requests cannot fit in one run. That makes planner output claim the workload fits a single run and can cause callers to overschedule work past the configured run budget.
Useful? React with 👍 / 👎.
| PR Review: Add CT ingestion planning groundwork This is a well-structured PR with solid planning contracts for the CT ingestion pipeline. Code quality is high. Here are my findings: STRENGTHS
ISSUES [1] EstimateDuration is a dead-weight wrapper In CtIngestionPlanner.cs, EstimateDuration is a pure pass-through to EstimateScheduledDuration with identical parameters and no added behaviour. Call EstimateScheduledDuration directly at the call site. [2] QueryPagesAsync has no unit tests The pagination extension method has four distinct stop conditions: HasMore==false, null/empty continuation token, a repeated token (the loop guard), and the maxPages cap. None are tested. Since ICtCertificateTransparencyProvider has no real implementation yet, all four branches can be exercised with a simple Func-based stub. This is the most complex untested logic in the PR and should be covered before real implementations arrive. [3] Default TransientFailureRatioCooldownThreshold is undocumented The test RateLimitProfileNormalizePreservesZeroFailureRatioThreshold implies Normalize() would otherwise change a 0d threshold, suggesting the C# default carries a meaning. In Decide(), the condition is ratio >= threshold. If the threshold defaults to 0d, any provider with a transient failure and no subsequent success enters cooldown immediately after the first failure. This may be intentional but is undocumented. Recommend: (a) an XML doc comment clarifying what 0d means, and (b) a test pinning the expected default-profile behaviour. [4] ICtProviderStateStore has no contract documentation The interface exists but there is no documentation on expected consistency guarantees (optimistic concurrency? last-write-wins?), thread-safety requirements, or the expected key structure (ProviderId only? ProviderId + operation type?). Implementations will follow in later PRs; a doc comment explaining the expected contract now would prevent divergent implementations. [5] NormalizeHex over-allocates char[] output = new char[value.Length] allocates at the full input length, but only hex digits are written (count characters). For formatted fingerprints with colons the output array is about twice the needed size. Minor, but worth a comment noting the over-allocation is intentional. [6] SplitDistinguishedNameParts does not handle semicolons RFC 4514 allows semicolons as an alternative RDN separator. The parser handles only commas. This affects TryExtractCommonName, which is a last-resort fallback, so the impact is low. Worth a comment noting the limitation. [7] null tolerance inconsistency in CtCertificateRecord.FromDer The method normalizes null to string.Empty for providerId via null-coalescing, but throws ArgumentNullException for certificateDer. Consider documenting why null is tolerated for providerId but not certificateDer, or unify by throwing for both. MINOR OBSERVATIONS (no action required)
SUMMARY The planning contracts are sound and the implementation is careful. The two items to address before merging are: (1) missing tests for QueryPagesAsync -- the only non-trivially complex logic without coverage -- and (2) documenting the zero-threshold behaviour for TransientFailureRatioCooldownThreshold. The remaining items are minor improvements or notes for follow-up PRs. |
PR Review: Add CT ingestion planning groundworkThis is a well-structured PR that establishes solid foundational abstractions for CT provider planning. The immutable design, overflow protection, and comprehensive test coverage make it a strong foundation for the follow-up execution work. StrengthsImmutable value objects with Saturating arithmetic: The Separate budget lanes for Cert Spotter: Distinguishing Pre-NET8 ASN.1 fallback parser: Test coverage: 809 lines of tests for ~3800 lines of production code is a strong ratio. The boundary cases (zero requests, Issues and Observations1.
A mock 2. long totalRequestCount = checked((previous?.TotalRequestCount ?? 0L) + 1L); long successfulRequestCount = checked((previous?.SuccessfulRequestCount ?? 0L) + (success ? 1L : 0L));These use 3. IsPermanentlyFailed = !success && (permanentFailure || previous?.IsPermanentlyFailed == true),A transient failure or timeout after a permanent failure will preserve 4. bool transientFailure = outcome.OutcomeKind == CtProviderOutcomeKind.TransientFailure || outcome.OutcomeKind == CtProviderOutcomeKind.Timeout; bool timeout = outcome.OutcomeKind == CtProviderOutcomeKind.Timeout;A 5. In 6. CtProviderCapacityEstimate estimate = CtIngestionPlanner.EstimateCapacity(profile, 20);The test does not assert 7. Minor:
8. Docs: The table still references specific timing observations ("468 ms", "514 ms", "2.37 s") that came from a specific domain run. The SummaryThe abstractions are well-designed and the implementation is solid. The main asks before follow-up provider execution work builds on this:
Items 2 and 3 of the issues around |
PR Review: Add CT ingestion planning groundworkOverall this is a well-structured PR with clear separation of concerns, comprehensive test coverage, and good defensive coding. The immutable Issues1. The private helper is copy-pasted verbatim in both 2. int remainingPages = maxPages.HasValue && maxPages.Value > 0 ? maxPages.Value : int.MaxValue;Callers who pass 3. Both outcome-kind branches are identical in body. They can be collapsed: if ((outcome.OutcomeKind == CtProviderOutcomeKind.TransientFailure || outcome.OutcomeKind == CtProviderOutcomeKind.Timeout) && outcome.CooldownOnTransientFailure) { return Max(previous?.CooldownUntilUtc, AddTimeSpanSaturating(occurredAtUtc, rateLimit.CooldownAfterRateLimit)); }4. The private Minor Observations5. Custom ASN.1 SAN parser ( The pre-.NET 8 fallback parser is correctly length-bounded. One unhandled shape: if the SAN extension SEQUENCE is wrapped in an explicit context tag 6.
7. The Test CoverageThe suite is thorough — 30+ cases covering profile defaults, zero-request edge cases, One gap worth filling: no test exercises SummaryThe architecture is sound and the code is production-quality groundwork. The main asks are: extract the duplicated |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a3c80c3669
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
TryBuildExactCtMetadataEntryFromCrtShPostgreSqlRows computes matchingRows for the current host, but CertificateObservationCount is set from rows.Count (the full domain query result). In the domain-batched flow (QueryDomainMetadataAsync), that gives each host the same inflated count, which overstates per-host CT evidence and propagates into downstream totals/caps that aggregate CertificateObservationCount.
This weak-key check uses a single < 2048 threshold for all key types, so normal ECDSA certs (for example P-256/P-384) are flagged as weak in LatestCertificateWeakKey. That creates false positives in risk metadata for backfilled CT entries; the threshold should depend on key algorithm (as done in the DER normalization path) rather than raw key size alone.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
DomainDetective/CertificateInventoryCapture.CtMetadataBackfill.cs Outdated Show resolved Hide resolved
PR Review: Add CT ingestion planning groundworkThis is a solid architectural step — moving the PostgreSQL dependency out of the core library into Here are my observations, grouped by severity: Medium concerns
ObservedP95LatencyMilliseconds = previous?.ObservedP95LatencyMilliseconds,This field is always carried forward from the previous state, meaning it will be Minor concerns
private static CtCertificateQuery WithContinuationToken(this CtCertificateQuery query, string continuationToken) { return new CtCertificateQuery { Name = query.Name, QueryKind = query.QueryKind, Operations = query.Operations, RequireFullCertificate = query.RequireFullCertificate, ContinuationToken = continuationToken, PageSize = query.PageSize, Timeout = query.Timeout }; }Since Silent swallowing of SAN parsing errors in catch (Exception ex) when (!ExceptionHelper.IsFatal(ex)) { // Keep certificate parsing resilient; callers still receive core certificate fields. }Swallowing all non-fatal exceptions here means that a bug in the custom pre-NET8 ASN.1 SAN parser would produce silently empty
This is an awkward location for an interface. Since it's Connection string construction appends Npgsql-style timeout parameters return PostgreSql.BuildConnectionString( host: "crt.sh", database: "certwatch", username: "guest", password: string.Empty, port: 5432, ssl: true) + $";Timeout={timeoutSeconds};Command Timeout={timeoutSeconds}";The appended No test for The loop guard in Observations (not issues)
Good work on the dependency isolation — the direction is right. |
PR Review: CT Ingestion Planning GroundworkOverall this is a well-structured PR. The contracts and abstractions are cleanly separated, the test coverage is thorough (880+ lines in Bug / CorrectnessBroad The five specific catch clauses ( // Other places in CtCertificateRecord.cs already do this: catch (Exception ex) when (!ExceptionHelper.IsFatal(ex))The collapsed catch should apply the same guard: } catch (Exception ex) when (!ExceptionHelper.IsFatal(ex)) { domainPostgreSqlLookupFailed = true; ...Code Quality
if (outcome.OutcomeKind == CtProviderOutcomeKind.TransientFailure && outcome.CooldownOnTransientFailure) { ... } if (outcome.OutcomeKind == CtProviderOutcomeKind.Timeout && outcome.CooldownOnTransientFailure) { ... }These can be merged since bool isTransientOrTimeout = outcome.OutcomeKind == CtProviderOutcomeKind.TransientFailure || outcome.OutcomeKind == CtProviderOutcomeKind.Timeout; if (isTransientOrTimeout && outcome.CooldownOnTransientFailure) { ... }Potential Confusion
Minor / Nits
Test CoverageTest coverage is comprehensive. Specific things covered well:
No missing coverage concerns beyond the normal integration tests for the actual PostgreSQL path. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bcd27cf458
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
DomainDetective/CertificateTransparency/CtCertificateTransparencyProviderExtensions.cs Show resolved Hide resolved
| TimeSpan estimatedFirstRunDuration = capacities | ||
| .Select(static estimate => estimate.EstimatedFirstRunDuration) | ||
| .Aggregate(TimeSpan.Zero, AddTimeSpan); | ||
| int firstRunRequestCount = EstimateFirstRunRequestCount( |
There was a problem hiding this comment.
Recompute first-run duration after aggregate run-budget slicing
When operation-specific budgets are enabled, the aggregate planner sums each segment’s EstimatedFirstRunDuration before applying the provider-wide MaximumRequestsPerRun cap (firstRunRequestCount). If domain and host segments are each below the cap but exceed it in total, the plan reports a capped first-run request count while still using duration for all segment requests, overstating first-run completion time and skewing run-window planning.
Useful? React with 👍 / 👎.
Code Review: CT Ingestion Planning GroundworkOverall this is a well-structured PR. The abstraction layer is clean, the immutability patterns are consistent, the test coverage (1000+ lines for planning logic alone) is impressive, and the documentation in CT_PROVIDERS.MD is genuinely useful as a service-design guide. A few things worth discussing before merging: Issues1. Broad catch (Exception) swallows all errors in the backfill pathFile: The PR replaces the narrow list of specific exception types ( Consider filtering with a guard clause to re-throw fatal exceptions, or catching a DBAClientX-specific base exception type if one is available. 2. Ordinal-based column mapping in MapRow is fragileFile: Columns are accessed by index (0-7). If the query in 3. ShouldUseCrtShPostgreSqlMetadataFallback activates without explicit opt-inFile: The new 4. QueryPagesAsync final runtime state is not accessible to callersFile: The runtime state is threaded through page calls and embedded in each Minor NotesSuccess clears cooldown immediately in CtProviderRuntimeStateUpdater A single success immediately clears any cooldown even if 49 of the last 50 requests failed. This is fine for groundwork but consider a configurable recovery threshold (e.g., N consecutive successes) before the cooldown clears, depending on future usage patterns. Connection string concatenation in BuildCrtShPostgreSqlConnectionString The connection string is built then concatenated with timeout keys as a raw string suffix. If Removed warning assertion in BackfillMissingCtCertificateMetadataAsync_RetainsExactPassiveFallback The removed assertion checked for a "domain PostgreSQL lookup failed" warning that is now emitted as "skipped" in the new code path. The removal is intentional and correct.
Test CoverageThe
SummaryThe design is solid and a good foundation for the planned "DD Next" service. The main actionable items are the broad exception catch (a safety concern worth addressing before merge) and documenting the activation-by-registration behavior of the PostgreSQL fallback. The rest are refinements that can be addressed in follow-up PRs. |
| test |
| PR Review part 1/2 - Bugs and Design Issues This is a well-designed PR. The dependency isolation (removing Npgsql from the base package), the immutable/normalizable value-object pattern, and the saturating arithmetic throughout are all solid. Test coverage for the planner is thorough. Below are items worth addressing before merge. Bugs / Correctness
CtCertificateRecord.SplitDistinguishedNameParts handles backslash-escaped commas but not RFC 4514 double-quoted values. A certificate with a quoted comma in the CN (e.g. CN=Smith, John,O=Example) will produce a garbled result. This is an edge case in the wild, but worth noting since the method already handles escape sequences.
In CrtShPostgreSqlMetadataProvider.QueryDomainMetadataAsync the limit calculation uses unchecked int multiplication: Math.Min(Math.Max(32, normalizedHosts.Count * 8), 512). The 512 cap makes overflow unlikely in practice, but casting Count to long first would make the intent explicit.
CertificateDer = row.IsDBNull(0) ? null : (byte[])row.GetValue(0) - GetValue returns object and the direct cast throws InvalidCastException if the driver returns a string hex encoding instead of a byte array. An as cast with a null fallback or a try/catch would keep parsing resilient to schema surprises from a shared public replica. Design / Maintainability
The private static helper is copy-pasted identically into CtIngestionPlanner and CtProviderRuntimeStateUpdater. A shared internal static class CtTimeSpanHelper (or similar) would eliminate the duplication since both callers are in the same assembly.
BuildCrtShPostgreSqlConnectionString appends timeout parameters as a raw string suffix (PostgreSql.BuildConnectionString(...) + ;Timeout=N;Command Timeout=N). If PostgreSql.BuildConnectionString ever emits a trailing semicolon, or if a future maintainer changes the timeout key names, this silently produces a malformed or ignored connection string. Prefer the connection string builder API of the underlying driver. |
| PR Review part 2/2 - More Design Issues, Minor Notes, and Positive Callouts
The test BackfillMissingCtCertificateMetadataAsync_RetainsExactPassiveFallback... previously asserted that warnings contains the string 'domain PostgreSQL lookup failed'. This assertion was removed without explanation. If the warning is no longer emitted because the domain-path now goes through the new ICtSqlMetadataProvider (which handles failures differently), that should be noted in the PR description or a code comment. If it was removed inadvertently it should be restored.
The method returns [crt.sh HTTP, crt.sh SQL, CertSpotter, NativeCT]. Since ChooseFirstReadyProvider picks the first Ready plan, callers who pair this factory with PlanProviders will silently prefer crt.sh HTTP. The factory doc comment says nothing about ordering -- a brief doc note would help callers avoid surprises. Minor / Style
ParseDnsNamesFromSanExtension checks if rawData[offset] == 0x04 before looking for the SEQUENCE. X509Extension.RawData in .NET holds extension value bytes directly (starting with 0x30), not an outer OCTET STRING wrapper. The guard is defensive, but a comment explaining when the 0x04 wrapper is expected would prevent future confusion.
ICtSqlMetadataProvider is internal while the new ICtCertificateTransparencyProvider is public. The asymmetry appears intentional (the SQL interface is a legacy bridge, the CT interface is the new surface), but a doc comment on ICtSqlMetadataProvider noting it is a transitional implementation detail of the backfill path would make the intent explicit. Positive callouts
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ad97e39b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return options != null && | ||
| (options.EnableCrtShPostgreSqlMetadataFallback || exactMetadataProvider != null); | ||
| (options.EnableCrtShPostgreSqlMetadataFallback || exactMetadataProvider != null || sqlMetadataProvider != null); |
There was a problem hiding this comment.
Honor PostgreSQL fallback flag when provider is registered
ShouldUseCrtShPostgreSqlMetadataFallback treats a registered SQL provider as sufficient to enable fallback even when EnableCrtShPostgreSqlMetadataFallback is false. After calling DomainDetectiveCtSqlRegistration.Register(), runs that intentionally disable PostgreSQL fallback will still execute crt.sh DB lookup paths, which can introduce unexpected network/database traffic and timeouts in captures that were configured to avoid that source.
Useful? React with 👍 / 👎.
| You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
PR Review: Add CT ingestion planning groundworkOverall this is a well-structured, well-tested addition. The domain model is coherent — immutable state, capability flags, saturating arithmetic, and the provider profile/planner separation are all solid design choices. The test suite in Style / AGENTS.MD complianceBrace style inconsistency — // Current (Allman) in CtIngestionPlanner.cs: public static CtProviderCapacityEstimate EstimateCapacity(...) { ... } // AGENTS.MD expected (K&R): public static CtProviderCapacityEstimate EstimateCapacity(...) { ... }Large files — Potential bugs / correctness
if (outcome.OutcomeKind == CtProviderOutcomeKind.TransientFailure && outcome.CooldownOnTransientFailure) { return Max(previous?.CooldownUntilUtc, AddTimeSpanSaturating(occurredAtUtc, rateLimit.CooldownAfterRateLimit)); }Rate limit cooldowns and transient-failure backoffs typically have different characteristics (rate limits are server-imposed; transient failures warrant shorter exponential backoff). Sharing the same duration could either over-penalize on transient failures or under-penalize on rate limits. Consider adding a
IsPermanentlyFailed = !success && (permanentFailure || previous?.IsPermanentlyFailed == true),…means a single successful request will clear Connection string duplication risk — return PostgreSql.BuildConnectionString(...) + $";Timeout={timeoutSeconds};Command Timeout={timeoutSeconds}";If Timeout counted in both bool transientFailure = outcome.OutcomeKind == CtProviderOutcomeKind.TransientFailure || outcome.OutcomeKind == CtProviderOutcomeKind.Timeout; bool timeout = outcome.OutcomeKind == CtProviderOutcomeKind.Timeout;This is fine if intentional (Timeout is-a kind of transient failure), but the overlap is non-obvious. A comment on the field or here would help callers avoid double-counting when aggregating metrics. Security / robustness
Raw ASN.1 BER fallback parser — the hand-rolled SAN parser in Missing test coverage
Minor
Good foundation for the follow-up execution work. The planning contracts are cleanly separated from implementation, which will make the provider-specific work much easier to add incrementally. |
5fd2ecb into codex/dependency-isolation-pr
Summary
Stacking
This PR is stacked on #1172 (
codex/dependency-isolation-pr) so the dependency split stays isolated. It does not move or cut out CtSql implementation; it only prepares reusable CT planning contracts for the follow-up provider execution work.Validation
dotnet test DomainDetective.Tests\DomainDetective.Tests.csproj -c Debug --framework net8.0 --no-restore --filter "TestCtCertificateQuery|TestCtIngestionPlanner"dotnet build DomainDetective.Tests\DomainDetective.Tests.csproj -c Debug --no-restore