Skip to content

Add case-insensitive uniqueness enforcement for project identifier#22406

Open
akabiru wants to merge 22 commits intodevfrom
open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage
Open

Add case-insensitive uniqueness enforcement for project identifier#22406
akabiru wants to merge 22 commits intodevfrom
open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage

Conversation

@akabiru
Copy link
Member

@akabiru akabiru commented Mar 18, 2026

Ticket

https://community.openproject.org/wp/73205

What are you trying to accomplish?

Add case-insensitive uniqueness enforcement for Project.identifier, ensuring no two projects can have identifiers that differ only in case (e.g. "proj" and "PROJ").

Includes:

What approach did you choose and why?

The core of this change is a LOWER(identifier) functional index on the projects table, paired with a uniqueness: { case_sensitive: false } validation.

The migration includes a deduplication step that renames any existing case-colliding identifiers (e.g., "myproject" and "MyProject") before creating the unique index, preventing index creation failures on instances with legacy data. The deduplication hardens against secondary collisions (e.g., where the renamed identifier would itself collide with another existing identifier). The index is built concurrently to avoid write-locking the projects table on larger instances, consistent with the existing pattern used across migrations in this codebase.

The identifier_not_historically_reserved validation uses LOWER(slug) = LOWER(?) to prevent a project from claiming an identifier that was historically used by another project under a different casing — guarding against confusing URL routing scenarios (e.g., /projects/proj resolving to one project while /projects/PROJ would resolve to another). This is a validation-only check and does not affect URL resolution itself.

The ProblematicIdentifiers service and PreviewQuery autofix pipeline are also updated: reserved slug identifiers are now excluded from identifier suggestions using a case-insensitive comparison, ensuring suggestions don't propose identifiers that are case-insensitively reserved in the slug history.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)
@akabiru akabiru self-assigned this Mar 19, 2026
@akabiru akabiru changed the title [#73149] Support setting-aware identifier casing with case-insensitive uniqueness [#73149] Support setting-aware project identifier casing with case-insensitive uniqueness Mar 19, 2026
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from dc0a497 to dbed01e Compare March 19, 2026 06:11
Base automatically changed from implementation/72917-save-the-previous-value-when-a-project-identifier-changes to dev March 19, 2026 08:56
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch 4 times, most recently from 855b75c to 36bacc6 Compare March 20, 2026 11:25
@akabiru akabiru changed the base branch from dev to refactor/extract-projects-identifier-concern March 20, 2026 11:27
@akabiru akabiru force-pushed the refactor/extract-projects-identifier-concern branch from 54a420a to e55e2c6 Compare March 20, 2026 12:48
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 36bacc6 to df753f1 Compare March 20, 2026 13:06
@akabiru akabiru changed the base branch from refactor/extract-projects-identifier-concern to refactor/strip-ascii-control-chars-from-identifier March 20, 2026 13:06
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from df753f1 to 128ae0a Compare March 20, 2026 13:19
@akabiru akabiru force-pushed the refactor/strip-ascii-control-chars-from-identifier branch from 7bb5773 to 1babb4b Compare March 20, 2026 13:54
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 128ae0a to 6541f9b Compare March 20, 2026 14:11
@akabiru akabiru marked this pull request as ready for review March 20, 2026 14:13
@akabiru akabiru requested a review from a team March 20, 2026 14:13
@akabiru akabiru changed the title [#73149] Support setting-aware project identifier casing with case-insensitive uniqueness Support setting-aware project identifier casing with case-insensitive uniqueness Mar 20, 2026
@opf opf deleted a comment from github-actions bot Mar 20, 2026
@akabiru akabiru requested a review from Copilot March 20, 2026 14:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds setting-aware normalization and case-insensitive uniqueness for Project.identifier, aligning storage casing with Setting::WorkPackageIdentifier and updating related autofix preview and MCP search behaviors.

Changes:

  • Normalize Project.identifier to uppercase (alphanumeric mode) or lowercase (numeric mode) and make uniqueness case-insensitive.
  • Add a LOWER(identifier) unique index (including a concurrent migration) and update initial table definition for fresh installs.
  • Extend identifier-autofix preview classification + translations, and update MCP tool schemas/specs for case-insensitive identifier searches.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
app/models/projects/identifier.rb Implements casing normalization, case-insensitive uniqueness, FriendlyId input normalization, and updates historical identifier checks.
lib/open_project/acts_as_url/adapter/op_active_record.rb Adds optional post_process support to apply controlled casing transforms to generated slugs.
db/migrate/20260319120000_add_case_insensitive_uniqueness_for_project_identifiers.rb Adds concurrent migration switching to a LOWER(identifier) unique index.
db/migrate/tables/projects.rb Updates initial schema definition to use the LOWER(identifier) unique index.
app/services/work_packages/identifier_autofix/preview_query.rb Expands “problematic identifier” detection and error classification for semantic identifiers.
config/locales/en.yml Adds new autofix preview error labels for additional classifications.
spec/services/work_packages/identifier_autofix/preview_query_spec.rb Updates preview query specs to bypass normalization and cover new classifications.
spec/models/projects/identifier_spec.rb Adds normalization and case-insensitive collision specs; switches to rails_helper.
spec/features/projects/create_spec.rb Strengthens feature coverage for alphanumeric identifier auto-suggestion behavior.
app/services/mcp_tools/search_projects.rb Updates tool schema description to case-insensitive identifier matching.
app/services/mcp_tools/search_programs.rb Updates tool schema description to case-insensitive identifier matching.
app/services/mcp_tools/search_portfolios.rb Updates tool schema description to case-insensitive identifier matching.
spec/requests/mcp/mcp_tools/search_projects_spec.rb Updates request spec to expect case-insensitive identifier matching.
spec/requests/mcp/mcp_tools/search_programs_spec.rb Updates request spec to expect case-insensitive identifier matching.
spec/requests/mcp/mcp_tools/search_portfolios_spec.rb Updates request spec to expect case-insensitive identifier matching.
spec/models/project_spec.rb Removes a stray blank line at EOF.
Base automatically changed from refactor/strip-ascii-control-chars-from-identifier to dev March 20, 2026 14:56
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 6541f9b to 6aedcab Compare March 20, 2026 15:01
@opf opf deleted a comment from github-actions bot Mar 20, 2026
Introduce case-insensitive handling for project identifiers: - Add `normalizes :identifier` to automatically upcase (alphanumeric mode) or downcase (numeric mode) identifiers on assignment - Add `parse_friendly_id` to normalize FriendlyId lookups for case-insensitive finder queries - Switch uniqueness validation to `case_sensitive: false` - Replace inline `exclusion:` validator with explicit `identifier_not_reserved` that checks case-insensitively - Consolidate alphanumeric format validators into a single `identifier_alphanumeric_format` method with early return to prevent cascading error messages - Use case-insensitive LOWER() comparison in historical identifier reservation check - Add `post_process` support to the OpActiveRecord acts_as_url adapter with an allowlist of safe transforms (upcase/downcase) - Add migration to replace the unique index on projects.identifier with a case-insensitive LOWER(identifier) index - Update table definition to match the new index Includes corresponding test additions for normalization, case- insensitive uniqueness, reserved identifier rejection, and the create_spec fixture fix for alphanumeric mode.
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch 2 times, most recently from a1d5166 to 2ca7187 Compare March 20, 2026 17:33
Expand PreviewQuery to detect additional alphanumeric identifier format violations: case-mismatch, underscores, leading digits, and purely numerical identifiers. Restructure error classification with FORMAT_RULES for clarity and add corresponding locale strings. Update MCP tool descriptions for search_projects, search_programs, and search_portfolios to reflect case-insensitive identifier matching, with updated test expectations.
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 2ca7187 to 91362fb Compare March 20, 2026 18:35
akabiru added 2 commits March 23, 2026 17:22
These were built on the normalizes premise which has been removed. In alphanumeric mode, SuggestionGenerator handles identifier generation rather than acts_as_url. The API PR (22417) will handle the proper wiring between modes.
The underscore fix (allowing _ in identifiers per spec) needs to target ProblematicIdentifiers instead of PreviewQuery after the extraction refactor on the target branch.
@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 0738f98 to 5c7bed1 Compare March 23, 2026 14:25
@akabiru akabiru requested review from judithroth and thykel March 23, 2026 14:38
length: { maximum: SEMANTIC_IDENTIFIER_MAX_LENGTH },
if: ->(p) { p.identifier_changed? && p.identifier.present? && Setting::WorkPackageIdentifier.alphanumeric? }
# Validators for the alphanumeric identifier format (e.g. "PROJ1")
validate :identifier_alphanumeric_format,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we already switch to semantic across the code? (We can still discuss the name further, but for now we should at least remain consistent.)

Of course this is probably better via a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we haven't found naming scheme that would sit 100% well, but I would definitely at least get rid of "numeric/alphanumeric", since the "numeric" part does not really click within the context of this file.

Copy link
Member Author

@akabiru akabiru Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the "numeric/alphanumeric" wording to be consistent with Setting::WorkPackageIdentifier.{numeric?,alphanumeric?}- which should make it easier to rename once we decide/adopt new names. I definitely prefer to keep the renaming isolated into a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should try to be consistent somehow. But renaming the setting probably requires a database migration 😅 so maybe it's easier if we settle on numeric / alphanumeric even if I don't like it as much 🥲

@akabiru akabiru force-pushed the open-point/73149-how-do-we-handle-project-identifiers-case-insensitive-storage branch from 7295e82 to 2bf5740 Compare March 23, 2026 15:21
akabiru added 2 commits March 23, 2026 18:53
Adds declarative specs for case-insensitive uniqueness, max length validation, and the LOWER(identifier) unique database index.
PostgreSQL enforces uniqueness constraints at index build time — when CREATE INDEX encounters duplicate key values, the entire operation fails. For concurrent index builds (algorithm: :concurrently), this is particularly problematic: a failed build leaves behind an "invalid" index entry in pg_class that is ignored for queries but still incurs write overhead on every INSERT/UPDATE, and must be explicitly dropped before retrying. Since this migration introduces a unique index on LOWER(identifier), any pre-existing case collisions (e.g. "MyProject" and "myproject") would cause the index build to fail. This adds a deduplication step that resolves collisions before the index is created: the oldest project (lowest id) keeps its identifier unchanged while newer duplicates receive a "-N" suffix via a window function partitioned over LOWER(identifier).
@akabiru akabiru requested review from judithroth and thykel March 24, 2026 04:32
@akabiru akabiru changed the title Support setting-aware project identifier casing with case-insensitive uniqueness Add case-insensitive uniqueness enforcement for Project.identifier Mar 24, 2026
akabiru added 3 commits March 24, 2026 08:04
Switch suffix from dash to underscore so deduplicated identifiers remain valid in both numeric and alphanumeric modes. Add a NOT EXISTS guard to skip rows where the suffixed identifier would itself collide with an existing LOWER(identifier). In the astronomically unlikely event of a secondary collision, the subsequent CREATE UNIQUE INDEX will fail loudly rather than silently produce bad data.
After the background autofix job converts identifiers from lowercase to uppercase, historical slugs in friendly_id_slugs will differ in case from current project identifiers. Without LOWER() in the NOT IN subquery, those slugs are incorrectly included in the reserved set, causing the suggestion generator to over-reserve. Example: a project created in numeric mode gets identifier "proj" and FriendlyId records slug "proj". The autofix job later uppercases it to "PROJ". Now the case-sensitive `slug NOT IN (SELECT identifier ...)` sees "proj" != "PROJ" and incorrectly treats "proj" as reserved, preventing the suggestion generator from offering it.
Unify test helpers into a single create_project_with_raw_identifier method that better communicates intent (bypasses validations to set an exact identifier). Add project.reload after update_all so the in-memory object stays consistent with the database. Also documents that the migration rollback is intentionally lossy — deduplicated identifiers keep their suffixes under the restored case-sensitive index.
@akabiru akabiru changed the title Add case-insensitive uniqueness enforcement for Project.identifier Add case-insensitive uniqueness enforcement for project identifier Mar 24, 2026
@thykel thykel requested a review from Copilot March 24, 2026 08:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

akabiru added 4 commits March 25, 2026 12:16
The suggestion generator produces uppercase candidates and checks exclusions with case-sensitive `include?`. Historical slugs stored in lowercase (e.g. "proj") would not block the generator from suggesting "PROJ", even though `identifier_not_historically_reserved` would reject it on save via LOWER(). Upcasing the exclusion set at the consumption point keeps ProblematicIdentifiers#exclusion_set casing-neutral for other consumers while ensuring the preview never proposes identifiers that collide with historical slugs.
…dator The numeric (legacy) format validation was inline as a `validates :format` declaration, while the alphanumeric validator was already a dedicated method. Extract it into `identifier_numeric_format` so both mode-specific validators follow the same pattern, and fix a misleading comment on the shared validators.
Copy link
Contributor

@judithroth judithroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one thing that could be improved but as far as I see you only moved that code around without changing it much - so if you don't want to change that I can live with it.

return
end

errors.add(:identifier, :no_special_characters) unless identifier.match?(/\A[A-Z][A-Z0-9_]*\z/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: You could change /\A[A-Z][A-Z0-9_]*\z/) to /\A[A-Z0-9_]*\z/) because the requirement to have the first letter be a character is already covered by the check in line 136.

And then you could remove the return, so the user who tries to change an identifier to 123! gets both messages (not starting with character and no special characters) at once.
I usually find it quite annoying if systems only tell me parts of the errors just to complain again after I changed something 😅

But oh, I see you just moved that code around and you're not the author.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Sure, happy to address this 👍🏾

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants