Add case-insensitive uniqueness enforcement for project identifier#22406
Add case-insensitive uniqueness enforcement for project identifier#22406
Conversation
dc0a497 to dbed01e Compare 855b75c to 36bacc6 Compare 54a420a to e55e2c6 Compare 36bacc6 to df753f1 Compare df753f1 to 128ae0a Compare 7bb5773 to 1babb4b Compare 128ae0a to 6541f9b Compare There was a problem hiding this comment.
Pull request overview
Adds setting-aware normalization and case-insensitive uniqueness for Project.identifier, aligning storage casing with Setting::WorkPackageIdentifier and updating related autofix preview and MCP search behaviors.
Changes:
- Normalize
Project.identifierto uppercase (alphanumeric mode) or lowercase (numeric mode) and make uniqueness case-insensitive. - Add a
LOWER(identifier)unique index (including a concurrent migration) and update initial table definition for fresh installs. - Extend identifier-autofix preview classification + translations, and update MCP tool schemas/specs for case-insensitive identifier searches.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| app/models/projects/identifier.rb | Implements casing normalization, case-insensitive uniqueness, FriendlyId input normalization, and updates historical identifier checks. |
| lib/open_project/acts_as_url/adapter/op_active_record.rb | Adds optional post_process support to apply controlled casing transforms to generated slugs. |
| db/migrate/20260319120000_add_case_insensitive_uniqueness_for_project_identifiers.rb | Adds concurrent migration switching to a LOWER(identifier) unique index. |
| db/migrate/tables/projects.rb | Updates initial schema definition to use the LOWER(identifier) unique index. |
| app/services/work_packages/identifier_autofix/preview_query.rb | Expands “problematic identifier” detection and error classification for semantic identifiers. |
| config/locales/en.yml | Adds new autofix preview error labels for additional classifications. |
| spec/services/work_packages/identifier_autofix/preview_query_spec.rb | Updates preview query specs to bypass normalization and cover new classifications. |
| spec/models/projects/identifier_spec.rb | Adds normalization and case-insensitive collision specs; switches to rails_helper. |
| spec/features/projects/create_spec.rb | Strengthens feature coverage for alphanumeric identifier auto-suggestion behavior. |
| app/services/mcp_tools/search_projects.rb | Updates tool schema description to case-insensitive identifier matching. |
| app/services/mcp_tools/search_programs.rb | Updates tool schema description to case-insensitive identifier matching. |
| app/services/mcp_tools/search_portfolios.rb | Updates tool schema description to case-insensitive identifier matching. |
| spec/requests/mcp/mcp_tools/search_projects_spec.rb | Updates request spec to expect case-insensitive identifier matching. |
| spec/requests/mcp/mcp_tools/search_programs_spec.rb | Updates request spec to expect case-insensitive identifier matching. |
| spec/requests/mcp/mcp_tools/search_portfolios_spec.rb | Updates request spec to expect case-insensitive identifier matching. |
| spec/models/project_spec.rb | Removes a stray blank line at EOF. |
spec/services/work_packages/identifier_autofix/preview_query_spec.rb Outdated Show resolved Hide resolved
spec/services/work_packages/identifier_autofix/preview_query_spec.rb Outdated Show resolved Hide resolved
6541f9b to 6aedcab Compare Introduce case-insensitive handling for project identifiers: - Add `normalizes :identifier` to automatically upcase (alphanumeric mode) or downcase (numeric mode) identifiers on assignment - Add `parse_friendly_id` to normalize FriendlyId lookups for case-insensitive finder queries - Switch uniqueness validation to `case_sensitive: false` - Replace inline `exclusion:` validator with explicit `identifier_not_reserved` that checks case-insensitively - Consolidate alphanumeric format validators into a single `identifier_alphanumeric_format` method with early return to prevent cascading error messages - Use case-insensitive LOWER() comparison in historical identifier reservation check - Add `post_process` support to the OpActiveRecord acts_as_url adapter with an allowlist of safe transforms (upcase/downcase) - Add migration to replace the unique index on projects.identifier with a case-insensitive LOWER(identifier) index - Update table definition to match the new index Includes corresponding test additions for normalization, case- insensitive uniqueness, reserved identifier rejection, and the create_spec fixture fix for alphanumeric mode.
a1d5166 to 2ca7187 Compare Expand PreviewQuery to detect additional alphanumeric identifier format violations: case-mismatch, underscores, leading digits, and purely numerical identifiers. Restructure error classification with FORMAT_RULES for clarity and add corresponding locale strings. Update MCP tool descriptions for search_projects, search_programs, and search_portfolios to reflect case-insensitive identifier matching, with updated test expectations.
2ca7187 to 91362fb Compare These were built on the normalizes premise which has been removed. In alphanumeric mode, SuggestionGenerator handles identifier generation rather than acts_as_url. The API PR (22417) will handle the proper wiring between modes.
The underscore fix (allowing _ in identifiers per spec) needs to target ProblematicIdentifiers instead of PreviewQuery after the extraction refactor on the target branch.
0738f98 to 5c7bed1 Compare db/migrate/20260319120000_add_case_insensitive_uniqueness_for_project_identifiers.rb Show resolved Hide resolved
| length: { maximum: SEMANTIC_IDENTIFIER_MAX_LENGTH }, | ||
| if: ->(p) { p.identifier_changed? && p.identifier.present? && Setting::WorkPackageIdentifier.alphanumeric? } | ||
| # Validators for the alphanumeric identifier format (e.g. "PROJ1") | ||
| validate :identifier_alphanumeric_format, |
There was a problem hiding this comment.
Shall we already switch to semantic across the code? (We can still discuss the name further, but for now we should at least remain consistent.)
Of course this is probably better via a follow-up PR.
There was a problem hiding this comment.
I know we haven't found naming scheme that would sit 100% well, but I would definitely at least get rid of "numeric/alphanumeric", since the "numeric" part does not really click within the context of this file.
There was a problem hiding this comment.
I took the "numeric/alphanumeric" wording to be consistent with Setting::WorkPackageIdentifier.{numeric?,alphanumeric?}- which should make it easier to rename once we decide/adopt new names. I definitely prefer to keep the renaming isolated into a separate PR.
There was a problem hiding this comment.
I agree. We should try to be consistent somehow. But renaming the setting probably requires a database migration 😅 so maybe it's easier if we settle on numeric / alphanumeric even if I don't like it as much 🥲
7295e82 to 2bf5740 Compare Adds declarative specs for case-insensitive uniqueness, max length validation, and the LOWER(identifier) unique database index.
PostgreSQL enforces uniqueness constraints at index build time — when CREATE INDEX encounters duplicate key values, the entire operation fails. For concurrent index builds (algorithm: :concurrently), this is particularly problematic: a failed build leaves behind an "invalid" index entry in pg_class that is ignored for queries but still incurs write overhead on every INSERT/UPDATE, and must be explicitly dropped before retrying. Since this migration introduces a unique index on LOWER(identifier), any pre-existing case collisions (e.g. "MyProject" and "myproject") would cause the index build to fail. This adds a deduplication step that resolves collisions before the index is created: the oldest project (lowest id) keeps its identifier unchanged while newer duplicates receive a "-N" suffix via a window function partitioned over LOWER(identifier).
Project.identifier Switch suffix from dash to underscore so deduplicated identifiers remain valid in both numeric and alphanumeric modes. Add a NOT EXISTS guard to skip rows where the suffixed identifier would itself collide with an existing LOWER(identifier). In the astronomically unlikely event of a secondary collision, the subsequent CREATE UNIQUE INDEX will fail loudly rather than silently produce bad data.
After the background autofix job converts identifiers from lowercase to uppercase, historical slugs in friendly_id_slugs will differ in case from current project identifiers. Without LOWER() in the NOT IN subquery, those slugs are incorrectly included in the reserved set, causing the suggestion generator to over-reserve. Example: a project created in numeric mode gets identifier "proj" and FriendlyId records slug "proj". The autofix job later uppercases it to "PROJ". Now the case-sensitive `slug NOT IN (SELECT identifier ...)` sees "proj" != "PROJ" and incorrectly treats "proj" as reserved, preventing the suggestion generator from offering it.
Unify test helpers into a single create_project_with_raw_identifier method that better communicates intent (bypasses validations to set an exact identifier). Add project.reload after update_all so the in-memory object stays consistent with the database. Also documents that the migration rollback is intentionally lossy — deduplicated identifiers keep their suffixes under the restored case-sensitive index.
Project.identifier db/migrate/20260319120000_add_case_insensitive_uniqueness_for_project_identifiers.rb Show resolved Hide resolved
app/services/work_packages/identifier_autofix/problematic_identifiers.rb Show resolved Hide resolved
The suggestion generator produces uppercase candidates and checks exclusions with case-sensitive `include?`. Historical slugs stored in lowercase (e.g. "proj") would not block the generator from suggesting "PROJ", even though `identifier_not_historically_reserved` would reject it on save via LOWER(). Upcasing the exclusion set at the consumption point keeps ProblematicIdentifiers#exclusion_set casing-neutral for other consumers while ensuring the preview never proposes identifiers that collide with historical slugs.
…ntifiers-case-insensitive-storage
…dator The numeric (legacy) format validation was inline as a `validates :format` declaration, while the alphanumeric validator was already a dedicated method. Extract it into `identifier_numeric_format` so both mode-specific validators follow the same pattern, and fix a misleading comment on the shared validators.
judithroth left a comment
There was a problem hiding this comment.
I found one thing that could be improved but as far as I see you only moved that code around without changing it much - so if you don't want to change that I can live with it.
| return | ||
| end | ||
| | ||
| errors.add(:identifier, :no_special_characters) unless identifier.match?(/\A[A-Z][A-Z0-9_]*\z/) |
There was a problem hiding this comment.
Nitpick: You could change /\A[A-Z][A-Z0-9_]*\z/) to /\A[A-Z0-9_]*\z/) because the requirement to have the first letter be a character is already covered by the check in line 136.
And then you could remove the return, so the user who tries to change an identifier to 123! gets both messages (not starting with character and no special characters) at once.
I usually find it quite annoying if systems only tell me parts of the errors just to complain again after I changed something 😅
But oh, I see you just moved that code around and you're not the author.
There was a problem hiding this comment.
Nice catch! Sure, happy to address this 👍🏾
Ticket
https://community.openproject.org/wp/73205
What are you trying to accomplish?
Add case-insensitive uniqueness enforcement for
Project.identifier, ensuring no two projects can have identifiers that differ only in case (e.g. "proj" and "PROJ").Includes:
What approach did you choose and why?
The core of this change is a
LOWER(identifier)functional index on theprojectstable, paired with auniqueness: { case_sensitive: false }validation.The migration includes a deduplication step that renames any existing case-colliding identifiers (e.g., "myproject" and "MyProject") before creating the unique index, preventing index creation failures on instances with legacy data. The deduplication hardens against secondary collisions (e.g., where the renamed identifier would itself collide with another existing identifier). The index is built concurrently to avoid write-locking the projects table on larger instances, consistent with the existing pattern used across migrations in this codebase.
The
identifier_not_historically_reservedvalidation usesLOWER(slug) = LOWER(?)to prevent a project from claiming an identifier that was historically used by another project under a different casing — guarding against confusing URL routing scenarios (e.g.,/projects/projresolving to one project while/projects/PROJwould resolve to another). This is a validation-only check and does not affect URL resolution itself.The
ProblematicIdentifiersservice andPreviewQueryautofix pipeline are also updated: reserved slug identifiers are now excluded from identifier suggestions using a case-insensitive comparison, ensuring suggestions don't propose identifiers that are case-insensitively reserved in the slug history.Merge checklist