Skip to content

Conversation

@cbuescher
Copy link
Member

For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.

Closes #73763

For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes elastic#73763
@cbuescher cbuescher added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.14.0 labels Jun 30, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 30, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@cbuescher
Copy link
Member Author

@elasticmachine update branch

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments.

final boolean min = sortMissingFirst(missingValue) ^ reverse;
missingValue = min ? 0L : Long.MAX_VALUE;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic could move to IndexNumericFieldData#missingObject ? IndexFieldData#missingObject doesn't need to be final, we need to pick the value depending on the targetNumericType.

Copy link
Member Author

@cbuescher cbuescher Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a bit and one problem I couldn't solve with this is that "missingObject" is part of XFieldComparatorSource, not IndexFieldData. That probably woudl mean overwriting it in special versions of LongValuesComparatorSource, and for the things I tried it wasn't always clear how to ther the "targetNumericType" from the sort (e.g. reduceType() returns SortField.Type that doesn't even differentiate between Long, Data and Datenanos like IndexNumericFieldData.NumericType does). Maybe I'm missing something but from what I tried this current location seems less complex to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targetNumericType could be added to the LongValuesSource and checked in missingObject ? It's not really about complexity here, missingObject is called in several contexts so it needs a fix.

SearchResponse searchResponse = client().prepareSearch(index)
.addSort(SortBuilders.fieldSort("mydate").order(SortOrder.ASC).setFormat(format).setNumericType("date_nanos"))
.get();
assertHitsInOrder(searchResponse, new String[] { "1", "2", "4", "5", "3", "6" });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tiebreaker should be used to ensure that 3 is always sorted before 6 (same value otherwise).

.setNumericType("date_nanos")
)
.get();
assertHitsInOrder(searchResponse, new String[] { "3", "6", "1", "2", "4", "5" });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@cbuescher
Copy link
Member Author

@jimczi thanks for the review, I improved the tests and left a comment regarding moving the implementation logic elsewhere and the problems I encountered. If you have other suggestions please let me know, otherwise I hope its also okay if we leave it where it is.

@cbuescher
Copy link
Member Author

@jimczi I pushed another commit that adresses your latest comment, should be ready for another look I think

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, LGTM otherwise

@Nullable Object missingValue,
MultiValueMode sortMode,
Nested nested,
NumericType targetNumericType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targetNumericType is implicit here ?

@Nullable Object missingValue,
MultiValueMode sortMode,
Nested nested,
NumericType targetNumericType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targetNumericType is implicit here, no need to add it.

// special case to prevent negative values that would cause invalid nanosecond ranges
if (sortMissingFirst(missingValue) || sortMissingLast(missingValue)) {
final boolean min = sortMissingFirst(missingValue) ^ reversed;
return min ? 0L : Long.MAX_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use DateUtils#MAX_NANOSECOND_INSTANT as the max

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the converted nanoseconds value of that Instant though I think, which should be the same as Long.MAX_VALUE? I introduced another constant (8c6df48#diff-09ba5f62fc8de8923f01213874ef48657920b2cc1428ff4f27f5f06169ee8f1aR207) in my last commit to not have to do that coversion every time, just if you want to check.

@cbuescher
Copy link
Member Author

@elasticmachine update branch

@cbuescher cbuescher merged commit bf952a4 into elastic:master Jul 7, 2021
@cbuescher cbuescher added backport pending auto-backport Automatically create backport pull requests when merged labels Jul 7, 2021
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 7, 2021
For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes elastic#73763
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 7, 2021
For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes elastic#73763
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.14
7.x
cbuescher pushed a commit that referenced this pull request Jul 7, 2021
#75064) For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes #73763
@cbuescher cbuescher added v7.14.0 and removed v7.14.1 labels Jul 7, 2021
cbuescher pushed a commit that referenced this pull request Jul 7, 2021
…#75065) For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes #73763
ChrisHegarty added a commit that referenced this pull request Aug 1, 2025
This commit fixes an issue whereby indices created prior to 7.14 that have an index sort on a date_nanos field can no longer be opened (with versions >= 7.14). When opening an index the configured index sort, derived from the index settings, must match exactly that of the sort encoded in the index itself. A change to fix a bug back in 7.14 changed this for date_nanos fields whose index.sort.missing value is absent, see #74760. Specifically, the default minimum value changed from Long.MIN_VALUE to 0L. The change in this commit restores the default minimum value for indices prior to 7.14.
ChrisHegarty added a commit that referenced this pull request Aug 1, 2025
…132171) This commit fixes an issue whereby indices created prior to 7.14 that have an index sort on a date_nanos field can no longer be opened (with versions >= 7.14). When opening an index the configured index sort, derived from the index settings, must match exactly that of the sort encoded in the index itself. A change to fix a bug back in 7.14 changed this for date_nanos fields whose index.sort.missing value is absent, see #74760. Specifically, the default minimum value changed from Long.MIN_VALUE to 0L. The change in this commit restores the default minimum value for indices prior to 7.14.
ChrisHegarty added a commit that referenced this pull request Aug 1, 2025
…132172) This commit fixes an issue whereby indices created prior to 7.14 that have an index sort on a date_nanos field can no longer be opened (with versions >= 7.14). When opening an index the configured index sort, derived from the index settings, must match exactly that of the sort encoded in the index itself. A change to fix a bug back in 7.14 changed this for date_nanos fields whose index.sort.missing value is absent, see #74760. Specifically, the default minimum value changed from Long.MIN_VALUE to 0L. The change in this commit restores the default minimum value for indices prior to 7.14.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.14.0 v7.15.0 v8.0.0-alpha1

7 participants