19
$\begingroup$

In object-oriented languages, there is typically a notion of object identity which is distinct from object value, such that two objects can have the same values but different identities. For example, in Python this is the distinction between comparisons using is and comparisons using ==; in Java it's the distinction between == and .equals().

However, in pure functional languages, there tends not to be a notion of object identity. Even when a type must necessarily be implemented using references (e.g. a recursive type), the language doesn't have a sense in which two values of that type can be equal but "not the same object".

Immutability definitely has something to do with it, because two different mutable objects equal by value might not stay equal by value. But object identity is still sometimes a useful concept in object-oriented languages, even for classes that are deeply immutable; for example, "fresh objects" can be used as unique identifiers, or as sentinel values. Are such use-cases just not useful enough in the functional paradigm to add this as a language feature, or are there concrete reasons why it cannot or should not be implemented? Are there any examples of pure functional languages which do have object identity as a concept?

$\endgroup$
2
  • $\begingroup$ Reference equality is also often used as a short-circuit when implementing value equality, especially for collections where it’s useful to avoid iteration if possible. $\endgroup$ Commented Jun 1, 2023 at 23:40
  • 3
    $\begingroup$ @Bbrk24 True, but the language implementation can do that without exposing it as a language feature. $\endgroup$ Commented Jun 1, 2023 at 23:44

2 Answers 2

35
$\begingroup$

In a pure functional language, functions are referentially transparent — that is, any function call should be able to be replaced by its value, and vice-versa, without affecting the semantics of the program, and two calls with the same arguments produce the same result. Allowing (inspectable) object identity breaks this property.

Supposing a function that creates an object:

makeObj x = object { x = x } 

In a pure system it must be the case that

makeObj 5 == makeObj 5 

is true, and that it remains true even if these two instantiations were widely separated in time or location. If each object creation has a detectably different identity, the program can branch based on those differences, and that property doesn't hold. The same is true for less-direct ways of creating an object.

Visible object identity is just inherently impure, so you won't find pure languages that have it.


This is assuming an identity-comparing == operation, but it could also be a key in a map, a piecewise function definition, etc — anything that distinguishes two value-equal objects by identity is going to break transparency. You could construct a function like this:

sentinel = makeObj 5 choose v = case v of sentinel -> 4.0 _ -> 5.0 choose sentinel / choose (makeObj 5) == 0.8 choose (makeObj 5) / choose sentinel == 1.25 

with different return values based on whether you pass in makeObj 5 or sentinel, but these ought to be equivalent. As soon as identity differences are exposed to the programmer somehow we have this issue unless they're unable to have any effect on behaviour... and then we've lost object identity again.


As far as use cases go, using fresh objects for sentinel values in object-oriented languages is really reinventing a functional-programming feature within the object-oriented paradigm. The conventional functional approach to situations where you need sentinel values is an unparameterised algebraic data type value, and a unique singleton object fills somewhat the same niche. The better functional design probably doesn't use objects, so this probably wouldn't be a reason for including identity.

Unique identifiers could be useful, perhaps as something like a capability, but there's not a lot of reason to tie that to identity of other values. Unique identity by creation site in the source code is possible in a pure system and would accommodate that sort of usage, but other ways to obtain them probably fall foul of referential transparency again.

$\endgroup$
5
  • $\begingroup$ I was thinking of referential transparency but thought that it would still hold, as long as the == in the definition means value equality rather than reference equality. Perhaps this still causes problems once you start branching based on reference equality ─ could you give a concrete example of this, e.g. a "pure" function which would return 1 the first time you call it and 2 the second time? $\endgroup$ Commented Jun 2, 2023 at 0:05
  • 3
    $\begingroup$ Here == is standing in for any way of distinguishing different identities - maybe it's a key in a map or something, but anything you can do that has two value-equal objects able to branch differently has a breach following on from it. I added a function that behaves visibly differently at the value level with two should-be-identical arguments. $\endgroup$ Commented Jun 2, 2023 at 1:00
  • $\begingroup$ "anything that distinguishes two value-equal objects by identity is going to break transparency“ – coincidentally (or maybe not so coincidentally), this is also true for object-oriented programming. Objects that respond to the same messages in the same way are indistinguishable, which makes it possible to simulate objects. Inspectable Identity breaks this. This means, for example, any Java program that uses == is not object-oriented – an important lesson that is often forgotten when teaching OO using Java. $\endgroup$ Commented Feb 25, 2024 at 16:58
  • $\begingroup$ "Wow! So no now() in pure functional languages!?" So it seems. $\endgroup$ Commented Mar 17 at 13:51
  • 2
    $\begingroup$ @PabloH Of course now() is possible in pure functional languages, but it is not a pure function with no arguments and one return value that is just a plain old number. Similar story in pure logic languages where there is the potential for backtracking over now(). Linear types, monads, and linear modes really can change the world. $\endgroup$ Commented Mar 18 at 2:05
1
$\begingroup$

I think fundamentally, the reason "pure functional" languages don't have object identity, is because object identity is typically a synonym for address identity or storage location identity, and pure functional languages don't have such a concept.

The reason functional languages don't have this concept is because they are primarily for the use of those whose background is mathematics and whose application of the computer will be primarily mathematical, and who as mathematicians usually don't have (and don't need to have) a strong systematic concept of storage. The compilers of such languages manage the use of local storage automatically in a way that the programmer can fail to know that such a function is even being performed.

What I mean by storage is the concept of data having a physical placement and organisation, and of the existence of operations that are designed to alter the physical placement of data (or alternatively put, operations that are designed to alter the data present in a physical place).

There is no assignment operator in conventional mathematics, in the sense of an operator that purely commands a value to be moved or copied from one place to another.

There is a tendency for mathematicians to think of values as abstract objects, that exist in no particular place (because in truth it exists as an idea in their minds), and whose occasional physical manifestations or representations can spring in and out of reality without much consideration. Two representations of the number 1, are still the same number 1.

Computer programmers however think of a value as being the setting of a field in storage. In a hypothetically static environment, the distinction may not seem obvious. It becomes more obvious when you are engaged in the process of making changes to physical storage in a machine or system that has moving parts and change - when you are engaged in programming how those changes are to be done.

Programmers do understand and use (on the appropriate occasions) the mathematical notion of values and value equality, but there is the possibility of additional comparisons which arise in programming, which is not just whether the values of two different variables are the same, but whether two references to storage locations are to the same location. The same values in different locations often mean different things, too.

When values arise from the same storage location with two aliases, the value in such a case is not just the same, but connected by means of a reference to a common physical place of storage, in the sense that alteration via one reference will change the value subsequently retrieved by the other reference because they are both working via the same storage location.

In most industrial programming languages, you see, a named variable is primarily an alias for a storage location, not an alias for a particular value or a algebraic placeholder which is later bound to a value.

These industrial languages are more popular because they proceed from conceptualisations appropriate to data processing, where the physical movement and placement of data, and the changing of data in particular places in response to ongoing events and control actions of all kinds, is an important aspect of what the machine is being programmed to do.

Summing it all up, pure functional languages are special purpose languages and therefore do not expose certain facilities and concepts of real physical computers, such as storage, and such as identity of storage locations.

$\endgroup$
2
  • $\begingroup$ "they are primarily for the use of those whose background is mathematics and whose application of the computer will be primarily mathematical, and who as mathematicians usually don't have (and don't need to have) a strong systematic concept of storage" << The Erlang community might strongly disagree with that statement. Erlang is a functional language developed for telecommunications. Functional programming comes with guarantees of safety, robustness and fault-tolerance that are very useful in practical engineering. $\endgroup$ Commented May 16 at 11:06
  • $\begingroup$ @Stef, but Erlang is not a pure functional language (in fact it was designed to respond to control and to manage the evolution in the state of a system of telephone exchanges), and is highly obscure anyway. The kind of "safety, robustness, and fault-tolerance" that you mean has a narrow, highly-circumscribed definition that does not in fact provide useful guarantees for the vast majority of computer applications, despite the generality of how you state the claim. $\endgroup$ Commented May 16 at 13:12

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.