Two programs in the same programming language are called observationally equivalent whenever they are interchangeable in all observable contexts.
The same notion is also known by the name of (Morris-style) contextual equivalence. In fact, there may a subtle distinction between observational and contextual equivalence depending on the definition, but they often coincide.
A denotational semantics whose equality captures observational equivalence is called fully abstract.
The above definition is intentionally underspecified: it fundamentally depends on the notion of observable context. Roughly, observable contexts are programs of the same language in which a term has been 'punched' out, and replaced by a hole. This hole can be 'filled' by a program fragment - possibly subject to certain subject restrictions - to obtain a complete program.
An observable context can be considered as an experimental apparatus: one may plug a program into that hole to obtain an actual program, and then evaluate that program to observe the result of the computation. Thus, the shape and the type of the observable contexts determine the kind of 'experiments' we are allowed to run on programs, which in turn determine what is observable.
Filling the hole in a context is crucially distinct from substituting for a variable in an open term for at least two reasons:
To give a concrete definition of observational equivalence, we need to fix the programming language/calculus, and the notions of a context and of an observation. Below we give several examples.
Consider a typed -calculus system with the type of booleans (for example, STLC).
A (typed) context is a program with a hole, such that if , then (by we denote the 'filling' of the hole in with a term ).
For example, is a context that can be typed as .
Then, two term and are observationally equivalent at type in a context iff
for any it is the case
Under this definition, the context may observe the final value of the program (whether it evaluates to or ), and the termination behavior of the program (whether it terminates or diverges).
In particular, this definition equates all diverging programs.
We can also formulate a different version of observational equivalence, that does not take termination behavior into account:
Leibniz equality can also be seen as an instance of observational equivalence.
In this setting, contexts are predicates, and observations are validity of those predicates.
Two terms are Leibniz equivalent if for any predicate it is the case that
Please expand.
Observational equivalence is normally defined through contexts, as explained above. However, there is an alternative approach to defining it, namely as the largest type-respecting binary relation that is a congruence, and also adequate. The observational aspect is encoded in adequacy: the relation is adequate whenever and implies .
This approach is due to [Gordon 1988], and was used intensively by [Lassen 1998]. See also §7.5 of the tutorial chapter [Pitts 2005].
The concept of observational equivalence is thought to have first arisen in James Morris' thesis. It was first applied to -normal forms of the untyped -calculus, and called extensional equivalence.
Morris, James Hiram. 1969. ‘Lambda-Calculus Models of Programming Languages’. PhD thesis, Massachusetts Institute of Technology. http://hdl.handle.net/1721.1/64850.
@phdthesis{morris_lambda-calculus_1969,
type = {{PhD} thesis},
title = {Lambda-calculus models of programming languages},
url = {http://hdl.handle.net/1721.1/64850},
school = {Massachusetts Institute of Technology},
author = {Morris, James Hiram},
year = {1969}
}
Pitts, A. M. 2005. ‘Typed Operational Reasoning’. In Advanced Topics in Types and Programming Languages, edited by B. C. Pierce, 245–89. The MIT Press. [pdf]
@incollection{pitts_2005,
title = {Typed {Operational} {Reasoning}},
booktitle = {Advanced {Topics} in {Types} and {Programming} {Languages}},
publisher = {The MIT Press},
author = {Pitts, A. M.},
editor = {Pierce, B. C.},
year = {2005},
pages = {245--289}
}
Lassen, Søren. 1998. ‘Relational Reasoning about Functions and Nondeterminism’. PhD dissertation, Aarhus University. [pdf]
@phdthesis{lassen_1998,
type = {{PhD} dissertation},
title = {Relational {Reasoning} about {Functions} and {Nondeterminism}},
school = {Aarhus University},
author = {Lassen, Søren},
year = {1998},
note = {BRICS Dissertation Series BRICS DS-98-2}
}
Gordon, Andrew D. 1998. ‘Operational Equivalences for Untyped and Polymorphic Object Calculi’. In Higher Order Operational Techniques in Semantics, edited by A. D. Gordon and A. M. Pitts, 9–54. Publications of the Newton Institute. Cambridge University Press.
@incollection{gordon_1998,
series = {Publications of the {Newton} {Institute}},
title = {Operational equivalences for untyped and polymorphic object calculi},
booktitle = {Higher {Order} {Operational} {Techniques} in {Semantics}},
publisher = {Cambridge University Press},
author = {Gordon, Andrew D.},
editor = {Gordon, A. D. and Pitts, A. M.},
year = {1998},
pages = {9--54},
}