Should Distributed Objects Be Stateless?Source: Distributed Objects mailing list
Should objects/components that are used in distributed environments have state (data members) or just function?
Does having state make them non-scalable?
Is Microsoft Transaction Server (MTS) the solution to the scalability issue?
Roger Sessions wrote in his book COM and DCOM: Microsoft's Vision for Distributed Objects, page 259:
If you think you don't care about statelessness in component objects, then either you don't understand distributed objects or you don't understand statelessness, or both. The concept is absolutely fundamental to efficient distributed objects. In the future, designing an object with state will seem as anachronistic as filling a program with goto statements does today.
In later posts, he added:
Object state is today's equivalent of the goto. This is unfortunate. After all, we spent the last five years explaining to everybody how objects consist of state and behavior. Now we will have to spend the next five years explaining to everybody that we were wrong. I am just of guilty of this as everybody else. In our defense, I point out that we were thinking of non-distributed objects back then. We weren't thinking of moving the objects out of our address space and having to share them with other processes.
Let's say I have an Elvis object, and I want to ask him to shake, rattle, and roll, in that order. [...] each method is dependent on the state changes of the others.
From the client's perspective, we have this scenario:
Instantiate ElvisNow assume we our using MTS and its automatic transaction capability. In this case, MTS will automatically begin a transaction at the start of each method invocation, and commit it at the end of the method invocation. Elvis then sees the scenario like this:
Somebody instantiated meThe question is, while Elvis is hanging around doing nothing, can he do something for some other client?
If he starts methods by reading his state and ends them by writing his state, then he is free to work for somebody else during the hang around times, which are likely to be very considerable. This is the meaning of statelessness and object pooling.
If he doesn't start methods by reading his state and end by writing his state, then he is responsible for keeping his state in order so it is ready for the rattle invocation. In this case, he can't do anything for anybody else while he is hanging around doing nothing. If he did, then his state wouldn't be as the first client left it after he completed the first client's shake, and his rattle would fail. This is the meaning of statefulness.
Actually, Microsoft doesn't really support this today. Instead, it deinstantiates Elvis after the method is finished, so that the resources can be reused by another Elvis. It then instantiates a new Elvis just in time for the rattle invocation. The client never sees any of this. From the clients perspective, Elvis is only instantiated once (at the beginning of the program) and deinstantiated once (at the end of the program). This instantiation/deinstantiation may sound less efficient than allowing Elvis himself to work for another client, but actually, there isn't much difference between these two techniques from a client perspective (or from an Elvis implementor's perspective). Both require the same statelessness to work. There may (or may not) be a significant performance impact of the extra instantiations. I haven't seen any performance measurements in this area.
Ron Resnick commented:
[...] I realized that it wasn't just any arbitrary distributed objects that Roger was suggesting ought to be stateless, but rather objects in the middle tier of a three-tier architecture [client-server-database -YS]. The backend, it later became clear, is certainly stateful in Roger's view - it's a traditional database.
In the example above, where Elvis is a supposedly 'stateless' middle-tier object, peculiar how it is described as "I better read in my state from the database...". Um, which state? My state?
In effect, the object isn't really stateless at all - it has state, only there is an implementation decision to store that state external to the object itself, in a database. But from a modeling perspective, the object is most certainly stateful - otherwise it would really be very hard to call it an 'object' [...].
As Roger writes: "The concept [statelessness] is absolutely fundamental to efficient distributed objects."
By 'efficient', I'm understanding Roger to mean that the implementation of state in distributed objects should be done externally to the object from performance considerations, and not that he thinks that objects should be modeled (in the OOA&D sense) as stateless. [...]
In fact, the database is being used here as just a dumb store, an extension of the logical, modeled object. It's there to inflate the object with state at the moment of use, and to deflate it the moment after. To this end, it's functionally no different than any other persistency mechanism, including a file system or raw disk. Oh sure, the db gives you value-added row/table operations, and transactional stuff that you don't have to implement yourself, but it's not being used as an active part of the logic of the system. The logic is all in tier 2, and the db is just an externalization mechanism.
To illustrate what I'm getting at here, consider a Java framework in which objects serialize themselves to a stream, and then unserialize later. Most people would consider these objects to be stateful, and to merely be manipulating the physical location of that state in the runtime setting.
Or, consider a CORBA framework in which the server objects are activated and deactivated as needed by the ORB runtime. When the client disconnects from the server object, eventually the ORB 'collects' the object and deactivates it. That deactivation (at least in Orbix, to my recollection) typically involves the removal of the server process that was housing the object, meaning that the server object had better persistently write out any values it wants to survive the death of its process, and read them back in on re-instantiation. Consider also what happens if the server process isn't killed by the ORB, but is just left to run, idling, with no client connections for a while. On a sufficiently busy system with virtual memory - say a typical Unix or NT environment, eventually the OS will say 'hey! this process is just idling and gobbling precious RAM. Time to swap it out'. Net result? The server process, which contains the server object, which contains the server object state, is now externalized to disk as a virtual memory paging operation, transparent to itself and the ORB. Virtual memory serves here as a (non-transactional) 3rd tier. When a client does eventually make an invocation on this server object, a page fault will occur on the upcall causing the process to be reloaded. Do we call such an object stateless? Not typically.
Roger Sessions also said:
One more point I need to make. I have no problem with non-distributed objects containing state. When I say objects can't have state, I am really talking about distributed objects, or what would probably be better to describe as components. Components can, and should be, made up of objects, and often stateful objects. Those objects can maintain their state for many method invocations, as long as they eventually flush it out sometime before the component level method finishes its execution.
Ron Resnick replied:
Hmm. Ok, so I think you have a different definition of 'component' than I do - to you, it seems to be a composite medium/large grain object, composed of finer grain objects. Here's how Mark Baker & I defined component:
"A component is an object which has a certain specification autonomy associated with it. Components can thus be aggregated together into a collection at a much later stage than can language objects which are not components. The degree of this autonomy and dynamic aggregation varies with the component system. In some cases, components are aggregated by a programmer who either visually assembles them, or writes program scripts to connect them, or both. In these cases, there is still a programming activity associated with the bindings between the objects. Tools which support this style of component usage thus make clear distinctions between their creation and execution environments. Visual assembly of OLE objects or Java Beans is an example of this form of component system. In more extreme autonomous component systems, the components may be assembled literally at the time of use, with no explicit programming required. This effectively blurs the creation and execution phases into one. Clearly, component aggregations performed dynamically in this fashion place greater responsibility and additional semantics on the components themselves. Hence, they simplify the tasks of the humans using the components at the expense of making the component creation and specification effort proportionately more complex."
I.e. we consider a component to be an object, regardless of size or granularity or compositeness, which exposes enough of itself to permit some form of later-than-compile-time aggregation.
How Microsoft Transaction Server Changes the COM Programming Model,
Microsoft Systems Journal Jan-98
ActiveX Questions and Answers,
Microsoft Systems Journal Mar-98