Tuesday, October 05, 2004

Always a bigger scope

In the context of a DBMS, there is some collection of data that is persistent, shared, transacted, etc., which constitutes the database. The simplest complete logic system that has been presented for defining and managing this collection of data is the Relational Model (RM). In the RM, all data is ultimately represented by a set of named relation (table) variables. In both traditional 3GL and relational languages, there are also other, more narrow, layers of scope, which describe the context available to a particular portion of logic. Within an expression, there might be identifiers available from the expression, identifiers available from the containing block, as well as the set of global identifiers.
Let's now look at a scoping issue that commonly arises. In this example, let's say we have constructed an accounting system in either a 3GL or DBMS logic system (it doesn't matter which for this analogy, though it certainly would matter in actuality!). Assume that our original implementation encompasses accounting for a single company, and we are subsequently requested to provide accounting for multiple companies. One obvious solution would be to construct multiple instances of the accounting system. By doing this we are in essence defining another implicit level of context to which all data in the system is tied. Another approach would be to refactor the existing system, essentially pairing a company identifier with each existing predicate of the system. This would make the company dimension an explicit part of the system's propositions.
There seem to be advantages to each approach. The multi-instance approach has the advantage of simplicity. Not only does the system not have to be redesigned and reimplemented, the logic of the system does not have to deal with the complexity of multiple companies. A re-design of the system would be far-reaching and would require nearly every facet of the system to change to address the additional company attribute. This is so because the company in question is not part of the implicit context of the executing logic, but an explicit facet of the data. We can draw from this and other examples that increasing implicit context narrows the scope and reduces the domain of explicit data. This in turn, allows more assumptions to be made by the logic and therefore nets simplification. On the other hand, the fact that the data is implicit to the instance is also potentially a disadvantage. For instance, what can be done if a report spanning companies is requested? Also, does the set of postal codes really need to be per-company?
In the current example, the multi-instancing was provided at the border between logic systems (between the accounting system's logic system and that of its host environment--presumably an operating system). This change in logic systems is problematic at best, especially when one of the logic systems is as dramatically low-level as an operating system. It is not the fact that we have contextualized the accounting system that makes it potentially difficult for us to construct a report that spans companies, it is the fact that we have done so in a low level system. Can instance-type scoping then be accomplished within a single logic system? Object Oriented Programming (OOP) languages provide one example of just such a logic system. In OOP languages, instances ("objects") contextualize units of logic ("classes") such that maximal contextualizing ("encapsulation") occurs. This allows reduction of explicit data and simplification of code. For instance, I might have logic that manipulates accounts. If that logic can assume a particular company and even a particular account, then the job of creating debit or credit transactions becomes simpler. It also insulates logic from more general changes. On the other hand, there are times when the very nature of the logic at hand is to deal with broader levels of scope, such as multiple accounts or companies. This task might prove difficult in a logic system that assumes maximally narrow scoping (such as OOP languages).
Can we then extrapolate from all this that appropriate scoping is a facet of each portion of logic, not an explicit facet of the data? This assumption would allow logic to exist at any level of scoping. A ramification of this is that logically all data is ultimately part of one giant collection (sound familiar) within which we are taking some limited perspective. Otherwise the scoping would also be defining limits on the availability of data and would suffer from problems as a result (like not being able to build cross-company reports).
Can all data be considered global then? Let's take another multi-instancing example, but this time let's assume that the instances are on separate computers. Let's say we have an system that analyzes a collection of interstellar "noise" looking for patterns, and we wish to run this system on multiple machines in order to increase the overall capacity. How can we maintain a single logic system across all instance boundaries without including virtually everything that the machine can possibly access in that collection of global data? We surely couldn't copy every available digital fact onto a machine and then keep it in sync!
A powerful concept, "physical data independence" suggests that copying the internet to our machine isn't necessary in order to maintain a single logic system. I am suggesting that perhaps all data accessible to the machine could logically be part of the database, not physically (we are talking about logical topics anyway). As an naive example, imagine a query like this:

select WebPage where URI = 'http://www.alphora.com/';
Now I must point out that at we are treading on dangerous turf and must watch our step. There are quite a few restrictions and assumptions made for good reason in the context of a DBMS, and so we should carefully consider each of them before naively throwing a mishmash of non-deterministic mayhem into the definition of the database. Nevertheless I am going to ignore these considerations for the present time in order to remain in the realm of broad strokes.
Ending my post, I deduce thus far that a logic system that provides a complete computing environment (an operating system) would need to provide at least the following:

  • A mechanism whereby the scope for a given portion of logic can be explicitly narrowed or widened. I see this as more than just providing local scopes within operators and expressions. This item may not technically be a logical need, but I do suspect it would be a practical one for several reasons.
  • A global set of relation variables logically encompassing every possible fact available to the system.

What fun!

No comments: