Fantastic Data

Distributed Data Cache

The data cache is a small, memory resident distributed database system that effectively shares data among the many nodes in the network. Data records are distributed and cached on multiple, independent nodes to provide system robustness, fault tolerance, and low query latency. The data cache provides filtered data flow to conserve bandwidth and storage. Data records are distributed based upon attributes within the data in order to satisfy the queries expressed on the individual nodes. The data cache efficiently distributes data using an innovative dynamic multicast protocol. It guarantees data delivery as long as the data remains valid and it ensures consistency of the multiple copies.

Data Distribution and Filters
Data messages are disseminated by matching attributes of the data records and the interests of the nodes. Data attribute distribution has many advantages for a sensor network. First it removes any need for knowledge of the nodes or their addresses by application programs. Secondly it establishes the primacy of the data over the source or destination node. This is a more natural means of specifying which data is of interest to an application program.

The data needs of the individual nodes of the network are determined by collecting and analyzing the transactions and queries performed against the database by application programs to form a “data interest” for the node. Data interest statements are expressed with SQL where clauses, giving a powerful and flexible means to tailor the data flow to what is needed. As data interests are dynamically altered, data stored on the nodes and data flowing between the nodes is adjusted to correctly reflect the new rules.

The data cache supports two fundamentally different data distribution methods—-ooze and route. Both move data records to the nodes that have expressed interest in receiving them through queries. The methods are targeted for different application needs and operate very differently. Both methods are generally required for a system.

Data Ooze
The ooze technique efficiently spreads data over many nodes in an area with common interests. It is an excellent technique for use when data interests correspond roughly with location, as for situation awareness data. The distribution of data interest statements is constrained to neighbors. Each node upon receiving a new data record, evaluates that record against the data interests of its neighbors. If a match is found, the record is forwarded to the neighbor. No cross network message traffic is ever generated and there is no need for the expensive maintenance of routing tables. As long as the interested nodes form a connected set, all nodes receive all of the required data. The ooze technique is the most efficient method to distribute data in a mesh network. It is exclusively available in the Fantastic Data distributed data cache.

Data Routing
The route technique moves data efficiently across the network from a source node to one or more destination nodes. This technique is best used when there is little commonality in the interests of neighboring nodes. Data interest statements must be distributed to the entire network. Data records flow back to the destination node along multiple redundant routes. These routes are built and maintained only as needed to satisfy a global query. The initial route is developed by backtracking along the paths that the query took to reach the source. Redundant routes may be used to increase reliability. After the initial routes are in place, they are maintained at each intermediary node using a traditional distance vector cost function. When the data interest is dropped, the routes are also dropped, ending the need for expensive route maintenance operations.

Addressing
The data cache outputs messages addressed for delivery to a subset of its neighbors allowing the lower communication layers to select the most efficient means of delivery. In practice many messages are sent to many neighbors; this can sometimes be more efficiently performed with a local broadcast if that capability is available. In a sensor network, node interests naturally cluster in a neighborhood allowing the system to take advantage of the natural broadcast properties of radio communication to reach many nodes with few messages. At each node, filters are reevaluated to determine if and to which nodes the data should be forwarded. Global network addresses are not required.

Guaranteed delivery and consistency
The data cache provides reliable delivery of data messages among the network nodes and guarantees consistency of the multiple copies. Records are accounted for and delivered independently. Delivery of an individual data record is initiated upon change to that record. In this manner, the data cache delivers smaller quantities of data—only the records that are changed and only upon need.

Attempts to deliver the record to all interested neighbors continue until the record is successfully delivered, or it is changed again, expires, or is deleted. Knowledge of the delivery is kept and may be queried by application programs. New nodes, or nodes that have been offline for an extended period, are provided with all currently valid data records (that match their expressed interest and that they don't already have) when they reconnect.

The data cache allows records to be updated from any node with no application coordination or record locking. The network continues to operate when partitioned and correctly reestablishes data consistency when the partition is repaired. Following a “last change wins” rule provides consistency of copies. This rule produces consistent results at all nodes even if updates are delivered out of order or if intermediate results are missed. Conflicting updates originating at different nodes are consolidated as they flow across the network based on primary key comparison. Data flowing along redundant data paths is consolidated when the paths intersect and subsequent repeated transmissions are suppressed.

Interface Language
The data cache provides object-relational database flexibility (transactions, queries, extended data types). Its interface language is SQL (not the full language, but the most useful parts-create table, delete table, update, insert, delete, and select) to promote easy integration with application programs. The data cache is available for small embedded systems and for Linux systems.

Fantastic Data

Copyright © 2006, 2007 by Fantastic Data. All rights reserved.