|
OTHER APPROACHES
Data warehouse,
federated database and enterprise search (to a lesser extent) systems are all
conventional approaches to the same challenge of dealing with data; where it resides; how to access it;
how to cope with "dirty data", typos, etc.; different types; different
standards; different formats; different security; different locations; different
owners; different systems; etc. Each conventional approach has its advantages
and disadvantages. WhamTech combined technologies from these three approaches to
provide dataless hybrid products, retaining the advantages and overcoming the
disadvantages of each approach.
For structured database-type queries, query success is the same
as, if not better than data warehouses and a lot better than conventional
adapters in federated database systems.
For unstructured text search, query
success is similar to search engines.
The reason for the high query success is
that WhamTech products uniquely clean up and standardize data that is read from
data source logs, or some other means, for indexes and then discarded, enabling
queries submitted to the indexes to be highly successful. Further, WhamTech
products enable advanced indexing including text search, fuzzy, aggregation,
calculation and compound indexes.
WhamTech products retain the
advantages of data warehouse systems: Clean data Indexes: multiple types Query
processing: multiple options Security: data and access …and at the same time,
retaining the primary advantage of federated database systems: Data remains at sourceEIQ Products derive their unique index
and query processing technologies from previously marketed database and search
products.
WhamTech
and its predecessors developed a relational very large database (VLDB)
technology, called D the Data Language and later, Thunderbolt, that was extremely fast compared to other
similarly configured database systems. This appealed to a niche data processing
intensive market; however, the real technology
differentiator was the unique index and query processing technologies used in the
current product, EIQ Server®, which indexes and processes queries against almost
any and all data sources. EIQ stands for External Index and Query. From
an operational and structural point of view, the relational index and query
management system (RIQMS) embedded in EIQ Server, is a conventional
relational database technology, with tables (virtual, in the case of EIQ
Server), indexes, and typical database operation commands and
queries. It is NOT a memory-resident system; NOT a read-only
system; NOT a fully inverted database system; and NOT a
retrieval-based storage system. There are, however, significant
performance and capability differences that distance EIQ Server's
RIQMS from other database and related technologies: Unique Method of
Query Execution
WhamTech's RIQMS has a unique method of isolating, connecting,
arranging (sorting), processing (updating), and presenting (displaying) data.
This unique technology enables real-time data isolation and access; no matter
the database size, number of concurrent users, or query complexity. Unique Combination of Technologies
WhamTech's RIQMS's speed and advanced capabilities arise from
a unique combination of three binary-level methodologies involving the three "Bs"
of computing that are normally associated with static data warehousing, not on
live data sources, as WhamTech provides for, and would be very difficult to
improve on:
 | Balanced binary trees that scale to billions of records,
terabytes of data |
 | Virtual bitmap
representations of intermediate and final query result-sets (known as
Collections)
- these can be integer lists or actual bitmaps, depending on
field data node-level "data density" |
 | Boolean operations on
Collections |
Balanced binary trees
are a technology from the 1960s and the attraction then, as it is now, is that
binary searches are considered to be the fastest method of searching ordered lists[1],
however, there are a number of practical problems associated with balanced
binary trees; all of which WhamTech has solved (and that is the main secret
to our success):
 | PROBLEM: Levels tend to get very deep, whereby a
binary tree consisting of a billion
nodes,
for example, needs 30 levels; this translates into time to traverse
WHAMTECH SOLUTION: WhamTech's implementation does not conform to the
conventional n = log2(x+1)
balanced binary tree rule, where n = number of levels and x = number of
nodes. Instead, WhamTech's implementation "leaps" levels to make
binary tree traverse-time measurable in microseconds rather than
milliseconds or seconds
|
 | PROBLEM: Rebalancing and
rotation
after an insert or delete can take considerable time and a very large
number of nodes can be affected
WHAMTECH SOLUTION: The maximum number of nodes that need rotated
is just over 100; however, typically only 10 to 15 nodes are
rotated after an insert or delete
|
 | PROBLEM: A worst-case scenario of deletion of a top node,
which is faced by almost all tree structures
WHAMTECH SOLUTION: This is one of the easiest operations for
WhamTech's RIQMS to deal with, as it fits the technology well |
With WhamTech's RIQMS, when a balanced binary tree is
subjected to a query, the result is an integer list set of pointers to data in the data
source. For subsequent query-related operations, this set is either used directly or is
converted to a Collection; an example is shown in the following diagram:

Complex queries are rendered simple by treating them as combinations of
lightning-fast queries on multiple balanced binary tree indexes. Once single-field Collections are isolated, they are combined using the full range of
Boolean arithmetic operations to provide a complex query result set, as shown in
the following example:

The bits in the final complex query result set represent the
final result-set record numbers in the data source or multiple data sources.
Remember that Collections need not be bitmaps and are in most cases integer
lists. Boolean operations can be performed on actual, virtual bitmaps
(integer lists) or both in combination.
WhamTech's RIQMS allows extremely fast queries and updates involving a minimal
number of nodes, regardless of the size of the data source or the
cardinality of
the data. WhamTech solved problems that have perplexed balanced binary
tree researchers for decades and forced most companies to tend towards
non-binary index tree structures, such as B+ trees, or other forms of indexes, where
branches are > 2. These non-binary tree structures do not allow for a
simple 0 or 1 decision, but impose more complex decision-making algorithms on
query processing at every node encountered, which increases traverse-time.
The core indexing technology code has remained untouched for
over 15 years, as the algorithms and code are stable and bug-free. As an example,
WhamTech's legacy database product, Thunderbolt, was used at a NYSE $2+ billion per year revenue
IT services company called ACS (www.acs-inc.com)
for mission critical 24/7 operations support for over 12 years, generating considerable revenue for ACS.
The IP associated with WhamTech's RIQMS is indirectly protected through
awarded patents.
Real-time Indexes
WhamTech's
real-time indexes achieve rates from 100s/1000s to 10s of 1000s of records
inserted/updated per second on low-level servers - a high-end example achieved a
query and insert rate of 80,000 records per second on a dual-933 MHz server. Real-time indexes
establish a new method for dealing with large-scale data and information issues,
from active (or real-time) data warehousing to near real-time database
performance thought only possible with memory-resident databases. Many
applications are tending towards real-time, e.g., interactive customer
relationship management (iCRM), inventory management, supply chain management
(SCM), and decision support systems (DSS).
The largest problem faced by database vendors, in general, is enabling
simultaneous queries (simple and complex) and data changes (inserts, deletes, and
updates) on data and indexes for VLDBs
such that they remain synchronized.
Most database systems are designed for transactions, a large
number of users, and simple queries; for such systems, updates are mainly
insertions and sequential. Most data warehouses are designed to be normally static, with
reduced subsets of data, a small number of users, and complex queries; for such systems,
updates are typically performed in regular batches. Complex queries
on most database systems can
cripple performance, particularly on VLDBs, but not
WhamTech's...
... WhamTech's unique
RIQMS allows for an entirely new approach to near real-time data and information
integration and sharing systems.
Complementary versus replacement technology
WhamTech's EIQ Server is intended to complement, not replace, existing
operations database systems, which are usually transaction-oriented, to
enable:
 | Growth to accommodate a larger number of users |
 | Additional capabilities for existing systems |
 | Advanced capabilities not available with existing systems |
EIQ Server, is the ultimate non-intrusive system,
as it externally indexes and processes queries against existing
data sources, including most databases. EIQ Server also
allows unstructured text search on databases as well as
unstructured data sources such as files, documents and e-mail, and
semi-structured data sources such as spreadsheets and XML.
EIQ Server has a unique approach to querying semi-structured data
sources using structured queries. EIQ Server can also
execute structured queries on unstructured data if entity
extraction tools are used to build indexes.
A great combination - database and search technologies
Many professionals in the database industry are predicting the convergence
of database and search technologies. Some database vendors already bundle
limited search capabilities in with their products, and some search engine
companies bundle limited database capabilities in their products. EIQ
Server is
a much more seamless integration of database and search technologies -- because
the search technology itself is based on the same RIQMS as the database query
technology.
It is estimated that 85% of
all enterprise data exists outside structured databases, and with
company mergers and internal reorganizations, the remaining 15% could be in separate databases, with different
structures and field names. EIQ Server, potentially offers one-stop access to all
corporate data regardless where it resides through either structured queries,
unstructured text searches or BOTH on the same data sources. EIQ Server is
truly a great combination of database and search technologies.
Reference: 1. C. William Gear, Applications and
Algorithms in Computer Science, Science Research Associates, Inc, 1978,
p. A107. |