|
|
|
| A unique approach to
providing data and information integration, sharing and
interoperability solutions |
Unique Approach
 | External index and query (EIQ)
Products are based on index and query processing layers
built and maintained external to data sources, where the
indexes use cleansed, formatted and standardized data
read from data sources and then discarded -- data remains
stored in data sources |
 | EIQ
Product layers reside behind or in front of corporate
firewalls, can be locally, regionally or centrally located,
and absorb the heavy load of maintaining indexes
and processing queries |
 | At
least twelve
change
data capture (CDC) options are used to read source data to
maintain indexes, ranging from real-time to batch |
 |
Indexes typically have the same structures or schema as the
respective data sources; relational, non-relational or
unstructured -- there is no structure or schema
transformation |
 | EIQ
Products are accessed through standard drivers (ODBC, JDBC,
OLE-DB, etc.) or Web Services, and queries submitted
in standard query language (SQL) |
 | Query processing yields pointers back to raw data in
data sources -- record numbers, indexed key(s), URLs
(with or without locations), RDFs, file positions, etc. |
 | Raw data is retrieved from data
sources through standard or proprietary user-level
access, and cleansed, formatted and standardized for
eventual provision to calling applications |
Other Approaches
Data warehouse,
conventional federated adapters and enterprise search (to a lesser extent) systems are all
conventional approaches to the same challenge of dealing with data; where it resides; how to access it;
how to cope with "dirty data", typos, etc.; different types; different
standards; different formats; different security; different locations; different
owners; different systems; etc.
Each conventional approach has its advantages and disadvantages. WhamTech combined technologies from these three approaches to
provide dataless hybrid products, retaining the advantages and overcoming the
disadvantages of each approach.
For structured database-type queries, query success is the same as, if not
better than data warehouses and considerably better than conventional
federated adapters.
For unstructured text search, query
success is similar to search engines.
The reason for the high query success is that WhamTech products uniquely have
indexes that comprise cleansed, formatted and standardize data that is read from
data source logs, or some other means, and then discarded. This enables
queries submitted to the indexes to be highly successful. Further, WhamTech
products have advanced indexing including text search, fuzzy, aggregation,
calculation, compound, join, denormalized, embedded value and link indexes.
WhamTech products retain the
advantages of data warehouse systems:
 | Clean data
-- indexes and results data |
 | Multiple
indexes and types |
 | Query
processing: multiple options |
 | Security: data and access |
and at the same time,
retain the primary advantage of conventional federated adapters:
 | Data remains at source |
Legacy Development
EIQ Products derive their unique index
and query processing technologies from previously marketed database and search
products.
WhamTech
and its predecessors developed a relational very large database (VLDB)
technology, called D the Data Language and later, Thunderbolt, that was extremely fast compared to other
similarly configured database systems. This appealed to a niche data processing
intensive market; however, the real technology
differentiator were the unique index and query processing technologies used and in the
current EIQ Products, which index and process queries against almost
any and all data sources.
From
an operational and structural point of view, the relational index and query
management system (RIQMS) embedded in EIQ Products, is a conventional
relational database technology, with tables (virtual, in the case of EIQ
Products), indexes, and typical database operation commands and
queries. It is NOT a memory-resident system; NOT a read-only
system; NOT a fully inverted database system; and NOT a
retrieval-based storage system. There are, however, significant
performance and capability differences that distance EIQ Products' RIQMS from other database and related technologies:
Unique Method of
Query Execution
WhamTech's RIQMS has a unique method of isolating, connecting,
arranging (sorting), processing (updating), and presenting (displaying) data.
This unique technology enables real-time data isolation and access; no matter
the database size, number of concurrent users, or query complexity.
Unique Combination of Technologies
WhamTech's RIQMS' speed and advanced capabilities arise from
a unique combination of three binary-level methodologies involving the three "Bs"
of computing that are normally associated with static data warehousing, not on
live data sources, as WhamTech provides for, and would be very difficult to
improve on:
 | Balanced binary trees that scale to billions of records,
terabytes of data |
 | Virtual bitmap
representations of intermediate and final query result-sets (known as Collections)
- these can be integer lists or actual bitmaps, depending on
field data node-level "data density" |
 | Boolean operations on
Collections |
Balanced binary trees are a technology from the 1960s and the attraction then, as it is now, is that
binary searches are considered to be the fastest method of searching ordered lists[1],
however, there are a number of practical problems associated with balanced
binary trees; all of which WhamTech has solved (and that is the main secret
to our success):
 | PROBLEM: Levels tend to get very deep, whereby a
binary tree consisting of a billion
nodes,
for example, needs 30 levels; this translates into time to traverse
WHAMTECH SOLUTION: WhamTech's implementation does not conform to the
conventional n = log2(x+1)
balanced binary tree rule, where n = number of levels and x = number of
nodes. Instead, WhamTech's implementation "leaps" levels to make
binary tree traverse-time measurable in microseconds rather than
milliseconds or seconds
|
 | PROBLEM: Rebalancing and
rotation
after an insert or delete can take considerable time and a very large
number of nodes can be affected
WHAMTECH SOLUTION: The maximum number of nodes that need rotated
is just over 100; however, typically only 10 to 15 nodes are
rotated after an insert or delete
|
 | PROBLEM: A worst-case scenario of deletion of a top node,
which is faced by almost all tree structures
WHAMTECH SOLUTION: This is one of the easiest operations for
WhamTech's RIQMS to deal with, as it fits the technology well |
With WhamTech's RIQMS, when a balanced binary tree is
subjected to a query, the result is an integer list set of pointers to data in the data
source. For subsequent query-related operations, this set is either used directly or is
converted to a Collection; an example is shown in the following diagram:

Complex queries are rendered simple by treating them as combinations of
lightning-fast queries on multiple balanced binary tree indexes. Once single-field Collections are isolated, they are combined using the full range of
Boolean arithmetic operations to provide a complex query result set, as shown in
the following example:

The bits in the final complex query result set represent the
final result-set record numbers in the data source or multiple data sources.
Remember that Collections need not be bitmaps and are in most cases integer
lists. Boolean operations can be performed on actual, virtual bitmaps
(integer lists) or both in combination.
WhamTech's RIQMS allows extremely fast queries and updates involving a minimal
number of nodes, regardless of the size of the data source or the
cardinality of
the data. WhamTech solved problems that have perplexed balanced binary
tree researchers for decades and forced most companies to tend towards
non-binary index tree structures, such as B+ trees, or other forms of indexes, where
branches are > 2. These non-binary tree structures do not allow for a
simple 0 or 1 decision, but impose more complex decision-making algorithms on
query processing at every node encountered, which increases traverse-time.
The core indexing technology code has remained
untouched for over 20 years, as the algorithms and code are stable and bug-free. As an example,
WhamTech's legacy database product, Thunderbolt, was used at a NYSE $2+ billion per year revenue
IT services company called ACS (www.acs-inc.com)
for mission critical 24/7 operations support for over 12 years, generating considerable revenue for ACS.
The IP associated with WhamTech's RIQMS is indirectly protected through
awarded patents (see Patents).
Real-time Indexes
WhamTech's
real-time indexes achieve rates from 100s/1000s to 10s of 1000s of records
inserted/updated per second on low-level servers - a high-end example achieved a
query and insert rate of 80,000 records per second on a dual-933 MHz server. Real-time indexes
establish a new method for dealing with large-scale data and information issues,
from active (or real-time) data warehousing to near real-time database
performance thought only possible with memory-resident databases. Many
applications are tending towards real-time, e.g., interactive customer
relationship management (iCRM), inventory management, supply chain management
(SCM), and decision support systems (DSS).
The largest problem faced by database vendors, in general, is enabling
simultaneous queries (simple and complex) and data changes (inserts, deletes, and
updates) on data and indexes for VLDBs
such that they remain synchronized.
Most database systems are designed for transactions, a large
number of users, and simple queries; for such systems, updates are mainly
insertions and sequential. Most data warehouses are designed to be normally static, with
reduced subsets of data, a small number of users, and complex queries; for such systems,
updates are typically performed in regular batches. Complex queries
on most database systems can
cripple performance, particularly on VLDBs, but not
WhamTech's...
... WhamTech's unique
RIQMS allows for an entirely new approach to near real-time data and information
integration and sharing systems.
Complementary Versus Replacement Technology
WhamTech's EIQ Products are intended to complement, not replace, existing
operations database systems, which are usually transaction-oriented, to
enable:
 | Growth to accommodate a larger number of users |
 | Additional capabilities for existing systems |
 | Advanced capabilities not available with existing systems |
EIQ Products are the ultimate
non-intrusive system, as it externally indexes and processes
queries against existing data sources, including most databases.
EIQ Products also
allow unstructured text search on databases as well as
unstructured data sources such as files, documents and e-mail,
and semi-structured data sources such as spreadsheets and XML.
EIQ Products have a unique approach to querying semi-structured data
sources using structured queries. EIQ Products can also
execute structured queries on unstructured data if entity
extraction tools are used to build indexes.
A Great Combination - Database and Search
Technologies
Many professionals in the database industry are predicting the convergence
of database and search technologies. Some database vendors already bundle
limited search capabilities in with their products, and some search engine
companies bundle limited database capabilities in their products. EIQ
Products are
a much more seamless integration of database and search technologies -- because
the search technology itself is based on the same RIQMS as the database query
technology.
It is estimated that 85% of
all enterprise data exists outside structured databases, and with
company mergers and internal reorganizations, the remaining 15% could be in separate databases, with different
structures and field names. EIQ Products, potentially offer
one-stop access to all corporate data regardless where it
resides through either structured queries, unstructured text
searches or BOTH on the same data sources. EIQ Products
are
truly a great combination of database and search technologies.
Reference: 1. C. William Gear, Applications and
Algorithms in Computer Science, Science Research Associates, Inc, 1978,
p. A107. |
|
|
|
|