WhamTech EIQ
Products™ provide the virtual data access, integration, sharing and
interoperability capabilities for a number of solutions.
In many cases, EIQ Products enable solutions that would either
be too difficult or too expensive using conventional approaches.
Organizations are awash with data management challenges and
almost none are exempt from the potential benefits that EIQ
Products offer. Most solutions combine EIQ
Product capabilities and can be broadly categorized as
follows:
The above
solutions are expanded on below:
Basic EIQ
Products configuration provides cleansed, transformed and standardized
indexes, and high quality query processing, without additional loads
on data
source systems. EIQ Products
overcome one of the major
disadvantages of conventional federated adapters
- access to
data when data source systems are unavailable. EIQ
Products can optionally retrieve and assemble query results data by inverting
indexes.
Typically, middleware or other
applications manage the interaction with adapter products
through schema mapping to reference data models.
EIQ Products are no different in that respect. WhamTech also offers a sub-middleware/simple
middleware product called EIQ Federated Server™ to manage the interaction with,
and among, EIQ
Products, including other EIQ Federated Servers.
Physical
operational data stores (ODSs) are typically copies of
operational/transactional data
in original data formats and schemas, and
stored separately from
operational/transactional systems. ODSs are typically used
to archive data for regulatory purposes and as a source for data
warehouses, which in-turn are sources for data marts. The two major advantages
that EIQ Product-based virtual ODSs have over physical ODSs are:
-
DATA QUALITY - EIQ Products can
clean, transform and standardize
data in indexes and result sets - the original data
remains untouched, thus allowing regulatory compliance,
but also achieving high quality queries and results.
-
UNIVERSAL ACCESS
SCHEMA - EIQ Products can
present a reference data model to an application
regardless of how the indexes (or data sources) are physically configured.
Indexes can be mapped to more than one reference data
model.
Cloud data services make
data available regardless of its location. EIQ
Products have a universal access schema that removes the
need to directly specify individual data sources or their
location. Data access
occurs in the background or in the cloud. Various
rules reflecting preferences can be imposed on results data,
including de-duplication, ranking and LIFO/FIFO in the case
that data sources receive the same queries from a different
EIQ Product.
Physical
data warehouses (DWs) and data marts (DMs) have two additional
major advancements on ODSs in general, whether virtual or physical
(note that numbering is continued from above):
-
DERIVED DATA -
Pre-aggregated, pre-calculated, fuzzy, text and other
indexes in addition to indexes that mirror data sources.
-
ARCHIVED DATA – Historic data can be retained along with current
data.
EIQ Products also have
virtual (index) versions of the above two capabilities (3 and 4) enabling virtual DW and
DM solutions that directly compare with physical DWs
and DMs. WhamTech virtual DW and DM solutions enable
applications such as business intelligence, data mining and
logistics that were thought to be the sole domain of
physical DW and DM solutions.
For an interesting and
controversial discussion
on virtual data warehouses,
read an article published by Information Management (also
linked in the Articles
section).
Conventional federated
adapters for master data management (MDM) are
difficult to implement
either partially or in full because of three main problems that
centralized databases/data warehouses solve:
-
Data in
data sources is
unclean and in many cases, unusable
-
Data in
data sources may not have indexes available for querying, or
data sources cannot accommodate improved indexes such as
fuzzy matching, address clean-up, name variation, and
pre-aggregated and pre-calculated fields
-
Query
performance, where data sources are not capable of executing
more advanced queries, queries are slow, or queries are blocked due
to load considerations and operational/transactional
activity on data sources
However,
copying large amounts of data into centralized databases/data
warehouses can be expensive and time-consuming to establish and
maintain, and raises data-related responsibility,
accountability, security, privacy and legal issues. But
perhaps, the most difficult problem is keeping everything
up-to-date.
WhamTech has
worked with several MDM vendors to design a hybrid solution
whereby more frequently accessed master data is copied to a
centralized database or MDM hub, and less frequently accessed
data stays in data sources. EIQ Products cleanse,
transform and standardize data used to build indexes, and
provide pointers as well as cross-references to data based on
global IDs. Identification of the same customer, vendor or
product, for example, in multiple data sources can be performed
when indexes are built or once master data is in the MDM hub.
EIQ Products fuzzy matching and link mapping can help in this
process by finding similar data across multiple heterogeneous
systems. Finally, a decision has to be made whether the
data source systems are responsible for updating the master data
or the master data updates data sources; this is called
harmonization.
More available
soon on WhamTech Virtual or Hybrid Master Data Management
solutions in a separate document.
As a variation
of the virtual ODS, WhamTech developed a solution to virtually
access archived mainframe data files. An example of the
solution approach is a recently completed project for a
large company to enable standard driver and SQL access to their
archived IBM mainframe, Cobol-generated VMS data files:
WhamTech used the original archived very large data files as the
data source and developed a VMS reader, a parser to build both
hierarchical and relational indexes, and a light version of an
ODBC driver. SQL is submitted to an EIQ SuperAdapter that
resolves queries and in-turn
retrieves results data on the VMS data files. The EIQ
SuperAdapter uses internal file pointer data to access specific
sections of the very large data files to avoid having to read
the entire file to retrieve results data. In this case,
the customer company is in the process of moving all files from,
and shutting down, the mainframe, with the goal of saving
significant maintenance fees.
The solution
could be extended to access live data files on mainframes
similar to EIQ Products accessing other data sources.
Click here to read more on Virtual Access to Archived Mainframe
Data Files.
Most link
analysis applications have the following limitations:
 |
Scalability: Data is moved in entirety into a single
database |
 |
Federated
access to data sources using conventional adapters with
associated poor results due to data quality |
 |
Difficult
to combine structured and unstructured data |
 |
No
near-real-time updates |
 |
No fuzzy
matching, probabilities or favorability/threat scores
|
WhamTech
developed an option that works in conjunction with its federated
data access solution based on EIQ
SuperAdapters to overcome the above-mentioned limitations.
This option takes advantage of content indexes and allows the capture of links in near-real-time among the same or
similar entities in specialized indexes, called Link Indexes™. Similar to content indexes, Link Indexes are
created and maintained at the data source level, but not on data
source systems, and can scale through distributed parallel
processing across multiple disparate data sources. The
combination of content indexes and Link Indexes enable virtual
views of entity data and links between entity data.
User-driven, interactive visualization displays only
entity data and links of interest, as needed. Link Indexes capture
the following five types of links for structured data
sources:
 |
Internal data
source, multiple table joins using primary key (PK) - foreign key (FK)
relationships |
 |
Internal data
source, single table self-joins using PK-FK relationships
|
 |
Internal data
source, single table self-joins using same or similar data
|
 |
Internal data source, multiple table joins using same or similar data
|
 |
External data source,
multiple table joins using same or similar data
|
...and two
types for unstructured data in either structured or unstructured
sources:
 |
Structured
data captured through entity extraction using same or
similar data |
 |
Unstructured text search using same or similar data
|
For all the
above types of links, EIQ Products can be selective about the entities used to
establish links, as it may not prove of value to link all
entities (types and data). Also, EIQ Products can apply fuzzy matching
using a product from a third-party vendor for structured data
and text matching algorithms for unstructured data.
Probabilities of match can be calculated and stored with links,
or calculated on-the-fly as links are analyzed. Plus,
given threat/favorability scores for specific entities combined
with probabilities, threat/favorability link analysis networks
can be displayed.
WhamTech
recognized through its work on Web search engines that all
networks and relationships can be represented through a
combination of links (one-to-one). WhamTech uses its
binary tree indexes to capture and maintain the links as link
maps (one-to-many) in Link Indexes, and Boolean operations on
bitmap representations to combine link maps for link analysis
(many-to-many).
In their basic
form, Link Indexes are join indexes, which are pre-formed joins
- both internal and external to data sources. Link Indexes can significantly
cut down on the computing time and resources needed to execute
queries involving joins.
More complex
queries, like nested selects, can take advantage of Link Indexes
to execute n degrees of separation queries, again, without much
computing time and resources needed.
The more
obvious use of Link Indexes is for link analysis applications,
where one-to-many, many-to-one and many-to-many relationships
between predefined entities, can be visually represented and
subjected to social network analysis.
As EIQ Products
also provide federated data access in conjunction with Link
Indexes, whereby an analyst can interact with data sources,
without being aware of it, through link
analysis visualization. EIQ Products can work with both
commercial and open source link visualization tools. EIQ
Products keep a log of all interactions within link analysis for
subsequent use in legal proceedings or for probable cause.
Analysts can retain link analysis networks and resume/retrieve
them in subsequent sessions.
More available
soon on
WhamTech Virtual
Link Mapping and Virtual Link Analysis solutions in a separate document.
Once an initial link analysis is performed,
analysts typically save the file for subsequent retrieval,
however, when retrieved, it contains dated information.
The options usually involve completely refreshing the link
analysis or with more advanced systems, incrementally updating
it with data from queries to a database or federated queries to
multiple data sources.
Living Networks are an extension
of WhamTech's virtual link mapping and link analysis that allows
an analyst to subscribe to any updates occurring in
near-real-time to a link analysis network. These
updates are for entities or links that are either in the network
or are within n degrees of separation from the network.
When an update is identified for a particular retained network
as a whole, in part, or specific entities and/or links, updates
are automatically made available to the analyst for updating the
network and/or the analyst is notified of the updates.
Living Networks are a culmination
of most of the capabilities that WhamTech's EIQ Products provide
and could significantly improve an analyst's ability to find,
represent, monitor and present complex information. This
is particularly true when probabilities of both entities and
links are combined with threat/favorability scores. More available
soon on
WhamTech Living Network solutions in a separate document.
WhamTech OEMs
an information geometry categorization tool from a large
system integrator that uses it primarily with intelligence
agencies. The tool provides a powerful way to search for
documents and emails that are relevant to a legal case.
This capability complements WhamTech's advanced text search,
and link mapping and link analysis.
Together, these tools comprise WhamTech's eDiscovery tool,
called Teracase.
Most eDiscovery
tools rely heavily on text search and some include concept
searches with varying degrees of success. WhamTech
recognized the leading edge capabilities of the information
geometry tool where it allows the end-user to quickly converge
on a category model with relatively little input and time
compared to other categorization tools. The category model
is used as a filter either on its own or combination with other
filters for use against the entire set of e-mails and documents.
WhamTech's
primary market is Early Case Assessment; both after a case had
been identified and corporate preemption. WhamTech's
secondary market is full eDiscovery.
Other WhamTech
solutions will benefit from either Teracase itself for
information discovery or the information geometry categorization
tool.
 |
Simple user
access |
 |
Supported
data sources:
 |
emails
and attachments |
 |
Documents (text, compressed, PDF, Microsoft Office
documents, Microsoft Works, HTML and variants,
WordPerfect and others) |
|
 | De-duplication using "same
as" algorithm, "similar as" coming soon |
 | Case Management
|
 | Category Management |
 | Search/Analysis with
various filters (include/exclude, keywords with various
options - stemming, synonym, phrase, etc., data source type,
email filters - sent/received date-time, sender/recipient
domain, sender/recipient email address, with/without
attachments, document filters - type, last modified
date-time, author, attachment/free-standing |
 | Save filters, apply saved
filters |
 | Results with options (list
of matching text body id, path-link to open, data source
type, ordered by keyword relevancy and category model score,
n results per page, next n/previous n, view email/document
text, tag selected, mark for a category model) |
 | Create, save, refine
category models |
 | Score entire corpus for
selected category models |
 | Predefined tags (Hot
- highly relevant email/documents that are found early and
will most likely be shown at trial, Responsive -
maybe, Non-responsive - irrelevant and Privileged
- these are files governed by attorney/client privilege and
will be presented to a judge (if the matter goes to trial),
but can initially be set aside for review by the lawyer |
 | User-assigned tag to any
email/document |
 | Teracase Administration
|
 | Audit log - log of all
user actions at the web server level |
Near-future features:
 | User management and role
assignment |
 | Billing Information -
ingest volume size (before and after unroll and unzipped)
and de-duplication by volume |
 | Statistics and reports
|
 | Link mapping and link
analysis for emails |
 | Interactive visualization
for both emails and documents |
More available
soon on
WhamTech eDiscovery solutions in a separate document. |