This list not only defines basic elements of database
technology, but it also illustrates how WhamTech database and search technology is redefining
them.
5GL Stands
for fifth-generation language; a category of programming languages that use
concise English-like statements to generate complex, detailed code. SQL is an
example of 4GL - fourth generation language.
Active Data Warehousing
is a relatively new term
used to describe a
new and rare breed of data warehousing that allows simultaneous real-time/continuous
updating and querying.
aggregation
Performance of a DBMS operation on an entire set of data records at the same
time.
API
Stands for Application Programming Interface; commands by which application
programs make requests to the operating system.
balanced binary tree ...is a data structure composed of
pairs of branches, beginning at a single point, or root. Each point of
branching is called a node, or parent. In
a binary tree, each node can have only one or two leaves, or children.
A binary tree index can be used for
extremely fast searches. To
construct such a tree, beginning with the root at the top and growing
downward, at each branch the lower-valued record number (or other key value)
goes on the left, the higher-valued on the right. New branches
are added to the bottommost nodes.
A binary tree index is said to be balanced when,
even after repeated insertions and deletions of nodes, the tree remains symmetrical
(the same number of nodes on the left
as on the right). WhamTech considers a
binary tree to be balanced when the number of nodes is the same on each
side of the root or top node, or one side has one extra node. Two or more
extra nodes is considered out of balance. The better balanced the tree, the faster and more
efficient the search.
Asymmetrical, or unbalanced, binary trees can be
made symmetrical (i.e. more efficient for searching) by re-sorting the nodes
in a process called rotation. However,
the more processor cycles are used for performing rotation, the less
computational capacity will be available for searching. That
is, the more rotation required to keep a binary tree in balance, the lower the
performance of the overall search scheme. The node
rotation problem increases geometrically with the number of nodes, which in
traditional database index systems is directly related to the size of the
database. WhamTech has solved the problem that faces core database technology
developers, in two ways, as follows:
1.
Thunderbolt VLDB's balanced binary trees depend on the cardinality of the
data rather than the size of the database.
As database sizes increase, high cardinality
data tends to increase the number of nodes, e.g., Last Name.
Low cardinality data tends to remain unaffected, since cardinality is
finite, e.g., Gender (2 possible values), State (52 possible values,
including DC and "unknown").
2.
WhamTech
has a proprietary method of building and maintaining balanced binary tree
indexes that minimizes the number of rotations required REGARDLESS OF THE
SIZE OF THE TREE. Query, update, addition and deletion operations are not
significantly affected.
WhamTech has
the capability to simultaneously query, update (including additions and
deletions), and maintain balance of a binary tree in real-time. As far as
WhamTech is aware, no one else has solved this problem, making this technology
uniquely able to manage massive volumes of data - a true real-time VLDB. See
Benefits.
bitmap Collection
A WhamTech VLDB technology result set consisting of record numbers in the database,
represented as an array of binary ones and zeros. See
Technologies.
cardinality In database terminology,
cardinality is the total number of unique occurrences of an entity (e.g. person,
organization, or transaction) that can participate in a relationship. An
extremely high cardinality field would be a Web site address, e-mail address or
phone number; in fact, these should be unique. A high cardinality field would be
street address - not unique, but close - there may be more than one "1234
Morningside Drive" in a particular State or in the US. A low cardinality field
would be State - only 52 variations, including DC and "unknown". An extremely
low cardinality field would be a check box - only two variations.
Collection
Proprietary WhamTech term for a set of record numbers that result from a
WhamTech VLDB technology query. Record numbers may be represented either in binary or in integer
form. See Technologies.
COM
Stands for Component Object Model; Microsoft programming environment.
crawler
See spider.
CRC
Stands for Cyclic Redundancy Check; a type of algorithm used to verify that no
errors have occurred in copying or transferring blocks of data. WhamTech uses
its own CRC algorithm to reduce storage and accelerate queries, and extremely
fast basic 64-bit one-way encryption.
data mining
Category of DBMS applications that seek to find new information and
relationships within multiple, often heterogeneous, legacy data stores; for
example, searching and analyzing customer sales transaction detail to determine
buying habits by ZIP code or other demographic criteria.
database A file management system that is usually considered
relational. See also relational.
DBMS Database Management System. See also RDBMS.
See Technologies page.
DNS
Stands for Domain Name Server; special-function server application on the World
Wide Web that translates URLs expressed as
names into specific physical, numeric IP (Internet Protocol) addresses on
specific hosts for purposes of routing access requests. Translation of Web site
addresses (e.g., www.whamtech.com) into numeric IP addresses is called DNS
resolution.
ETL
Acronym for database operations of Extract, Transfer, and Load, representing
processing overhead required to copy data from an external DBMS or file. Operations
performed entirely within a given DBMS require no ETL and therefore are more
efficient.
field Labeled storage location for data values within
a database. A group of different fields that all describe a single entity - such
as a person, company, or transaction - constitutes a data record. See schema.
GIS
Stands for Geographic Information System; IT infrastructure that supports
seismic exploration applications.
hypercube
functionality Ability to perform operations
in four dimensions (4D); typically, x, y, z and t (time).
integer Collection
A WhamTech VLDB technology result set consisting of record numbers in the database,
represented as a list of integers, or whole numbers. See
Technologies.
IP
Stands for Internet Protocol. See TCP/IP
and DNS.
join
Operation within a RDBMS by which data from
two tables are combined to form a third, virtual table upon which further
operations can be performed; one category of SQL
commands.
load balancing
Can mean two things:
1. Distributing load across multiple servers.
2. Within a DBMS, the process of optimizing the ratio of queries
(user-initiated requests for data) to operations needed to maintain and
update the database. In the case of 2, WhamTech VLDB products require almost zero
load balancing.
node Within a tree data structure, the point at
which a branching occurs to form multiple subordinate nodes, or leaves. The
single node from which the branches of a tree grow is called its root. A node is
also called a parent, and the nodes at the ends of its branches are its
children. See also
balanced binary tree.
ODBC
Stands for Open Database Connectivity, a multi-platform DBMS interface built to
execute SQL.
OLAP Acronym
for Online Analytical Processing, describing data mining within a DBMS.
profiling
Real-time tailoring of displays, particularly Web pages, to an identified set of
customer characteristics, such as probable preferences based on demographics.
RAD
Stands for Rapid Application Development; programming tools, including 5GLs,
that greatly reduce the amount of work effort required to generate new program
code.
RDBMS Relational Database Management System. See also
DBMS.
relational Describing tables or files linked to one another through a similarity
relationship, which could be one-to-one, one-to-many, many-to-one, or
many-to-many. For example, Customer_ ID could be used to describe a customer and link to
a sales item that the same customer bought.
relevancy
In DBMS terminology, the degree to which search results meet the requirements or
expectations implicit in the query.
rid list
Post-search process which rejects certain records (rid
list) and/or includes others (result set) according to specific
criteria.
rotation Database
tree index maintenance task requiring
computational overhead, to keep tree indexes in balance;
rebalancing. See
also balanced
binary tree.
schema
Within a RDBMS, the structure of tables
and fields.
SDAT
Stands for Seismic Data Access Technology.
spider
Or robot, a program that traverses networks, such as the Web, and triggers
document downloads to parse and index content for searching.
SQL
Structured Query Language is a set of commands used to conduct searches and
perform operations on tables within relational databases. SQL exists in many
implementations for many different DBMS platforms, including WhamTech's
products.
table Data structure composed of columns and rows.
When tables are used to represent data records, typically, the first row of
column headings contains field names and each row holds one data record. See
schema.
TCO
Stands for
Total Cost of Ownership. WhamTech products enable the lowest TCO of available alternatives for high-performance
VLDB applications, up to 90% less than other
RDBMSs. See Benefits.
TCP/IP
Stands for Transport Control Protocol / Internet Protocol; the addressing scheme
that defines the Internet.
URL
Stands for Universal Resource Locator; a standardized name addressing scheme and
syntax for locating files and pages on the Internet. Example:
http://www.whamtech.com/.
VLDB
Stands for
Very Large Database, typically referring to databases in the hundreds of
gigabytes to terabyte range and many hundreds of millions to billions of records. See
Technologies.
XML
Stands for eXtensible Markup Language; an enhancement of Hypertext Markup
Language (HTML), the coding scheme used to create Web pages. XML also defines
protocols for connection of pages to databases, as well as exchanging and
presenting data between systems and across networks, among other uses.