qrztech by QRZ Ham Radio

Contribute
to QRZ

The QRZ! Ham Radio CDROM 
Callsign Database Technical Specification         Rev D October 2003


The following information is provided for developers who wish to write
their own software to directly access the QRZ callsign database files.
Windows programmers including Visual C++ and Visual Basic users should
refer to the help file called QRZDLL.HLP which documents database
access through our custom dynamic link libraries, QRZDLL.DLL (for win3)
and QRZ32.DLL (for Win95/98/ME/2000/NT).

Users of the QRZ! Ham Radio CDROM who wish to write their own callsign
database search and retrieval software are encouraged to do so.  We
welcome user contributed shareware programs for future versions of the
QRZ! Ham Radio CDROM.


Overview

There are three versions of the QRZ software for the PC, all of
which share a common architecture.  Separate versions are provided
for DOS, Windows 3.1 and Windows 95 and/or Windows NT.  All of the
programs access the database using the same method.

The QRZ callsign database indexing and retrieval method was designed
and optimized for CDROM use.  The primary goal was to provide fast
searches for the most commonly sought after information.  A key to this
strategy is the caching of index information in memory to minimize
reads and more importantly, seeks from the CDROM drive.  The method
described below implements one such strategy and has been shown to
require only one CDROM head seek per database lookup.


Database Structure

The QRZ callsign database is composed of four separate copies of the
data, and four indices, each of which is sorted by different criteria.
One copy is sorted by callsign, one by last name, one by city/state and
one by zip code.  Due to differences in the way foreign addresses are
represented, many DX countries are not represented in the City/State
and Zip code databases.  These countries are generally searchable by
callsign and/or name only.


Data file	Index file	Database type
-----------------------------------------------------
callbkc.dat	callbkc.idx	Callsign
callbkn.dat	callbkn.idx	Name
callbks.dat	callbks.idx	State and City
callbkz.dat	callbkz.idx	Zip Code


All of the database files are located in the directory \CALLBK on the
CDROM.  Each of the four datafiles (*.dat) is accompanied by a
corresponding index file (*.idx).  The index files contain selected
keys from their corresponding databases which were selected by sampling
the databases at regular file offset intervals.  The sampling intervals
are chosen to produce indices that are no more than 64 Kbytes in length
so that they can each be contained within 64 Kb memory segments under
DOS and Window 3.x.  The same indices are used in the Win32 environment
despite the fact that there are no 64 Kbyte segment constraints, to
preserve compatibility with Win3 and DOS programs.

The sampling interval for the index keys is subject to change from
one release of the QRZ CDROM to the next and is therefore recorded
as one of the critical operating parameters in the header of each
index file.  This value, referred to as BytesPerKey, must be treated
by your program as dynamic and must be fetched from each of the
index headers at the start of each session.  A different BytesPerKey
values is used for each database.

The index header occupies the first 48 bytes of the *.idx file and
has the following format:

/*
**     Index Header Block Definition (Version 2)
**     (applies to all QRZ CDROMS from Version 2 onward) 
**
**     This block is located at the start of each index file
*/
typedef struct {
  char  DataName[16];    /* Name of the data file            */
  char  BytesPerKey[8];  /* Data Bytes per Index Item        */
  char  NumKeys[8];      /* Number of items in this index    */
  char  KeyLen[8];       /* Length of each key item in bytes */
  char  Version[8];      /* Database Version ID              */
} index_header;

All values in the index header block are stored in ASCII character 
representation.  These characters must be converted (by your program)
into string or integer values as necessary.  Characters are left
justified within each field and unused field characters (if any)
are zero filled.  Your program should not depend on the presence of
the null characters when reading these fields since some values could
legitimately fill the entire field.


Index Data Formats

Some number of keys (noted by the NumKeys field) immediately follow the
index header block in the index file.  All fields in a given index
file will have a width (in bytes) of 'KeyLen'.

The name index (CALLBKN.IDX) uses uniform keys which are set to a
maximum of 'KeyLen' characters per name.  Longer names are simply
truncated.  Names are stored in last-first format with a space between
the two parts.  The city/state index (CALLBKC) also uses 'KeyLen'
characters per entry with the two character state code occupying the
first two characters and the city name in the last 10 characters.  For
example, the town of Fremont, CA is represented as CAFREMONT in the
index.  Callsigns (in CALLBKC.IDX) each occupy a different 'KeyLen'
width slot (typically 6 characters wide) and zip code indices
(CALLBKZ.IDX) do the same using a typical KeyLen value of 5.  Your
program must always interpret KeyLen, BytesPerKey, and NumKeys and
never make assumptions regarding their sizes.  These sizes could change
in a future edition of the database and your program must be prepared
to deal with it.


Using the Index Header Block

The header block describes the field data which immediately follows
it.  The records are tightly packed on 'BytesPerKey' boundaries without
separators making them ideal candidates for use as memory arrays.
Although unused key fields will be zero filled to the right, there is
no guarantee that any given field will be null terminated.  Because of
this, the indices must always be searched in a random access, fixed
record length format.

A typical program will first search for the system drive that contains
the \CALLBK base directory.  Next, the program will open and load each
of the four indices into four separate 64 Kb memory buffers.  Searching
the indices is then performed by addressing the buffers as one-dimensional
arrays which contain 'NumKeys' elements that are each 'KeyLen' bytes wide.

A search for a particular item starts with the user inputting a desired
key which is then formatted into an index key value.  The program then
uses this key value to locate the closest match in the index table
which is less than or equal to the the user supplied key.  For most
machines a simple linear search of the table will be fast enough
however a binary search algorithm can be employed.

After the relevant table key is chosen, it's ordinal position from the
start of the table is saved in a variable called KeyOffset.  Next, the
program must multiply the KeyOffset value by the BytesPerKey value
which yields a DataOffset value.  This DataOffset value is then used as
an index into the actual datafile (*.dat).

Typically, a program will use the DataOffset value as an argument to a
File Seek system call ( fseek() ).  Once the file pointer is positioned
at the DataOffset, the program can then begin a linear search for the
desired record in the database.  Again, a binary search between the
[DataOffset] and [KeyOffset+1*BytesPerKey] can be used however experience
has shown this will provide only a minimal improvement in performance.

Be aware that the derived DataOffset value will usually land you in the
middle of some record.  This is typical and you will find that the
callsign that was pointed to by the index key will be located at the
beginning of the next text line in the file.  The data file is an
ordinary ASCII text file with a single newline (0x0a) character at
the end of each line.

The search of the data file should terminate at offset
[KeyOffset+1 * BytesPerKey] if the desired record has not yet been found.


Database Format

The database files all have the same format.  They are ASCII files
which consist of one text line per record.  Each record consists of a
fixed number of comma separated fields with blank fields represented by
consecutive commas.  Each line is terminated with a single ASCII
newline ('\n', 0x0a, or chr$(10)) character.

Every record has the same number of commas in it, except for
cross-reference records, which are discussed below.  If the data itself
is supposed to contain a comma, then it is represented in the database
by a semi-colon ';' which should be replaced by a comma in the program's
text output formatting routine.


Here's an example of one record from the database:

AA7BQ ,LLOYD,,FRED L,,53340,90009,00009,8215 E WOOD DR,SCOTTSDALE,AZ,
85260,E,KJ6RK,A

/*
**    Standard Record Field Offsets
*/
#define Callsign        0       AA7BQ
#define LastName        1       LLOYD
#define Email/Country   2       .271 *
#define FirstName       3       FRED L
#define JPG             4       . *
#define DateOfBirth     5       53340		// Dec 6, 1953
#define EffectiveDate   6       90009		// Jan 9, 1990
#define ExpirationDate  7       00009		// Jan 9, 2000
#define MailStreet      8       8215 E. Wood DR
#define MailCity        9       SCOTTSDALE
#define MailState       10      AZ
#define ZipCode         11      85260 
#define LicenseClass    12      E		// (P = TechPlus)
#define PreviousCall    13      KJ6RK 
#define PreviousClass   14      A 


* The fields Email/Country and JPG are used to indicate various flags
and modifiers to the record.

Email/Country indicates a) whether an email address exists for this
record, and b) the country number of the specified address.  If an
email address exists for this record (i.e. a corresponding address
exists in the netaddr.qrz database), then the FIRST character of this
field will contain a period (.).  Following the period, or first in
this field if no period is present, is the country number that
corresponds to this record.

Country numbers may be directly cross referenced in the ASCII text file
called COUNTRIES.DAT, located in the database directiory.

The JPG field indicates whether this callsign has a JPG image on this
CDROM.  If present, the image will be located in the E:\CALLBK\GIFS
directory (assuming E: is the CDROM drive).

Note: The email database netaddr.qrz is proprietary and is not
documented or accessible to user-developed programs.  Access routines
are provided to C++ and Visual Basic users through the QRZ DLL's.


Callsign Collating Sequence

Callsigns in the QRZ databases are stored in a special columnar format
which aids in performing searches.  With this format, the area digit
part of the callsign is always in the same position.  Callsigns are
considered to have a prefix, an area number and a suffix.  Collating
preference is always given in reverse order, that is, suffix followed
by area number followed by prefix.  When callsigns are compared for
sorting and searching, a this collating sequence (called 'defcab') is
applied to the callsign which results in the following logical behavior:


         abcdef  sort order  reason
	-------- ---------- ---------
	"KB3A  "    1st       def
    "KB2AB "    2nd       defc
	"K 5AB "    3rd       def
    "KB1ABC"    4th       defc
	"K 4ABC"    5th       defca
	"WA4ABC"    6th       defcab
	"WB4ABC"    7th 


The 'reason' lists why each entry deserves its position in the list
above the one below it.

To compare two callsigns for greater than, less than or equality, the
program must first transpose them into 'defcab' format (using spaces
for unused positions) and then do a left-to-right comparison of the two.
For example, to compare K1ABC against KC8AB, the program would do the
following:
                                  defcab
callsign K1ABC is transposed to: "ABC1K "
callsign KB8AB is transposed to: "AB 8KC"

then, a string compare as in:    strcmp("ABC1K ", "AB 8KB")

will return a "greater than" value meaning that K1ABC comes _after_
KB8AB in the database having been found greater at point 'f' in the
defcab sequence.


Date Formats

All dates are stored in 5 character Julian format, e.g. 93003 equals
January 3, 1993 or the 3rd day of 1993.  Dates before 1900 or after
year 2000 must be determined by the context in which they are used.
In other words, if the resultant age does not make sense, then it is
wrong.  For example, all licenses expire in the future so for license
expiration dates 02 must mean 2002.  Birthdays are more difficult to
judge but most can be arbitrarily considered to be greater than 10
years old.  This is not a perfect method, but it does yield
satisfactory overall results.


Cross Reference Information

When the FCC supplies a "previous callsign" in their database, it
is used by QRZ to construct a cross reference so that a person can
be found by their old call as well as their new one.

A cross reference record is distinguished from other records as one
which contains only one comma.  A cross-reference record takes the
form of "OldCall,NewCall" with no other information on the line.

When a cross reference record is encountered, your program must fetch
the second field and restart the search from the beginning to return
the primary reference.



-------------------------------------------------------------------

Please address programming questions and/or comments to:

flloyd@qrz.com

-------------------------------------------------------------------

Fred Lloyd, AA7BQ   10/23/01