Database Indexing: Key Principles
Database (aka Open System) Indexing
From the Encyclopaedia Britannica:
Database (computer science): also called electronic database, any collection of data, or information, that is specially organized for rapid search and retrieval by a computer. Databases are structured to facilitate the storage, retrieval, modification, and deletion of data in conjunction with various data-processing operations. A database management system (DBMS) extracts information from the database in response to queries.
Unlike back-of-book indexes which are generally stand-alone works with static, closed vocabularies reflecting the jargon used in the book, database indexes typically link many different kinds of publications on a variety of topics, by different authors, and the number of included resources are typically expanding over time.
For example, the platform of ProQuest provides many databases that encompass just about every academic field imaginable. New journal articles, dissertations and theses, and documents in general are constantly being added to those databases.
What makes all that content searchable is the indexing.
The Importance of Being Consistent
The most important thing for database indexing is consistency through time and across documents, so a user searching for a specific concept within a database will retrieve all the relevant results.
A good thesaurus is needed to guide the indexer in maintaining consistency over time and across subject fields in using the most accurate and appropriate term for a concept. This helps users know what to expect when searching and also to find what they’re looking for more efficiently.
Principles & Practices of Database Indexing
ANSI/NISO Z39.4-2021 Criteria for Indexes
“This standard provides guidelines for the content, organization, and presentation of indexes used for the retrieval of documents and parts of documents. It deals with the principles of indexing regardless of the type of material indexed, the indexing method used, the medium of the index, or the method of presentation for searching. It emphasizes three processes essential for all indexes: comprehensive design, vocabulary management, and syntax.”
From page 11, the Summary of Key Considerations section states, “The key consideration for databases and other continuing indexes is continuity in indexing practices, policies, and terminology.” [Emphasis added]
The Value of Controlled Vocabularies
They allow for:
consistency across time;
consistency among document sets that use very different vocabulary or have very different treatment of similar topics.
One example of the need for consistency among document sets with different vocabularies, is this database.
More on Thesauruses & Thesaurus Building
Blog post on thesauruses, how they’re useful to indexers, and a list of some examples.
How Do I Build a Thesaurus? | American Society for Indexing (asindexing.org)
Thesaurus Management Software | American Society for Indexing (asindexing.org)
Online Thesauri and Authority Files | American Society for Indexing (asindexing.org)
More on Database Indexing
Database Indexing | American Society for Indexing (asindexing.org)
Earle, Ralph, Robert Berry, and Michelle Corbin Nichols. “Indexing Online Information.” Technical Communication 43, no. 2 (1996): 146–56. http://www.jstor.org/stable/43088034. [on JSTOR]
Rieder, Bernhard. “From Universal Classification to a Postcoordinated Universe.” In Engines of Order: A Mechanology of Algorithmic Techniques, 145–98. Amsterdam University Press, 2020. https://doi.org/10.2307/j.ctv12sdvf1.8.
This book chapter presents a more abstract and philosophical discussion on the organization of information than other sources linked here.
Readers may be interested in starting on p.170, with the section called, “Coordinate Indexing,” and on p.178, the section, “Postcoordination.” Page 184 starts the section, “The Relational Database Model.”
Another potentially interesting chapter from the same book is “Engines of Order,” on pages 25-50; and the sections “Computerization,” p.34, and “Information Overload,” on page 37.