Fermat's Library | A new method of recording and searching information annotated/explained version. (2024)

### IntroIn 1953, Hans Peter Luhn, a researcher at IBM, published...

The term "legend" refers to a set of descriptive information or met...

Luhn is proposing a way to visualize the relationship between terms...

### Conventional Indexing ApproachImagine a comprehensive researc...

The 75 million patterns come from a combinatorial calculation (s al...

Luhn addresses a fundamental challenge in information retrieval: ba...

A specialized dictionary serves as a crucial tool in Luhn's informa...

**Calculating the Probability Factors:**- For the scenario wher...

Fermat's Library | A new method of recording and searching information annotated/explained version. (1)

A NEW METHOD

OF

RECORDING AND

SEARCHING INFOR MATION

H.

P.

LUHN*

This method applies to

the

procedures re-

quired to record a legend concerning a docu-

ment and to enable an inquirer to locate this

document by means of the legend,

if

it

is

related

to a specified subject.

The conventional methods of indexing and

classifying attempt to evaluate the relative im-

portance of a plurality of aspects contained in

a

document and makes the most important one the

key for locating the document within an orderly

scale of

a

certain dimension. Subordinated as-

pects are covered by way of reference in appro-

priate other locations of

the

scale.

system

is

that the standard of value on which the

indexer bases his decision may change and,

what

suddenly

is

considered an aspect of major sig-

nificance, may not have been included in the

classification or index at

the

time, even though

it

was contained in a document.

Another drawback

is

that

it

becomes diffi-

cult for an inquirer to reverse the process of

classification or indexing and pose his query in

a form matching to a reasonable degree the val-

ues of a potential reference.

acterizing a topic by a set of identifying ele-

ments or criteria. These elements may be of

any dimension and as many may be recorded as

is

desirable. Also,

they

are not weighted and no

significance need be implied by the order in

which they are given.

One of the main functions of the new method

is

that of producing a response to an inquiry in

all cases, even

if

the reference appears to be

remote,

it

being the understanding that

it

is

the

closest available.

The elements enumerated by recorders to

identify

a

topic will necessarily vary as no two

recorders

will

view a topic in identical fashion.

Similarly, no two inquirers, when referring to

the same subject

will

state their query in iden-

One of the disadvantages of the conventional

The new method uses the principle of char-

tical fashion.

It

is

therefore important that a

system recognizes that these variations arise

and that they cannot be controlled.

It must then

become the function of

the

system to overcome

these

variations to a reasonable degree.

When identifying a topic by a set of criteria

or identifying terms, the more terms are stated

the more specifically the topic

is

delineated.

Each term in turn may be a concept which in it-

self may vary as to specificity.

If

we consider

a concept

as

being a field in a multi-dimensional

array, we may then visualize a topic as being

located in that space which

is

common to all the

concept fields stated. It may further be visual-

ized that related topics are located more or less

adjacent to each other depending on the degree

of similarity and that this

is

so

because they

agree in some of

the

identifying terms and there-

fore share some of

the

concept fields.

Figure

1

is

a diagrammatic illustration.

The Topic,

Identified

-1.

FIGURE

1.

Other

Topics

In order to understand the nature of the ar-

rangement, let

us

assume a vocabulary of

100

concepts and

let

us

identify a topic by five con-

ceptual terms.

By using all possible combina-

tions of five terms, a total of

75

million patterns

of criteria result, each of these patterns having

a fixed location within

the

system.

Lf

then a

topic is identified by five terms of the vocabu-

lary,

it

is

thereby assigned to a definite one of

these fixed locations.

While assuming that there

is

an ideal and

true location where a topic belongs,

it

is

un-

*International Business Machines Corporation, Engineering Laboratory, Poughkeepsie, New York.

14

Fermat's Library | A new method of recording and searching information annotated/explained version. (2)

METHOD

OF

RECORDING AND SEARCHING INFORMATION

15

were to do the same job. There will result a

deviation from the true location proportionate

to the degree of disagreement of either.

For

in-

stance one recorder may diverge to the extent

of matching only

3

of

the

5

criteria while the

other matches

4.

The resultant displacement

is

shown in diagram Figure

2.

3/5

-_

/

4/5

-

5/5

-

(Ideal location)

h-

3/5

-_

/

4/5

-

5/5

-

(Ideal location)

h-

FIGURE

2.

Such disagreements will be the more pro-

nounced the more specific

the

conceptual terms

are and it

is

a further function of the new meth-

od

to minimize variations by broadening the con-

cept used in the terms and by using as large a

number of broadened criteria as possible even

to the extent of redundancy.

This

approach

is

based on the realization that an inquirer could

not match excessive specificity when stating

his

query and that

his

position

is

similar to that of

the recorder.

The process of broadening the concept in-

volves the compilation of a dictionary wherein

key terms of desired broadness may be found to

replace unduly specific terms, the latter being

treated as synonyms of a higher order than or-

dinarily considered. Translating criteria into

these key terms

is

a process of normalization

which will

eliminate many disagreements in the

choice

of

specific terms amongst recorders,

amongst inquirers, and amongst the two groups,

by merging the terms at issue into a single key

term. However the dictionary does not classify

or index but maintains

the

idea of terms being

fields and applies the identification principle to

the terms in the manner

it

is

applied to

the

top-

ics, even though to a lesser degree.

A

specific

term may appear under the heading of several

key terms and

if

according to

its

application an

overlapping of concepts exists then the term

is

represented by

the

several key terms involved,

as shown diagrammatically in Figure

3

for

‘b’.

The manner which an inquirer approaches

the process of searching for desired information

becomes one similar to that performed by the

recorder.

He

first

states hiw query in as many

and as specific terms as he desires.

Then with

the aid of the special dictionary he normalizes

the

conceptual terms of identification to arrive

at a statement adjusted to the requirements of

the system.

comparing of his statements with all the state-

ments contained in

the

collection of records

prepared by

the

recorder.

This task, being be-

yond human capability, may be performed au-

tomatically by a scanning machine

which

is

capable of not only matching similar portions

of informationbut of doing this in accordance

with

any conceivable pattern of conditions.

As

indicated earlier, the intended purpose

of a search

is

to ,produce a response to a query.

Because

it

is

noti usually known how specific a

response can be expected, the initial query

is

stated rather broadly thereby extending

the

field to include less related material. The ex-

tent of responses obtained on

this

bases

is

a

valuable indication of the amount of attention

devoted to the subject area in the past. The

material abtained would then be subjected to in-

creasingly more specific searches in order to

get

the

closest match possible. Also, material

uncovered by this approach may lead to the

discovery of unsuspected, but pertinent other

related information.

The actual process of searching involves the

Key

term

Field

A,

Key term

Field

B

Spec

if

ic

Terms

FIGURE

3.

In

particular,

the

scheme of broadening the

field of response consists of asking that a fixed

fraction of the given terms be met by the rec-

ords. This procedure

is

quite different from

that used when broadening a generic search by

dropping subclasses. The effect

is

illustrated

by the following diagrams, Figure

4,

showing

progressively broader fields formed by

5

terms.

Using the proportions of the example pre-

viously given and assuming an evenly distrib-

uted population of topics, the relative probabil-

ity of response

is

expressed by

the

factors

Fermat's Library | A new method of recording and searching information annotated/explained version. (3)

16

AMERICAN DOCUMENTATION

listed below each fraction. While applied to an

idealized situation, the results are neverthe-

less

indicative of the advantages the method of

identification has over other methods of index-

ing information.

5/5

4/5

3/5

2/5

1/5

probability factor of response:

1

96 4,656 152,096

3,764,376

FIGURE

4.

THE USE

OF

THE

UNIVAC FAC-TRONIC SYSTEM

IN

THE

LIBRARY REFERENCE

FIELD

HERBERT

F.

MITCHELL, JR.*

The tremendous increase in the volume of

technical literature of all kinds and fields

is

presenting the librarian with an almost impos-

sible reference task, The sheer volume of these

documents is creating a filing problem of the

first magnitude. When this volume

is

combined

with

the fact that many documents cut across

classification lines, the problem of providing

reference bibliographies

is

made that much

more difficult.

Several persons concerned with the furnish-

ing of reference material have approached

those of us engaged in the manufacture and uti-

lization of digital computers to

see

if

these ma-

chines might be of assistance to the librarian,

Such an occasion arose a

little

over a year

ago

when the Centralized Air Document Office in

Dayton, Ohio, approached Remington Rand to

as-

certain

the

suitability of our equipment for

this

work.

A

study was made to

see

how the UNIVAC

Fac-Tronic System might be applied to the task

of obtaining all possible documents from a

large file which could answer

a

specific query

submitted to

this

office. The model studied in-

visioned a library of

1,000,000

documents. Each

document was identified by an eight-digit shelf

number.

A

master reference file was to be

compiled, each item of which would consist of

the shelf number followed by a series of coded

approaches. Each such approach would repre-

sent some pertinent feature of

the

document,

such as: author, data, contract number, and de-

scriptors of the subject

or

subjects treated by

the document. It was anticipated that each doc-

ument would have an average of fifteen ap-

proaches with

a

maximum

of

thirty.

In

order to obtain

a

list

of all documents

which might possibly answer a given query, the

computer would be supplied

with

the appropriate

coded approaches included in the query.

It

would then search through the entire master file

and select

all

document items which contain the

approaches given in

the

query.

For

such a system

as

the above to be work-

*Director, UNIVAG Applications Department Remington Rand, Inc.

Fermat's Library | A new method of recording and searching information annotated/explained version. (2024)

References

Top Articles
Latest Posts
Article information

Author: Ray Christiansen

Last Updated:

Views: 6323

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Ray Christiansen

Birthday: 1998-05-04

Address: Apt. 814 34339 Sauer Islands, Hirtheville, GA 02446-8771

Phone: +337636892828

Job: Lead Hospitality Designer

Hobby: Urban exploration, Tai chi, Lockpicking, Fashion, Gunsmithing, Pottery, Geocaching

Introduction: My name is Ray Christiansen, I am a fair, good, cute, gentle, vast, glamorous, excited person who loves writing and wants to share my knowledge and understanding with you.