Information (Sep, 1952)
This article is the last in Scientific American series on Automatic Control. It covers Information theory and processing. It has some great tidbits such as primitive tagging system for books by Vanevar Bush that used binary coded descriptors on microfilm. Also I’d have to say the author deserves to gloat over this quote: “It is almost certain that “bit” will become common parlance in the field of information, as “horsepower” is in the motor field.”
- Part 1 – Automatic Control
- Part 2 – Feedback
- Part 3 – The Role of the Computer
- Part 4 – Automatic Machine Tools
- Part 5 – Information
The surprising discovery that it is subject to the same statistical treatment as heat facilitates its storage and handling in automatic control systems
by Gilbert W. King
THE “lifeblood” of automatic control is information. To receive and act on information is the essential function of every control system, from the simplest to the most complex. It follows that to understand and apply automatic control successfully we must understand the nature of information itself. This is not as simple as it may seem. Information, and the communication of it, is a rather subtle affair, and we are only beginning to approach an exact understanding of its elusive attributes.
Think of a thermocouple that records the temperature of a furnace. The instrument translates the temperature into a voltage. This information seems straightforward enough. But as soon as we put it into a practical feedback loop to control the furnace temperature, we discover that the voltage signal is not a “pure” translation; it is contaminated by the heat due to random motion of the electrons in the thermocouple. The contamination is known as “noise.” If we want to control the furnace temperature within a very small fraction of a degree, this noise may be sufficient to defeat our aim. In any case, the situation illustrates a fundamental property of information: in any physical system, it is never available without some noise or error.
Information can take a great variety of forms. In a thermocouple the voltage is a continuous signal, varying as the temperature varies. But information may also be conveyed discontinuously, as in the case of a thermostat, which either does or does not make an electrical contact. It gives one of two distinct signalsâ€””on” or “off.” The signals used in control may be numbers. The financial structure of the country is to a large extent controlled automatically (but not, as yet, mechanically) by the messages sent on ticker tape to hundreds of brokers, whose reactions affect the capital structure. Railway traffic is controlled by means of information transmitted on a teletype tape. Automatic control may include the human mind in the feedback loop, and in that case information takes the form of language messages, which may control the actions of people and nations. All literature, scientific or otherwise, represents messages from the past, and a literature search is a form of feedback loop which controls further thought and action.
During the past decade mathematicians have discovered with surprise and pleasure that information can be subjected to scientific treatment. Indeed, it meets one of the strictest requirements: it can be measured precisely. Information has been found to have as definite a meaning as a thermodynamic function, the nonpareil of all scientific quantities. It has properties like entropy. Recently the mathematical physicist L. N. Brillouin has shown that information is, in fact, negative entropy. For the moment, however, we shall merely state that information is something contained in a message which may consist of discrete digits and letters or of a varying but continuous signal. Signals convey information only when they consist of a sequence of symbols or values that change in a way not predictable by the receiver.
HUMAN BEINGS have developed a number of systems, using sets of discrete symbols, for communication. These can be analyzed in quantitative terms. To keep track of a bank account of less than $1,000.00, for example, requires five “places” in a counting machine: units, tens and hundreds for dollars, units and tens for cents. Experience seems to have shown that 60 letters or places (12 words) are sufficient for most telegraph messages. In the decimal system we do all our counting with 10 digits; in the English language we use 26 letters. And there are many other sets of symbols, such as the dots and dashes of the Morse code, and so on. In a control system, however, such sets of symbols may be too complex and cumbersome. The simplest type of components that we can use in a control loop is the kind of device (e.g., a relay or thyratron tube) that can assume only two statesâ€””on” or “off.” This means that it is most convenient to express message symbols on a binary scale, which has only two symbols: 0 and 1. Communication consists essentially in the progressive elimination and narrowing of the totality of all possible messages down to the one message it is desired to convey. If we visualize a recipient looking at a teletype awaiting the next symbol, we appreciate that each symbol reduces the number of possible messages by a factor proportional to the number of different symbols that might be sent. In the binary notation each symbol represents a simple choice between just two possible ones, and this has many advantages for expressing information.
In a message consisting of binary digits, each digit conveys a unit of information. From “binary digit” the mathematicians John Tukey and Claude Shannon have coined the portmanteau word “bit” as the name of such a unit of information. It is almost certain that “bit” will become common parlance in the field of information, as “horsepower” is in the motor field.
The number of bits in a message is a measure of the amount of information sent. This tells us exactly how much we are learning, and how much equipment is needed to handle the messages expected. Take as an example the recent suggestion that the contents of books be broadcast by television from a central library, thus doing away with the need for regional libraries. It takes seven bits to identify one letter or other character; on the average there are five letters in a word and 300 words on a page. Thus it would take only about 10,000 bits to transmit each page as a coded message. To televise a page, however, would require a great many more bits than that. In order to make the page legible, the screen would have to carry at least 250,000 black or white spots (corresponding to 500 lines vertically and horizontally). The image would have to be repeated 300 times, to allow the reader 10 seconds to read a page. Hence the required number of bits would be 75 million (300 X 250,000) instead of 10,000. Since an increase in the amount of information sent requires an increase in the bandwidth of the broadcasting channel, it is clear that the televising of books is not an efficient method.
CAN information in the form of a continuously changing voltage be of the same nature and be measured in the same units as numbers from a counting machine or words in a communication network? At first sight this does not seem possible, for it has long been considered axiomatic that a record of a continuous variable contains an infinite amount of significant information. Actually that is not so, for the reason that no physical measurement can resolve all of the information. The resolving power of a microscope, for example, is determined by its aperture, which is finite and therefore sets a limit on the fineness of discrimination. This theorem can be generalized to all instruments. Now Shannon, in his famous theorem on communication theory, has shown that when such a limitation exists, one can collect all the available information in a continuous signal by sampling it at certain finite intervals of time. Conversely, it can be proved that the continuous signal can be exactly reconstructed from the finite points, provided, of course, they are taken at the required frequency, determined by the aperture. A series of numbers, or of amplitudes sampled periodically, will completely specify the signal. Hence a message of this kind can be expressed as a series of binary digits.
So far the most extensive application of the principle has been the recording of infrared spectra in digital form on punched cards. With the information in this form, a spectrum can be read and interpreted by means of automatic computing machinery, much more rapidly than one could read the conventional graphic record by visual inspection. The diagrams on page 134 illustrate a simple case. They show a conventional recording of the infrared spectrum of solid cyclopropane, and the same spectrum as recorded in digital form on punched cards on the basis of 500 sample readings. The recording, however, includes the spectrum of water vapor in the air within the spectroscope, which must be subtracted to get the true spectrum of the compound under study. The subtraction would be very tedious to do by hand on the graphs, but it is easily done by a computer from the punched cards. It reveals that the infrared spectrum of the compound has two distinct peaks. By certain numerical treatments, suggested by communication theory, we can reduce “noise” in the record and smooth the curve to show the peaks more clearly. In short, the computing machine extracts more information from the record than we could otherwise obtain.
SO FAR we have used “message” and “information” interchangeably, but there is a distinction between them. The information content of the signals is reduced by the noise that comes with the message. The central problem of information theory, now undergoing investigation, is to determine the best methods of extracting the sender’s message from the received signal, which includes noise. A magnetic storm can garble the telegram “I love you” into “I hate you’ In fact, there is absolutely no way of being certain of transmitting a given message. Nothing is certain except chance.
One method of reducing the probability of error is to repeat the message. This does not improve the reliability very much and is expensive in bandwidth. The amount of “snow” in a television picture, for instance, could be halved by repeating each picture four times in the same interval of time, but this would demand four times the channel width. And the band available for television channels is limited and valuable. On the other hand, to get more information through a given bandwidth usually requires more hardware in the transmitter and the receiver, the amount increasing exponentially.
A more economical procedure for reducing the probability of error is to use redundancy. For instance, the message could be sent as “I love you, darling.” This increases the chances of correct reception of the meaning without requiring as much extra time or bandwidth as mere repetition of the message would.
One of the cleanest examples of automatic control is the solution of a mathematical problem by such a procedure on a computing machine. A computing machine is a communications network in which messages (numbers) are sent from one part to another. The reduction of errors, naturally, is most important. However carefully the machine is constructed, errors inevitably creep in. Now usually there is no redundancyâ€” the number 137 means one thing, and 138 distinctly another. But we can add redundancy, say by the method of carrying along the digit left after casting out nines, and can test these extra digits after each arithmetic operation. This requires more equipment (or bandwidth in a general sense), but a 20 per cent increase is sufficient to handle enough redundancy to reduce the possibility of overlooking an error to one part in 100 million.
The classic device for reducing extraneous noise in ordinary signals is a filter. For example, by cutting out high frequencies in a radio signal we can eliminate the high-pitched hissing components of noise without loss of message content, for the original message seldom contains such high frequencies. But to reduce the noise within the frequency range of the message itself is more difficult. Noise is universal and insidious, and elaborate devices are needed to overcome it. A wide variety of approaches, collectively called “filter theory,” has been considered. The most fascinating is in the direction of suitable coding of messages with the aid of computers.
LET US examine a simplified illustration based on the problem of using radar for “Identification of Friend or Foe.” We can send out a radar pulse of a specific pattern, which a converter in a friendly plane will change to another pattern but which an enemy plane will reflect unchanged. At long range, however, noise may confuse the pattern so that we cannot tell friend from foe. The question is: What is the most suitable pulse shape, and to what shape should it be converted, to give the smallest chance of making a mistake? Let us assume that the pulse wave can be above or below or at the zero level. We can therefore express the information in a ternary (instead of binary) notation of three digits: â€”1, 0, 1. We shall also assume (as often happens in practice) that a noise pulse of 1, added to a signal pulse of 1, gives 1 in our limited detection equipment. Now it is easy to see that if we merely used a positive pulse for “friend” and a negative pulse for “foeâ€ one would frequently be converted into the other when the noise-to-signal ratio was high. Let us then introduce some redundancy by using a double pulse.
There are nine possible signals. They can be represented as vectors in two dimensions (see diagram on page 146). Now of the nine vectors we need only two for our message. Which two should be chosen to give the least chance of error due to noise? It turns out that the best choice is the pair of vectors directly opposite to each other, because the noise patterns required to convert one to the other would in these cases be the least frequent.
Messages as a rule can be mapped with a great number of dimensions. And in such cases it proves to be feasible to select the vectors entirely at random rather than by definite rules (which are too complicated to work out). Now a random signal, by definition, is noise. In other words, we have the paradox that the best way to encode a message is to send it as a typical noise pattern. The selected noise patterns correspond to an ambassador’s code book. The patterns can be decoded mechanically at high speed in a computing machine. Methods of this type seem to be the ultimate in maximizing the rate of transmittal of information.
Every system of communication presupposes, of course, that the sender and the receiver have agreed upon a certain set of possible messages, called “message space.” In the Western Union system this message space consists of all possible strings of English words of reasonable length, but it does not permit foreign words. Wall Street has a more restricted set of messages. A reader of ticker tape could expect the message “Ethyl 24-1/4” but would be taken aback by “Ethel pregnant.”
Messages used in technological applications of automatic control also are restricted to a definite space. For example, a thermocouple used to control a furnace measures temperatures only within certain limits, and if the message came through as “one million degrees,” one could legitimately expect the whole feedback system to throw in its hand. If information is to be used for automatic control, the message space must be defined, and safeguards such as fuses or switches must be provided to eliminate all messages outside the established message space. This prevents the control from going wild. For instance, in a given process certain quantities may be known to be never negative, and the control program must provide that if a test does show a minus sign, the process is stopped or repeated.
AUTOMATIC control requires the storage of information received from the system’s sensory instruments. For this a digital device, which simply stores numbers, is better than an analogue device. And the binary system is especially convenient. The most efficient known mechanism for the retention of information is the human brain. Recent physiological experiments suggest that the brain operates not with continuous signals but with sampled digital information, probably on a binary system; nerves seem to transmit information by the presence or absence of a pulse. The brain, with its ability to store vast amounts of information in a tiny space and to deliver specified items on demand, is the model which automatic control design strives to imitate.
Among artificial memory devices the most efficient is the photographic emulsion. Not only can it pack a great deal of information into a small area, but each spot is capable of recording about 10 distinguishable levels of intensity. Microfilm in particular is a very effective means of storing printed or pictorial information. Ultimately every man may have on microcards a library as large as he likes.
For the sake of compressing information into as small an area as possible, the ability of emulsions to record degrees of brightness is given up, and all that is asked of a grain is whether it is black or not. In other words, the technology of this medium is tending to a binary system.
The recording of information in the conventional form of printed matter is wasteful of space, even when the print is reduced by photography to microscopic dimensions. The printing of a letter requires a certain area of paper, which we can imagine as a grid with certain squares blackened to form the letter (see diagram below). Some modern high-speed printers actually use this method, pushing forward certain pins from a matrix to mark each letter. In order to print passably legible letters the matrix must have at least 35 pins. In contrast, the binary digit notation needs only five places (instead of 35) to record the 26 letters of the alphabet, and only seven to give all the symbols of printing, including capitals, numerals and punctuation. Louis Braille recognized this when he used a binary system for his method of recording information for the blind.
The most efficient means of recording is by photography and binary digits. The finest commercial emulsion provides 32,400 resolvable dots or blanks per square millimeter. Allowing for the fact that at present the emulsion has to be mounted on a glass plate a millimeter thick, we have a medium which will store 40 million bits per cubic centimeter. If translated to binary code and recorded as black and white spots on this emulsion, all the words in all the books of the Library of Congress could be stored in a cubic yard.
Storage on photographic emulsion is not yet practicable because of the difficulties of retrieval: reading microfilm is not particularly easy or convenient. There are available, however, other means of compact storageâ€”punched cards or tapes, magnetic tape or drums, electronic storage tubes, printed circuits with miniature tubes or transistors.
AUTOMATIC control requires storage – of information for various purposes. In many cases the fact that information is stored is not apparent, but analysis shows that there is a delay during which the reported condition of the system is compared with certain standards. Discrepancies are discovered and corrected by feedback to the control organs. This is often done “instantaneously” by voltages stored in condensers, but more sophisticated control demands storage of a considerable history. We have seen that if a serious attempt is made to reduce noise associated with the message, sections of the received signal must be stored for a time to allow “filtering” of the signal by comparison, with the code book.
This kind of control is “non-linear,” because the control signals are not simply proportional to the information supplied by the sensing instrume