English 'E'  

The English Language

"Phonemes are the sounds that make the language. For example, in Spanish you have a sound usually written 'j' (as in Guadalajara) that does not exist in English or French, but has a close equivalent in German and Russian. A beginner could be tempted to make it like a French 'r' but it's bad, because there's also an 'r' in Spanish. To a dedicated learner, phonemes are not very difficult to learn and they should come after 10-50 hours of study for a reasonable language. Beware that some languages use very complex phonemic systems, for example tonal languages like Mandarin Chinese, where not only must you learn to distinguish between sounds that seem very close, but also you must tell whether the syllable with that sound is climbing, flat, or descending." --from Phonemes: the sounds that make the language

English is the most common human language in North America as well as in other world regions. As languages go, it is rich in content with 500,000 everyday words and at least that many technical words, dwarfing the German vocabulary of about 185,000 words and the French vocabulary of less than 100,000. English is also transmitted to more than 100 million people every day by the five largest broadcasting companies (CBS, NBC, ABC, BBC, and CBC). According to the Top 10 countries with the most English language speakers, the current Top 10 English-speaking countries are as listed below.

Country             No. Speakers 

1. USA 	            237,320,000
2. UK 	             58,090,000
3. Canada            18,218,000
4. Australia         15,561,000
5. Ireland 	      3,720,000
6. South Africa       3,700,000
7. New Zealand 	      3,338,000
8. Jamaica            2,460,000
9. Trinidad/Tobago    1,245,000
10. Guyana 	        764,000

English upper English lower


Each human language is characterized by its own alphabet, words, phonemes, and grammar (attribute or BNF grammar, a narrower concept than the everyday usage). Other intermediate categories like nouns, clauses, and phrases exist in a hierarchy ranging from the sounds of a language to the meaning of sound combinations. The words of the English language are written using a set of 26 letters from a Latin-based alphabet in use in England since the 7th century AD. The sentence "The quick brown fox jumps over the lazy dog" contains all of the letters in the English alphabet. An Old English alphabet of 24 letters in use since about 1000 AD gave rise to the present alphabet. Its letters and their relative frequencies in the English language are shown below.

A 8.17%
B 1.49%
C 2.78%
D 4.24%
E 12.70%
F 2.23%
G 2.02%
H 6.09%
I 6.97%
J 0.15%
K 0.77%
L 4.03%
M 2.41%
N 6.75%
O 7.51%
P 1.93%
Q 0.10%
R 5.99%
S 6.33%
T 9.06%
U 2.76%
V 0.98%
W 2.36%
X 0.15%
Y 1.97%
Z 0.07%

This frequency-of-letters table is the basis for many encoding schemes for both encryption and compression. For example, it is well-known that Morse code encodings are such that the most frequent letters have the shortest symbols, as shown here. A general method for determining the optimal encoding for an arbitrary sequence of strings and associated probabilities is called Huffman coding, an example of which is here. In the reverse case, if a simple 1-to-1 cipher is used to encrypt letters, the most common letter is apt to represent an "E", the second most a "T", etc. Many algorithms exist to hide these relationships.


The English language has about 40 phonemes, more than most languages due to the wide range of possible syllable structures in English. The phonemes can be divided into 2 large groups, those for which the vocal tract is relatively open and those involving a more restricted vocal tract. Sample digitized sentences containing all of the English phonemes from various geographic locations are shown below and are online at The Audio Archive.

Over 300,000 words occur in the English language, but only some 50,000 can be considered common. The phonemes comprising these words can be further refined on the basis of the succeeding and, less frequently, preceding phonemes, which cause us to pronounce a given phoneme in slightly different ways (e.g., consider the P in PUT and PAT). These phonetic variants are called allophones. Allophones are obviously difficult to analyze because they are very numerous, quite language-dependent, and highly context-sensitive.

English phonemes
Vowels and diphthongs

Speech features are often displayed in a spectrogram, which plots sound frequency against time and depicts amplitude as a third dimension - darkness of the display. In the case of vowels, dark horizontal bands correspond to dominant frequencies or formants; the band having the lowest frequency is called the first formant (F1), the next highest in frequency is the second formant (F2), etc. Though individual vowels overlap considerably in F1 and F2, one of the important milestone studies in speech recognition involved a study (by G. E. Peterson and H. L. Barney ("Control methods used in a study of the vowels", Journal Acoustical Society of America 24:175-184, 1952) demonstrating how most vowels could be visually separated on a plot of F1 and F2. The table below summarizes the results of that study for adult males.


Vowels are made with the vocal cords vibrating and air exiting the mouth. The vocal cords vibrate at a fundamental frequency and multiples thereof (harmonics), as many as 40 of which may occur. The fundamental frequency is about 120 Hz for the average man, about 225 Hz for the average woman, and 300 Hz for the average child. Diphthongs are vowels that change their sound as they are uttered; the A in MADE is an example and is represented with the code EY. These changes occur because of movements of the lips and tongue.


Consonants, unlike vowels, are relatively weak and involve restricted air flow at some point. Many types of consonants are distinguished by phoneticians; these include glides, wherein the vocal tract is narrowly constricted, and liquids, wherein the tongue blocks the vocal tract and air flows around it. Liquids and glides are sometimes called semivowels. Other consonants include nasals, wherein a blockage to the nasal tract (the velum) is opened to allow air flow through this area; fricatives, wherein only a small opening occurs, causing turbulent airflow and aperiodic sounds; plosives (stops), which involve a sudden release of air at the lips (labial), palate (alveolar), or glottis (velar); and affricates, which consist of a stop plus a fricative.



A language grammar is the set of rules that defines properly formed words, phrases and sentences of a language. To accomplish this, language words are generally placed into a small number of classes. Eight major word classes are defined for English: noun, pronoun, verb, adjective, adverb, preposition, conjunction, and determiner. These classes are then grouped into legal phrases and clauses; a noun phrase might consist of a determiner + adjective + noun, as in "the red fox." Finally, sentences are defined in terms of clauses; one sentence type might be defined as a noun phrase + a verb phrase; for example, "The fox was white" is an example of a dependent clause that is also a simple sentence. However, this example could also be linked to another dependent clause to form a more complex sentence:

"The fox was white because of its parent"

Early language grammars that preceded English were also defined by rules. According to History of English Grammars, the first formal definition of English was Pamphlet for Grammar by William Bullokar and published in 1586. This publication was written in part to demonstrate that English was quite as rule-bound as Latin. However, it was not until the 19th century that modern-languages became formally systematized and by the early 1900s publications were being written for the teaching and study of English, even as a foreign language and including description of the intonation patterns of English (see Grammar of spoken English (1924), by H. E. Palmer in reference above).

Language grammars that are processed by computer require a grammar rule format called Backus-Naur Form (BNF) or its more convenient extended format Extended BNF (EBNF). Both computer languages and human languages can be defined this way - plus almost any other rule-based notion. The basic idea is to define a hierarchal tree of rules with "what's being defined" (WBD) at its root:

WBD := L1 | L2 L3 | L2 L4

If L1 .. L4 represent the respective digits 1 ..4, the single rule above defines WBD as either 1, 23, or 24, as the vertical bar represents alternatives and adjacent symbols are somehow ordered (concatenated in this example). Rules can also be written one to a line and non-terminal symbols on the left like WBD, L1, etc. actually should be defined:

WBD := '1'
L2 := '2'
WBD := L2 '3'
WBD := L2 '4'

The above grammar includes the same 4 quoted terminal symbols ('1', '2', '3', '4') and defines the same original sentences (1, 23, and 24). This set of sentences in called a language and it is context-free because the left side of every rule (also called a production) contains only one symbol. Note that adding the rule L2 := '5' adds two new sentences, 53 and 54, to the language. A recursive production like

L2 := '2' L2

is legal and in our example adds strings beginning with not just one but an indefinite number of '2's to the language, in effect making the language infinitely large:

1, 23, 24, 223, 224, 2223, 2224, etc.,

This set of strings can also be written as (1, 2+3, 2+4), where the "+" means "one-or-more" (the asterisk* is used to indicate "none-or-more"). Infinite size is usually the case among all computer and human languages, which is why we need grammars for computer processing. Otherwise, problems like checking the syntax, analyzing meaning, voice recognition, and translation from one language to another could be accomplished by simple (though possibly very large) table lookup.

The greatest success at computer recognition of English sentences has involved context-free grammars with a small (100-1000) number of words. As examples, these include a a simple English grammar, a poetry grammar, and a grammar for an airline flight reservation system. The principal bottleneck seems to stem from the fact that English is not a totally context-free language. The way we parse and understand sentences depends on the context, either past or future, as mathematically proven here.

One alternative to a strict BNF representation of English is statistical parsing, which associates grammar rules with a probability used to apply the proper rule. One simple Windows system OpenNLP offers several tools including "a sentence splitter, a tokenizer, a part-of-speech tagger, a chunker (used to "find non-recursive syntactic annotations such as noun phrase chunks"), a parser, and a name finder."

Another type of language representation is Link Grammar, freely available here for download or online demonstration through the School of Computer Science at Carnegie Mellon:

"Think of words as blocks with connectors coming out. There are different types of connectors; connectors may also point to the right or to the left. A left-pointing connector connects with a right-pointing connector of the same type on another word. The two connectors together form a "link". Right-pointing connectors are labeled "+", left-pointing connectors are labeled "-". Words have rules about how their connectors can be connected up, that is, rules about what would constitute a valid use of that word. A valid sentence is one in which all the words present are used in a way which is valid according to their rules, and which also satisfies certain global rules."

Once the words and phrases of a speech input stream are put into parsed sentences on a computer, the problem becomes one of Natural Language Processing and is within the sphere of Artificial Intelligence. Syntax analysis using a compiler or parser determines whether the sentence is correct according to the grammar, and computational semantics attempts to discover the meaning of a sentence. Today, English text-to-speech (TTS) is a well-established technology, but to be totally useful for applications like an automated online assistant (below), computers must be capable of bidirectional natural language communication.

The link parser is available online here and its output for the sentence "The fox was white because of its parent" is shown below.



Online English
English Phonetics
American English Phonemes
Letter Frequency
The BNF Web Club
Computational Linguistics (Wiki)
Number of Words in the English Language

Further Reading