Ancient Greek didn't have lower case or pronunciation marks. Unfortunately, modern typesetting practice for ancient Greek requires lower case and the marks. Upper case, unmarked text is considered ugly.
The marks around greek letters are called diacriticals. The diacritical pronunciation marks were invented by Aristophones of Byzantium around 200BC as a pronunciation aid for non-Greeks.
Native speakers just 'know' how to pronounce words. For example, English speakers have no trouble pronouncing 'bass' (the fish) and 'bass' (the guitar).
Greek typeset with the pronunciation diacriticals is called polytonic. Recently (in 1982) the Greeks switched to a simpler, 'monotonic' system — they don't need a complex system of marks to pronounce their own language. This article is about the polytonic system of typesetting ancient Greek with pronunciation symbols.
The pronunciation symbols are:
How shall ancient Greek be represented on the web? The simpliest way is via 'transliteration': using similar letters in our Latin alphabet. For example, ΖΕΥΣ would be written 'Zeus'. The advantage of this system is that all English readers can read it. Project Gutenberg recommends this system for recording short Greek quotes in non-Greek books. Unfortunately, when I encoded the Historia Numorum this way, I was accused of 'dumbing down' the text. Readers knowing Greek are irritated by transliteration.
Another popular way is to use the Symbol font. Most computers have one. Unfortunately, the symbol font lacks lower-case and diacriticals. Also, it is considered a 'non-standard encoding' and some browsers and search engines can't or refuse to display it.
There is an ISO encoding for Greek, but it can't be used easily for books in European languages that quote Greek. Unicode Basic Greek fixed this problem, but standardized on Recent modern Greek — the monotonic system. Ancient Greek can't be accurately represented this way.
To get full polytonic Greek, one must use a combination of Unicode Basic Greek and Unicode Greek Extended. Unfortunately, there are two ways to combine Extended Greek and Basic Greek: using precomposed characters and using combining diacriticals. Either way can produce perfectly rendered polytonic Greek, but not all computers can handle both methods. Some browsers make one-or-the-other system very ugly.
Using precomposed characters, capital Alpha with a smooth breathing is Ἀ. Using combining characters, the same character is ᾿Α. The merits of which system of encoding is superior may be debated, but a more important question is how the letters are entered into the computer in the first place! Keyboards, after all, lack a 'capital Alpha with smooth breathing key'. No one can remember the complex extended Greek codes. (Basic Greek can be done using HTML mnemonic entities such as Α. Unfortunately, no accents can be entered in this system — not even modern Greek ones. The HTML entities are more for mathemtical symbols than Greek prose.
There are four common techniques for getting Greek into the computer: 'dead-keys', pop-up keyboards, OCR, and translation from 'Beta code'.
Dead-keys are how keyboards work in Greek. In the dead-key system, certain keys sometimes behave like shift keys and sometimes like regular keys. For example, the [ key is a dead-key for the circumflex/perispomeni accent. On a US-configured keyboard the '[' produces a '['. On a Greek dead-key keyboard the '[' produces nothing at first. If the next key pressed is one that can accept a circumflex accent it appears with one. If the next key pressed can't be accented that way, a [ followed by the second character appears. Skilled typists can produce polytonic text very quickly with this system, but the unresponsive (dead) nature of certain keys can confuse beginners.
Pop-up keyboards are programs or web pages that have all the characters as buttons. Every Windows system includes 'Character Map', a program that can copy any character (including polytonic Greek) to the clipboard. Unfortunately, 'Character Map' is irritating to use for Greek because it doesn't group the polytonic symbols near the non-polytonic ones. It is also very slow.
Optical Character Recognition (OCR) is scanning existing text and having the computer recognize it. Unfortunately, many OCR programs can't handle Greek. The ones that do handle Greek mostly handle recent, monotonic Greek. I'll discuss OCR in more detail later.
Beta Code is a system that uses latin letters, decorated with modern punctuation, to ape polytonic symbols. For example, capital Alpha with smooth breathing is represented as '*)A'. A nice property of beta code is that there are (free) web pages that will translate from Beta Code to Unicode. This means you can teach yourself beta code and get all the benefits of hand-crafted polytonic input. You can also easily translate beta code to transliterated english with a few simple substitutions, like changing the beta code's 'Q' to 'th' for theta.
OCR of mixed Greek/English text is much, much harder than regular OCR. It is often very difficult to teach OCR software the difference between 0 (zero), O (oh), Ο (omicron), and Θ (theta). So many Greek letters are similar to Latin letters that the source material quality and the OCR software make a big difference.
Although ABBYY can't OCR polytonic Greek I discovered that any OCR package can be taught polytonic Greek! The trick involves ligatures.
Ligatures are two letters that appear as one character. Æ is a ligature from Latin. In typesetting the sequences ff, fl, and ffl often get a specially carved type which flow more smoothly on the page than combining the 'f' and 'f' blocks. (This is especially true for italic fonts.)
Most OCR programs can be taught new ligatures. I taught ABBYY fine reader to reconize Greek letters not as the actual Unicode characters, but as if the Greek letters were ligatures of the Beta code characters! This required telling ABBYY to forget everything it knew about character shapes (uncheck 'use built-in patterns'), and hand-holding it through the first few pages.
Beta code can encode multi-lingual text. The $ switches into Greek mode, and the & switches into Latin. There are also switches for Hebrew and Arabic. I found it difficult to write macros and scripts to deal with beta-style mode switching, so I invented my own beta-like code for typesetting the Digital Historia Numorum and numismatic books.
My system is to use the in front of a transliterated character (e.g. a, b, th, ...) to indicate Greek is desired. A parenthesis before the character indicates a breathing, and a /, \, or ~ indicates an accent. Unlike beta code, my system lets me write in the same case as the intended output (beta code uses the asterisk, *, in front of capital letters.) I use `sf for final sigma. I write Zeus as "`Z`e`u`sf"
My system has the benefit of not needing modes (every snible-code sequence becomes the same Unicode sequence—no Greek/Latin/Hebrew mode differences, and conversion is merely string replacement. My system also looks almost exactly like transliterated text; beginners can read it merely by ignoring the symbols around the letters.
I feel kind of embarassed to call this 'snible-code', as my system is less than beta-code.
Because my system is just string replacement, it doesn't take programming skill to write converters — it is just string substitution for each character. Because it is so easy to write converters it is easy to experiment with different converters, such as one that strips off accents, breathings, and converts everything to upper-cased (for the reader who doesn't yet know the lower-cased alphabet.) It is also easy to write a converter that strips the backslashes, producing transliterated text for readers who don't know Greek at all.
All of these tricks are possible with beta-code, of course, but with a bit more programming.