While the majority of typographic control for web pages – font selection, leading, kerning, hyphenation and more – is the realm of , there are some basic character elements that are the responsibility of HTML, the basis of which is character encoding in UTF-8 formatting.

HTML documents specify that they are using UTF-8 in several places:

  • In the appropriate meta tag.
  • At the file level, when the page is saved.

UTF-8 is a character encoding scheme: a format that decides how characters are encoded in a document. (Note that this is different from, but intertwined with, font selection i.e. the choice of typeface used). Put in very simple terms, utf-8 allows us to use any character from any language in a document, along with a wide variety of glyphs (symbols) and punctuation, including many that are not on your keyboard.

Some of these typographic symbols are subtle: a true left opening quote (“), which is different from the character generated by your keyboard ("), an em-dash (–) from an en-dash (-). They do make an important difference in your document, contributing to a well-presented page.

Enjoy this piece? I invite you to follow me at twitter.com/dudleystorey to learn more.