Go to the Russian language version of this page

How to develop Cyrillic HTML page

Paul Gorodyansky 'Cyrillic (Russian): instructions for Windows and Internet'



This page explains how a developer can create an .html file with Cyrillic text inside it.

Note. English alphabet as a special case (different from say German).
Cyrillic Character Set as any other Character Set in the world (Japanese, Chinese, Central European, etc.) contains, in addition to the national symbols, a set of symbols called ASCII - in each and every legacy encoding ASCII symbols occupy first 128 positions of the encoding table while national letters occupy 2nd half of the table.

ASCII symbols (such as punctuation marks, etc.) include also English alphabet.
That is, English letters are part of Cyrillic Character Set!

So having a Web page with say Russian and English letters does not mean that you have a multilingual page. No, it's one Cyrillic encoding used on that page and that encoding contains English letters (more precise - ASCII symbols).

Different case: real multilingual pages where say Russian letters have to be combined with German letters or Polish or Japanese.
This case is covered on another page of my site - "How to develop multilingual HTML page"



That is, this article - about creation of Cyrillic (for example, Russian) Web page, i.e. Web page which announces itself as a Cyrillic one (Cyrillic encoding is specified).

Very different scenario: when you want to create a non-Cyrillic Web page (for example, Western European encoding page) and just place there couple Russian words -
it is NOT covered here, it's covered in another article, one mentioned above -
"How to develop multilingual HTML page"



A font is made for a specific encoding and because each and every encoding contains ASCII, each font in the world also contains ASCII. So any Cyrillic font contains English letters.


To create Cyrillic (or Cyrillic+English) HTML file, that is, a single Character Set text, a developer just writes some Cyrillic (+English) text while using some Cyrillic font and corresponding keyboard mode.

Most Russian-language Web pages (more than 90% for sure) are made nowadays in Windows-1251 encoding a.k.a. "Cyrillic(Windows)", just because the majority of authors work under MS Windows nowadays and 1251 is what Microsoft uses for Cyrillic, so built-in Windows Cyrillic fonts and keyboard tools are for Windows-1251 encoding.
Therefore it's much easier to type some "Cyrillic(Windows-1251)" encoding text in a Windows plain text editor than to type a "Cyrillic(KOI8-R)" encoding text.
It's practically impossible to type "Cyrillic, ISO-8859-5" encoding text under MS Windows.

But it really does not matter which encoding the author used - major browsers work fine with all Cyrillic encodings and as long as a Web page is done correctly (below), it will be shown to an end user correctly, too. Last part of this page discusses a creation of KOI8-R page - just in case.



How to write in Russian using fonts and keyboard tools - with "RU" as an indicator in taskbar - is explained in the "Introduction. Cyrillic in Windows" section of my site "Cyrillic (Russian): instructions for Windows and Internet"



If Cyrillic page has been authored correctly, then an end user will be able to read this page, for example, by switching to Cyrillic in the browser (such as View/Encoding/Cyrillic(Windows) or View/Encoding/Cyrillic(KOI8-R) in Internet Explorer) if a page does not specify its encoding.



Note. Cyrillic in the page's TITLE
If you or your future readers work under a non-Russian Windows, it's not a good idea to use Cyrillic letters in the Title of your page
(text inside HTML tags <TITLE> and </TITLE>).

For example, MS Internet Explorer ver. 5 and higher (as well as Netscape ver. 7.1 and higher and Mozilla ver. 1.4 and higher)
can show such title only under Windows 2000/XP and cannot under Windows 95/98/ME/NT, while Netscape 4.x - 7.0x will not be able to do so at all.

Here is my test page (written really for Bookmarks issue in Netscape - it's a Title text that goes to Bookmarks) that explains this:
"Title with the text different from Windows System Code Page"


Now, let's look at some methods of creating HTML text with Russian in it.

1. Plain Text editors - developer codes HTML manually

In such case all developer needs to do is to select a Cyrillic font as a working font in the plain text editor s/he uses. The switch keyboard to "RU" mode and start typing.
That's it. Knowing how to use fonts and keyboard to write in Russian, this developer just inputs the content of the HTML file - text and tags.

I personally use a very good shareware plain text editor UltraEdit that is very suitable for HTML.
It uses color for HTML tags and also lets me create my own macros. For example, I press Ctrl/L and immediately have the following construction in my text:

 <UL>
      <LI>
      <LI>
      <LI>
 </UL>     

All I need to do there to start writing Cyrillic HTML, is to choose Cyrillic font, for example:
  View/Set Font - "Courier New", Script - "Cyrillic"

Now, by switching between "EN" and "RU" I can write HTML tags and some English-Russian content.



2. WYSIWYG HTML editor - creates HTML text for you

If you work with some WYSIWYG HTML editor (that writes HTML code/tags of future Web page for you silently, 'behind the scenes'), then you must learn how to produce such Cyrillic (+English) HTML files -
a common problem is when author did not tune-up the editor for Cyrillic before starting the development and thus HTML file is created as a "Western" file
  (charset=windows-1252 or charset=iso-8859-1 or charset=us-ascii)

and not as a "Cyrillic" page (f.e. charset=windows-1251).

Usually in such case there are no Cyrillic letters in this HTML file - just SGML entities such as &aacute; or some numeric codes like &#1076; - instead of Cyrillic alphabet letters.
In your browser, when you do View/Source for such page, there are no readable Russian text there - strong sign that this Cyrillic page was incorrectly authored.

Also, at the top of such incorrectly developed 'Cyrillic' page one could see that it's marked as "Western" because it has the line
  <META http-equiv="content-type" content="text/html; charset=windows-1252">
(or sometimes "iso-8859-1" or "us-ascii")
that means "Western European" encoding.

This page will not be readable for most users. A good, readable-by-all Cyrillic HTML file must comply with the following (as an example, I use below windows-1251 as a Cyrillic encoding of a page):

Correct tune-up of your WYSIWYG HTML editor would prevent the problems listed above.
The tune-up for several editors is given below.


How to tune-up a WYSIWYG HTML Editor to create correct Cyrillic HTML

Each WYSIWYG HTML editor requires a unique, its own tune-up for Cyrillic, and a developer must find it out before starting to write a code. Some editors may not be able to work with Cyrillic at all...

Below are the tune-up instructions for some WYSIWYG HTML editors.

Important.   After you read the tune-up instruction for the editor of your choice, do not forget to read generic (applicable for any editor) "Final Notes for Cyrillic HTML" part of this page that lists some common mistakes people do causing the page to be unreadable for some readers.



I personally tried the Cyrillic tune-up steps only for the following WYSIWYG HTML editors:

There are couple more editors that I did not see myself but found tune-up steps on the Web:


Here are the tune-up instructions (using Cyrillic(Windows-1251) encoding as an example):


Final notes regarding correct Cyrillic HTML

After you've developed a Cyrillic HTML page either 'by hand' (using a plain text editor and typing HTML code/tags yourself) or by letting a WYSIWYG HTML editor to write HTML code/tags for you, you need to check that this Cyrillic Web page will be readable for any end user.
Here are some common mistakes that a developer makes causing the page to be unreadable for some users (based on their browser and/or computer type).

First two have been already mentioned above, but it's worth to list all items here, in one place.

You need to check the Source HTML code that a WYSIWYG HTML editor made for you to make sure you did not make the common mistakes listed below.
You can check the Source HTML text via View/Source option of your browser or your HTML editor or by opening .html file in a Plain Text editor that lets you look at the plain text Cyrillic - HTML text is a plain text, same as in a .TXT file.

Mistake 1. Cyrillic HTML text does not contain normal Cyrillic alphabet letters.
Usually it happens when an author uses some WYSIWYG HTML editor that was not tuned-up for the creation of a Cyrillic HTML text.
As a result, View/Source would show the following inside the page instead of Cyrillic alphabet letters:

Mistake 2. The page announces itself as "Western European" and not as "Cyrillic".
That is, charset (encoding) value for this page is not a Cyrillic one (such as windows-1251 for example), but "Western" - iso-8859-1 or windows-1252 or us-ascii.

Charset (encoding) value can be set either in HTTP Header sent by the Web server to the browser along with the page itself or in the 'body' of HTML text of that page, in its Header part, for example
  <META http-equiv="content-type" content="text/html; charset=windows-1251">

Mistake 3. HTML tags <FONT FACE=...> are used for Cyrillic strings.

A good, readable-for-all Cyrillic Web page should not contain HTML tags <FONT FACE=...>.
An author should not assume what specific fonts on an end user computer would contain Cyrillic - how could s/he?
It's very much possible that on author's computer with Office 2000 installed, "Verdana" contains Cyrillic while an end user on Windows 98 may have Western-only font "Verdana" and thus will not see any readable Cyrillic if this author surrounds Cyrillic text with <FONT FACE=Verdana>!

It's true not just for Cyrillic but for any non-Western-European script.
You may want to read my separate page regarding the tags <FONT FACE=...> and <FONT SIZE=...> (that also may make the text unreadable):
"Incorrectly designed, unreadable Russian pages".

If your WYSIWYG editor has surrounded your Cyrillic strings with such tags, you may need to open your HTML file in a plain text editor (or use Source Edit if such option exists in your WYSIWYG editor) and manually remove such tags (only those that surround Cyrillic text. Western European strings can have it).


Note. Creation of a KOI8-R page.

Eventhough nowadays most Russian-language Web pages are in Cyrillic(Windows-1251) encoding, one could develop a Russian page in Cyrillic(KOI8-R) encoding.

As it was explained on the "Cyrillic Fonts and Encodings" section of my site "Cyrillic (Russian): instructions for Windows and Internet",

modern applications such as Netscape 4+/Mozilla, Internet Explorer, Front Page 2000, etc. allow a user to work with native for MS Windows set of fonts and keyboard tools - of "Cyrillic(Windows-1251)" encoding and process KOI8-R automatically, without KOI8-R fonts and keyboard tools.

It means the following for a Cyrillic HTML page developer:



If you develop a KOI8-R page without WYSIWYG HTML editor, 'by hand' - using plain text editor, then:


Paul Gorodyansky. 'Cyrillic (Russian): instructions for Windows and Internet'