This page is Chapter 2 of the "Unicode and Cyrillic: problems & solutions" section of my site.

Go to the Russian language version of this page

Unicode and Cyrillic: Copy/Paste and other problems



Problem: No readable Cyrillic or just question marks (????) while working with MS Word 97 and newer or other Unicode-based programs
(Internet Explorer, Outlook Express, MS Outlook, Netscape 7/Mozilla, etc.):



Terminology Note.
Most of these problems do not exist in the 'Russian Windows' environment.
When I write below, "Russian version of Windows", I do not mean only this special, localized version where a word "Start" is in Russian.
What I mean is any Windows installation (even with English interface) where a system code page (System Default locale) is Russian ("Cyrillic" code page 1251).
(system code page issue is described in details in the "Full Russification" section of my site)

So, to avoid this long description - "...when system code page..." - I call such Windows installation a Russian Windows.



The reason for the appearance of these new problems in the modern software (such as Word 97/2000, Internet Explorer, etc.) is that the modern applications use a new type of data encoding - Unicode.

Many Windows applications are still non-Unicode programs and use legacy encodings such as "Western European, Code Page 1252" or "Cyrillic, Code Page 1251".

Examples of the non-Unicode programs where you can type some Cyrillic text:



When you want to move Cyrillic texts (perform Copy and Paste) between Word 97 (or another Unicode-based program) and some non-Unicode program, or want to work with a plain text file (.TXT) along with Word 97/2000, you may face some problems:

working simultaneously with both types of Cyrillic texts - texts that use Unicode and those that do not use Unicode - often leads (under a non-Russian Windows) to the unreadable, gibberish texts or just a set of question marks instead of Cyrillic letters.


Below you find the solutions for these problems.

Note.
I assume that you already know how to enable Cyrillic fonts and Cyrillic keyboard tools in your Windows.
If it's not the case, then do it before reading any further here.
To enable Cyrillic fonts and keyboard, read "Cyrillic in Windows" section of my site.
           
Table of Content


  

Copy/Paste:
Unicode program ---> non-Unicode program

Trying to copy some Cyrillic text from a Unicode program (f.e. Word 97 and newer, Internet Explorer, Outlook Express, MS Outlook 2000, Netscape 7/Mozilla) to a non-Unicode one (f.e. Netscape 4.79, UltraEdit, Dreamweaver)
and see just question marks (???) instead of Cyrillic as a result.

This usually happens under a non-Russian version of Windows (that is, where System Code Page is not "Cyrillic, CP-1251").
Conversion from Unicode text to a non-Unicode text is usually based on System Code Page, and thus under the "Western" Windows installation (where system code page is "Western European, CP-1252") the following happens:

Solution: use an intermediate window - a program that understands Unicode and also lets you specify that you are dealing with "Cyrillic" and not "Western" encoding.

I suggest to use one of the following programs of such type (click on the corresponding link below to read the instruction):


Netscape 4.7x as intermediate window for Copy/Paste

Netscape 4 can help to solve the problem during Copy/Paste from a Unicode program (f.e. Internet Explorer or Word 97/2000) to a non-Unicode program (f.e. plain text editor or Dreamweaver):

Netscape Communicator 4. has built-in HTML editor - Composer, that is good for this - it understands Unicode and also lets us specify that we are dealing with Cyrillic text and not "Western":

Now you can use this window as an intermediate one:

Back to the Table of Content


Unicode text editor UniPad as intermediate window for Copy/Paste

UniPad (freeware for personal use) can help to solve the problem during Copy/Paste from a Unicode program (f.e. Internet Explorer or MS Word 97/2000) to a non-Unicode program (f.e. plain text editor or Dreamweaver):

Now you can use this window as an intermediate one:

Back to the Table of Content


   

Copy/Paste:
non-Unicode program ---> Unicode program

Trying to copy some Cyrillic text from a non-Unicode program (f.e. Netscape 4.79 or UltraEdit or Macromedia Dreamweaver) to a Unicode program (f.e. Word 97 and newer, Internet Explorer, Outlook Express, MS Outlook 2000, Netscape 7/Mozilla) and see just unreadable (gibberish) text instead of Cyrillic as a result.

This usually happens under a non-Russian version of Windows (that is, where System Code Page is not "Cyrillic, CP-1251").
The Unicode program does not know that the incoming text is a Cyrillic one and is using system code page as a default during the conversion from non-Unicode text to Unicode text.
For example, under "Western" installation of Windows it looks at the incoming bytes as a sequence of "Western" encoding bytes and performs the conversion
    "Western European, CP-1252" ---> Unicode

For example:
Cyrillic small 'd' contained in that original non-Unicode text has a byte value of 228 in the "Cyrillic, CP-1251" code page. But that Unicode program assumes that incoming data belong to "Western" encoding! In "Western, CP-1252" code page a value 228 is a German a-umlaut, so the following conversion takes place:
    non-Unicode German a-umlaut ---> Unicode German a-umlaut
and you'll see German a-umlaut in that Unicode program instead of Russian 'd' after you paste the text there.

There are 2 possible solutions to this situation. Some non-Unicode programs let you use very simple Solution 1, so just try it first, but if it does not work, then use Solution 2.

Note. Word 2000/XP has its own solution for the text copied to a Word's window - see "MS macro Eefonts for Word 2000/XP" section below.


Solution 1
Use the following approach while copying the text from a non-Unicode program (f.e. Netscape 4.79 or UltraEdit or Dreamweaver, etc.) to the Windows Clipboard:

This Solution 1 (switching keyboard to Cyrillic mode before copying) may not work for each and every non-Unicode program.
In such case:


MS macro Eefonts for Word 2000/XP

Microsoft offers a free macro that solves the problem of a non-readable text copied from some non-Unicode program to a Word 2000/XP document.
Same macro helps to make readable an old Cyrillic .doc created in the past with non-Unicode Word 6 for example.

Go to the Microsoft page (Knowledge base article Q260162)
"Incorrect Characters Appear When You Open Document in Earlier Eastern European Version of Word".

Find there a link to download Eefonts.exe.

Download and install it. Now in your Word 2000/XP you will have a new option under the Tools menu:
  Tools / Fix Broken Text

When you copy some Cyrillic from a non-Unicode program to Word 2000/XP, you will see first some gibberish text (as explained above).
You need to select that text and

Now you will have a readable Cyrillic!


Solution 2
for non-Unicode --> Unicode copying case

The universal solution for the successful copy of Cyrillic text from a non-Unicode program (Netscape 4.79, Dreamweaver, plain text editor, etc.) to a Unicode one (Internet Explorer, Outlook, etc.) is the following:

Use an intermediate window such as a program that understands Unicode and also lets you specify that you are dealing with "Cyrillic" and not "Western" encoding.

I suggest to use a freeware (for personal use) editor UniPad as such intermediate program:

Now you can use this UniPad window while copying Cyrillic from a non-Unicode program (f.e. Netscape 4.79 or Dreamweaver or UltraEdit) to some Unicode program (MS Word 97/2000, Internet Explorer, Outlook Express, etc.):

Back to the Table of Content


 

Cyrillic in MS Word 97 and newer:
working with .TXT files

The above happens under a non-Russian Windows, i.e. when system code page is not "Cyrillic, CP-1251".

Plain text files (.TXT) contain non-Unicode text, so when Unicode-based Word 97/2000 deals with such files, it performs the conversion between Unicode text and non-Unicode text.
By default, this conversion uses system code page and therefore we see the above problems if system code page is say "Western, CP-1252" and not "Cyrillic".

The solution is to specify that the content of the plain text (.TXT) file belongs to "Cyrillic" encoding and not to system code page.

MS Word 2000 and newer has its own way to specify that, while Word 97 requires an intermediate program to be used.


   
Here are the solutions for the two cases where plain text (.TXT) Cyrillic files are involved:


 
Opening a Cyrillic plain text (.TXT) file in MS Word 97 and newer

Let's assume that you have some plain text (.TXT) Russian file that contains the text in "Cyrillic CP-1251" encoding (a.k.a "Cyrillic(Windows)" or "Windows-1251").

Word 2000 (and newer versions) allows you to specify that this file is really a Cyrillic one, while Word 97 requires more complex approach to be used.

MS Word 2000 and newer

MS Word 97

There are several possible solutions for loading Cyrillic .TXT file into Word 97, let's look at two of them:


 
Netscape-based method for loading Cyrillic .TXT into Word 97

In Netscape, do File/Open, choose "Text (.TXT)" as a "Files of Type".
Your Cyrillic .txt opens in Netscape. Change encoding to Cyrillic(Windows-1251):

Now you should see normal Cyrillic text and can safely copy it to Word 97.


 
Plain Text non-Unicode editor-based method of loading Cyrillic .TXT file into Word 97.

Instead of opening your Cyrillic plain text (.TXT) file directly in Word 97, you need to open it in any non-Unicode plain text editor and then use Copy/Paste methods of this page to place this text into Word 97.
I am using a shareware plain text editor UltraEdit, so you can download it, too or use your favorite plain text editor that works with Cyrillic.

Let's use UltraEdit as an example:

Back to the Two .TXT related problems list

Back to the Table of Content


 

Saving Word 97+ Cyrillic text as a plain text (.TXT) file

Let's assume that you want to save your document opened in MS Word, as a plain text (.TXT) Russian file.

Word 2000 (and newer versions) allows you to specify that this file is really a Cyrillic one, while Word 97 requires more complex approach to be used.

MS Word 2000 and newer

That newly created plain text file contains normal Windows-1251 Cyrillic text and not question marks :)

MS Word 97

So you have some Cyrillic text in your open MS Word 97 window and want to save it as a a plain text file.
Instead of creating this Cyrillic plain text (.TXT) file using Word 97, you need to copy the text to any non-Unicode plain text editor and then do Save As there.
I am using a shareware plain text editor UltraEdit, so you can download it, too or use your favorite plain text editor that works with Cyrillic.

Let's use UltraEdit as an example:

Back to the Two .TXT related problems list

Back to the Table of Content


Paul Gorodyansky. 'Cyrillic (Russian): instructions for Windows and Internet'