How to use Unicode (UTF8) in Pajek

Unicode is supported in Pajek 2.00 (October 1, 2010) and later.

Preparing Unicode files for Pajek

For preparing Unicode (UTF8) NET files you must use some Unicode editor.
Unicode UTF8 files should be saved with BOM (Byte Order Mask). In some editors BOM is called also a signature.
BabelPad can be downloaded for free. In BabelPad you must check the option: Byte Order: Byte Order Mark when saving files.
There are several other editors that support Unicode available, like EmEditor, NotePad++, SuperEdi...

Reading Unicode files

  • Pajek recognizes and reads Unicode UTF8 files if they start with a proper BOM (try unicode1.net).
  • Pajek reads (usual reading - not Unicode) files where nonASCII characters are presented as &#dddd; While reading such files nonASCII characters are transformed to Unicode (try unicode2.net).

Reading networks without labels and adding labels

Since Unicode labels can be very long and therefore spent a lot of memory, in case of larger networks it is better not to load labels in Pajek until really necessary (e.g. small network is extracted which can be visualized). To append labels which are stored in Unicode (UTF-8) file you must:
  • Read file without labels (before reading network uncheck Options/Read-Write/Read-Save vertices labels) then select Network/Create New Network/Transform/Add/Vertex Labels/from File(s), or, if the network is already loaded in Pajek:
  • Use Network/Create New Network/Transform/Add/Vertex Labels/Default, then select Network/Create New Network/Transform/Add/Vertex Labels/from File(s) and select the Unicode NET file.

Saving as Unicode files

By default networks are stored in ASCII files where nonASCII characters are always stored as &#dddd; To store files as native Unicode UTF8 you must check: Options/Read-Write/Save Files as Unicode UTF8 with BOM. This option will stay checked also when you run Pajek next time, unles you uncheck it.

Selecting fonts

When working with Unicode in Pajek you must/can select proportional and monospaced font (Options/Font). Arial Unicode MS and Lucida Sans Unicode seems to be the most powerful proportional fonts at the time of writing this.
For monospaced font we recommend GNU Unifont. The other monospaced font that recognizes quite a lot of letters is Courier New.
Monospaced font is used in Pajek in Report window and some other windows where reports are column aligned, e.g. Network/Info/General, while proportional font is used otherwise (including Draw window). When you change the font and click on Report window the contents of the window will be refreshed using the latest font selected.

Also

Exports to SVG and X3D support Unicode.
From 2.00 on Pajek reads also Unix files.
From 3.11 on tabulators (additionally to spaces) are used as delimiters in NET and other input files (useful for example when copy-pasting data from Excel - tabulators do not need to be replaced by spaces anymore).