Right to Left Text support in Ibex
Ibex will render Arabic text. Support for right to left (rtl) text includes:
- mirroring - where a character such as '(' is reversed so that it is still correct when the text is read right to left
- shaping of Arabic text - where a character changes shape depending on its surrounding characters;
- support for the Unicode Bidirectional Algorithm including the explicit embedding characters: LRO, RLO, LRE, RLE, PDF
Text is read from the XSL-FO file in the natural order, by which we mean order in which the characters would be written. This means that for a line of Arabic text (i.e. right to left) the first character on the line (which will be displayed at the right hand end) is the first character in the XML.
Ibex can determine the direction of text from the letters which make up the text. It is not necessary to use direction="rtl" to specify text direction.
Specifying writing-mode="rl-tb" can be used to tell Ibex which side of an element is the start edge. This affects (a) the order in which fo:table-cell elements are positioned across the fo:table-row, and (b) the affect of properties such as border-start-width. When writing-mode="lr-tb" the start edge is the left hand edge, when writing-mode="rl-tb" the start edge is the right hand edge.
For clarity the Arabic text in the example which follows is entered as Unicode values such as م. This prevents your browser from applying any formatting and makes the example easier to follow. In normal usage Arabic text would be entered as characters, not Unicode values
Example
The following code displays two Arabic characters separated by a space:
<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="page"
page-height="29.7cm"
page-width="21cm" margin="2cm">
<fo:region-body margin-top="3cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="page">
<fo:flow flow-name="xsl-region-body">
<fo:block-container writing-mode="rl-tb">
<fo:block font-family="arial" font-size="30pt" >
م ك
</fo:block>
</fo:block-container>
</fo:flow>
</fo:page-sequence>
</fo:root>
The two characters used are:
Unicode Value | Name | Appearance | Unicode |
---|---|---|---|
U+0645 | ARABIC LETTER MEEM | م | http://www.unicode.org/charts/PDF/U0600.pdf |
U+0643 | ARABIC LETTER KAF | ك | http://www.unicode.org/charts/PDF/U0600.pdf |
Ordering
The two characters appear in the above FO in the order U+0645 U+0643. When Ibex renders the PDF it recognises that the text is Arabic and reverses the order of the characters in the text, to produce this (note the space between the characters):
Shaping
The above image has a space between the two characters. If this space is removed script shaping will take place and the glyphs will be changed to reflect the position of the character in the word. Each character has four possible formats:
- initial - when the character is the first character in the word
- medial - when the character is in the middle of the word
- final - when the character is the last character in the word
- isolated - when the character is by itself
When the space between the two letters is removed Ibex applies script shaping and produces the following text:
Script shaping has changed each of the characters as shown in this table:
Original Unicode Value | Old Name | Old Appearance | New Unicode Value | New Name | New Appearance |
---|---|---|---|---|---|
U+0643 | ARABIC LETTER KAF | ك | U+FEDA | ARABIC LETTER KAF FINAL FORM | ﻚ |
U+0645 | ARABIC LETTER MEEM | م | U+FEE3 | ARABIC LETTER MEEM INITIAL FORM | ﻣ |
Which letter is converted to the initial or final form is calculated reading right to left, so the rightmost character in a word is the initial character, the leftmost word is the final one.
Old Net implementation details
This only applies to versions of Ibex before version 6.0
The .Net version of Ibex ships with an assembly called something like ibexshaping20.dll. The exact name will depend on the .Net framework you are using and whether your code is compiled for 32 or 64 bits (see the Ibex manual for details).
This assembly contains a C++ wrapper for the Windows Uniscribe API which is used to do the shaping. This assembly is loaded using reflection, so if no right to left text will be processed by your application you do not need to deploy this assembly.
Know issues
In some cases shaping Arabic text with different applications will not produce identical results. Specifically we have seen instances where Internet Explorer and Mozilla will display the same Arabic text differently.