Skip to content

Add support for Arabic script and RTL languages #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Heshmatkhah opened this issue Apr 14, 2025 · 1 comment
Open

Add support for Arabic script and RTL languages #57

Heshmatkhah opened this issue Apr 14, 2025 · 1 comment

Comments

@Heshmatkhah
Copy link

Heshmatkhah commented Apr 14, 2025

Hi
Tanks for your job to replace the wkhtmltox.
As in odoo/odoo#205137, you started to migrate the Odoo to paper-muncher, and it's grate.
But as I tested it only supports for Latin script:

Here is a screenshot from my chrome of a text I copped from Vazirmatn home page, I Expect something similar to this from output:

Expected output screenshot

Here is some key points:

  • The RTL: Some languages like Persian/Arabic/Hebrew/Urdu/... are RTL.

  • The Font: Default OS fonts are not good at showing Characters in Persian/Arabic/Hebrew, most of the time they end up in printing (U+FFFD) or (U+25AF). Almost all the time we need to use custom fonts for this languages.

  • The Cursive: As described in the Wikipedia page, Some laters in Arabic script are change shape according to their surrounding characters. here an example:
    سلام = س + ل + ا + م (in this example م didn't changed, but other 3 changed)

    • Correct: Arabic script Cursive example

    • Wrong Cursive + RTL: Wrong Cursive + RTL

    • Wrong Cursive + RTL + Font : Wrong Cursive + RTL + Font

You can take a look at Arabic Reshaper and how it preprocess texts before passing it to Pillow

HTML Code

here is an example page I try to print/render using your paper-muncher:
<!DOCTYPE html>
<html lang="fa" dir="rtl">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>فونت وزیرمتن</title>
    <!--
        It doesn't support to load stylesheets!
        <link href="https://fonts.googleapis.com/css2?family=Vazirmatn:wght@400;700&display=swap" rel="stylesheet">
    -->

    <style>
        /* Copy of google font stylesheet content: */
        /* arabic */
        @font-face {
            font-family: 'Vazirmatn';
            font-style: normal;
            font-weight: 400;
            font-display: swap;
            src: url(https://fonts.gstatic.com/s/vazirmatn/v13/Dxxo8j6PP2D_kU2muijlGMWWIGroe7ll.woff2) format('woff2');
            unicode-range: U+0600-06FF, U+0750-077F, U+0870-088E, U+0890-0891, U+0897-08E1, U+08E3-08FF, U+200C-200E, U+2010-2011, U+204F, U+2E41, U+FB50-FDFF, U+FE70-FE74, U+FE76-FEFC, U+102E0-102FB, U+10E60-10E7E, U+10EC2-10EC4, U+10EFC-10EFF, U+1EE00-1EE03, U+1EE05-1EE1F, U+1EE21-1EE22, U+1EE24, U+1EE27, U+1EE29-1EE32, U+1EE34-1EE37, U+1EE39, U+1EE3B, U+1EE42, U+1EE47, U+1EE49, U+1EE4B, U+1EE4D-1EE4F, U+1EE51-1EE52, U+1EE54, U+1EE57, U+1EE59, U+1EE5B, U+1EE5D, U+1EE5F, U+1EE61-1EE62, U+1EE64, U+1EE67-1EE6A, U+1EE6C-1EE72, U+1EE74-1EE77, U+1EE79-1EE7C, U+1EE7E, U+1EE80-1EE89, U+1EE8B-1EE9B, U+1EEA1-1EEA3, U+1EEA5-1EEA9, U+1EEAB-1EEBB, U+1EEF0-1EEF1;
        }

        /* latin-ext */
        @font-face {
            font-family: 'Vazirmatn';
            font-style: normal;
            font-weight: 400;
            font-display: swap;
            src: url(https://fonts.gstatic.com/s/vazirmatn/v13/Dxxo8j6PP2D_kU2muijlE8WWIGroe7ll.woff2) format('woff2');
            unicode-range: U+0100-02BA, U+02BD-02C5, U+02C7-02CC, U+02CE-02D7, U+02DD-02FF, U+0304, U+0308, U+0329, U+1D00-1DBF, U+1E00-1E9F, U+1EF2-1EFF, U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
        }

        /* latin */
        @font-face {
            font-family: 'Vazirmatn';
            font-style: normal;
            font-weight: 400;
            font-display: swap;
            src: url(https://fonts.gstatic.com/s/vazirmatn/v13/Dxxo8j6PP2D_kU2muijlHcWWIGroew.woff2) format('woff2');
            unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
        }

        /* arabic */
        @font-face {
            font-family: 'Vazirmatn';
            font-style: normal;
            font-weight: 700;
            font-display: swap;
            src: url(https://fonts.gstatic.com/s/vazirmatn/v13/Dxxo8j6PP2D_kU2muijlGMWWIGroe7ll.woff2) format('woff2');
            unicode-range: U+0600-06FF, U+0750-077F, U+0870-088E, U+0890-0891, U+0897-08E1, U+08E3-08FF, U+200C-200E, U+2010-2011, U+204F, U+2E41, U+FB50-FDFF, U+FE70-FE74, U+FE76-FEFC, U+102E0-102FB, U+10E60-10E7E, U+10EC2-10EC4, U+10EFC-10EFF, U+1EE00-1EE03, U+1EE05-1EE1F, U+1EE21-1EE22, U+1EE24, U+1EE27, U+1EE29-1EE32, U+1EE34-1EE37, U+1EE39, U+1EE3B, U+1EE42, U+1EE47, U+1EE49, U+1EE4B, U+1EE4D-1EE4F, U+1EE51-1EE52, U+1EE54, U+1EE57, U+1EE59, U+1EE5B, U+1EE5D, U+1EE5F, U+1EE61-1EE62, U+1EE64, U+1EE67-1EE6A, U+1EE6C-1EE72, U+1EE74-1EE77, U+1EE79-1EE7C, U+1EE7E, U+1EE80-1EE89, U+1EE8B-1EE9B, U+1EEA1-1EEA3, U+1EEA5-1EEA9, U+1EEAB-1EEBB, U+1EEF0-1EEF1;
        }

        /* latin-ext */
        @font-face {
            font-family: 'Vazirmatn';
            font-style: normal;
            font-weight: 700;
            font-display: swap;
            src: url(https://fonts.gstatic.com/s/vazirmatn/v13/Dxxo8j6PP2D_kU2muijlE8WWIGroe7ll.woff2) format('woff2');
            unicode-range: U+0100-02BA, U+02BD-02C5, U+02C7-02CC, U+02CE-02D7, U+02DD-02FF, U+0304, U+0308, U+0329, U+1D00-1DBF, U+1E00-1E9F, U+1EF2-1EFF, U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
        }

        /* latin */
        @font-face {
            font-family: 'Vazirmatn';
            font-style: normal;
            font-weight: 700;
            font-display: swap;
            src: url(https://fonts.gstatic.com/s/vazirmatn/v13/Dxxo8j6PP2D_kU2muijlHcWWIGroew.woff2) format('woff2');
            unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
        }

        body {
            font-family: 'Vazirmatn', sans-serif;
            background-color: #f9f9f9;
            color: #333;
            max-width: 800px;
            margin: 50px auto;
            padding: 20px;
            line-height: 2;
            font-size: 1.2em;
        }
    </style>
</head>
<body>
<h1>فونت وزیرمتن</h1>
<p style="background-color: lightgray; border-radius: 1rem; padding: 1rem;">
    پروژه وزیرمتن یک خانواده تایپ‌فیس فارسی-عربی با ۹ وزن است که در سال ۱۳۹۴ با نام «وزیر» آغاز شد و در طول این سال‌ها طراحی و توسعه آن ادامه یافت. فونت وزیرمتن شکلی ساده و روان دارد و می‌توان از آن در اغلب زمینه‌ها استفاده کرد. برای حروف لاتین از فونت Roboto استفاده شده است. این یک نرم افزار آزاد و متن‌باز است.
</p>
</body>
</html>

And here is output of paper-muncher r persian-font-rtl-style-test.html -o persian-font-rtl-style-test.png:

Image showing rendering error

You can find output of paper-muncher p persian-font-rtl-style-test.html -o persian-font-rtl-style-test.pdf
here.

@Louciole
Copy link
Member

Louciole commented Apr 14, 2025

Hello!
Thanks for the great issue!
We're already aware of the it.
For now, we are improving compliance for the most common use cases.
You can find the list of wanted properties here.
Language compliance will come after, and we will return to you!
I'll leave this issue open as long as it isn't working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants