Description
I am trying to parse Arabic texts using the pretrained model (PADT), but some portions of texts are recognized as a single sentence.
For example, this Arabic passage results in a single sentence:
ﻮﺒﺳﺮﻋﺓ ﺖﺒﻌﺘﻫ ﺄﻠﻴﺳ ﻮﺴﻘﻄﺗ ﻑﻯ ﻦﻔﻗ ﻁﻮﻴﻟ ﺎﻨﺘﻫﻯ ﺐﻫﺍ ﺈﻟﻯ ﺏﻻﺩ ﺎﻠﻌﺟﺎﺌﺑ, ﻭﺈﻟﻯ ﻉﺎﻠﻣ ﻢﺜﻳﺭ ﻢﻧ ﺎﻠﻤﻏﺎﻣﺭﺎﺗ. . . ﻒﻬﻳﺍ ﻦﻠﺤﻗ ﺐﻫﺍ ﻞﻨﺧﻮﺿ ﻢﻌﻫﺍ ﺖﻠﻛ ﺎﻟﺮﺤﻟﺓ ﺎﻠﻣﺪﻬﺷﺓ . ﺎﻠﻔﺼﻟ ﺍﻷﻮﻟ ﺎﻠﺴﻗﻮﻃ ﻑﻯ ﺞﺣﺭ ﺍﻷﺮﻨﺑ ﺏﺩﺃ ﺎﻠﻤﻠﻟ ﻲﺴﻴﻃﺭ ﻊﻟﻯ ﺄﻠﻴﺳ ﻮﻬﻳ ﺖﺠﻠﺳ ﺏﺎﻠﻗﺮﺑ ﻢﻧ ﺄﺨﺘﻫﺍ ﻊﻟﻯ ﺾﻓﺓ ﺎﻠﻨﻫﺭ، ﻻ ﺖﻔﻌﻟ ﺶﻴﺋًﺍ ﺱﻭﻯ ﺈﻠﻗﺍﺀ ﻦﻇﺭﺓ ﺥﺎﻄﻓﺓ ﺐﻴﻧ ﺎﻠﺤﻴﻧ ﻭﺍﻶﺧﺭ ﻊﻟﻯ ﺎﻠﻜﺗﺎﺑ ﺎﻟﺫﻯ ﺖﻃﺎﻠﻌﻫ ﺄﺨﺘﻫﺍ، ﻞﻜﻨﻫ ﻙﺎﻧ ﻚﺗﺎﺑﺍ ﺏﻻ ﺹﻭﺭ ﻮﻳﻻ ﺡﻭﺍﺭ؛ ﻒﺣﺪﺜﺗ ﻦﻔﺴﻫﺍ ﻕﺎﺌﻟﺓٌ ﻮﻣﺍ ﻑﺎﺋﺩﺓ ﻚﺗﺎﺑ ﺥﺎﻟ ﻢﻧ ﺎﻠﺻﻭﺭ ﻮﻤﻧ ﺎﻠﺣﻭﺍﺭ؟ ﻭﺄﺧﺬﺗ ﺖﻔﻛﺭ (ﻕﺩﺭ ﻡﺍ ﺎﺴﺘﻃﺎﻌﺗ؛ ﻒﺷﺩﺓ ﺎﻠﺣﺭﺍﺭﺓ ﺞﻌﻠﺘﻫﺍ ﺖﺸﻋﺭ ﺐﻨﻋﺎﺳ ﺵﺪﻳﺩ ﻮﺘﺒﻟﺩ)... ﻪﻟ ﺺﻨﻋ ﻊﻗﺩ ﻢﻧ ﺰﻫﺭﺓ ﺎﻟﺮﺒﻴﻋ ﻲﺴﺘﺤﻗ ﺎﻠﻨﻫﻮﺿ ﻮﻘﻄﻓ ﺍﻷﺰﻫﺍﺭ؟ ﻮﻔﺟﺃﺓ! ﻞﻤﺤﺗ ﺃﺮﻨﺑًﺍ ﺄﺒﻴﺿ ﻞﻫ ﻊﻴﻧﺎﻧ ﻭﺭﺪﻴﺗﺎﻧ ﻲﻣﺭ ﺏﺎﻠﻗﺮﺑ ﻢﻨﻫﺍ ﻞﻣ ﺖﺴﺘﻏﺮﺑ ﺄﻠﻴﺳ ﻝﺬﻠﻛ ﻭﻻ ﻞﺴﻣﺎﻋ ﺍﻷﺮﻨﺑ ﻮﻫﻯ ﻲﺣﺪﺛ ﻦﻔﺴﻫ ﻕﺎﺋﻻ ﻱﺍ ﺈﻠﻫﻯ! ﻱﺍ ﺈﻠﻫﻯ! ﺱﻮﻓ ﺄﺗﺄﺧﺭ (ﻮﺤﻴﻧ ﻒﻛﺮﺗ ﻑﻯ ﺬﻠﻛ ﻒﻴﻣﺍ ﺐﻋﺩ ﺦﻃﺭ ﻞﻫﺍ ﺄﻨﻫ ﻙﺎﻧ ﻊﻠﻴﻫﺍ ﺄﻧ ﺖﺴﺘﻏﺮﺑ ﺍﻸﻣﺭ، ﻞﻜﻧ ﻚﻟ ﺬﻠﻛ ﺏﺩﺍ ﻂﺒﻴﻌﻳﺍ ﺝﺩﺍ ﺂﻧﺫﺎﻛ) ﻮﻠﻜﻧ ﻊﻧﺪﻣﺍ ﺄﺧﺮﺟ ﺍﻷﺮﻨﺑ ﺱﺎﻋﺓ ﻢﻧ ﺞﻴﺑ ﺹﺩﺍﺮﻫ ﻮﻨﻇﺭ ﻒﻴﻫﺍ ﺚﻣ ﻢﺿﻯ ﻢﺳﺮﻋﺍ ﻮﻘﻔﺗ ﺄﻠﻴﺳ ﻑﻯ ﺎﻧﺪﻫﺎﺷ؛ ﺇﺫ ﺦﻃﺭ ﻞﻫﺍ ﺄﻨﻫﺍ ﻞﻣ ﺖﺷﺎﻫﺩ ﻖﻃ ﺃﺮﻨﺑﺍ ﻝﺪﻴﻫ ﺞﻴﺑ ﺹﺩﺍﺭ ﻭﻻ ﺱﺎﻋﺓ ﻲﺧﺮﺠﻫﺍ ﻢﻧ ﺬﻠﻛ ﺎﻠﺠﻴﺑ ﻮﻤﻧ ﺵﺩﺓ ﻒﺿﻮﻠﻫﺍ ﺝﺮﺗ ﻊﺑﺭ ﺎﻠﺤﻘﻟ ﻢﺘﺘﺒﻋﺓ ﺍﻷﺮﻨﺑ ﻮﻠﺤﺴﻧ ﺢﻈﻫﺍ ﻞﺤﻘﺗ ﺐﻫ ﻮﻫﻭ ﻲﺨﺘﻓﻯ ﺐﺳﺮﻋﺓ ﻑﻯ ﺞﺣﺭ ﻚﺒﻳﺭ ﺖﺤﺗ ﺎﻠﺳﻭﺭ. ﺎﻧﺰﻠﻘﺗ ﺄﻠﻴﺳ ﻭﺭﺍﺀﻩ ﺩﻮﻧ ﺄﻧ ﺖﺗﻮﻘﻓ ﻞﺤﻇﺓ ﻞﺘﻔﻛﺭ ﻚﻴﻓ ﺲﺘﺘﻤﻜﻧ ﻢﻧ ﺎﻠﺧﺭﻮﺟ ﺐﻋﺩ ﺬﻠﻛ. ﺎﻤﺗﺩ ﺞﺣﺭ ﺍﻷﺮﻨﺑ ﻢﺜﻟ ﺎﻠﻨﻔﻗ ﻞﻤﺳﺎﻓﺓ ﻖﺼﻳﺭﺓ ﺚﻣ ﺎﻨﺣﺩﺭ ﻒﺟﺃﺓ, ﻮﻠﻣ ﻲﻜﻧ ﻝﺩﻯ ﺄﻠﻴﺳ ﺄﻳﺓ ﻑﺮﺻﺓ ﻞﺘﻤﻨﻋ ﻦﻔﺴﻫﺍ ﻢﻧ ﺎﻠﺴﻗﻮﻃ ﻑﻯ ﺐﺋﺭ ﻊﻤﻴﻗﺓ ﺝﺩﺍ. ﻭﺎﻠﺒﺋﺭ ﻙﺎﻨﺗ ﺈﻣﺍ ﻊﻤﻴﻗﺓ ﺝﺩﺍ، ﺃﻯ ﺄﻧ ﺄﻠﻴﺳ ﺲﻘﻄﺗ ﺐﺒﻃﺀ ﺵﺪﻳﺩ، ﻒﻗﺩ ﻙﺎﻧ ﻝﺪﻴﻫﺍ ﻢﺘﺴﻋ ﻢﻧ ﺎﻟﻮﻘﺗ ﻞﺘﻨﻇﺭ ﻢﻧ ﺡﻮﻠﻫﺍ ﻮﻫﻯ ﺖﺴﻘﻃ، ﻮﻠﺘﺘﺳﺍﺀﻝ ﻊﻣﺍ ﺲﻴﺣﺪﺛ ﻒﻴﻣﺍ ﺐﻋﺩ. ﻑﻯ ﺎﻠﺑﺩﺎﻳﺓ ﺡﺍﻮﻠﺗ ﺄﻧ ﺖﻨﻇﺭ ﺈﻟﻯ ﺍﻸﺴﻔﻟ ﻞﺘﺘﺒﻴﻧ ﻡﺍ ﻲﻨﺘﻇﺮﻫﺍ، ﻮﻠﻜﻧ ﺎﻠﻇﻼﻣ ﻙﺎﻧ ﺡﺎﻠﻛﺍ ﻮﻠﻣ ﺖﺴﺘﻄﻋ ﺄﻧ ﺕﺭﻯ ﺶﻴﺋﺍ، ﺚﻣ ﻦﻇﺮﺗ ﺈﻟﻯ ﺝﻭﺎﻨﺑ ﺎﻠﺒﺋﺭ، ﻭﻼﺤﻈﺗ ﺄﻨﻫﺍ ﺕﺯﺪﺤﻣ ﺏﺎﻟﺩﻭﺎﻠﻴﺑ ﻭﺮﻓﻮﻓ ﺎﻠﻜﺘﺑ ﻒﺷﺎﻫﺪﺗ ﺥﺭﺎﺌﻃ ﻮﺻﻭﺭ ﻢﻌﻠﻗﺓ ﺐﻣﻼﻘﻃ ﻎﺴﻴﻟ ﻪﻧﺍ ﻮﻬﻧﺎﻛ. ﺝﺬﺒﺗ ﺄﻠﻴﺳ ﺏﺮﻄﻣﺎﻧًﺍ ﻢﻧ ﺄﺣﺩ ﺎﻟﺮﻓﻮﻓ ﻮﻫﻯ ﺖﻣﺭ ﺐﻫﺍ ﻮﻗﺩ ﺄُﻠﺼﻘﺗ ﻊﻠﻴﻫ ﺐﻃﺎﻗﺓ ﻚُﺘﺑ ﻊﻠﻴﻫﺍ ﻡﺮﺑﻯ ﺎﻠﺑﺮﺘﻗﺎﻟ ﻞﻜﻨﻫ ﻞﺳﻭﺀ.
I am not familiar with Arabic script (we are investigating the issue with a native speaker), so there should be something triggering the error, but it's strange because I have tried to parse the same sentence with another parser (UDpipe 2) and the same model, and it parses into 16 sentences.
many thanks!