0

I have a .dic file (for italian and for a kobo) which I'm trying to modify to prevent some hyphenation errors in my file. At the top of the file there is a list of punctuation marks followed and preceded by a 1. What does that do? How does it work? Then, in the file there's this rule: l'2 which should impede words being hyphenated after the apostrophe in constructs with l': but nonetheless, I found an horrible hyphenation like this one (l'-hai) was not prevented. Is this fixable?

9
  • 1
    Welcome to TeX.SE! Where does this .dic file come from? Please explain better! Commented Nov 15, 2024 at 17:33
  • no standard files in the tex distributions use a .dic extension so it is hard to know what you are asking. You may be describing the format of \patterns which contains lists of numbers and letters. You can not use \patterns in a normal latex run though so it is normally simpler to correct any issues using \hyphenation where you simply list whole words and their hyphenation points Commented Nov 15, 2024 at 17:43
  • Thanks! It's a hyphenation file from LibreOffice. It is commonly added to ereaders to have a better hyphenation. I added the first sections from an equivalent file for english, it's a list of "1!1" "1,1" etc, separated with a "NEXT LEVEL" line. I wanted to understand what that first section does with the punctuation marks and why removing the first one solved some horrible hyphenation like Isabe-l". (name followed in the text by inverted commas and dot). Also, the l'2 rule in the file should prevent hyphenation after the apostrophe but didn't work in an instance (l'-hai instead of l'hai) Commented Nov 15, 2024 at 17:46
  • Also, I was told that the .dic hyphenation file actually uses TeX patterns. I never used them so I wanted to know how they work in this simple instance Commented Nov 15, 2024 at 17:50
  • 1
    tex pattern files have no ! or NEXT LEVEL lines so any questions about that would need to be asked on a libre office forum Commented Nov 15, 2024 at 18:34

2 Answers 2

2

You show no example but the libre office .dic file is based on a Tex \patterns file, eg

https://extensions.libreoffice.org/en/extensions/show/swedish-hyphenation?Tags%5B%5D=undefined

includes a .dic file for swedish that starts

ISO8859-1 LEFTHYPHENMIN 1 RIGHTHYPHENMIN 2 .a2bak .2ab2a .a1b .a2bal .a2ban .a2bas .a2be .a2bi5e .a2bi1li .a4b4is .a2b3it .a2bl2 .a2bo .ab1ol .a2b5r 

the first three lines would not be valid for a TeX file but have a clear enough meaning. The first line specifies the file encoding (latin-1) the next two lines are equivalent to the TeX settings

\lefthyphenmin= 1 \righthyphenmin=2 

which would allow hyphenation after the first letter of the word (which would look very odd in English, but perhaps it's OK in Swedish, I do not know) and allows hyphenation with just two letters (but not one letter) being carried over.

The rest of the file are patterns where the runs of letters are matched against a word and odd numbers encourage hyphenation and even numbers discourage and . is a word boundary.

So .a2b5r matches abr at the start of a word and says a-br would be mildly bad hyphenation (2) and ab-r would be more strongly good hyphenation (5). Of all patterns that match a given word the highest number at each inter-letter position is chosen and controls if that position is a hyphenation point (even) or not (odd).

So to prevent l'-hai hyhenation you could increase LEFTHYPHENMIN to 3 so you needed at least three letters before the hyphenation, or you could make a pattern l'8h which would strongly discourage hyphenation between l' and h

I should say I have no knowledge of Libre Office, this is answering as if the file were a TeX patterns file apart from the first lines, as stated.

1

Required readings are Claudio Beccari: How to make a foreign language pattern file: Romanian. TUGboat 16:1 [March 1995], and Computer aided hyphenation for Italian and modern Latin. TUGboat 13.1 (1992) 23-33. The former paper might serve as an introduction for designing patterns for the Liang-Knuth algorithm for hyphenation, which seems to be at play in these "dic" files, while the latter is a description of hyphenation patterns for italian/neolatin.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.