fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_5601-1989 by ChristophWurst · Pull Request #12654 · nextcloud/mail

ChristophWurst · 2026-03-25T07:57:59Z

Trick borrowed from https://github.com/bytestream/Util/blob/45a9c3c14e70eec9df5d117a2d4941b097872672/lib/Horde/String.php#L847-L867.

Tested with the email from the ticket - threw ValueException before, is converted now ✔️

AI-assisted: OpenCode (Claude Haiku 4.5)

AI-assisted: OpenCode (Claude Haiku 4.5) Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

ChristophWurst · 2026-03-25T07:58:07Z

/backport to stable5.7

Copilot

Pull request overview

This PR addresses Nextcloud Mail issue #11416 by improving IMAP charset handling for Outlook messages that label Korean content using non-mbstring charset names (ks_c_5601-1987 / ks_c_5601-1989), preventing conversion failures during message processing.

Changes:

Add a charset normalization step mapping ks_c_5601-1987 / ks_c_5601-1989 to UHC (CP949/Windows-949).
Apply the normalized charset during mb_convert_encoding() / iconv() conversion in the IMAP charset converter.
Add unit test coverage for both Outlook Korean charset labels.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`lib/IMAP/Charset/Converter.php`	Adds charset normalization and uses it during conversion to avoid failures on Outlook Korean charset aliases.
`tests/Unit/IMAP/Charset/ConverterTest.php`	Adds unit tests validating conversion of `ks_c_5601-1987` and `ks_c_5601-1989` content to UTF-8.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/IMAP/Charset/Converter.php

kesselb · 2026-03-25T14:57:15Z

Thanks for taking over; looks good!

When I looked at it a while back, I'd bumped into some blockers and didn't find the time to continue.

mb_list_encodings is "incomplete". To get a full list, one needs to also run mb_encoding_aliases for each encoding from the list.

The list also contains "BASE64", "UUENCODE", "HTML-ENTITIES", and "Quoted-Printable". Request the aliases for those; does trigger a deprecation warning 🙈

What I had in mind was similar to what also Josh started with #12221.

https://github.com/zbateson/mb-wrapper/blob/a282176d795b81aac38cadb31c9b29315a94011b/src/MbWrapper.php#L37

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/IMAP/Charset/Converter.php

tests/Unit/IMAP/Charset/ConverterTest.php

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T17:58:49Z

lib/IMAP/Charset/Converter.php

+if (in_array($normalizedCharset, mb_list_encodings(), true)) {
+$converted = mb_convert_encoding($data, 'UTF-8', $normalizedCharset);
 } else {
-$converted = iconv($charset, 'UTF-8', $data);
+$converted = iconv($normalizedCharset, 'UTF-8', $data);
 }


iconv() can throw a ValueError for unknown/unsupported encodings, and this call isn’t guarded. A bad charset token can still make mailbox loading fail hard. Consider wrapping the conversion attempt in try/catch (\ValueError) and falling back to the auto-detection path (or throwing ServiceException) instead of letting the error bubble up.

Copilot · 2026-03-25T17:58:49Z

lib/IMAP/Charset/Converter.php


-$converted = @mb_convert_encoding($data, 'UTF-8', $charset);
+$normalizedCharset = $charset !== null ? $this->normalizeCharset($charset) : null;
+$converted = @mb_convert_encoding($data, 'UTF-8', $normalizedCharset);


mb_convert_encoding() may throw ValueError (not return false) when the provided charset name is invalid. Since the code only checks for false, this path can still fatally error and bypass the intended iconv/ServiceException fallback. Please catch \ValueError around this call and handle it like a failed conversion.

Suggested change

$converted = @mb_convert_encoding($data, 'UTF-8', $normalizedCharset);

try {

$converted = @mb_convert_encoding($data, 'UTF-8', $normalizedCharset);

} catch (\ValueError $e) {

$converted = false;

}

Copilot · 2026-03-25T17:58:50Z

tests/Unit/IMAP/Charset/ConverterTest.php

+$koreanKsc56011987MimePart->setCharset('ks_c_5601-1987');
+$koreanText = '안녕하세요';
+$koreanKsc56011987MimePart->setContents(mb_convert_encoding($koreanText, 'UHC', 'UTF-8'));
+// Korean (Outlook) - ks_c_5601-1989 is also mapped to UHC


The Korean test fixture is generated with mb_convert_encoding(..., 'UHC', ...). If the mbstring build in CI/runtime doesn’t include the UHC encoding, this will fail before Converter::convert() is exercised (even though the converter could still work via iconv). Consider generating the bytes with iconv instead, or conditionally skipping when the encoding isn’t supported, to keep the unit test environment-independent.

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T07:24:36Z

lib/IMAP/Charset/Converter.php

+}
+} catch (ValueError) {
+// Invalid charset name, fall through to auto-detection
+$converted = null;
 }


After catching ValueError for an invalid charset name, the code comment says it should “fall through to auto-detection”, but later $normalizedCharset is recomputed from the original (still-invalid) $charset and passed to mb_convert_encoding()/iconv() again. This prevents the intended autodetect fallback and guarantees a ServiceException for invalid charset tokens even when mbstring could detect/convert the bytes. Consider treating an invalid/empty charset as null (or setting a flag) so the later conversion uses autodetection instead of retrying the invalid name.

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T07:35:27Z

tests/Unit/IMAP/Charset/ConverterTest.php

 $iso2022jpMimePart_noCharset = new Horde_Mime_Part();
 $iso2022jpMimePart_noCharset->setContents('外せ園査リツハワ題');


$iso2022jpMimePart_noCharset is created but never used in the returned data provider cases, which makes the fixture setup misleading and harder to maintain. Either add an assertion case for it or remove the unused mime part setup.

Suggested change

$iso2022jpMimePart_noCharset = new Horde_Mime_Part();

$iso2022jpMimePart_noCharset->setContents('外せ園査リツハワ題');

Copilot · 2026-03-26T07:35:27Z

tests/Unit/IMAP/Charset/ConverterTest.php

+public function testConvertWithInvalidCharsetThrowsServiceException(): void {
+$mimePart = $this->createMock(Horde_Mime_Part::class);
+// Use content that is NOT valid UTF-8 (raw ISO-8859-1 bytes)
+$mimePart->method('getContents')
+->willReturn(mb_convert_encoding('Tëst with spëcial chärs', 'ISO-8859-1', 'UTF-8'));
+$mimePart->method('getCharset')
+->willReturn('INVALID-CHARSET-NAME-12345');
+
+$this->expectException(\OCA\Mail\Exception\ServiceException::class);
+
+$this->converter->convert($mimePart);


testConvertWithInvalidCharsetThrowsServiceException assumes conversion must fail after an invalid charset name, but Converter::convert() deliberately falls back to mbstring auto-detection when it catches the ValueError. Depending on mbstring.detect_order, the ISO-8859-1 payload here may be successfully auto-detected/converted, making this test environment-dependent/flaky. Consider pinning mb_detect_order() within the test (and restoring it after), or assert a behavior that is deterministic (e.g., that no ValueError bubbles up), rather than requiring a ServiceException.

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T08:06:05Z

lib/IMAP/Charset/Converter.php

 }

-$converted = @mb_convert_encoding($data, 'UTF-8', $charset);
+$normalizedCharset = $charset !== null ? $this->normalizeCharset($charset) : null;


$detectedCharset is computed above, but the conversion path below ignores it. When $charset is null/invalid, $normalizedCharset becomes null and mb_convert_encoding() is called without a concrete source encoding, so the earlier detection has no effect beyond the UTF-8 fast-path. Consider using $detectedCharset (when it’s not false) as the from-encoding for mb_convert_encoding/iconv, and only falling back/throwing when detection fails.

Suggested change

$normalizedCharset = $charset !== null ? $this->normalizeCharset($charset) : null;

// Prefer the explicit charset, but fall back to the detected charset if available

if ($charset !== null) {

$normalizedCharset = $this->normalizeCharset($charset);

} elseif ($detectedCharset !== false) {

$normalizedCharset = $this->normalizeCharset($detectedCharset);

} else {

$normalizedCharset = null;

}

Copilot · 2026-03-26T08:06:06Z

tests/Unit/IMAP/Charset/ConverterTest.php

+ * This tests the code path where $charset is null and we reach the
+ * fallback mb_convert_encoding() call. With null, mbstring should
+ * auto-detect the encoding rather than receiving an empty string
+ * (which would cause a ValueError).


This docblock claims that passing null as the source encoding triggers mbstring auto-detection, but Converter currently doesn’t use $detectedCharset for conversion (it only uses it to short-circuit UTF-8). Either adjust the implementation to actually convert from the detected charset, or update the comment to match the real behavior so the test doesn’t document an incorrect contract.

Copilot · 2026-03-26T08:06:06Z

tests/Unit/IMAP/Charset/ConverterTest.php

+// Should complete without ValueError and return valid UTF-8
+// (auto-detection may not produce perfect results, but should not throw)
+$result = $this->converter->convert($mimePart);
+
+$this->assertTrue(mb_check_encoding($result, 'UTF-8'));
+$this->assertNotEmpty($result);


testConvertWithNullCharsetFallback() only asserts “valid UTF-8” and “not empty”, which can pass even if the text is garbled (e.g., replacement characters). To ensure the fallback path actually converts correctly, assert the expected decoded string (like 'Tëst') or compare against a known-good conversion based on the detected encoding.

Suggested change

// Should complete without ValueError and return valid UTF-8

// (auto-detection may not produce perfect results, but should not throw)

$result = $this->converter->convert($mimePart);

$this->assertTrue(mb_check_encoding($result, 'UTF-8'));

$this->assertNotEmpty($result);

// Should complete without ValueError and return correctly decoded UTF-8

$result = $this->converter->convert($mimePart);

// Ensure that the fallback auto-detection actually yields the expected text

$this->assertSame('Tëst', $result);

$this->assertTrue(mb_check_encoding($result, 'UTF-8'));

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_5601-1989

1b7648d

AI-assisted: OpenCode (Claude Haiku 4.5) Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

ChristophWurst self-assigned this Mar 25, 2026

ChristophWurst added the bug label Mar 25, 2026

ChristophWurst requested review from GretaD and kesselb as code owners March 25, 2026 07:57

ChristophWurst added the 3. to review label Mar 25, 2026

backportbot bot added the backport-request label Mar 25, 2026

This was referenced Mar 25, 2026

Mail with an outlook "bookwithme" link and long x-microsoft-antispam-message-info makes inbox inaccessible #11416

Open

ValueError in Converter.php: mb_convert_encoding() contains invalid encoding "ks_c_5601-1987" preventing inbox from loading #12639

Closed

kesselb requested a review from Copilot March 25, 2026 14:42

Copilot started reviewing on behalf of kesselb March 25, 2026 14:43 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

lib/IMAP/Charset/Converter.php Outdated Show resolved Hide resolved

lib/IMAP/Charset/Converter.php Outdated Show resolved Hide resolved

kesselb approved these changes Mar 25, 2026

View reviewed changes

ChristophWurst added 2 commits March 25, 2026 17:47

fixup! fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_…

f203cdf

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

fixup! fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_…

03d0f31

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

ChristophWurst requested a review from Copilot March 25, 2026 17:43

Copilot started reviewing on behalf of ChristophWurst March 25, 2026 17:44 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

lib/IMAP/Charset/Converter.php Outdated Show resolved Hide resolved

lib/IMAP/Charset/Converter.php Outdated Show resolved Hide resolved

tests/Unit/IMAP/Charset/ConverterTest.php Outdated Show resolved Hide resolved

fixup! fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_…

e9c14b8

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

ChristophWurst requested a review from Copilot March 25, 2026 17:54

Copilot started reviewing on behalf of ChristophWurst March 25, 2026 17:54 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

fixup! fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_…

432207f

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

ChristophWurst requested a review from Copilot March 26, 2026 07:19

Copilot started reviewing on behalf of ChristophWurst March 26, 2026 07:20 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

fixup! fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_…

d6b9aa9

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

ChristophWurst requested a review from Copilot March 26, 2026 07:31

Copilot started reviewing on behalf of ChristophWurst March 26, 2026 07:31 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

fixup! fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_…

a4d64f9

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

ChristophWurst requested a review from Copilot March 26, 2026 08:01

Copilot started reviewing on behalf of ChristophWurst March 26, 2026 08:02 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

fixup! fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_…

b758b49

…5601-1989 Signed-off-by: Christoph Wurst <1374172+ChristophWurst@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_5601-1989#12654

fix(imap): handle korean Windows encoding ks_c_5601-1987/ks_c_5601-1989#12654
ChristophWurst wants to merge 8 commits intomainfrom
fix/imap/korean-windows-encoding

ChristophWurst commented Mar 25, 2026

ChristophWurst commented Mar 25, 2026

Copilot AI left a comment

Uh oh!

Uh oh!

kesselb commented Mar 25, 2026

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Copilot AI Mar 25, 2026

Copilot AI Mar 25, 2026

Copilot AI Mar 25, 2026

Copilot AI left a comment

Copilot AI Mar 26, 2026

Copilot AI left a comment

Copilot AI Mar 26, 2026

Copilot AI Mar 26, 2026

Copilot AI left a comment

Copilot AI Mar 26, 2026

Copilot AI Mar 26, 2026

Copilot AI Mar 26, 2026

Labels

3 participants

		$iso2022jpMimePart_noCharset = new Horde_Mime_Part();
		$iso2022jpMimePart_noCharset->setContents('外せ園査リツハワ題');

-$normalizedCharset = $charset !== null ? $this->normalizeCharset($charset) : null;
+// Prefer the explicit charset, but fall back to the detected charset if available
+if ($charset !== null) {
+$normalizedCharset = $this->normalizeCharset($charset);
+} elseif ($detectedCharset !== false) {
+$normalizedCharset = $this->normalizeCharset($detectedCharset);
+} else {
+$normalizedCharset = null;
+}

Conversation

ChristophWurst commented Mar 25, 2026

ChristophWurst commented Mar 25, 2026

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

kesselb commented Mar 25, 2026

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Labels

3 participants