Revisions to Is UTF-8 the final character encoding for all future time?

added 92 characters in body

Source Link

edited Jun 15, 2020 at 0:24

Bernhard Barker

965
6
13

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

We can't predict the future.
Nothing lasts forever, especially in software.

But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. ItUnicode also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it soon. Although experts have notoriously been wrong about similar things, likeBetween 2010 and 2020, less than 40k characters have been added. It will take about 240 years to use up the remaining ~1 million unallocated characters if we assume we keep going at the same rate. This is a lot faster than I recall correctlyimagined, how we'd never need more than a few kilo- or megabytes for computers (but that isn'tbut still quite a while away and assuming it will keep going at the same thing)rate is quite an assumption.

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future with little to no changes. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

We can't predict the future.
Nothing lasts forever, especially in software.

But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. It also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it. Although experts have notoriously been wrong about similar things, like, if I recall correctly, how we'd never need more than a few kilo- or megabytes for computers (but that isn't quite the same thing).

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future with little to no changes. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

We can't predict the future.
Nothing lasts forever, especially in software.

But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. Unicode also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it soon. Between 2010 and 2020, less than 40k characters have been added. It will take about 240 years to use up the remaining ~1 million unallocated characters if we assume we keep going at the same rate. This is a lot faster than I imagined, but still quite a while away and assuming it will keep going at the same rate is quite an assumption.

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future with little to no changes. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

added 26 characters in body

Source Link

edited Jun 14, 2020 at 14:17

Bernhard Barker

965
6
13

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

We can't predict the future.
Nothing lasts forever, especially in software.

But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. It also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it. Although experts have notoriously been wrong about similar things, like, if I recall correctly, how we'd never need more than a few kilo- or megabytes for computers (but that isn't quite the same thing).

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future with little to no changes. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

We can't predict the future.
Nothing lasts forever, especially in software.

But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. It also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it. Although experts have notoriously been wrong about similar things, like, if I recall correctly, how we'd never need more than a few kilo- or megabytes for computers (but that isn't quite the same thing).

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

We can't predict the future.
Nothing lasts forever, especially in software.

But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. It also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it. Although experts have notoriously been wrong about similar things, like, if I recall correctly, how we'd never need more than a few kilo- or megabytes for computers (but that isn't quite the same thing).

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future with little to no changes. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

Source Link

answered Jun 14, 2020 at 14:10

Bernhard Barker

965
6
13

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

We can't predict the future.
Nothing lasts forever, especially in software.

But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. It also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it. Although experts have notoriously been wrong about similar things, like, if I recall correctly, how we'd never need more than a few kilo- or megabytes for computers (but that isn't quite the same thing).

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

Stack Exchange Network

Return to Answer