Production's localization cache uses CDB database files in the k8s deployment images, which are very efficient to look up strings from but which are very slow to copy when anything in the database changes.
This can cause deployments to be delayed by about a half hour due to the time spent copying these large database files around. It _should_ be possible to arrange the data in a way that updates much more efficiently but is still fast enough to look up strings in MediaWiki with.
Notes (to be added to further):
- container diffs are _file-based_ not _line_- or _block_-based. any change in the file will be slow
- therefore it's worth investigating either multiple small files, fewer of which change at once, or a file per version which builds with the differential from the previous version
- which of these will perform better for generation? for update syncs? for string reads?
- do we continue to build on CDB? use JSON or PHP but in different arrangements? something else?
- check prior notes on T99740: Use static php array files for l10n cache at WMF (instead of CDB)
- there is a lot of string duplication, especially with the keys - consider some indirection mapping to reduce space
- consider compression of message payloads to further reduce space
- performance impact on decode?
- zlib? zlib with dictionary?
- brotli? (comes with a standard dictionary with common words in many languages)
Prep before session:
- do some exploratory hacking testing a few alternate layouts to see how they perform for lookups, and how well they isolate changes into per-version synced files
At the session:
- hash out prior notes and any new surprises anyone comes up with
- pick an experimental layouts and figure out what a production version would look like on the MediaWiki end and for the cache generation
- try implementing!
- ???
- non-profit!




