63

I know there are a few questions about this on StackOverflow, but I couldn't find what I was looking for.

I'm using pyyaml to read (.load()) a .yml file, modify or add a key, and then write it (.dump()) again. The problem is that I want to keep the file format post-dump, but it changes.

For example, I edit the key en.test.index.few to say "Bye" instead of "Hello".

Python:

with open(path, "r", encoding = "utf-8") as yaml_file: self.dict = yaml.load(yaml_file) 

Then, after changing the key:

with open(path, "w", encoding = "utf-8") as yaml_file: dump = pyyaml.dump(self.dict, default_flow_style = False, allow_unicode = True, encoding = None) yaml_file.write( dump ) 

Yaml:

Before:

en: test: new: "Bye" index: few: "Hello" anothertest: "Something" 

After:

en: anothertest: Something test: index: few: Hello new: Bye 

Is there a way to keep the same format? For example the qoutes and order. Am I using the wrong tool for this?

I know maybe the original file it's not entirely correct, but I have no control over it (it's a Ruby-on-Rails i18n file).

Thank you very much.

6
  • 9
    yaml.dump has a default_style argument. Using default_style='"' will keep your string values in double quotes, but your keys and any other value types will also be wrapped in double quotes. Commented Dec 27, 2013 at 20:13
  • 1
    Thanks!, I'll keep it in mind, it would have been really useful if it wasn't for the keys :( Commented Dec 27, 2013 at 23:15
  • You'll probably have a hard time ordering the keys, too. yaml.load gives you a dict; its keys are unordered. yaml.dump probably outputs in whatever order the iteration goes. Commented Dec 28, 2013 at 1:57
  • 1
    The new file represents exactly the same information (in YAML) as the origin file; there is no reason to keep the same format. Commented Feb 19, 2014 at 12:37
  • @Evert that's true, but I wanted to keep the format because it's useful given the context of the sublime package I have created github.com/NicoSantangelo/sublime-text-i18n-rails Commented Feb 20, 2014 at 18:57

3 Answers 3

116

Below, ruamel.yaml is used instead.

ruamel.yaml is actively maintained. Unlike PyYAML, ruamel.yaml supports:

  • YAML <= 1.2. PyYAML only supports YAML <= 1.1. This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
  • Roundtrip preservation. When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():
    • PyYAML naively ignores all input formatting – including comments, ordering, quoting, and whitespace. Discarded like so much digital refuse into the nearest available bit bucket.
    • ruamel.yaml cleverly respects all input formatting. Everything. The whole stylistic enchilada. The entire literary shebang. All.

Library Migration

Switching from PyYAML to ruamel.yaml in existing applications is typically as simple as changing the library import to:

from ruamel import yaml 

This works because ruamel.yaml is a PyYAML fork that conforms to the PyYAML API.

No other changes should be needed. The yaml.load() and yaml.dump() functions should continue to behave as expected.

Roundtrip Preservation and What It Can Do for You

For backward compatibility with PyYaml, the yaml.load() and yaml.dump() functions do not perform roundtrip preservation by default. To do so, explicitly pass:

  • The optional Loader=ruamel.yaml.RoundTripLoader keyword parameter to yaml.load().
  • The optional Dumper=ruamel.yaml.RoundTripDumper keyword parameter to yaml.dump().

An example kindly "borrowed" from ruamel.yaml documentation:

import ruamel.yaml inp = """\ # example name: # Yet another Great Duke of Hell. He's not so bad, really. family: TheMighty given: Ashtaroth """ code = ruamel.yaml.load(inp, Loader=ruamel.yaml.RoundTripLoader) code['name']['given'] = 'Astarte' # Oh no you didn't. print(ruamel.yaml.dump(code, Dumper=ruamel.yaml.RoundTripDumper), end='') 

It is done. Comments, ordering, quoting, and whitespace will now be preserved intact.

Sign up to request clarification or add additional context in comments.

20 Comments

I must say this is a wonderful answer. I'm currently not developing the project that used PyYAML but I'll definitely give ruamel.yaml a try when I have some spare time and accept the answer if it works. Thanks!
@sjmh Starting with ruamel.yaml 0.11.12 you can specify preserve_quotes=True during loading, which will wrap the strings loaded with information needed for dumping. Also see this answer
PyYAML has new maintainers now, and had a v4.1 release recently. The answer is outdated and silly content such as "PyYAML is a fetid corpse rotting.." should probably be edited out.
3.x to 4.x is a major version number bump, so backwards incompat changes should be expected. I'm not denying PyYAML maintainership has problems and politics, but the language used in this answer is a bit excessive. It reads like an advertisement for ruamel.yaml or propaganda.
This answer should be corrected. It immediately starts with false statements. Maybe they where true at the time it was written, but it is not longer the case. PyYaml is not dead and the web site is up. At the moment of writing this comment, PyYaml appear quite alive and kicking. Look at the latest releases: - 2019-07-30: PyYAML 5.1.2 is released. - 2018-06-06: PyYAML 5.1.1 is released. - 2019-03-13: LibYAML 0.2.2 and PyYAML 5.1 are released. - 2018-07-05: PyYAML 3.13 is released. - 2018-06-24: LibYAML 0.2.1 is released. This answer is misleading.
|
9

In my case, I want " if value contains a { or a }, otherwise nothing. For example:

 en: key1: value is 1 key2: 'value is {1}' 

To perform that, copy function represent_str() from file representer.py in module PyYaml and use another style if string contains { or a }:

def represent_str(self, data): tag = None style = None # Add these two lines: if '{' in data or '}' in data: style = '"' try: data = unicode(data, 'ascii') tag = u'tag:yaml.org,2002:str' except UnicodeDecodeError: try: data = unicode(data, 'utf-8') tag = u'tag:yaml.org,2002:str' except UnicodeDecodeError: data = data.encode('base64') tag = u'tag:yaml.org,2002:binary' style = '|' return self.represent_scalar(tag, data, style=style) 

To use it in your code:

import yaml def represent_str(self, data): ... yaml.add_representer(str, represent_str) 

In this case, no diffences between keys and values and that's enough for me. If you want a different style for keys and values, perform the same thing with function represent_mapping

Comments

2

First

To represent dictionary data is used following code:

mapping = list(mapping.items()) try: mapping = sorted(mapping) except TypeError: pass 

It is why ordering is changed

Second

Information about how scalar type was presented (with double quote or not) is lost when reading (this is principal approach of library)

Summary

You can create own class based on 'Dumper' and to overload method 'represent_mapping' for changing behaviour how dictionary will be presented

For saving information about double quotes for scalar you must also create own class based on 'Loader', but i am afraid that it will affect and other classes and will doing it difficult

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.