24

I have a python project where I'd like to use YAML (pyYaml 3.11), particularly because it is "pretty" and easy for users to edit in a text editor if and when necessary. My problem, though, is if I bring the YAML into a python application (as I will need to) and edit the contents (as I will need to) then writing the new document is typically not quite as pretty as what I started with.

The pyyaml documentation is pretty poor - does not even document the parameters to the dump function. I found http://dpinte.wordpress.com/2008/10/31/pyaml-dump-option/. However, I'm still missing the information I need. (I started to look at the source, but it doesn't seem the most inviting. If I don't get the solution here, then that's my only recourse.)

I start with a document that looks like this:

 - color green : inputs : - port thing : widget-hint : filename widget-help : Select a filename - port target_path : widget-hint : path value : 'thing' outputs: - port value: widget-hint : string text : | I'm lost and I'm found and I'm hungry like the wolf. 

After loading into python (yaml.safe_load( s )), I try a couple ways of dumping it out:

 >>> print yaml.dump( d3, default_flow_style=False, default_style='' ) - color green: inputs: - port thing: widget-help: Select a filename widget-hint: filename - port target_path: value: thing widget-hint: path outputs: - port value: widget-hint: string text: 'I''m lost and I''m found and I''m hungry like the wolf. '
 >>> print yaml.dump( d3, default_flow_style=False, default_style='|' ) - "color green": "inputs": - "port thing": "widget-help": |- Select a filename "widget-hint": |- filename - "port target_path": "value": |- thing "widget-hint": |- path "outputs": - "port value": "widget-hint": |- string "text": | I'm lost and I'm found and I'm hungry like the wolf. 

Ideally, I would like "short strings" to not use quotes, as in the first result. But I would like multi-line strings to be written as blocks, as with the second result. I guess fundamentally, I'm trying to minimize an explosion of unnecessary quotes in the file which I perceive would make it much more annoying to edit in a text editor.

Does anyone have any experience with this?

2 Answers 2

15

If you can use ruamel.yaml (disclaimer: I am the author of this enhanced version of PyYAML) you can round-trip the original format (YAML document stored in a file org.yaml):

import sys import ruamel.yaml from pathlib import Path file_org = Path('org.yaml') yaml = ruamel.yaml.YAML() yaml.preserve_quotes = True data = yaml.load(file_org) yaml.dump(data, sys.stdout) 

which gives:

- color green: inputs: - port thing: widget-hint: filename widget-help: Select a filename - port target_path: widget-hint: path value: 'thing' outputs: - port value: widget-hint: string text: | I'm lost and I'm found and I'm hungry like the wolf. 

Your input is inconsistently indented/formatted, and although there is for more control in ruamel.yaml over the output than in PyYAML, you cannot get your exact original back:

  • you sometimes (color green :) have a space before the value indicator (:) and sometimes you don't (outputs:). Apart from special control over root level keys, ruamel.yaml always puts the value indicator directly adjoint to the key.
  • your root level sequence is indented two columns with offset for the block sequence indicator (-) of zero (this is the default ruamel.yaml uses). Others are indented five with three offset. ruamel.yaml cannot format sequences individually/inconstently, I recommend using the default since your root collection is a sequence.
  • your mappings are sometimes indented three columns (value for key color green) sometimes two (e.g. value for key port target_path). Again ruamel.yaml cannot format these individually/inconstently
  • Your block style literal scalar is indented more than the standard two spaces if you don't append a block indentation indicator to the | indicator (e.g. using |4). So this extra indention will be lost

As you see setting yaml.preserv_quotes keeps the superfluous quotes around 'thing' as that is not what you want, it is not set in the rest of this examples.

The following "normalises" all three examples:

import sys import ruamel.yaml from pathlib import Path LT = ruamel.yaml.scalarstring.LiteralScalarString file_org = Path('org.yaml') file_plain = Path('plain.yaml') file_block = Path('block.yaml') def normalise(d): if isinstance(d, dict): for k, v in d.items(): d[k] = normalise(v) return d if isinstance(d, list): for idx, elem in enumerate(d): d[idx] = normalise(elem) return d if not isinstance(d, str): return d if '\n' in d: if isinstance(d, LT): return d # already a block style literal scalar return LT(d) return str(d) yaml = ruamel.yaml.YAML() for fn in [file_org, file_plain, file_block]: data = normalise(yaml.load(file_org)) yaml.dump(data, fn) assert file_org.read_bytes() == file_plain.read_bytes() assert file_org.read_bytes() == file_block.read_bytes() print(file_block.read_text()) 

which gives:

- color green: inputs: - port thing: widget-hint: filename widget-help: Select a filename - port target_path: widget-hint: path value: thing outputs: - port value: widget-hint: string text: | I'm lost and I'm found and I'm hungry like the wolf. 

So, as you indicated, you get block style literal scalars if a scalar has newlines, and no block style and no quotes if a scalar it doesn't have a newline.

Sign up to request clarification or add additional context in comments.

4 Comments

Is it easy to specify in ruamel.yaml that multi-line strings should be written as blocks (with |) and short string should not receive quotes, without already having a yaml file to invoke a round-trip on?
@oulenz That depends on your defiition of easy, you of course have to either have some rules (e.g. any string containing a newline should be a literal block scalar; any without spaces should be unquoted) if you don't want to use those by hand. Why don't you ask a question here on SO about that, if you are interested in getting that done?
I'm now using stackoverflow.com/a/33300001 . I had hoped you might have included an option for this since it seems to be what many people want. I'm insufficiently familiar with the various edge cases to appreciate that there is no clear-cut implementation of this.
@oulenz My answer there could use some updating.
12

Try the pyaml pretty printer. It gets closer, though it does put quotes around short strings with spaces in them:

>>> print pyaml.dump(d3) - 'color green': inputs: - 'port thing': widget-help: 'Select a filename' widget-hint: filename - 'port target_path': value: thing widget-hint: path outputs: - 'port value': widget-hint: string text: | I'm lost and I'm found and I'm hungry like the wolf. 

1 Comment

In case anyone else reads this comment and futilely tries to find the pretty_print option for yaml.dump()... this comment refers to the less-standard pyaml package (imported with import pyaml), as opposed to the more standard PyYAML (imported with import yaml).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.