Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Monday, April 6, 2026

A workflow for Agentic Engineering

"- Claude, build the entire Platform from scratch. Make no mistakes."

In the agentic engineering community, some think that 100% test coverage will guarantee the quality of generated code. I think that relying too much on automated tests is problematic, because testing is difficult. I'd rather review the actual source code. More importantly, having an active part in the development of it by steering the agent(s) in real time.

I usually develop a feature step by step, to not focus too much on upfront planning. I used to do this before the agents joined the development process, and have found that it works well today too. When beginning with a new task, I often have some clear ideas and some vague ideas about how to solve the problem. By starting with the things that are clear, I can postpone thinking about the implementation details of the more vague parts of the overall feature. I can worry about that later, when I have learned more. During the development, the how and what to implement will likely change. This is a natural thing, as you learn more along the way and understand more about the actual problem to solve. I guess this is the basic idea behind Embrace Change (from the Agile Manifesto).

I haven't yet felt any need to set up larger Agent Orchestrations for the kind of problems that I solve. It mostly seems like overdoing it, doesn't it? I don't know about you, but we are not building or reinventing an entire E-Commerce or Social Media platform every day. The things we do is on a much smaller and human-friendly scale. I think the idea about 100% test coverage might be about scaling up fast and the assumption that agents will produce so much code, that it becomes impossible for humans to grasp.

An Example

I recently developed a new feature that involved several repos, a combination of Python backends and Next.js apps. I decided to do this in smaller steps and began with the most straightforward step, which in this case was one of the backend services that I already know well. I had a pretty clear idea on what needed to be changed in there and added a simple Solution Design into the ticket.

For context: I used the same ticket for all the development of this particular feature. Each plan an agent produced was submitted as a comment to the ticket. I have automated this with an MCP server connected to the issue system that we currently use at work. It seems like having the relevant data collected in the ticket made the upcoming tasks clearer for the agent(s). Each task begun with a new context, but I instructed the agent to read the ticket that contained the problem to solve, the solution design, previous plans and linked pull requests before planning how to implement things.

It felt like things went pretty smooth. However, the very first task needed several iterations. I realize now that even if the basic setup of that particular repo was straightforward, the service has evolved over time and the structure of it has diverged. I think this is quite common in long-lived repos. This confused the agent, so I practiced the stop-the-line principle by rejecting generated code, correcting and steering the agent in real time. I did this when I noticed that the implementation (the code) went into a direction I didn't like. My agentic tool of choice, Eca, recently added support for a new steer command that is very useful for this kind of workflow. You can change the direction, or steer, while the agents are working. It is not always necessary to halt all ongoing work, sometimes a friendly nudge in the right direction is enough.

Sometimes, unexpected things happen: the other day, the main model I use was unreachable about half the day (again). Oh no. But I just switched to a different provider and continued the work! This is another thing where an Open Source and Provider-agnostic tool like Eca shines.

Another thing that I noticed was that the short summary the agent produced for each Pull Request covered the whole feature and the particular details pretty well. The data in the actual ticket was probably helpful for this kind of task too.

I've been advocating Test Driven Development (TDD) and the sibling REPL Driven Development for many years. The real-time steering, the stop-the-line principle with origins from The Toyota Way could be an Agentic Engineering version of the Test Driven approach. It's about fast feedback loops. What do you think?

Top Photo: that's me pretending to do something important on the cell phone, taken at Åreskutan, Jämtland, Sweden.

Sunday, March 22, 2026

The tools of an Agentic Engineer

A lot of great things have origins from the 1970s: Hip Hop redefining music and street culture, Bruce Lee was taking Martial Arts to the next level and the initial development of something called editor macros (also known as Emacs) was happening. I was born in that decade, but that's purely coincidence.

My choice of primary development tool since a couple of years back is that editor from the seventies. It is my choice of development for Python, JavaScript, TypeScript and Lisp dialects such as Clojure and elisp. And today, as an agentic engineer, it turned out to be a great choice for this kind of software development too. With the rise of various CLI, TUI & Desktop based tools for AI development, it would be reasonable to think that this ancient code editor would become obsolete - right?

Not if you knew about the innovative Emacs community. It is driven by passion, support from the community itself and Open Source. These components are usually more resilient and reliable long term than the VC driven startup culture. Emacs is part of the greater Lisp community, where a lot of innovations in general take place. The Clojure community is cutting edge in many aspects of software development including AI.

More Agents

One thing that I have noticed lately is that the more I get into Agentic Engineering, the more I use Emacs. When the focus has shifted from typing code to instruct and review, I have found use of Emacs powers I haven't really needed until now. Tools like Magit (git) and I'm also learning more about the powerful Org Mode. I didn't care that much about Markdown before, but now it is an important part of the development itself. So I just configured my Emacs to have a nice-looking, simple and readable markdown experience.

"More Agentic Engineering, More Emacs"

With Emacs, I use a great AI-tool called Eca and with it I am not limited to any specific vendor for agentic development. Vendor lock-in is something I really want to avoid. The combination of Eca and the power tools mentioned before, makes a very nice Agentic Engineering toolset. Eca is actively developed and has a lot of useful features and a very nice developer experience. It supports standards like AGENTS.md, commands, skills, hooks, sub-agents and use a client-server setup in the same way as the language server protocol. It is Open Source and not only for Emacs. Have a look at the website for support of your favorite editor or IDE. By the way, Eca is developed in Lisp (Clojure).

I have my Eca-setup shared at GitHub, and have also some contributions to the Eca plugins repository.

Human Driven Development

With this setup, the human reviewing can happen in real time, and doesn't have to wait until the end where the amount of code too often is quite overwhelming. The human developer (that's me) can quickly act when noticing that things takes a different route than expected, in a similar way as the stop-the-line principle from the Toyota Way. This is a lean way to reach the end goal quickly: deploying code that is good enough for production and adds value.

I have found that many Agile practices in combination with developer friendly tools fits well with the ideas of Agentic Engineering. Even though I've seen worrying signs of a return of the Waterfall movement.

To summarize: the result of my new Agentic Engineering development-style is that I haven't put my IDE to the side - it's at the very Center of the agentic workflow.



Top Photo by me, taken at Åreskutan, Jämtland, Sweden.

Saturday, October 25, 2025

Please don't break things

"Does this need to be a breaking change?"

During the years as a Software Developer, I have been part of many teams making a lot of task force style efforts to upgrade third-party dependencies or tools. Far too often it is work that add zero value for us. It adds significant cost, though. As a user of third-party tools, you don't have much choice. Even if you might feel productive as a developer when implementing these forced changes, think of all the other stuff you could do instead to improve your product or platform.

The great tools from the community, most of it Open Source, is something we should be thankful for, and appreciate the efforts made by the people out there. This is a plea to have extra attention when making changes that will affect your users. I am also an Open Source maintainer, and try hard to avoid changes that doesn't add value for the users.

An example from Python: uv

warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead

The change itself makes a lot of sense. There's a new PEP standard for how to declare the dependencies only needed for development. Most Package & Dependency management tools out there already had their own implementation of this feature and it is probably a good thing to use the standard. From a user perspective, this only means that we need to make changes in all our Python projects. My suggestion is: why not support both options?

Maintainability vs Value

From a tooling developer perspective it is understandable that you don't want to maintain several ways of solving a problem. What about the Developer Experience of all the users of the tool out there? Imagine the teams maintaining many projects with 10, 40 or even 100 different Microservices and libraries. Each one in its own git repo.

Another Python example: Pydantic

The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0.

I like Pydantic, it is a very useful tool with great features. The 2.0 release also came with significant performance improvements. But this change doesn't make sense to me. I understand that it probably fits better within the domain of Pydantic itself.

Does this have to be a breaking change? I would suggest to have the both alternatives there. Yes, it might be a little bit more of maintenance for you as a library developer. More importantly: your users can focus on adding value into their products instead of this mostly zero-added-value work.

Example three: SQLAlchemy

MovedIn20Warning: The ``declarative_base()`` function is now available as sqlalchemy.orm.declarative_base(). (deprecated since: 2.0)

This is probably also correct, from an internal SQLAchemy domain perspective. From a user perspective, this only means that we need to change a lot of existing code. The cost of having the code overlapping in two namespaces is probably low.

I am also a maintainer of tools, and I've also made mistakes, or design choices that turned out to be not that great later on. But I also actively have made the decision to not force users to make the kind of changes that are described in this post. I don't want to break things for users of the tool because of design choices made in the past.

Should the users change their workflows?

The main thing I work on today is Python tooling for Polylith, that is a Monorepo architecture. The changes introduced by uv, Pydantic and SQLAlchemy actually isn't that much work for the developer teams using Polylith today. You'll have only one place in the source code where these changes are needed. This setup is robust, and is ready for any unexpected breaking changes in the tools that are used. Sounds nice, doesn't it?



Top Photo by generated by Dall-E, prompted and modified afterwards by me.

Sunday, April 20, 2025

Feedback loops in Python

How fast can we get useful feedback on the Python code we write?

This is a walk-through of some of the feedback loops we can get from our ways of working and developer tools. The goal of this post is to find a both Developer friendly and fast way to learn if the Python code we write actually does what we want it to.

What's a Feedback Loop?

From my point of view, feedback loops for software is about running code with optional input and verifying the output. Basically, run a Python function and investigate the result from that function.

Feedback Loop 1: Ship it to Production

Write some code & deploy it. When the code is up & running, you (or your customers) can verify if it works as expected or not. This is usually a very slow feedback loop cycle.

You might have some Continuous Integration (CI) already set up, with rules that should pass before the deployment. If your code doesn't pass the rules, the CI tool will let you know. As a feedback loop, it's slow. By slow, I mean that it takes a long time before you will know the result. Especially when there are setup & teardown steps happening in the CI process. As a guard, just before releasing code, CI with deployment rules is valuable and sometimes a life saver.

Commit, Push & Merge

Pull Requests: just before hitting the merge button, you will get a chance to review the code changes. This type of visual review is a manual feedback loop. It's good, because you often take a step back and reflect on the written code. Will the code do the thing right? Does the code do the right thing? One drawback is that you review all changes. For large Pull Requests, it can be overwhelming. From a feedback loop perspective, it's not that fast.

Testing and debugging

Obviously, this is a very common way for feedback on software. Either manual or automated. The manual is mostly a slower way to find out if the code does what expected or not, than an automated test. There's the integration-style automated tests, and the unit tests targeting the different parts. Integration-style tests often require mocking and more setup than unit tests. Both run fast, but the unit tests are more likely to be faster. You can have your development environment setup to automatically run the tests when something changes. Now we're getting close, this workflow can be fast.

I usually avoid the integration-type of tests, and rather write unit tests. I try to write small, focused and simple unit tests. The tests help me write small, focused and simple code too.

Test Driven Development

An even faster way to get feedback about the code is to write software in a test driven way (TDD): write a test that initially fails, write some code to make the test pass, refactor the test and refactor the code. For me, this workflow usually means jumping back-and-forth between the test and the code. Like a Ping Pong game.

TDD Deluxe

I'm not that strict about the TDD workflow. I don't always type the first lines of code in a test, or sometimes the test is halfway done when I begin to implement some of the code that should make the test pass. That's not pure TDD, I am aware. A few years ago, I found a new workflow that fits my sloppy approach very well. It's a thing called RDD (REPL Driven Development).

With RDD, you interactively write code. What does that even mean? For me, it's about writing small portions of code and evaluate it (i.e. run it) in the code editor. This gives me almost instant feedback on the code I just wrote. It's like the Ping Pong game with TDD, but even faster. Often, I also write inline code that later on evolves into a unit test. Adding some test data, evaluating a function with that test data, grab the response and assert it. The line between the code and the test is initially blurry, becoming clearer along the way. Should I keep the scratch-like code I wrote to evaluate a function? If yes, I have a unit test already. If not, I delete the code.

Interactive Python for fast Feedback Loops

I have written about the basic flows of REPL Driven Development before:

REPL - the Read Eval Print Feedback Loop

When starting a REPL session from within a virtual environment, you will have access to all the app-specific code. You can incrementally add code to the REPL session by importing modules, adding variables and functions. You can also redefine variables and functions within the session.

With REPL Driven Development, you have a running shell within your code editor. You mostly use the REPL shell to evaluate the code, not for writing code. You write the code as usual in your code editor, with the REPL/shell running there in the background. IPython is an essential tool for RDD in Python. It's configurable to auto-reload changed submodules, so you don't have to restart your REPL. Otherwise, it would have been very annoying.

Even more Interactive Python feedback loops

We can take this setup even further: modifying and evaluating an externally running Python program from your code editor. You can change the behavior of the program, without any app restarts, and check the current state of the app from within your IDE. The Are we there yet? post describes the main idea with this kind of setup and how I’ve configured my favorite code editor for it.

Jupyter, the Kernel and IPython

You might have heard of or already use Jupyter notebooks. To simplify, there's two parts involved: a kernel and a client. The Kernel is the Python environment. The client is the actual notebook. This type of setup can be used with REPL Driven Development too, having the code editor as the client and feeding the kernel or inspecting the current state by evaluating code. For this, we need a Kernel specification, a running Kernel, and we need to connect to the running Kernel from the IDE.

Creating a kernel specification

You can do this in several ways, but I find it most straightforward to add ipykernel as a dev dependency to the project.

 # Add the dependency (example using Poetry) poetry add ipykernel --group dev # generate the kernel specification python -m ipykernel install --user --name=the-python-project-name 

The above commands will generate a kernel specification and is only run once. Now you have a ready-to-go kernel spec.

Start the Kernel
 jupyter kernel --kernel=the-python-project-name 

The above command will start a kernel, using the specification we have generated. Please note the output from the command, with instructions on how to connect to it. Use the kernel path from the output to connect your client.

The tooling support I have added is as of this writing for Emacs. Have a look at this recording for a 13-minute demo on how to use this setup for a Fast & Developer Friendly Python Feedback Loop.



Top Photo by Timothy Dykes on Unsplash

Sunday, March 23, 2025

Are we there yet?

Continuing with the work on tooling support for interactive & fun development with Python.

A while ago, I wrote an article about my current attempts to make development in Python more interactive, more "test" driven and more fun.

My North Star is the developer experience in Clojure, where you have everything at your fingertips. Evaluating expressions (i.e. code) is a common thing for Lisp languages in general. I've found that it is sometimes difficult to explain this thing to fellow developers with no experience from Clojure development. The REPL is in most languages an external thing you do in a terminal window, detached from the work you usually do in your IDE. But that's not the case with REPL Driven Development (RDD).

Along the way, I have learned how to write and evaluate Python code within the code editor by using already existing tools and how to configure them. Here's my first post and second post (with setup guides) about it. You'll find information about the basic idea with RDD and guides on how to setup your IDE or code editor for Python development.

Can it be improved?

This has been the way I have done Python development most of the time, described in the posts above. But I have always wanted to find ways to improve this workflow, such as actually see the evaluated result in an overlay right next to the code. Not too long ago, I developed support for it in Emacs and it has worked really well! You can read about it and see examples of it here.

What about AI?

While writing Python code, and doing the RDD workflow, there's often a manual step to add some test or example data to variables and function input parameters. Recently, I got the idea to automate it using an LLM. I wrote a simple code editor command that prompts an AI to generate random (but relevant) values and then populate the variables with those. Nowadays, when I want to test Python code, I can prepare it with example data in a simple way by using a key combination. And then evaluate the code as before.

Are we there yet?

One important thing with RDD, that I haven't been able to figure out until now, is how to modify and evaluate code from an actual running program. This is how things are done in Clojure, you write code and have the running app, service, software constantly being changed while you develop it. Without any restarts. It is modified while it is running. Tools like NRepl does this in the background, with a client-server kind of architecture. I haven't dig that deep into how NRepl works, but believe it is similar to LSP (Language Server Protocol).

The workflow of changing a running program is really cool and something I've only seen before as a Clojure developer. So far I have used IPython as the underlying tool for REPL Driven Development (as described in the posts above).

A solution: the Kernel

In Python, we have something similar to NRepl: Jupyter. Many developers use Notebooks for interactive programming, and that is closely related to what I am trying to achieve. With a standard REPL session, you can add, remove and modify the Python code that lives in the session. That's great. But a sesssion is not the same as an actual running program.

A cool thing with Jupyter is that you can start a Python Kernel that clients can connect to. A client can be a shell, a Notebook - or a Code Editor.

jupyter kernel

By running the jupyter kernel command, you have a running Python program and can interactively add things to it, such as initiating and starting up a Python backend REST service. Being able to connect to the Jupyter kernel is very useful. While connected to the kernel, you can add, remove, modify the Python code - and the kernel will keep on running. This means that you can modify and test your REST service, without any restarts. With this, we are doing truly interactive Python programming. You will get instant feedback on the code you write, by evaluating the code in your editor or when testing the endpoint from a browser or shell.

"Dude, ever heard about debugging?"

Yes, of course. Debugging is closly related to this workflow, but it is not the same thing. Debugging is usually a one way flow. A timeline: you run code, pause the execution with breakpoints where you can inspect things, and then continue until the request is finalized. The RDD workflow, with modifying and evaluating a running program, doesn't have a one-way timeline. It's timeless. And you don't add breakpoints.

REPL Driven tooling support

I have developed tooling support for this new thing with connecting to a Jupyter Kernel, and so far it works well! Python is different from languages like Clojure: namespaces are relative to where the actual module is placed in the folder structure of a repo. This means that those connected to a Kernel need to have the full namespace (i.e. the full python path) for the items to inspect. This is what I have added in the tooling, so I can keep the RDD flow with the nice overlays and all.

I am an Emacs user, and the tooling I've written is for Emacs. But it shouldn't be too difficult to add it to your favorite code editor. Have a look at the code. You might even learn some Lisp along the way.


UPDATE: I have recorded a very much improvised video (with sound) explaining what is happening and how you start things up.

Resources

Top Photo by David Vujic

Friday, February 7, 2025

FOSDEM 25

FOSDEM, a Conference different from any other I've been to before. I'm very happy for the opportunity to talk, share knowledge and chat with the fellow devs there. I also met old and new friends in Brussels, and learned a little bit more about the nice Belgian beer culture. 😀

The FOSDEM event is free and you don't even register to attend (only speakers do that in a call for speakers process). Just come by the University Campus, located not far from the center of the Inner City.

I travelled towards Belgium with the Night Train from Stockholm, Sweden, and arrived the next morning in Hamburg, Germany. From there, I took the DB ICE Train to Köln/Cologne. At some point, the top speed was about 250 km/h. That's fast! From there, another fast train to Brussels.

The variety of topics and the amount of tracks at FOSDEM was mind blowing, with thousands of participants. The overall vibe was very friendly & laid back. I joined Rust focused talks, JavaScript and UX/Design talks. And, of course, the Python room talks.

My talk was in the Python room and was about Python Monorepos and the Polylith Developer Experience. Everything Python related was on FOSDEM Day 2 and I think my presentation went really well! There was a lot of questions afterwards and I got great feedback from the people attending.

The next day I went back the same route as before. I got a couple extra of hours to spend in Hamburg before onboarding the Night Train back home to Stockholm. A great Weekend Trip!

Here's the recording of my talk from FOSDEM 25:

The video was downloaded from fosdem.org.
Licensed under the Creative Commons Attribution 2.0 Belgium Licence.


Resources

Friday, January 3, 2025

Better Python Developer Productivity with RDD

"REPL Driven Development is an interactive development experience with fast feedback loops”

I have written about REPL Driven Development (RDD) before, and I use it in my daily workflow. You will find setups for Python development in Emacs, VS Code and PyCharm in this post from 2022. That's the setup I have used since then.

But there's one particular thing that I missed ever since I begun with RDD in Python. I learned this interactive REPL kind of writing code when developing services and apps with Clojure. When you evaluate code in Clojure - such as function calls or vars - the result of the evaluation nicely pops up as an overlay in your code editor, right next to the actual code. This is great, because that's also where the eyes are and what the brain is currently focused on.

The setup from my previous post will output the evaluated result in a separate window (or buffer, as it is called in Emacs). Still within the code editor, but in a separate view. That works well and has improved my Python Developing Productivity a lot, but I don't like the context switching. Can this Python workflow be improved?

Can I improve this somehow?

I've had that question in the back of my mind for quite some time. My Elisp skills are unfortunately not that great, and I think that has been a blocker for me. I've managed to write my own Emacs config, but that's about it. Until now. During the Christmas Holidays I decided to try learning some Emacs Lisp, to be able to develop something that reminds of the great experience from Clojure development.

I used an LLM find out how to do this, and along the way learned more about what's in the language. I'm not a heavy LLM/GPT/AI user at all. In fact, I rarely use it. But here it made sense to me, and I have used it to learn how to write and understand code in this particular language and environment.

I have done rewrites and a lot of refactoring of the result on my own (my future self will thank me). The code is refactored from a few, quite large and nested Lisp functions into several smaller and logically separated ones. Using an LLM to just get stuff done and move on without learning would be depressing. The same goes for copy-pasting code from StackOverflow, without reflecting on what the code actually does. Don't do that.

Ok, back to the RDD improvements. I can now do this, with new feature that is added to my code editor:

Selecting a variable, and inspecting what it contains by evaluating and displaying the result in an overlay.
Selecting a function, execute it and evaluate the result.

The overlay content is syntax highlighted as Python, and will be rendered into several rows when it's a lot of data.

The actual code evaluation is still performed in the in-editor IPython shell. But the result is extracted from the shell output, formatted and rendered as an overlay. I've chosen to also truncate the result if it's too large. The full result will still be printed in the Python shell anyway.
The Emacs Lisp code does this in steps:

  1. Adding a hook for a specific command (it's the elpy-shell-send-buffer-or-region command). The Emacs shortcut is C-c C-c.
  2. Capture the contents of the Python shell.
  3. Create an overlay with the evaluated result, based on the current cursor position.
  4. Remove the overlay when the cursor moves.

This is very much adapted to my current configuration, and I guess the real world testing of this new addition will be done after the holidays, starting next week. So far, so good!

Future improvements?

I'm currently looking into the possibilities of starting an external IPython or Jupyter/kernel session, and how to connect to it from within the code editor. I think that could enable even more REPL Driven Development productivity improvements.

You'll find the Emacs Lisp code at GitHub, in this repo, where I store my current Emacs setup.

Top Photo by Nicolai Berntsen on Unsplash

Sunday, December 22, 2024

Introducing python-hiccup

"All you need is list, set and dict"

Write HTML with Python

Python Hiccup is a library for representing HTML using plain Python data structures. It's a Python implementation of the Hiccup syntax.

You create HTML with Python, using list or tuple to represent HTML elements, and dict to represent the element attributes. The work on this library started out as a fun coding challenge, and now evolving into something useful for Python Dev teams.

Basic syntax

The first item in the list is the element. The rest is attributes, inner text or children. You can define nested structures or siblings by adding lists (or tuples if you prefer).

["div", "Hello world!"]
Using the html.render function of the library, the output will be HTML as a string: <div>Hello world!</div>

Adding id and classes:["div#foo.bar", "Hello world!"]
The HTML equivalent is: <div id="foo" class="bar">Hello world!</div>

If you prefer, you can define the attributes using a Python dict as an alternative to the compact syntax above: {"id": "foo", "class": "bar"}

Writing a nested HTML structure, using Python Hiccup:
["div", ["span", ["strong", "Hello world!"]]]
The HTML equivalent is:
<div><span><strong>Hello world!</strong></span></div>

Example usage

Server side rendering with FastAPI

Using the example from the FastAPI docs , but without the inline HTML. Instead, using the more compact and programmatic approach with the Python-friendly hiccup syntax.

 from python_hiccup.html import render from fastapi import FastAPI from fastapi.responses import HTMLResponse app = FastAPI() def generate_html_response(): data = ["html", ["head", ["title", "Some HTML in here"]], ["body", ["h1", "Look ma! HTML!"]]] return HTMLResponse(content=render(data), status_code=200) @app.get("/items/", response_class=HTMLResponse) async def read_items(): return generate_html_response()

PyScript

Add python-hiccup as any other third-party dependency in the package.toml file: packages = ["python-hiccup"]

Write the HTML rendering in your PyScript files:

 from pyweb import pydom from python_hiccup.html import render pydom["div#main"].html = render(["h1", "Look ma! HTML!"]) 

That's it!

python-hiccup aims to make HTML rendering programmatic, simple and readable. I hope you will find it useful. The HTML in this blog post was written using python-hiccup.

Resources



Top photo by Susan Holt Simpson on Unsplash

Tuesday, August 27, 2024

Simple Kubernetes in Python Monorepos

"Kubernetes, also known as K8s, is an open source system for automating deployment, scaling, and management of containerized applications."
(from kubernetes.io)

Setting up Kubenetes for a set of Microservices can be overwhelming at first sight.

I'm currently learning about K8s and the Ecosystem of tooling around it. One thing that I've found difficult is the actual K8s configuration and how the different parts relate to each other. The YAML syntax is readable, but I also find it hard to understand how to structure it - impossible to edit without having the documentation close at hand. When I first got the opportunity to work with Python code running in Kubernetes, I realized that I have to put extra effort in understanding what's going on in there.

I was a bit overwhelmed by what looked like a lot of repetitive and duplicated configuration. I don't like that. But there's a tool called Kustomize that can solve this, by incrementally constructing configuration objects with snippets of YAML.

Kustomize is about managing Kubernetes objects in a declarative way, by transforming a basic setup with environment-specific transformations. You can replace, merge or add parts of the configuration that is specific for the current environment. It reminds me of how we write reusable Python code in general. The latest version of the Kubernetes CLI - Kubectl - already includes Kustomize. It used to be a separate install.

Microservices

From my experience, the most common way of developing Microservices is to isolate the source code of each service in a separate git repository. Sometimes, shared code is extracted into libraries and put in separate repos. This way of working comes with tradeoffs. With the source code spread out in several repositories, there's a risk of having duplicated source code. Did I mention I don't like duplicated code?

Over time, it is likely that the services will run different versions of tools and dependencies, potentially also different Python versions. From a maintainability and code quality perspective, this can be a challenge.

YAML Duplication

In addition to having the Python code spread out in many repos, a common setup for Kubernetes is to do the same thing: having the service-specific configuration in the same repo as the service source code. I think it makes a lot of sense to have the K8s configuration close to the source code. But with the K8s configuration in separate repos, the tradeoffs are very much the same as for the Python source code.

For the YAML part in specific, it is even likely that the configuration will be duplicated many times across the repos. A lot of boilerplate configuration. This can lead to unnecessary extra work when needing to update something that affects many Microservices.

One solution to the tradeoffs with the source code and the Kubernetes configuration is: Monorepos.

K8s configuration in a Monorepo

A Monorepo is a repository containing source code and multiple deployable artifacts (or projects), i.e. a place where you would have all your Python code and where you would build & package several Microservices from. The purpose of a Monorepo is to simplify code reuse, and to use the same developer tooling setup for all code.

The Polylith Architecture is designed for this kind of workflow (I am the maintainer of the Python tools for the Polylith Architecture).

While learning, struggling and trying out K8s, I wanted to find ways to improve the Configuration Experience by applying the good ideas from the Developer Experience of Polylith. The goal is to make K8s configuration simple and joyful!

Local Development

You can try things out locally with developer tools like Minikube. With Minikube, you will have a local Kubernetes to experiment with, test configurations and to run your containerized microservices. It is possible to dry-run the commands or apply the setup into a local cluster, by using the K8s CLI with the Kustomize configs.

I have added examples of a reusable K8s configuration in the The Python Polylith Example repo. This Monorepo contains different types of projects, such as APIs and event handlers.

The K8s configuration is structured in three sections:

  • A basic setup for all types of deployments, i.e. the config that is common for all services.
  • Service-type specific setup (API and event handler specific)
  • Project-specific setup

All of theses sections have overlays for different environments (such as development, staging and production).

As an alternative, the project-specific configuration could also be placed in the top /kubernets folder.

I can run kubectl apply -k to deploy a project into the Minikube cluster, using the Kustomize configuration. Each section adds things to the configuration that is specific for the actual environment, the service type and the project.

The base, overlays and services are the parts that aren't aware of the project. Those Project-specific things are defined in the project section.

Using a structure like this will make the Kubernetes configuration reusable and with almost no duplications. Changing any common configuration only needs to be done in one place, just as with the Python code - the bricks - in a Polylith Monorepo.

That’s all. I hope you will find the ideas and examples in this post useful.


Resources


Top photo by Justus Menke on Unsplash

Sunday, May 12, 2024

Pants & Polylith

But who is Luke and who is R2?
"Pants is a fast, scalable, user-friendly build system for codebases of all sizes"
"Polylith helps us build simple, maintainable, testable, and scalable backend systems"

Can we use both? I have tried that out, and here's my notes.

Why?

Because The Developer Experience.

Developer Experience is important, but what does that mean? For me, it is about keeping things simple. The ability to write, try out and reuse code without any context switching. By using one single setup for the REPL and the IDE, you will have everything at your fingertips.

The Polylith Architecture solves this by organizing code into smaller building blocks, or bricks, and separating code from the project-specific configurations. You have all the bricks and configs available in a Monorepo. For Python development, you create one single virtual environment for all your code and dependencies.

There is also tooling support for Polylith that is useful for visualizing the contents of the Monorepo, and for validating the setup. If you already are into Pantsbuild, the Polylith Architecture might be the missing Lego bricks you want to add for a great Developer Experience.

Powerful builds with Pants

Pantsbuild is a powerful build system. The pants tool resolves all the dependencies (by inspecting the source code itself), runs the tests and creates distributions in isolation. The tool also support the common Python tasks such as linting, type checking and formatting. It also has support for creating virtual environments.

Dude, where's my virtual environment?

In the Python Community, there is a convention to name the virtual environment in a certain way, usually .venv, and creating it at the Project root (this will also likely work well with the defaults of your IDE).

The virtual environment created by Pants is placed in a dists folder, and further in a Pants-specific folder structure. I found that the created virtual environment doesn't seem to include custom source paths (I guess that would be what Pants call roots).

Custom source paths is important for an IDE to locate the Python source code. Maybe there are built-in ways in Pantsbuild to solve that already? Package management tools like Poetry, Hatch and PDM have support for configuring custom source paths in the pyproject.toml and also creating virtual environments according to the Python Community conventions.

Note: If you are a PyCharm user, you can mark a folder as a source root manually and it will keep that information in a cache (probably a .pth file).

Example code and custom scripts

I have created an example repository, a monorepo using Pantsbuild and Polylith. You will find Python code and configurations according to the Polylith Architecture and the Pantsbuild configurations making it possible to use both tools. In the example repo I have added a script that adds source paths, based on the output from the pants roots command, to the virtual environment created by Pantsbuild. This is accomplished by adding a .pth file to the site_packages folder. For convenience, the script will also create a symlink to a .venv folder at the root of the repo.

Having the virtual environment properly setup, you can use the REPL (my favorite is IPython) with full access to the entire code base:

 source .venv/bin/activate ipython 

With an activated virtual environment, you can also use all of the Polylith commands:

 poly create poly info poly libs poly deps poly diff poly check poly sync 

Pants & Polylith

Pantsbuild has a different take on building and packaging artifacts compared to other tools I've used. It has support for several languages and setups. Some features overlap with what's available in the well-known tooling in the Python community, such as Poetry. Some parts diverge from the common conventions.

Polylith has a different take on sharing code, and also have some overlapping features. Polylith is a Monorepo Architecture, with tooling support for visualizing the Monorepo. From what I've learned so far, the checks and code inspection features are the things you will find in both Pants and Polylith.

Pants operate on a file level. Polylith on the bricks level.

My gut feeling after learning about it and by experimenting, is that Pantsbuild and Polylith shares the same basic vision of software development in general and I have found them working really well together. There are some things I would like to have been a better fit, such as when selecting contents of the Pants-specific BUILD files vs the content in the project-specific pyproject.toml files.

Maybe I should develop a Pants Polylith plugin to fix that part. 🤔
How does that sound to you?


Resources


Top Photo by Studbee on Unsplash

Saturday, April 13, 2024

Write Less Code, You Must

An aspect of Python Software Development that is often overlooked, is Architecture (or Design) at the namespace, modules & functions level. My thoughts on Software Development in general is that it is important to try hard writing code that is Simple, and Easy to move from one place to another.

When having code written like this, it becomes less important if a feature was added in Service X, but a better fit would be Service Y when looking at it from a high-level Architectural perspective. All you need to do is move the code to the proper place, and you're all good. However, this will require that the actual code is moveable: i.e. having the features logically separated into functions, modules and namespace packages.

Less Problems

There's a lot of different opinions about this, naturally. I've seen it in in several public Python forums, and been surprised about the reactions about Python with (too) few lines of code in it. How is it even possible having too little of code?

My take on this in general is Less code is Less Problems.

An example

 def my_something_function(): # Validation # if valid # else do something ... python code here # Checking # if this # elif that # elif not this or not that # else do_something ... python code here # Data transformation # for each thing in the things # do a network call and append to a list ... python code here # Yay, done return the_result 

This type of function - when all of those things are processed within the function body - is not very testable. A unit test would likely need a bunch of mocking, patching and additional boilerplate test data code. Especially when there are network calls involved.

My approach on refactoring the code above would be to first identify the different tasks within this controller type of function, and begin by extracting each task into separate functions. Ideally these would be pure functions, accepting input and returning output.

At first, I would put the functions within the same module, close to at hand. Quite quickly, the original function has become a whole lot more testable, because the extracted functions can now easily be patched (my preference is using pytest monkeypatch). This approach would be my interpretation of developing software towards a clean code ideal. There is no need for a Dependency Injection framework or any unnecessary complex OOP-style hierarchy to accomplish it.

In addition to testability, the Python code becomes runnable and REPL-friendly. You can now refactor, develop and test-run the individual functions in the REPL. This is a very fast workflow for a developer. Read more about REPL Driven Development in Python here.

With the features living in separate isolated functions, you will likely begin to identify patterns:

"- Hey, this part does this specific thing & could be put in that namespace"

When moving code into a namespace package, the functions become reusable. Other parts of the application - or, if you have a Monorepo containing several services - can now use one and the same source code. The same rows of code, located in a single place of the repo. You will likely structure the repo with many namespace packages, each one containing one or a couple of modules with functions that ideally do one thing. It kind of sounds like the Unix philosophy, doesn't it?

This is how I try to write code on a daily basis, at work and when developing Open Source things. I use tools like SonarCloud and CodeScene to help me keep going in this direction. I've written about that before. The Open source code that I focus on these days (Polylith) has 0% Code Duplications, 0% Code Smells and about a 9.96 long-term Quality Code Scoring. The 0.04 that is left has been an active decision by me and is because of endpoints having 5+ input arguments. It makes sense for me to keep it like that there, but not in functions within the app itself where an options object is a better choice.

This aspect of Software Development is, from my point of view, very important. Even more important than the common Microservices/Events/REST/CQRS debates when Architecture is the topic of discussion. This was my Saturday afternoon reflections, and I thank you for reading this post. ☀️

Top Photo by Remy Gieling on Unsplash

Sunday, February 18, 2024

Python Monorepo Visualization

What's in a code repository? Usually you'll find the source code, some configuration and the deployment infrastructure - basically the things needed to develop & deploy something. It might be a service, an app or a library. A Monorepo contains the same things, but for more than one artifact. In there, you will find the code, configurations and infrastructure for several services, apps or libraries.

The main use case for a Monorepo is to share code and configuration between the artifacts (let's call them projects).

These things have to be simple

Sharing code can be difficult. Repos can be out of date. A Monorepo can be overwhelming. With or without a Monorepo, the most common way of sharing code is to package them as libraries that the projects can add as external dependencies. But managing different versions and keeping the projects up-to-date could lead to unexpected and unwanted extra work. Some Monorepos solve this by using symlinks to share code, or custom scripts for copying things into the individual projects during deployment.

Doing that can be messy, I've seen it myself. I was once part of a team that migrated away from a horrible Monorepo, into several smaller single-repo microservices. The tradeoffs: source code spread out in repos with an almost identical structure. Almost is the key here. Also, code and config duplications.

These tradeoffs have a negative impact on the Developer Experience.

The Polylith Architecture has a different take on organizing and sharing code, with a nice developer experience. These things have to be simple. Writing code should be fun. Polylith is Open Source, by the way.

The most basic type of visualization

In a Polylith workspace, the source code lives in two folders named bases and components. The entry points are put in the bases folder, all other code in the components folder. At first look, this might seem very different from a mainstream Python single-project structure. But it isn't really that different. Many Python projects are using a src layout, or have a root folder with the same name as the app itself. At the top, there's probably an entry point named something like app.py or maybe main.py? In Polylith, that one would be put in the bases folder. The rest of the code would be placed in the components folder.

 components/ .../ auth db kafka logging reporting ... 

You are encouraged to keep the components folder simple, and rather put logically grouped modules (i.e. namespace packages) in separate components than nested structures. This will make code sharing more straightforward than having a folder structure with packages and sub-packages. It is also less risk of code duplication with this kind of structure, because the code isn't hidden in a complex folder structure. As a side effect, you will have a nice overview over the available features: the names of the folders will tell what they do and what's available for reuse. A folder view like this is surprisingly useful.

Visualize with the poly tool

Yes, looking at a folder structure is useful, but you would need to navigate the actual source code to figure out where it is used and which dependencies that are used in there. Along with the Polylith Architecture there is tooling support. For Python, you can use the tooling together with Poetry, Hatch, PDM or Rye.

The poly info command, an overview of code and projects in the Monorepo.

Besides providing commands for creating bases, components and projects there are useful visualization features. The most straightforward visualization is probably poly info. Here, you will get an overview of all the bricks (the logically grouped Python modules, living in the bases and components folders), the different projects in the Workspace and also in which projects the bricks are added.

Third-party libraries & usages

There's a command called poly libs that will display the third-party dependencies that are used in the Workspace (yes, that's what the contents of the Monorepo is called in Polylith). It will display libraries and the usages on a brick-level. In Polylith, a brick is the thing that you share across projects. Bricks are the building blocks of this architecture.

The poly libs command, displaying the third-party dependencies and where they are used.

The building blocks and how they depend on each other

A new thing in the Python tooling is the command called poly deps. It displays the bricks and how they depend on each other. You can choose to display an overview of the entire Workspace, or for an individual project. This kind of view can be helpful when reasoning about code and how to combine bricks into features. Or inspire a team to simplify things and refactor: should we extract code from this brick into a new one here maybe?

A closer look at the bricks used in a project with poly deps.

You can inspect a single brick to visualize the dependencies: where it is used, and what other bricks it uses.

A zoomed-in view, to inspect the usages of a specific brick.

Export the visualizations

The output from these commands is very easy to copy-and-paste into Documentation, a Pull Request or even Slack messages.

poly deps | pbcopy

📚 Docs, examples and videos

Have a look at the the Polylith documentation for more information about getting started. You will also find examples, articles and videos there for a quick start.



Top image made with AI (DALL-E) and manual additions by a Human (me)

Thursday, January 25, 2024

Simple & Developer-friendly Python Monorepos

🎉 Announcing new features 🎉

Polylith is about keeping Monorepos Simple & Developer-friendly. Today, the Python tools for the Polylith Architecture has support for Poetry, Hatch and PDM - three popular Packaging & Dependency Management tools in the Python community.

In addition to the already existing Poetry plugin that adds tooling support for Polylith, there is also a brand new command line tool available. The CLI has support for both Hatch and PDM. You can also use it for Poetry setups (but the recommended way is to use the dedicated Poetry plugin as before).

To summarize: it is now possible to use the Simple & Developer-friendly Monorepo Architecture of Polylith with many different kinds of Python projects out there.

"Hatch is a modern, extensible Python project manager."
From the Hatch website

🐍 Hatch

To make the tooling fully support Hatch, there is a Hatch build hook plugin to use - hatch-polylith-bricks - that will make Hatch aware of a Polylith Workspace. Hatch has a nice and well thought-through Plugin system. Just add the hook to the build configuration. Nice and simple! The Polylith tool will add the config for you when creating new projects:

 [build-system] requires = ["hatchling", "hatch-polylith-bricks"] build-backend = "hatchling.build" 
"PDM, as described, is a modern Python package and dependency manager supporting the latest PEP standards. But it is more than a package manager. It boosts your development workflow in various aspects."
From the PDM website

🐍 PDM

Just as with Hatch, there are build hooks available to make PDM aware of the Polylith Workspace. Writing hooks for PDM was really simple and I really like the way it is developed. Great job, PDM developers! There is a workspace build hook - pdm-polylith-workspace - and a projects build hook - pdm-polylith-bricks - to make PDM and the Polylith tooling work well together.

This is added to the workspace build-system section pyproject.toml:

 [build-system] requires = ["pdm-backend", "pdm-polylith-workspace"] build-backend = "pdm.backend" 

And the plugin for projects.
This will be added by the poly create project command for you.

 [build-system] requires = ["pdm-backend", "pdm-polylith-bricks”] build-backend = "pdm.backend" 
"Python packaging and dependency management made easy."
From the Poetry website

🐍 Poetry

For Poetry, just as before, add or update these two plugins and you're ready to go!

 poetry self add poetry-multiproject-plugin poetry self add poetry-polylith-plugin 

📚 Docs, examples and videos

Have a look at the the Polylith documentation for more information about getting started. You will also find examples, articles and videos there for a quick start. I'm really excited about the new capabilities of the tooling and hope it will be useful for Teams in the Python Community!



Top photo made with AI (DALL-E) and manual additions by a Human (me)

Thursday, August 3, 2023

Kafka messaging with Python & Polylith

This article isn't about Franz Kafka or his great novella The Metamorphosis, where the main character one day realizes that he has transformed into a Human-size Bug.

It is about a different kind of Kafka: Apache Kafka, with examples on how to get started producing & consuming messages with Python. All this in a Polylith Monorepo (hopefully without any of the bugs from that Franz Kafka novella).

This article can be seen as Part III of a series of posts about Python & Polylith. Previous ones are:

  1. GCP Cloud Functions with Python and Polylith
  2. Python FastAPI Microservices with Polylith

If you haven't heard about Polylith before, it's about Developer Experience, sharing code and keeping things simple. You will have all your Python code in a Monorepo, and develop things without the common Microservice trade offs. Have a look at the docs: Python tools for the Polylith Architecture

Edit: don't know what Kafka is? Have a look at the official Apache Kafka quickstart.

I will use the confluent-kafka library and have read up on the Confluent Getting Started Guide about writing message Producers & Consumers.

The Polylith Architecture encourages you to build features step-by-step, and you can choose from where to begin. I have an idea about producing events with Kafka when items have been stored or updated in a database, but how to actually solve it is a bit vague at the moment. What I do know is that I need a function that will produce a message based on input. So I'll begin there.

All code in a Polylith Workspace is referred to as bricks (just like when building with LEGO). I'll go ahead and create a Kafka brick. I am going to use the Python tooling for Polylith to create the brick.

Note: I already have a Workspace prepared, have a look at the docs for how to set up a Polylith Workspace. Full example at: python-polylith-example

The poly tool has created a kafka Python package, and placed it in the components folder. It lives in a top namespace that is used for all the bricks in this Polylith Workspace. I have set it to example here, but you would probably want an organizational name or similar as your top namespace.

 bases/ components/ example/ kafka/ __init__.py core.py 

There's two types of bricks in Polylith: components and bases. A component is where you write the actual implementation of something. A base is the entry point of an app or service, such as the entry point(s) of a FastAPI microservice or the main function of a CLI. In short: a base is a thin layer between the outside world and the components (containing the features). I will develop the base for my new Kafka feature in a moment.

A Producer and a Consumer

For this example kafka component, I will use code from the Confluent Python guide (with a little bit of refactoring).

 def produce(topic: str, key: str, value: str): producer = get_producer() producer.produce(topic, value, key, callback=_acked) producer.poll(10000) producer.flush() 

Full example at: python-polylith-example

I'll go ahead and write a message consumer while I'm at it, and decide to also put the Consumer within the kafka component.

 def consume(topic: str, callback: Callable): consumer = get_consumer() consumer.subscribe([topic]) try: while True: msg = consumer.poll(1.0) if msg is None: continue if msg.error(): logger.error(msg.error()) else: topic, key, value = parse(msg) callback(topic, key, value) except KeyboardInterrupt: pass finally: consumer.close() 

The kafka component now looks like this after some additional coding & refactoring:

 kafka/ __init__.py consumer.py core.py parser.py producer.py 

Running a Kafka server locally

I continue following along with the Confluent guide to run Kafka locally, and have added a Docker Compose file. I am storing that one in the development folder of the Polylith workspace.

 development/ kafka/ docker-compose.yml 

I can now try out the Producer and Consumer in a REPL, making sure messages are correctly sent & received without any Kafkaesque situations (👴 🥁).

Producing a message

I already have a messaging API in my python-polylith-example repo, with endpoints for creating, reading, updating and deleting data. This is the acual service that I want to extend with Kafka messaging abilities. The service is built with FastAPI and the endpoints are found in a base.

 base/ example/ message_api/ __init__.py core.py 
 @app.post("/message/", response_model=int) def create(content: str = Body()): return message.create(content) 

I'll continue the development, by adding the newly created kafka component. While developing, I realize that I need to transform the data into simple data structures - and remember that I already have a component that can be used here. This is where Polylith really shines: developing these kind of smaller bricks makes it easy to re-use them in other places - just by importing them.

Consuming messages

I have the kafka component with a consumer in it, and now is the time when I create a new base: the entry point for my kafka consumer.

This is the code I add to the base. Note the re-use of another already existing Polylith component (the log).

 from example import kafka, log logger = log.get_logger("Consumer-app-logger") def parse_message(topic: str, key: str, value: str): logger.info(f"Consumed message with topic={topic}, key={key}, value={value}") def main(): topic = "message" kafka.consumer.consume(topic, parse_message) 

Adding a project

I now have all code needed for this feature. What is left is to add the infrastructure for it, the actual deployable artifact. This command will create a new entry in the projects folder.

 projects/ consumer_project/ pyproject.toml 

I'm adding the dependencies and needed packages to the project-specific pyproject.toml file. But I am lazy, and will only add the base in the packages section - and then run poetry poly sync. It will add all needed bricks for this project. The poly tool has some magic in it, yes.

When deploying, just use the build-project command to package it properly without any relative paths, and use the built wheel to deploy it where you want it running. That's all!

In this article, I have written about the usage of Polylith when developing features and services, and Kafka messaging in specific. Adding new features & re-using existing code is a simple thing when working in a Polylith workspace. Don't hesitate to reach out if you have feedback or questions.

Additional info

Full example at: python-polylith-example
Docs about Polylith: Python tools for the Polylith Architecture

Top photo by Thomas Peham on Unsplash