In a recent post I opined on a past gitops failure and contemplated how showing diffs of the changes you're going to make to the running repo is one check that can help mitigate certain types of mistakes. The automation around a gitops repo makes it a natural target for automation in the space of fault and regression detection.

So I spent a little time on improving my workflow of just running kustomize build apps/myapp/overlay/primary | less in a separate terminal to look through the output, and fix any issues, then iterate.

Sometimes the LOC of the resources (especially if there are CRDs involved) when installing some app or service can get excessive to parse manually, and even some related groups or single resources, like a complex Pod configuration, can be difficult to parse for errors by eye.

One interesting tool I ran into during my research was dyff. It is a yaml diff tool that can give succinct and human-readable field and list specific difference reports. But a special feature (that's enabled by default) is its kubernetes detection: if it detects the documents it's diffing are k8s resources, it will pin its diffing engine on the document's GVK, namespace and name.

Similarly to argo's own reconciler, this gives dyff the ability to understand the difference between addition, removal, and change of specific kubernetes resource instances. With two sets of plain yaml documents to diff(i.e. no k8s identity), it has no way to tell if a document from the before set is the same as a document from the after set.

Diffing with Document Identity

Let's look at how diffing with knowledge of document identity can be helpful: this is a relatively simple-looking change to my local working copy of the gitops repo:

diff --git a/apps/argocd/overlays/primary/kustomization.yaml b/apps/argocd/overlays/primary/kustomization.yaml
index 934bf5c..32dc17f 100644
--- a/apps/argocd/overlays/primary/kustomization.yaml
+++ b/apps/argocd/overlays/primary/kustomization.yaml
@@ -3,6 +3,9 @@ resources:
 - ../../base
 - repo.yaml
+- ../../components/argonix
 - path: argocd-cm.patch.yaml
 - path: custom-tools.patch.yaml

The thing that's special about this change is that components are a primary composition tool in kustomize; a component is one of only a couple types of multi-resource generators in kustomize that have the power to also mutate other documents in the pipeline; to use a programming term: they're kustomize mixins.

If we run kustomize build apps/argocd/overlays/primary this will generate the final resources that would be injected into the cluster during reconciliation if I were to commit the change. If I were able to also run this build command against the master branch, I would then be able to dyff the two outputs and see a compact view of what's going to change from what's live.

Handwaving the details for the moment, this is what the dyff output looks like for that change:

(file level)
    apiVersion: v1
    plugin.yaml: |
    kind: ConfigManagementPlugin
      name: argonix-jobs
      version: v1.0
      # Make sure the reconciler is up-to-date
        command: [bash, -c]
        - >-
            nix --extra-experimental-features 'nix-command flakes' build
            --out-link /opt/reconciler github:drzzlio/argonix?dir=cmp#reconcile;
      # Run the argonix job reconciler
      # Must always and only return k8s resources to stdout
        command: [/opt/reconciler/bin/reconcile]
      # Run against repos with a flake in the root
        fileName: "./flake.nix"
        preserveFileMode: false
    kind: ConfigMap
    name: argonix-cmp-plugin-54m5h9fgkt
    namespace: argocd

spec.template.spec.containers  (Deployment/argocd/argocd-server)
  + one list entry added:
    - name: argonix-cmp-plugin
    image: nixos/nix
    - /var/run/argocd/argocd-cmp-server
    - name: var-files
    mountPath: /var/run/argocd
    - name: plugins
    mountPath: /home/argocd/cmp-server/plugins
    - name: argonix-cmp-plugin
    mountPath: /home/argocd/cmp-server/config/plugin.yaml
    subPath: plugin.yaml
    - name: cmp-tmp
    mountPath: /tmp

spec.template.spec.volumes  (Deployment/argocd/argocd-server)
  + two list entries added:
    - name: argonix-cmp-plugin
    name: argonix-cmp-plugin-54m5h9fgkt
    - name: argonix-cmp-tmp
    emptyDir: {}

We can see pretty simply from this dyff, without wading through all of the argocd install resources, that this component adds a new configmap and modifies the argocd-server pod to add a container and two volumes it references. Dyff is quite smart about how it compactly presents the changes in your yaml.

Stop, Impl Time

So how do we do this? Well, for my repo setup it's fairly simple. The primary driver about how simple this will be is how homogeneous your yaml build pipelines are. For this repo, I only expose kustomizations; kustomize can generate yaml from helm charts directly, so for upstream cases needing helm, kustomize is better at composing charts than helm is itself.

With that said, I'll share my code, but keep in mind that it may be more complex in your stack (i.e. if you have to detect which generation pipeline to run for an app). The cluster-level dyff function in particular will be difficult to implement if you don't have a way to easily discover all the roots that need to be generated for a cluster.

This is the implementation in my gitops project's flake.nix. As this is the scripts value for my devenv output, each of the '' enclosed multiline-strings is bash. The ${...} entities are nix variables and get expanded into the bash script strings (in the case of pkgs.* expansions, nix will also automatically install that tool).

          scripts = let
            newtree = ''
              set -e
              if git worktree list | grep gitopskdiffmaster &> /dev/null; then
                cd /tmp/gitopskdiffmaster
                git fetch
                git checkout origin/master &> /dev/null
                cd - > /dev/null
                git fetch
                git worktree add /tmp/gitopskdiffmaster origin/master &> /dev/null
          in {
            # Useful for diffing an application's generated resources after
            # local changes.
            # Takes a relative path, checkes out master in a temp folder, then
            # does `kustomize build` against the same relative path from master
            # and the current directory before dyffing the output.
            kdiff.exec = ''
              echo diffing `pwd`/$1 with master/$1
              ${pkgs.dyff}/bin/dyff between --ignore-order-changes --truecolor on --omit-header \
                <(kustomize build --enable-helm /tmp/gitopskdiffmaster/$1) \
                <(kustomize build --enable-helm `pwd`/$1)
            # Automated kdiff on file changes.
            # Takes two directories, watches the first for any yaml file
            # changes, calls `kdiff` on the second any time a change
            # is detected.
            kdiffwatch.exec = ''
              ${pkgs.watchexec}/bin/watchexec -e yaml -w $1 kdiff $2
            # Similar to kdiff, but for all the resources in cluster
            # Takes the name of a cluster, checks out master to a temp folder,
            # then generates and dyffs the resources for the cluster at master
            # and the cluster in the local directory.
            cdiff.exec = ''
              echo diffing `pwd`/clusters/$1 with master/clusters/$1
              ${pkgs.dyff}/bin/dyff between --ignore-order-changes --truecolor on --omit-header \
                <(kustomize build /tmp/gitopskdiffmaster/clusters/$1 | yq '.spec.source.path' -r | tr '\n' '\0' | xargs -0i -n 1 bash -c 'kustomize build --enable-helm /tmp/gitopskdiffmaster/{} 2>&1; echo "---"') \
                <(kustomize build clusters/$1 | yq '.spec.source.path' -r | tr '\n' '\0' | xargs -0i -n 1 bash -c 'kustomize build --enable-helm {} 2>&1; echo "---"')

At a high level, we're using git worktree to check out the origin/master commit in a temp directory. We can then run the build against both the local repo files and also against what's in master before running dyff against the two outputs.

The kdiff script is meant to render and diff just a specific kustomization(usually an overlay), useful to run when iterating on a particular app's configuration.

The kdiffwatch script adds watchexec to the party to automatically run a kdiff any time some yaml files get changed.

The cdiff script is a bit different. As my project is using argo to render app kustomizations in-cluster (using the app-of-apps pattern), if I want to diff everything in the cluster I need to extract every app path that argo is rendering from.

This is pretty simple since this repo already has a kustomization at clusters/<clustername> which renders all of the Application resources that argo will target for rendering into the cluster (this is actually the "app" in app-of-apps).

So the cdiff script first renders this kustomization to extract the spec.source.path property for each app using yq, then runs kustomize build on each of the paths; this is repeated for the master branch copy, before finally generating a dyff of the outputs.

If you're not already using some diffing tool to see what's changing in the output of your gitops repo, you should set something up; it's simple and it will save you not just time, but pain.