- Published on
Migrating HashiCorp Vault from File to Raft Storage in Kubernetes
- Authors
- Name
- Daniel Herrmann
Overview
HashiCorp Vault is a popular tool for managing secrets and protecting sensitive data. It supports multiple storage backends, namely Consul, raft and file. The file backend is the default when installing Vault via Helm, but comes with a few limitations. Most notably, the documented backup procedures require raft storage and its snapshot capabilities. Using file storage it is theoretically possible to backup the data, but there's no guarantee that the data is consistent unless the vault server is stopped.
We had Vault deployed in our Kubernetes cluster using the file backend, including a decent number of secret engines (mainly KV and PKI). We've therefore decided to migrate from file to raft, mainly to make use of the snapshot capabilities and including it in our K8up backups.
The general process will be as follows:
- Take a backup of the existing Vault data
- Use the
vault operator migrate
command to migrate the data from file to raft storage - Modify the Helm deployment to use raft storage and redeploy the Vault server
- (Optional): adjust ArgoCD application settings to work around ArgoCD modifying Pod values
Backup the existing Vault data
First of all we need to take a consistent backup of the data. As mentioned above, we can only guarantee consistency if the Vault server is stopped, which however also means that the pod is not available to run any commands. The steps are:
- (Optional) If using ArgoCD or any other GitOps tool, disable auto-sync
- Scale down the Statefulset to 0 replicas, wait for the pod to be terminated
- Deploy a temporary pod mounting the existing PVC to take a backup
---
apiVersion: v1
kind: Pod
metadata:
name: migration-backup
namespace: core-vault
spec:
containers:
- name: migration-backup
image: busybox
args:
- sleep
- "1000000"
volumeMounts:
- name: source
mountPath: /data-source
volumes:
- name: source
persistentVolumeClaim:
claimName: data-vault-0
readOnly: true
- Launch the pod (
kubectl apply -f migration-backup.yaml
) and get a shell (kubectl exec -ti migration-backup /bin/sh
) - Use
tar
to create a backup of the data (tar czf /tmp/backup.tar.gz /data-source/
) - Copy the backup to a safe location (
kubectl cp migration-backup:/tmp/backup.tar.gz ./backup.tar.gz
) - Delete the temporary pod (
kubectl delete -f migration-backup.yaml
)
Next, restart the Vault server by scaling the Statefulset back to 1 and unseal the vault if no auto-unseal is configured.
Data Format Migration
The next step is to actually migrate data from file to raft storage format. This is relatively simple, as the vault operator migrate
command does all the heavy lifting.
- Launch a shell in the Vault pod (
kubectl exec -ti vault-0 /bin/sh
) - Create the migration configuration file (
/home/vault/migrate.hcl
):
storage_source "file" {
path = "/vault/data/"
}
storage_destination "raft" {
path = "/vault/data/"
}
cluster_addr = "https://vault-0.vault-internal:8201"
- Run the migration command (
vault operator migrate -config=/home/vault/migrate.hcl
)
Depending on the amount of data, this can take a while. The command will output progress information.
Redeploy Vault Resources
The last step is to modify the Helm deployment to use raft storage and redeploy the Vault server. Essentially, what we need to do is:
- Enable HA using
server.ha.enabled
andserver.ha.replicas
values. You can set the number of replicas to 1 for now. - Enable raft storage using
server.ha.raft.enabled
- Move the configuration (if you have changed it in the first place, otherwise you can skip this step) from
server.standalone.config
toserver.ha.raft.config
and adjust a couple of values (see below)
ui = true
listener "tcp" {
// ...
}
- storage "file" {
+ storage "raft" {
path = "/vault/data"
}
+ service_registration "kubernetes" {}
You then need to reinstall the helm chart with the new values. In our case we're using ArgoCD, so the steps are to delete the application with all its content, modify the values and then have ArgoCD re-sync the application. This should bring up the new Vault server using raft storage. If no auto-unseal is configured, you will need to unseal the vault again. You should now be left with a health Vault status, indicating HA and raft status similar to this:
$ vault status
Key Value
--- -----
Seal Type azurekeyvault
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Storage Type raft
Cluster Name vault-cluster-xxx
Cluster ID xxxx-f912-46cb-a010-xxxx
HA Enabled true
HA Cluster https://vault-0.vault-internal:8201
HA Mode active
Raft Committed Index 1185
Raft Applied Index 1185
ArgoCD Pitfalls
Vault is a bit special in that it modifies the pod's labels depending on the state of the pod. These labels are then used as selector, for example for the service. In a default configuration this will not work and end up with a service without endpoint, as ArgoCD will auto-sync and purge the labels from the Pod again. For example the vault-active
service:
k get svc vault-active -o yaml
apiVersion: v1
kind: Service
metadata:
name: vault-active
spec:
selector:
app.kubernetes.io/instance: vault
app.kubernetes.io/name: vault
component: server
vault-active: "true"
Note the vault-active
label and compare against the labels of a healthy pod:
apiVersion: v1
kind: Pod
metadata:
generateName: vault-
labels:
app.kubernetes.io/instance: vault
app.kubernetes.io/name: vault
apps.kubernetes.io/pod-index: "0"
component: server
helm.sh/chart: vault-0.28.1
statefulset.kubernetes.io/pod-name: vault-0
vault-active: "true"
vault-initialized: "true"
vault-perf-standby: "false"
vault-sealed: "false"
vault-version: 1.17.2
name: vault-0
In order for ArgoCD to not purge these labels, we need to make use of the ignoreDifferences
diff customization in the application manifest:
spec:
ignoreDifferences:
- group: admissionregistration.k8s.io
kind: MutatingWebhookConfiguration
jqPathExpressions:
- .webhooks[]?.clientConfig.caBundle
- kind: Pod
name: vault-0
jsonPointers:
- /metadata/labels/vault-active