Overview
This page contains the list of deprecations and important or breaking changes for Vault 1.13.x compared to 1.12. Please read it carefully.
Changes
Consul dataplane compatibility
If you are using Consul on Kubernetes, please be aware that upgrading to Consul 1.14.0 will impact Consul secrets, storage, and service registration. As of Consul 1.14.0, Consul on Kubernetes uses Consul Dataplane by default instead of client agents. Vault does not currently support Consul Dataplane. Please follow the Consul 1.14.0 upgrade guide to ensure that your Consul on Kubernetes deployment continues to use client agents.
User lockout
As of version 1.13, Vault will stop trying to validate user credentials if the user submits multiple invalid credentials in quick succession. During lockout, Vault ignores requests from the barred user rather than responding with a permission denied error.
User lockout is enabled by default with a lockout threshold of 5 attempt, a lockout duration of 15 minutes, and a counter reset window of 15 minutes.
For more information, refer to the User lockout overview.
Active directory secrets engine deprecation
The Active Directory (AD) secrets engine has been deprecated as of the Vault 1.13 release. We will continue to support the AD secrets engine in maintenance mode for six major Vault releases. Maintenance mode means that we will fix bugs and security issues but will not add new features. For additional information, see the deprecation table and migration guide.
AliCloud auth role parameter
The AliCloud auth plugin will now require the role
parameter on login. This
has always been documented as a required field but the requirement will now be
enforced.
Mounts associated with removed builtin plugins will result in core shutdown on upgrade
As of 1.13.0 Standalone (logical) DB Engines and the AppId Auth Method have been
marked with the Removed
status. Any attempt to unseal Vault with
mounts backed by one of these builtin plugins will result in an immediate
shutdown of the Vault core.
NOTE In the event that an external plugin with the same name and type as a deprecated builtin is deregistered, any subsequent unseal will continue to unseal with an unusable auth backend, and a corresponding ERROR log.
The remediation for affected mounts is to downgrade to the previously-used version of Vault
environment variable and replace any Removed
feature with the
preferred alternative
feature.
For more information on the phases of deprecation, see the Deprecation Notices FAQ.
Impacted versions
Affects upgrading from any version of Vault to 1.13.x. All other upgrade paths are unaffected.
Known issues
Rotation configuration persistence issue could lose transform tokenization key versions
A rotation performed manually or via automatic time based rotation after restarting or leader change of Vault, where configuration of rotation was changed since the initial configuration of the tokenization transform can result in the loss of intermediate key versions. Tokenized values from these versions would not be decodeable. It is recommended that customers who have enabled automatic rotation disable it, and other customers avoid key rotation until the upcoming fix.
Affected versions
This issue affects Vault Enterprise with ADP versions 1.10.x and higher. A fix will be released in Vault 1.11.9, 1.12.5, and 1.13.1.
PKI OCSP GET requests can return HTTP redirect responses
If a base64 encoded OCSP request contains consecutive '/' characters, the GET request will return a 301 permanent redirect response. If the redirection is followed, the request will not decode as it will not be a properly base64 encoded request.
As a workaround, OCSP POST requests can be used which are unaffected.
Impacted versions
Affects all current versions of 1.12.x and 1.13.x
PKI revocation request forwarding
If a revocation request comes in to a standby or performance secondary node, for a certificate that is present locally, the request will not be correctly forwarded to the active node of this cluster.
As a workaround, submit revocation requests to the active node only.
STS credentials do not return a lease_duration
Vault 1.13.0 introduced a change to the AWS Secrets Engine such that it no longer creates leases for STS credentials due
to the fact that they cannot be revoked or renewed. As part of this change, a bug was introduced which causes lease_duration
to always return zero. This prevents the Vault Agent from refreshing STS credentials and may introduce undesired behaviour
for anything which relies on a non-zero lease_duration
.
For applications that can control what value to look for, the ttl
value in the response can be used to know when to
request STS credentials next.
An additional workaround for users rendering STS credentials via the Vault Agent is to set the
static-secret-render-interval
for a template using the credentials. Setting this configuration to 15 minutes
accommodates the default minimum duration of an STS token and overrides the default render interval of 5 minutes.
Impacted versions
Affects Vault 1.13.0 only.
LDAP pagination issue
There was a regression introduced in 1.13.2 relating to LDAP maximum page sizes, resulting in
an error no LDAP groups found in groupDN [...] only policies from locally-defined groups available
. The issue
occurs when upgrading Vault with an instance that has an existing LDAP Auth configuration.
As a workaround, disable paged searching using the following:
Impacted versions
Affects Vault 1.13.2.
PKI Cross-Cluster revocation requests and unified CRL/OCSP
When revoking certificates on a cluster that doesn't own the
certificate, writing the revocation request will fail with
a message like error persisting cross-cluster revocation request
.
Similar errors will appear in the log for failure to write
unified CRL and unified delta CRL WAL entries.
As a workaround, submit revocation requests to the cluster which issued the certificate, or use BYOC revocation. Use cluster-local OCSP and CRLs until this is resolved.
Impacted versions
Affects Vault 1.13.0 to 1.13.2. Fixed in 1.13.3.
On upgrade, all local revocations will be synchronized between clusters; revocation requests are not persisted when failing to write cross-cluster.
Slow startup time when storing PKI certificates
There was a regression introduced in 1.13.0 where Vault is slow to start because the PKI secret engine performs a list operation on the stored certificates. If a large number of certificates are stored this can cause long start times on active and standby nodes.
There is currently no workaround for this other than limiting the number of certificates stored
in Vault via the PKI tidy or using no_store
flag for PKI roles.
Impacted versions
Affects Vault 1.13.0+
Token creation with a new entity alias could silently fail
A regression caused token creation requests under specific circumstances to be forwarded from perf standbys (Enterprise only) to the active node incorrectly. They would appear to succeed, however no lease was created. The token would then be revoked on first use causing a 403 error.
This only happened when all of the following conditions were met:
- the token is being created against a role
- the request specifies an entity alias which has never been used before with the same role (for example for a brand new role or a unique alias)
- the request happens to be made to a perf standby rather than the active node
Retrying token creation after the affected token is rejected would work since the entity alias has already been created.
Affected versions
Affects Vault 1.13.0 to 1.13.3. Fixed in 1.13.4.
update-primary can lead to data loss
It's possible to lose data from a Vault cluster given a particular configuration and sequence of steps. This page describes two paths to data loss, both associated with the use of update-primary.
Normally update-primary does not need to be used. However, there are a few cases where it's needed, e.g. when the known primary cluster addresses of a secondary don't contain any of the correct addresses. But update-primary does more than you might think: it does almost everything that enabling a secondary does, except that it doesn't wipe storage. One of the steps that it takes is to temporarily remove most of the mount table records: it removes all mount entries except for those that are managed automatically by vault, e.g. identity mounts.
This update-primary behaviour is unintended and we'll be reworking it in an upcoming release. Once it lands the changelog entry will be "Fix a race condition with update-primary that could result in data loss after a DR failover."
update-primary with local data in shared mounts
If update-primary is done on a PR secondary with shared mounts containing local data (e.g. pki certs, approle secretids), the merkle tree on the PR secondary may get corrupted due to a timing race.
When this happens, the PR secondary still contains all the stored data, e.g. listing local certs from PKI mounts will return the correct results. However, because the merkle tree has been corrupted, a downstream DR secondary will not receive the local data, and will delete it if it already had it. If the PR secondary's DR secondary is promoted before the PR secondary is repaired, the newly promoted PR secondary will not contain the local data it ought to. If the former PR secondary is lost or destroyed, the missing data will not be recoverable other than via a snapshot restore.
Detection and remediation
If the TRACE level log line "cleaning key in merkle tree"
appears immediately subsequent
to an update-primary on a PR secondary, that's an indicator that the timing race was lost
and that the merkle tree may be corrupt.
Repairing the corrupt merkle tree is done by issuing a replication reindex request to the PR secondary.
If logs are no longer present (the update-primary was done some time in the past), it's probably best to reindex the PR secondary pre-emptively as a precaution.
update-primary with "Allow" path filters
There is a further path to data loss associated update-primary.
This issue requires that the PR secondary receiving an update-primary request has an
associated Allow
path filter defined for it. Like the first issue, this one too has
a timing aspect: the problem may or may not manifest, depending on how
quickly the mount tables truncated by update-primary get repaired by replication.
At startup/unseal (and after an update-primary), Vault runs a background job that looks at the mount data it has stored and tries to delete any that doesn't belong there, based on path filters. This behaviour was introduced in 1.0.3.1 to recover from a regression that allowed for inappropriate filtering of data: we needed to ensure that any previously unfiltered data got cleaned up on secondaries that ought not have it.
If a performance secondary has an associated Allow path filter, this cleanup code can misfire during the interval between when the truncated mount tables are written by update-primary and the time when they get rewritten by replication. The cleanup code will delete the data associated with the missing mount entries. The cleanup code doesn't modify the merkle tree, and as a result this deleted data won't be discovered as missing and repaired by replication.
Detection and remediation
When the cleanup code fires it logs the INFO level message "deleted mistakenly stored
mount entry from backend"
. This is a reliable indicator that the bug was hit.
If logs aren't available, the other indicator that this problem has manifested is to query the shared mount in question. The secondary won't have any of the data that the primary does, e.g. roles and configuration will be absent.
Reindexing the performance secondary will update the merkle tree to reflect the missing storage entries and allow missing shared data to be replaced by replication. However, any local data on shared mounts (such as PKI certs) will not be recoverable.
Impacted versions
Affects all current versions of Vault.
PKI storage migration revives deleted issuers
Vault 1.11 introduced Storage v1, a new storage layout that supported multiple issuers within a single mount. Bug fixes in Vault 1.11.6, 1.12.2, and 1.13.0 corrected a write-ordering issue that lead to invalid CA chains. Specifically, incorrectly ordered writes could fail due to load, resulting in the mount being re-migrated next time it was loaded or silently truncating CA chains. This collection of bug fixes introduced Storage v2.
Affected versions
Vault may incorrectly re-migrated legacy issuers created before Vault 1.11 that were migrated to Storage v1 and deleted before upgrading to a Vault version with Storage v2.
The migration fails when Vault finds managed keys associated with the legacy issuers that were removed from the managed key repository prior to the upgrade.
The migration error appears in Vault logs as:
Error during migration of PKI mount: failed to lookup public key from managed key: no managed key found with uuid
Note
Issuers created in Vault 1.11+ and direct upgrades to a Storage v2 layout are not affected.The Storage v1 upgrade bug was fixed in Vault 1.14.1, 1.13.5, and 1.12.9.
Using 'update_primary_addrs' on a demoted cluster causes Vault to panic
Affected versions
- 1.13.3, 1.13.4 & 1.14.0
Issue
If the update_primary_addrs
parameter is used on a recently demoted cluster, Vault will panic due to no longer
having information about the primary cluster.
Workaround
Instead of using update_primary_addrs
on the recently demoted cluster, instead provide an
activation token.