PostgreSQL Performance and Linux Kernel Upgrades: What AWS Engineers Found

A Linux 7.0 kernel upgrade halved PostgreSQL throughput on AWS. Here's what happened, why it matters, and how to protect your database layer during infrastructure upgrades.

Dmytro SerebrychSEO & Lead of Production · 5 min read · LinkedIn →

When an AWS engineer reported that upgrading to Linux 7.0 cut PostgreSQL query throughput by roughly 50%, the database and cloud infrastructure communities paid attention. The finding is a reminder that production database performance is not just about your queries or indexes — it is deeply coupled to the kernel, scheduler, and memory subsystem beneath them. If your team runs PostgreSQL on AWS — or any managed cloud platform — this regression has direct implications for how you approach infrastructure upgrades.

What Happened

The reported regression emerged after AWS engineers tested PostgreSQL workloads on Linux 7.0 against the previous kernel series. Under read-heavy and mixed OLTP workloads, throughput dropped by approximately half — a significant regression for any production database, but especially consequential for cloud-hosted PostgreSQL instances where customers expect infrastructure upgrades to be transparent.

The root cause appears to involve changes to the memory folios and scheduler subsystems in the newer kernel — changes designed to improve general-purpose performance that had unintended adverse effects on the specific I/O and memory access patterns PostgreSQL relies on. The fix is not straightforward: the kernel changes are not easily reverted, and the correct resolution requires either kernel-side changes or workload-level mitigations.

As of early April 2026, the issue is unresolved upstream and there is no simple patch. Distributions shipping Linux 7.0 — or cloud providers who upgrade their underlying kernel versions — will carry this regression until a fix lands.

A 50% throughput regression on PostgreSQL does not announce itself loudly. It shows up as slightly elevated p99 latency — until traffic spikes and the system collapses.

Why This Matters for Production Systems

The 50% throughput figure is striking, but the broader lesson is more important: infrastructure upgrades that appear routine can have non-obvious, severe effects on database performance. Most engineering teams do not benchmark their database workloads before and after kernel upgrades. They upgrade, observe that "everything is still running," and move on — without measuring whether performance has regressed within acceptable bounds.

For high-traffic applications, a 50% database throughput regression translates directly into:

Doubled query latency at the same concurrency level
Higher CPU utilization per request
Reduced headroom before autoscaling triggers
Cascade failures under load spikes in systems without adequate connection pooling

A system that handled 10,000 requests per minute before the upgrade might start degrading at 5,000 after it. If traffic is currently at 4,000, you will not notice until the next peak.

Cloud providers typically abstract kernel versions from customers — you do not choose which kernel runs under your RDS instance or managed PostgreSQL cluster. This means the regression can arrive as part of a platform maintenance window you did not initiate, with no notice beyond a change in query latency metrics.

How to Detect and Mitigate

There are several concrete steps engineering teams can take to protect themselves against this class of regression:

Baseline your PostgreSQL performance metrics continuously. Transactions per second, average query latency by query type, and cache hit rates should be tracked over time with enough historical data to identify post-upgrade regressions. Tools like pg_stat_statements, Prometheus with postgres_exporter, and Grafana dashboards make this straightforward to implement. If you do not have this in place, you cannot distinguish a kernel regression from a query regression from an application regression. Our article on PostgreSQL query optimization covers some of these monitoring fundamentals in detail.

Pin kernel versions on self-managed instances where performance is critical. If you manage your own PostgreSQL on EC2 or bare metal, you have the option to control kernel upgrades — test on a staging instance before rolling to production. For managed services like RDS, monitor AWS maintenance announcements and test workloads in staging environments that mirror your production kernel version.

Implement connection pooling (PgBouncer or similar). Connection pooling does not fix a kernel regression, but it provides a buffer: if throughput drops, the pool absorbs the increased queuing before connection exhaustion becomes a problem. Teams without connection pooling will hit PostgreSQL's connection limit much faster when throughput decreases.

Use read replicas and caching layers strategically. Read replicas are not just for scaling read traffic — they are also isolation points. If a kernel upgrade hits your primary, read traffic absorbed by replicas is protected. Similarly, caching frequently-accessed query results at the application layer reduces direct load on the database and provides headroom when the underlying throughput decreases.

Watch the upstream kernel mailing lists. The Linux kernel community and PostgreSQL developers communicate openly about performance regressions. Subscribing to relevant mailing lists or following aggregators like phoronix.com gives early visibility into emerging issues before they reach your production kernel version.

The Broader Pattern: Invisible Infrastructure Dependencies

This regression illustrates a pattern that experienced infrastructure engineers recognize but that is underappreciated in product-focused engineering teams: your application's performance is a function of the entire stack, not just the code you wrote. The kernel, the hypervisor, the storage driver, the network stack — all of these influence database performance, and all of them change over time.

Cloud providers have strong incentives to upgrade their underlying infrastructure and generally do so carefully, but no amount of internal testing covers every customer workload profile. PostgreSQL under a specific combination of PgBouncer settings, storage driver version, and workload mix may behave differently than the workloads AWS engineers test internally. The responsibility for detecting and responding to regressions ultimately belongs to the team running the database.

This is precisely why infrastructure observability is not optional — it is the only way to distinguish a kernel regression from a query regression from a traffic spike. Teams that treat monitoring as a nice-to-have discover this the hard way.

How UData Helps

Managing production database infrastructure — monitoring, kernel dependency tracking, performance baselining, and regression response — requires engineering bandwidth that product teams rarely have available. When your developers are focused on features, database operations become a background responsibility that gets attention only when something breaks.

UData's outstaffed engineers handle database infrastructure as a primary focus rather than a side task. We set up continuous performance monitoring, maintain staging environments that mirror production, and respond to infrastructure regressions before they affect your users. For teams running PostgreSQL on AWS or other cloud platforms, we can review your current observability setup and identify gaps that would leave you exposed to regressions like this one. You can review examples of infrastructure work we've delivered in our project portfolio.

If your team is stretched thin on operations and you need experienced engineers who can own database reliability, get in touch — we can scope an engagement around your specific infrastructure and risk profile.

Conclusion

The Linux 7.0 PostgreSQL regression is not an isolated incident — it is a concrete example of how invisible infrastructure changes can have major application-level consequences. The teams that will handle this well are the ones with continuous performance monitoring, staging environments that match production, and the engineering discipline to review infrastructure changes before they reach production traffic. The teams that will struggle are the ones operating on the assumption that managed infrastructure is always transparent. It is not, and the cost of finding out the hard way is measured in degraded user experience and unplanned engineering emergency time.