The OWASP Top 10, the reference standard for the most critical web application security risks, has just dropped a new 2025 version - the first major update since 2021. It feels like the perfect moment to ask: how does the rise of vibe coding intersect with these risks? So I built a simple web app with AI and audited it against the new Top 10. The results explain why "ship fast, fix later" could be a dangerous game.

The OWASP Top 10 sums up the most common web app security problems. It is built from a large dataset of real application security findings, grouped by underlying weakness (CWE). Categories are ranked mainly by how often they appear across applications, not by raw counts. Most entries come directly from data, while a couple are added through community surveys.
Let’s start with an important thing - OWASP Top 10 is primarily an awareness document, not a coding or testing standard. It documents AppSec risks with not necessarily easily testable issues - the bare minimum and just a starting point in the AppSec world. If you are looking for the OWASP standard helping adopt application security, you should use ASVS - Application Security Verification Standard. I have recently written some blog posts about implementing OWASP ASVS and what's new in ASVS 5.0 2025 version. I encourage you to check them out.
Currently, we have access to the Top 10 2025 release candidate. However, while the data collection and analysis are finalized, the remaining thing is formatting and editing stuff, so we can treat it as ready in terms of the important knowledge. Here is a brief summary of what has changed in comparison to the 2021 version. You can read the full document, which includes a comprehensive explanation of each point at a dedicated site: https://owasp.org/Top10/2025..

The idea is simple: we’ll vibe code a web app that is deliberately vulnerable to the OWASP Top 10 security issues. The aim is to observe how often, especially when we're not paying close attention, we might unknowingly introduce vulnerabilities.
It's essential to note that the goal isn't to prove that AI is unsafe or that vibe coding is inherently bad. Instead, to take a look at what happens in practice, how shifts in discipline and workflow impact the incidence of common web security weaknesses.
We're about to vibe code a personal gallery app for saving your favorite websites. At first glance, it's straightforward - a minimal UI with login, a display of saved links, with extra features like an option to export the gallery. What looks like a harmless side project is actually a perfect playground for potential security pitfalls, each mapping somehow onto the OWASP Top 10. Potential vulnerabilities are:
In this episode, we explain the essential challenge faced by every modern software organization: technical debt. We analyze how to define it, distinguish between conscious strategic debt and unintentional "mess", and, crucially, how to manage and eliminate it to maintain development speed and team morale.
Our host, Paweł Dolega, discusses this topic with Sebastian Titze, CTO at Corify, an experienced technology leader. Sebastian, drawing on his career experience from working in the financial sector to consulting and management, shares his strategies.
He explains how to measure technical debt using metrics like cognitive complexity and change frequency, emphasizes that fixing tech debt must be framed as a business decision (ROI), and stresses the importance of organizational culture and team motivation in addressing it.
Beyond the Commit, Episode 3: Sebastian Titze
Want to listen to the podcast on Spotify, Apple, or Amazon Podcasts? Check out the Beyond the Commit website for details.
Here you'll find the transcript of the conversation (PDF file).
Beyond the Commit is brought to you by VirtusLab & SoftwareMill.
Our podcast spotlights CTOs and senior engineers, sharing candid stories that resonate with technology and business leaders alike. Expanding on our popular technology blog (45 k monthly readers), the series adopts a more personal, conversation-driven format.
Didn't get a chance to listen to previous episodes?
Don't worry, you can do it now!
Beyond the Commit, Episode 1: Michal Janoušek
Beyond the Commit, Episode 2: Gilberto Taccari
]]>Capture checking is an upcoming Scala feature that allows you to track which designated values (capabilities) are captured (i.e., stored as references) by arbitrary other values. This tracking happens at compile time and is currently an opt-in mechanism that can be enabled via an import.
There are two good resources to get started with capture checking: the official Scala docs and Nicolas Rinaudo's article. Still, it took me some time to understand how and why capture checking works. Thanks to the help from Martin Odersky, and the OOPSLA talk by Yichen Xu, I think I now have a pretty good grasp of the mechanism. Hence, I hope the following will be a valuable companion to these two excellent sources.
Capture checking has been significantly improved in the upcoming Scala 3.8. We'll be using the 3.8.0-RC3 version in the examples below.
A value is captured by another if it's somehow retained in its object tree. For example, given:
case class JsonParser(inputStream: InputStream, config: JsonConfig)
an instance parser: JsonParser captures the input stream & the config with which it was created. Now, certain captures are worthwhile tracking, while others are not. Here, we might want to track where the inputStream is used & retained, so that when we close it, we are sure that nobody is using it anymore. On the other hand, tracking config is probably useless.
Let's take a look at a couple of examples of capture checking in action. First, the use case we've talked about above—tracking the lifetime of an input stream.
We might define a utility function, which opens an input stream for a given file name, applies the provided closure, and ensures that the stream is closed (try running the examples with scala-cli!):
//> using scala 3.8.0-RC3
import language.experimental.captureChecking
import java.io.{FileInputStream, InputStream}
def withFile[T](name: String)(op: InputStream^ => T): T =
val f = new FileInputStream(name)
try op(f)
finally f.close()
Such a function might be used legally, e.g., to read the first byte of the file:
@main def main =
withFile("data.txt"): in =>
in.read()
However, when we try to "leak" the input stream (note that after withFile completes, the stream is no longer valid—it's closed), we get a compiler error (which is … quite complex and opaque, but remember it's still an experimental, under-development feature):
withFile("data.txt")(identity)
[error] ./main.scala:17:24
[error] Found: (x: java.io.InputStream^'s1) ->'s2 java.io.InputStream^'s3
[error] Required: java.io.InputStream^ => java.io.InputStream^'s4
[error]
[error] Note that capability cap cannot be included in outer capture set 's4.
[error]
[error] where: => refers to a fresh root capability created in method
main3 when checking argument to parameter op of method withFile
[error] ^ refers to the universal root capability
[error] cap is a root capability associated with the result
type of (x: java.io.InputStream^): java.io.InputStream^'s4
[error] withFile("data.txt")(identity)
[error] ^^^^^^^^
Why is that? The input stream is marked as a capability: its type is InputStream^. The ^ is crucial: it designates the value as tracked. Any type followed by a ^ is a tracked capability....
In my latest blog post, I mentioned that the size of the method compiled by the JIT compiler depends on the amount of profiling data included. But what does that actually mean? In this article, I will answer that question and also explain how the profiling mechanism works in general.
I was conducting some research using a VM on GCP, specifically running Debian GNU/Linux 12 (bookworm) on an amd64 architecture. The Java version installed was Temurin-25+36 (build 25+36-LTS). You can find the complete source code on GitHub. To gain insight into the state of JVM internal objects residing in Metaspace, I utilized the jhsdb utility.
The program I used for the experiment is basically a simple loop, as shown below:
static void main(String[] args) throws InterruptedException {
var nrOfIterations = Integer.parseInt(args[0]);
var sleepInMillis = Integer.parseInt(args[1]);
var argumentHolder = new ArgumentsHolder(nrOfIterations, sleepInMillis);
var sum = 0;
for (int i = 1; i <= argumentHolder.getNrOfIterations(); i++) {
IO.println("This is: " + i + " iteration");
Thread.sleep(resolveSleepInMillis(i, argumentHolder.getSleepInMillis()));
sum += resolveNumber(i);
}
IO.println("My fantastic sum is: " + sum);
}
The Thread.sleep() is included to allow me to stop the JVM at specific points.
I ran the program with the following command:
java -XX:+PrintCompilation -jar target/app.jar 100000 100
I included the PrintCompilation flag in the run command because it allows me to see when specific methods are compiled by the JIT compiler or decompiled back to interpreter mode.
To attach a command-line debugger to a running JVM, I first needed to know the process ID. Once I had the pid, I could run:
jhsdb clhsdb --pid <pid>
After the debugger was attached I could see the method details, but I needed to perform some initial steps first.
class org.zygiert.Main
InstanceKlass object in Metaspaceinspect <memory_address>
Array<Method*>* InstanceKlass::_methods: Array<Method*> @ 0x00007f307c400400
mem 0x00007f307c400400/5
0x00007f307c400400: 0x0000000000000004
0x00007f307c400408: 0x00007f307c400478
0x00007f307c400410: 0x00007f307c4005c0
0x00007f307c400418: 0x00007f307c400768
0x00007f307c400420: 0x00007f307c400698
With this information, I gained interesting insights into how the method metadata changed as the application ran. I used the inspect command, which I previously mentioned. Similar steps can be repeated to examine each class in the codebase. In my case, I was particularly interested in one more class.
The first stop of a program I created was at the 235th iteration. By knowing the memory addresses for each method, I was able to access the method metadata. For the main method, it looked as follows:
Type is Method (size of 88)
ConstMethod*...
]]>
In the second episode, we turn our attention to the evolving role of the Chief Technology Officer, the challenges of tech recruitment, and the dynamic reality of the FinTech industry. Together, these themes highlight how technology leadership influences both organizational growth and the broader financial ecosystem.
Our host, Paweł Dolega, talks with Gilberto Taccari, CTO at Tot. With a PhD in Computer Engineering and over ten years of experience, Gilberto combines technical depth with business strategy.
His career spans consultancy for major Italian banks, leadership at FinTech startups, and building high‑performing teams in competitive markets.
Guided by a "build it right" philosophy, Gilberto shares how sustainable growth, smart hiring, and sound technology decisions can influence the future of finance.
Want to listen to the podcast on Spotify, Apple, or Amazon Podcasts? Check out the Beyond the Commit website for complete details.
Here you'll find a complete transcript of the conversation (PDF file).
Beyond the Commit is brought to you by VirtusLab & SoftwareMill.

Our podcast spotlights CTOs and senior engineers, sharing candid stories that resonate with technology and business leaders alike. Expanding on our popular technology blog (45 k monthly readers), the series adopts a more personal, conversation-driven format.
Didn't get a chance to listen to the premiere episode?
Don't worry, you can do it now!
Testing concurrent code is hard. If you run a multi-threaded test and it passes, you never know if the code is correct or if it's just luck. The only way to be certain is through formal verification; however, despite significant progress in this area, writing and maintaining formally verified code remains very expensive and time-consuming. Hence, it's only applicable to a narrow spectrum of use cases.
That's where deterministic concurrent testing comes in. It doesn't give you a guarantee that your multi-threaded code is correct, but it also provides much more than unit or stress tests (that is, when you run your code in a loop, hoping that enough thread interleavings will occur to find any problems).
Fray is one such library, which enables writing concurrent tests for the JVM, deterministically simulating various thread interleaving, and if needed, replaying runs that failed, using a standard Java debugger.
Let's take a closer look at how Fray can be used and how it works.
Before we begin, to reiterate: when writing a concurrent test using Fray, you won't get a guarantee that every concurrency bug will be found. Usually, the search space (the number of possible thread interleavings) is too large, as it grows exponentially with each synchronization point (such as acquiring a lock, invoking blocking I/O, or reading an atomic integer).
However, Fray explores the search space in a "probabilistically intelligent" way, implementing the latest research on the subject, and thus maximizing the probability that if a bug exists, it will be found.
On a more technical level, the test code is first instrumented at bytecode load-time. Then, Fray runs it in multiple iterations. It integrates well with build systems such as Maven or Gradle, testing frameworks such as JUnit, and any JVM-based languages. A concurrent test fails if there's a deadlock, or if any thread started during the test ends with an exception.
Here's a simple Fray-enabled test case, using JUnit:
@ExtendWith(FrayTestExtension.class)
public class FrayDemoTest {
// Similar examples might include: rate limiters, CPU credits, ...
class Bank {
volatile int balance = 100;
void withdraw(int amount) {
if (balance >= amount) {
balance = balance—amount;
}
assert(balance >= 0);
}
}
@ConcurrencyTest
public void bankTest() throws InterruptedException {
var bank = new Bank();
Thread t1 = new Thread(() -> {
bank.withdraw(75);
});
Thread t2 = new Thread(() -> {
bank.withdraw(75);
});
t1.start();
t2.start();
t1.join();
t2.join();
}
}
Despite using volatile to protect memory from concurrent access, this code is not correct, as there exists a thread interleaving where the assertion fails, and we end up with a negative account balance:
| time | t1 | t2 | |
|---|---|---|---|
| 1 | if (balance >= amount) | true | |
| 2 | if (balance >= amount) | still true! | |
| 3 | balance = balance - amount | balance is 25 | |
| 4 | balance = balance - amount | balance is -50! |
The problem is that if followed by an assignment is not an atomic operation. In fact, there are three separate operations (read in if, read old value, write...
Have you ever been in a situation where you had to validate data that was beyond classic email or phone validation?
Validating form data can often lead to unreadable and complex-looking regular expressions that no one understands, and everyone hopes there are some unit tests written for those, so you can at least grasp the idea of what the purpose is and what we are trying to achieve. On top of that, when you need to cross-validate multiple fields or detect sarcasm entered by the user, you are on your own when it comes to, e.g., Spring validation (or any popular framework I am aware of).
What if your framework validation could understand language and intent, not just syntax? Meet Semantic AI Validator from SoftwareMill, which is a lightweight, annotation-based (JSR-380 compliant), async library for Spring framework, which solves some of the problems people may have when building web solutions where forms are an essential part of the business.
Semantic AI Validator is a Kotlin library that enables intelligent, context-aware validation of form fields using Large Language Models. Instead of writing complex regex patterns or business logic for custom validators, describe what you want to validate in plain English (or any other language if needed) with an LLM prompt.
Features
@AIVerify to any field with a validation promptWhenever you need semantic or subjective checks (for bios, reviews), or you need to check cross-field consistency on your web forms, you can utilise LLM and execute your validations across multiple providers with simple @Valid annotation and its attributes.
Let’s take a simple email validation with regexp in Spring:
@field:NotBlank(message = "Contact email cannot be blank")
@field:Pattern(
regexp = "^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$",
message = "Invalid email format. Expected format: [email protected]"
)
val contactEmail: String
In this particular case, it's not so bad actually. With some basic knowledge about regular expressions, you can easily decipher what we are expecting in this field. But the regex validation can become complex. If you ever wanted to validate something other than email, e.g., a complex URL with path and query parameters, you know the pain.
Now compare that to the power of Semantic Spring Validator:
@AIVerify(
prompt = """Verify that this project description contains all of the following elements:
1. Project name
2. Main goal or objective
3. Target audience
4. Timeline or schedule
If any of these elements are missing, respond with: "INVALID: Missing [list what's missing]"
If all elements are present, respond with: "VALID"
Be strict in your validation. Vague or implied information doesn't count."""
)...
]]>
Were you ever responsible for managing a Kafka cluster running in production, assuming that it’s not trivial and handles thousands or maybe millions of messages daily, you probably had many sleepless nights where you were thinking how big of a mess you would need to solve next day at work?
Not to mention situations where you were simply on call and, with proper monitoring setup, you had to solve them straight away that very night...
At SoftwareMill, we manage multiple Kafka clusters for our clients, often with on-call situations, and, needless to say, it’s often not a pleasurable experience. Kafka is powerful but also really complex.
Simple configuration mistakes can have serious consequences, and in situations where we inherit existing setups, and often when we try to fix some basics of those done wrongly from the beginning, we have a period of time where we need to give 110% of our experience and Developer/DevOps brains to make things work as they should.
Kafka is a tool that has been in existence for over 10 years and offers a plethora of tools to make life easier for developers and others setting up clusters. However, even with good monitoring, you still only see symptoms and not the root causes.
Being firm believers in the quote work smarter, not harder, we try to leverage automation to reduce manual effort. For this very purpose, KCPilot was created. KCPilot is a tool that, although still in its early development stage, promises to make existing Kafka cluster setups check faster by identifying common problems that require your attention.
In short, KCPilot is basically a set of tasks that analyzes, with the help of LLM, the data associated with the running cluster to find common mistakes people make when setting it up. The tool first scans your cluster configuration, logs, and metrics (some of which also with the help of LLM) and executes predefined tasks to search for potential issues.
Retrieving the data to be analyzed is a complex task on its own, mainly because Kafka deployments can differ significantly in their architecture, the way brokers are started, and how logs and metrics are handled. There is no single standard or unified approach, and every production setup tends to do things slightly differently.
On top of that, Kafka itself doesn’t provide any simple commands or binary responses that would directly tell you whether the cluster is healthy or not. A good example of this complexity are logs, which require a multi-step process to locate and collect from the right locations (more on that later).

Except for logs, we are trying to get the system info, environment variables, and other information on each broker and store that data for the use of the analysis step. Finding this bit of information is usually done by any person analysing existing Kafka clusters so that it can be evaluated and corrections (if needed) can be made. The tool significantly...
I used to hear in the past that the JIT (Just In Time) compiler can compile bytecode on multiple levels. However, I have never thought about what it means in practice. In the middle of October, I was attending a conference where I had the pleasure of listening to a talk about ‘lock-free programming'. During this talk, the speakers briefly mentioned the JVM internals. This inspired me to dig deeper into this topic. As one of the best ways to check if you understand any topic well is to try to explain it to others, I decided to write this article.
One of the main differences between Java and languages like C++ is how they are compiled. A C++ compiler translates code directly into machine code that the CPU can execute. In contrast, Java compiles its code into an intermediate form known as bytecode. This bytecode cannot be executed directly by the CPU. It requires additional processing. The JVM serves as an application that translates this bytecode into machine code, enabling it to be executed by the CPU.
It's important to note that the code we write in any JVM-compatible language serves as a recipe for how we want our program to execute.
The JVM can compile our bytecode at five different levels. I will illustrate this with an image:

I will take a very simple method as an example to illustrate the difference between each level.
This is the definition of a method in Java:
private static int resolveNumber(int i) {
if (i < 20000) return i;
else return i * 2;
}
After compilation, we can examine the generated bytecode using the following command:
javap -c -p target/classes/org/zygiert/Main.class
The produced bytecode is a set of operation codes for the JVM that specify the operations to be executed. Below is the bytecode representation of the method:
0: iload_0
1: sipush 20000
4: if_icmpge 9
7: iload_0
8: ireturn
9: iload_0
10: iconst_2
11: imul
12: ireturn
Here's what actually happens:
iload_0 loads an integer from a local variable to the stack.sipush pushes the short integer 20000 onto the stack.if_icmpge compares the two integer values. If the first integer is greater than or equal to 20000, it jumps to the instruction at byte offset 9. If not, it continues to the next instruction.If the integer is less than 20000:
iload_0 loads parameter again onto the stack.ireturn returns the integer from the stack.If the integer is greater than or equal to 20000:
iload_0 parameter is loaded again onto the stack.iconst_2 pushes the constant integer 2 onto the stack.imul multiplies the two topmost integers on the stack, leaving the result on the stack.ireturn returns the resulting integer from the stack....In this premiere episode of Beyond the Commit podcast, we explore the software engineering career journey, from individual contributor to team lead and ultimately to engineering manager. Along the way, we unpack the essential skills, mindsets, and expectations that define each stage of professional growth.
Our host, Paweł Dolega, CTO at VirtusLab, dives into these topics in a conversation with Michal Janoušek, a seasoned technology consultant and founder of PureBrew Tech.
With nearly 20 years of experience in the tech industry, Michal has progressed from individual contributor roles to consulting and leadership positions across various levels. His perspective on career development, team building, and technical leadership brings valuable, real-world insight to the discussion.
Want to listen to the podcast on Spotify, Apple, or Amazon Podcasts? Visit the Beyond the Commit website for more information.
Here you'll find the transcript of the conversation: Beyond the Commit Episode 1 Full Transcript (PDF file).

Beyond the Commit is brought to you by VirtusLab & SoftwareMill.
Our podcast spotlights CTOs and senior engineers, sharing candid stories that resonate with technology and business leaders alike. Building on our popular technology blog (45,000 monthly readers), the series adopts a more personal, conversation-driven format.
]]>When we published "IT Trends to Watch in 2021" back in December 2020, the world was in the middle of massive uncertainty. COVID-19 had forced everyone into remote mode. Digital transformation became suddenly urgent, and the IT landscape felt like it was reshaping in real-time.
Five years later, in 2025, it’s time to look back at what we predicted, and to see where we were spot on, where we were overly optimistic, and where reality took a slightly different path.
Take a look at newest IT Trends to Watch in 2026.
We believed AI and machine learning would move out of research labs and into real-world products, even reaching the edge, mobile devices, IoT, and embedded systems.
That was very much spot on.
From image recognition running directly on smartphones to on-device language models, smart cameras, and AI-powered apps, the edge AI is no longer futuristic. Companies are increasingly blending cloud-based models with lightweight edge inference. We may have slightly underestimated how fast AI would grow, and just how much generative AI would achieve, but the underlying direction was spot on.
Back in 2020, we wrote about the shift from batch processing to streaming data pipelines. At the time, Kafka and Flink were already gaining traction.
Fast-forward to today: real-time analytics, event-driven architectures, and streaming platforms are standard in fintech, e-commerce, logistics, and IoT. The tooling matured dramatically, and "real-time" became a default expectation, not a luxury.
That one aged very well.
We predicted that cybersecurity would become a top priority especially in software supply-chain security and DevSecOps practices. Sadly, the industry validated that prediction faster than we’d have liked.
The SolarWinds attack, Log4Shell, and numerous open-source dependency breaches made it clear: security isn’t a checklist; it’s part of everyday engineering. Today, with security shifting left, few serious teams ship code without integrating security into CI/CD and dependency scanning.
This one wasn’t just correct, it became essential.
We claimed that remote work wasn’t a temporary pandemic fix, but a long-term transformation.
Nowadays, hybrid and remote work are the norm, not the exception. The tooling has improved, a distributed culture has matured, and "digital-first" is now a given across industries. The only caveat is that the world is shifting more towards a hybrid model than we initially expected, with major companies publicly announcing their return-to-office policies.
Still, the overall prediction holds: remote work is here to stay, at least as part of the week.
In 2020, we were optimistic that blockchain would mature, moving past crypto hype toward enterprise adoption and decentralized identity.
Five years on, we’re still waiting for that mainstream enterprise moment.
Yes, there are interesting use cases in supply chain tracking, finance, and digital identity, but blockchain didn’t become the backbone of everyday enterprise systems. Instead, it stayed niche, useful in specific contexts but...
]]>Get ready for our sixth annual deep dive into the future of tech.
2026 won’t be about "if" AI touches your stack, but about how smartly you put it to work. AI is everywhere, and it’s not hitting the brakes. The winners won’t just adopt tools, they’ll orchestrate them, amplifying reliability and velocity, while designing systems that scale with latitude and lock down security by default.
We also discuss some security, cloud-native, and software engineering patterns that we believe will be especially relevant in 2026. Enjoy the read.
And because we keep score, you can also see how our 2021, 2022, 2023, 2024, and 2025 predictions aged.
AI adoption is unlikely to ease off next year. With the evolution of ChatGPT, GitHub Copilot, and other LLM assistants, AI has become a necessity, available both as an added feature in various products (e.g., Google Workspace) and as tools leveraged not only by software engineers, but also by UX Designers, Project Owners, and managers.
2025 StackOverflow Developer Survey shows that 84% of respondents are using AI; 66% say AI solutions are "almost" right, and 69% report increased productivity. At the same time, positive sentiment has decreased compared with the previous year.
AI is used for architecture, planning, coding, analysis, testing, and documentation generation. New models are released regularly, so people constantly test them, compare different variants, and assess which are best for their use cases.
The general way of working has undergone a significant change. For developers, AI can handle the tedious, repetitive tasks they never liked.
Everybody's wondering how to use AI best? Where to set the boundaries so that AI increases efficiency without introducing tech debt and maintains the overall maintainability of the systems created? How can we train people to use AI while also ensuring their self-development, so that we don't end up with a skills gap in a couple of years?
They’re also learning how to approach tasks more efficiently, getting the best results without exhausting credits and with minimal manual correction. AI still hallucinates. Productivity gains are sometimes questioned because outputs must be corrected, reviewed, validated, and tested.
In our view, the coming year will bring stabilization rather than major breakthroughs, with a focus on assessing AI-based workflow standards and polishing productivity. We have witnessed a moment where every day brought a new change in the AI world; now the changes have slowed down. Among the top-most often used solutions, we can find Claude Code and Cursor.
Moreover, with the Model Context Protocol, we can transition from simple "copilot" usage to working with an orchestrated group of agents that interact with various tools and systems. Definitely, we will see more agentic workflows allowing us to interact with different applications, including web browsers.
We’ll also see small models working locally on devices and tools, allowing us to integrate various AI-related technologies with other systems, such as n8n, which has recently emerged as one...
]]>In multi-datacenter Kafka deployments, ensuring data durability and availability under failure conditions requires a careful balance between consistency and performance.
The Confluent Stretched Cluster 2.5 architecture introduces mechanisms such as observers and automatic observer promotion (AOP) to provide cross-datacenter resilience while minimizing operational complexity.
This post explains the role of replication.factor (RF) and min.insync.replicas (min.ISR) on the Confluent Platform Stretched Cluster, describing how they affect availability and how Confluent’s implementation differs from open-source Apache Kafka.
A stretched cluster 2.5 is a Kafka deployment where brokers are installed in two fully operational data centers (DCs), while a third, lightweight site hosts only metadata services – historically ZooKeeper or, in modern Confluent Platform versions, a KRaft controller quorum.
A broker hosts partition replicas; each replica assumes one of three roles:
The ISR (In-Sync Replicas) list includes brokers fully caught up with the leader.
The configuration min.insync.replicas defines how many ISR members must acknowledge a write (with acks=all) to be considered committed.
To use observers and stretched clusters, you must define the number of replicas when creating a topic and then specify confluent.placement.constraints, which determines where each replica is placed and what role it will assume.
Let’s assume a scenario in which we want to have 4 replicas of every partition (RF=4), two per DC. With min.isr = 2, a write is considered successful once two ISR members acknowledge it— even if both acknowledgments come from brokers in the same DC.
To ensure data is acknowledged across both DCs, the topic can be configured as follows:
The replica placement policy looks as follows:
{
"version": 2,
"replicas": [
{
"count": 1,
"constraints": {
"rack": "dc1"
}
},
{
"count": 1,
"constraints": {
"rack": "dc2"
}
}
],
"observers": [
{
"count": 1,
"constraints": {
"rack": "dc1"
}
},
{
"count": 1,
"constraints": {
"rack": "dc2"
}
}
],
"observerPromotionPolicy":"under-min-isr"
}
If the DC hosting the leader fails, Automatic Observer Promotion (AOP) can promote a caught-up observer to the ISR and elect it as a new leader. Promotion timing depends on replication lag and network health; it’s not bound to a fixed duration. During this transition, producers with acks=all cannot send new messages.
This configuration is also sensitive to overlapping failures (for example, one DC outage combined with a broker restart in the surviving DC).
This scenario assumes the same replication factor as Scenario 1, but min.isr=3. With this config, all replicas are synchronous – no observers are used.
The replica placement policy looks as follows:
{
"version": 2,
"replicas": [
{
"count": 2,
"constraints":...
]]>
If you have ever gone beyond running Kafka on your local machine and moved your ideas to real-world production projects utilizing Kafka, you probably know the drill - Kafka clusters are complex, and the problems you can encounter within your Kafka infrastructure have endless possibilities.
There are multiple aspects to this, starting from your setup (which is complex by itself and you need to understand many things about Kafka to make it right and secure) and ending in day-to-day operations, where a new set of problems arise and you need to get to the PhD level of expertise to understand what is going on and how that can affect your business.
When investigating your Kafka cluster, you often need to gather data from multiple sources, check out the configuration, look into the logs, and check the JMX or Prometheus metrics to get started.
Many of the problems you can usually find on existing Kafka clusters are repeated and are common errors made by DevOps or other people on your development team who set the cluster up at the beginning of its life. It’s been running fine for ages until the time you need to investigate because something is off with your data.
At Softwaremill, we are trying to gather that tribal knowledge into a standard set of problems that can be automatically discovered on any Kafka cluster. For this exact purpose, we have created a new open-source project called KCPilot.
KCPilot is a CLI tool that helps you gather all the data and analyze it to identify common problems that can occur in your Kafka cluster. It utilizes LLM to execute each analysis task and is easily extensible, allowing us to add new tasks and solve more problems in the future.
KCPilot is part of SoftwareMill’s Innovation Hub and is currently in the MVP stage. While functional, it may contain bugs and has significant room for improvement. We welcome feedback and contributions.Treat this project as a base, invent your own tasks, and share them with the world by opening a pull request.
The tool is written in Rust, and is a simple CLI tool that you can run locally or on the bastion server. All you need to provide (at least for now) is the ssh access to your workers (or also to your bastion server if you run it locally and want to inspect the production cluster).
KCPilot scans your workers for configuration files, logs, and metrics, and stores them for analysis. Once the data is gathered, you can analyze it and produce a report showing you all the good and bad aspects of your current cluster setup. Currently, the tool has 17 built-in checks, but can be easily extended with new checks (tasks) which are created in yaml and are translated into LLM prompts with provided context from data gathered earlier.
This article is for Rust developers interested in building infrastructure tools (especially CLI-based) as well as for Kafka administrators and all...
]]>With the just-released Java 25, we have the opportunity to test the fifth preview of structured concurrency APIs. Turns out, creating such an API is not that easy!
The API allows you to create forks (virtual threads) within scopes, that run your tasks. Certain constraints are imposed, aiming to provide a safe and comprehensible way to work with concurrent, I/O-bound tasks in Java applications.
The right set of constraints is often the root of the most powerful features. However, from my attempts in porting Jox, a virtual-thread-native reactive-streaming-like library, from Java 21 to Java 25, I feel that the constraints currently in place are not always the best ones. That's why I'd like to offer my critique of the current design.
Structured concurrency is an approach where the syntactical structure of the code determines the lifetime of the threads created within that code.
Structured concurrency is well-introduced in the JEP itself, as well as in articles by Martin Sustrik, who coined the term, and Nathaniel J. Smith, who popularized it in Python. Structured concurrency is also widely used in Kotlin. Hopefully, these materials are sufficient to explain the concept; hence, I won't provide a detailed introduction here.
Let me provide a simple example. Say we want to run two methods concurrently: findUser() and fetchOrder(), combining their results when successful. We also want to short-circuit on failure: interrupt the other task if one fails. Finally, we want a guarantee that the scope within which we create the tasks will only finish once all threads have terminated (successfully, with an exception, or due to interruption).
These are precisely the guarantees and features that StructuredTaskScope from the JEP provides:
Response handle() throws InterruptedException {
try (var scope = StructuredTaskScope.open()) {
Subtask<String> user = scope.fork(() -> findUser());
Subtask<Integer> order = scope.fork(() -> fetchOrder());
scope.join();
return new Response(user.get(), order.get());
}
}
Before we dive into the weaker points of the specification, I'd like to be clear that I don't think that the current JEP is all bad; quite the opposite.
First and foremost, it properly implements the main idea of structured concurrency: using StructuredTaskScope, you have the guarantee that the scope won't complete until all forks (virtual threads) created within have completed. No thread leaks, and no cancellation delays.
Moreover, the current design is consistent with other Java features, as it uses the try-with-resources pattern. We have a clearly delineated region of code, where given resources are being used (virtual threads created to run forks), with a guarantee that the scope is always closed (properly cleaning up if needed).
Finally, using concurrency scopes has minimal overhead (or none at all), compared to unstructured variants of the code (e.g., relying on ExecutorServices and Futures).
But of course, it's always more interesting to discuss the other side: what might be the problems, then?
JEP 505 works great when the scope's main body doesn't contain much logic and...
]]>You're on your way to a family vacation in Italy, driving from Poland, and it's a long drive of over 1500 km, so you decide to make a stopover in Austria. After an entire day behind the wheel, you arrive at the hotel, eager to stretch your legs, give the kids an occasion to run around, and explore what the Austrian city has to offer.

You enter the hotel, happy that the driving for today is over, and walk over to the front desk. But here the problems start! The Internet connection is down, and they can't check you in. "But," you say, "I've got everything in here, the booking, the confirmation, can't you just give me a key to my room and complete the check-in later?". "No," they reply, "checking in through the System is the only way to get a key". The System is, of course, an ordinary in-browser web app.
They ask you to wait. You do so; 10 minutes pass, 20 minutes—the kids are starting to ruin the lobby—finally, 30 minutes later, the connection is back online, and you check in successfully. On one hand, you might say, it's only 30 minutes. On the other hand, you've started programming at around age 12, and you can't help but wonder—are we doomed to live with such technology? Or can we do better?
Of course we can! It's just poor (but probably cheap) system design that provides "sub-optimal"—to put it mildly—user experience in case of network problems.
I've been a fan of the local-first approach for a long time. It's full of good ideas which feel "right": you own your data; a central server is a convenience, not a necessity; the application is resilient to disruptions. Hence, I thought—maybe that's the solution to the hotel check-in problem?
When driving the next day with plenty of time to think, a full-blown local-first app for this type of problem would probably be too much. The tools and techniques are available, such as CRDTs and automerge, but they excel in areas like collaborative editing and mobile apps with intermittent connectivity.
But we can take some of the ideas and apply them to create a "local-second" solution. By default, the system would operate in a traditional, centralized client-server fashion. That is, a completely standard webapp+HTTP API.
However, when there are network problems, we would degrade into an offline mode, which offers limited functionality. The crucial services are still available—and data is being buffered until the connection is restored.
In our case, such crucial functionality is checking guests in, so that they can rest after a long trip; other operations might be unavailable.
Armed with AI coding assistants (Claude Code, if you're curious), I've created a PoC application, which implements a skeleton of such a local-second webapp and backend. But before diving into the code, there are a couple of choices to be made regarding the technologies used and the architecture itself.

Unlike in a traditional web app,...
]]>Each half-year, we get a new, fresh, and yummy version of Java. Now we get Java 25, and this article will discuss JEP-507, which introduces primitive types in patterns, instanceof, and switch. Remember, this is a preview feature.
The title of this JEP seems to explain everything, but it’s worth looking into this deeper.
Note: The original version of this article was published in September 2024. I updated it in September 2025 to coincide with the release of Java 25, reflecting the latest changes and improvements introduced in the new version.
Let’s look into the world before this JEP. Whenever we wanted to work with switch or instancesof, we were quite limited in this fashion. For the switch, we couldn't use all of the boxed types.
Whenever you would want to write that code,
Double d = 20.0d;
switch(d){
case 20.0d -> log("double is 20");
default -> log("wrong number" + d);
}
You will receive the compilation error:
java: constant label of type double is not compatible with switch selector type java.lang.Double
That’s because there were limitations for some types. In JLS, you can read that:
“The Expression is called the selector expression. The type of the selector expression must be char, byte, short, int, or a reference type, or a compile-time error occurs.”
From this quote, we can understand that int type is ok to use in switch. That’s correct, you can create code like this:
int v = 20;
switch(v){
case 20 -> log("int is 20");
default -> log("wrong number" + v);
}
And everything will work correctly. As you may know or read from our blogpost, switch in Java 21 got a lot of upgrades, you can do something like:
Integer v = 20;
switch(v){
case Integer i when i < 20-> log("This value is too low: " + i );
case Integer i -> log("This value is perfect: " + i );
}
Looks great right? We could try something like that for primitive int.
int v = 20;
switch(v){
case int i when i < 20 -> log("This value is too low: " + i );
case int i -> log("This value is perfect: " + i );
}
We run this code, and get a compilation error:
java: unexpected type
required: class or array
found: int
Why is that? Before JEP-507, Java had limited support for switch patterns.
Switch Patterns like that were possible for Record Patterns.
record Value(int i){}
void test(){
Value v = new Value(20);
switch (v) {
case Value(int i) when i < 20 -> log("This value is too low: " + i );
case Value(int i) -> log("This value is perfect: " + i );
}
}
Work correctly. Instanceof did not work with any of the primitive types.
Long, Float, Double, and Boolean are now allowed for a switch. So, our preview example with Double works.
Instanceof also changed their behavior. It’s possible to check...
]]>The work on JEP-505 resulted in the introduction of structured concurrency in Java 25. In this article, updated in September 2025, we will delve into the concept of structured concurrency, compare it with the current API, and explore the problems it solves.
Join us on a journey through decades of Java evolution
Structured concurrency means that all subtasks are bound to the scope of their parent task and cannot outlive it, just like a method call cannot last longer than the method that invoked it.
It makes it much easier to reason about concurrent code: no stray background threads continue running after the parent has finished, error handling is more predictable since failures can be managed in one place, and the overall system is simpler to observe and debug.
Check:
In practice, it prevents the typical pitfalls of uncontrolled concurrency, like thread leaks or tasks lingering indefinitely, often leading to fragile and hard-to-maintain systems.
In Java, structured concurrency can be used with virtual threads from Project Loom. Because virtual threads are extremely lightweight, creating new tasks is almost free, making this approach both practical and efficient even in highly concurrent applications.
To illustrate the issues with the current API, let’s consider a simple example: generating an invoice. We must fetch the issuer, the customer, and the list of items. Doing this sequentially on a single thread works, but it’s slow. A better approach would be to fetch the data in parallel.
Let’s first see how this looks with ExecutorService.
Invoice generateInvoice() throws ExecutionException, InterruptedException {
try (final var executor = Executors.newFixedThreadPool(3)) {
Future<Issuer> issuer = executor.submit(this::findIssuer);
Future<Customer> customer = executor.submit(this::findCustomer);
Future<List<Item>> items = executor.submit(this::findItems);
return new Invoice(issuer.get(), customer.get(), items.get());
}
}
This approach brings with it a couple of problems:
findCustomer takes longer to execute than findItems, and findItems throws an exception,customer.get(), until the thread executing findCustomer ends. As a result, we will waste resources waiting for the subtask to finish.generateInvoice() is interrupted, the subtasks in the executor are not interrupted and get leaked.Invoice generateInvoice() throws InterruptedException {
try (var scope = StructuredTaskScope.open(
StructuredTaskScope.Joiner.allSuccessfulOrThrow() // (1)
)) {
Subtask<Issuer> issuer = scope.fork(this::findIssuer);
Subtask<Customer> customer = scope.fork(this::findCustomer);
Subtask<List<Item>> items = scope.fork(this::findItems);
scope.join(); // (2)
return new Invoice(issuer.get(), customer.get(), items.get());
}
}
StructuredTaskScope using the factory method open() with the policy allSuccessfulOrThrow().findIssuer, findCustomer, or findItems, the other subtasks will be canceled and the scope fails.The code above is free of the problems mentioned earlier:
findCustomer takes longer to execute than findItems, but findItems throws...In this article, we will look at JEP 502 - Stable Values. It’s a new feature that will appear with Java 25, our new LTS, planned to be released in September. It is going to be introduced as a first preview feature, which means that everything can be changed.
A StableValue <T> is a container that holds a single value of type T. Once assigned, that value becomes immutable. You can think of it as an eventually final value.
It’s vital to notice that a reference to an object will be immutable. The object beneath can be changed.
Before Java 25, to achieve immutable data, we had to use the final keyword.
class Controller {
private final EmailSender sender = new EmailSender();
}
There were some problems with this approach.
Whenever our code has a final field, it has to be set eagerly, either initialized by a constructor or by having a static field. Because of that, the application's startup may suffer. Not all fields are immediately needed, right?
So maybe we could remove the final keyword and do something like that:
class PetClinicController {
private EmailSender sender = null;
EmailSender getSender() {
if (sender == null) {
sender = new EmailSender();
}
return sender;
}
void adoptPet(User user, Pet pet) {
// some logic here
getSender().sendEmailTo(user, "You are great person!");
}
}
It’s going to work, right? The startup will be faster. And yes, that's one way to solve this issue before Java 25. Unfortunately, it's not free.
Using our example from before, let's migrate to the new Java.
class PetClinicController {
private final StableValue<EmailSender> sender = StableValue.of();
EmailSender getSender() {
return sender.orElseSet(() -> new EmailSender());
}
void adoptPet(User user, Pet pet) {
// some logic here
getSender().sendEmailTo(user, "You are great person!");
}
}
The code is very similar, but StableValue handles the nullability of EmailSender. Thanks to that, it’s impossible to call EmailSender without first calling a method to retrieve the value.
The value inside a StableValue is guaranteed to be set in a thread-safe way.
Stable Values provide the foundation for higher-level functional abstractions. Right now, we have three types of stable functions.
It is a function that will be computed only once, and the result of this will be stored and returned. For example, it can be used in our Controller.
class PetClinicController {
private Supplier<EmailSender> sender = StableValue.supplier(()-> new EmailSender());
void adoptPet(User user, Pet pet)...
]]>
We replaced interactions for two separate Google services with a single, small LLM-powered microservice using Gemini AI (also from Google). The result? Costs were cut by over 90% and improved flexibility with just a couple of days work.
During a routine GCP billing account check for our client, I paid special attention to some out-of-the-box APIs provided by Google, which we use for language translations (Cloud Translation API) and sentiment analysis (Cloud Natural Language API). Because of our latest changes in some of the business logic, as well as some upcoming features we planned to implement over the summer, it was expected that the cost of those services would increase heavily, and I needed to estimate by how much to better prepare for the product discussions with the so-called ‘business’.
We originally used two Google APIs in a simple workflow: translate user text to English and then run sentiment checks. The translation step existed only because the sentiment API didn’t support the text’s original language. If the text looked suspicious (offensive, profane, or unnaturally polished), we flagged it for a human review team.
Looking ahead, we planned to use the Translation API in many more ways, and the number of interactions with those services would grow exponentially. We wanted to enhance the UX with translations into 12 languages and keep sentiment analysis. That meant calling the API thousands of times a day just to get the translations (we cache results and only refresh when the source text changes).
While analysing current and future call volumes, we realized a single LLM-based service could replace both APIs. I needed to do a quick Proof-of-Concept (PoC) and check how much that would be if we tried to get the same functionality from simple LLM calls through the API.
Business requirements for our new PoC microservice were simple:
The architecture before migration was trivial. In one of our microservices, we received a message containing the text to be checked, executed the call to the Translation Google API to get the text in English, and then called the Cloud Natural Language API twice to get different results for different sentiment analyses we were interested in.

With the new features, we needed to translate text into 11 languages, detect the source language when unknown, and continue running sentiment analysis.
That means we needed to multiply the number of requests to the API we already had, multiple times, but that was not all. The new features we were introducing over the summer incorporated changes which were causing many more (hundreds of times more) pieces of text to be pre-translated. Overall, API usage was set to grow by hundreds-fold in the near future.
Before estimating how much we will pay for the services with the current setup or how much that will be if we use LLM, we could do a simple test:...
]]>Project Leyden is an ongoing initiative aimed at reducing the startup and warmup time of the Java Virtual Machine. The first completed JEP, identified as JEP 483, was introduced in JDK 24. The upcoming release of JDK 25 will include two additional JEPs:
In this blog post, I would like to describe both of these enhancements.
At the beginning, a few words of explanation for those who have already read my previous blog post: How to Improve JVM-Based Application Startup Time. I wrote there that to use Project Leyden, all you need to do is just include this single option in your application run command:
-XX:CacheDataStore=myapp.aot
However, this was true only for the early access build of JDK 24 (build 24-leydenpremain+2-8). Starting with the official release of JDK 24, this has changed.
In JDK 24, there is no such flag as CacheDataStore. Instead, we have a few others:
Usage of Project Leyden was split into three parts:
java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -jar my-app.jar
java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -jar my-app.jar
java -XX:AOTCache=app.aot -jar my-app.jar
JEP 514 has simplified the process. Instead of executing three separate commands, we now only need to run two. The first command handles both recording the AOT configuration and creating the cache. The second command remains unchanged and is used to run the application.
It is possible, since a new command-line option appeared in JDK 25: AOTCacheOutput. When used alone, it effectively splits a single Java launcher invocation into two sub-invocations. The command is structured as follows:
java -XX:AOTCacheOutput=app.aot -jar my-app.jar
There are two separate commands processed under the hood. The first command uses AOTMode=record, which creates the AOT configuration. The second command uses AOTMode=create, which utilizes the configuration generated by the first command to create an AOT cache in a file specified by the AOTCacheOutput option. This process is detailed in the issue related to this JEP:
"It is much easier to explain one command splitting into two sub-commands running under two pre-existing modes, than to explain the interactions of a new combined mode with all other VM features. That is why we choose not to invent a new mode such as record+create."
One of the benefits of this approach is that the AOT configuration file is created automatically and deleted after the cache is generated. In the previous approach, this process had to be done manually. In that case, the name of the AOT configuration file is created by adding the ‘.config’ suffix to the file name specified by the AOTCacheOutput option. For example, if the original file is named app.aot the configuration file will be named app.aot.config.
If someone wants...
]]>In Part 1 of this series, we looked at why accessibility matters and how the European Accessibility Act is reshaping digital standards across the EU. Now it's time to get practical.
In this second part, we dive into hands-on accessibility best practices for front-end developers. This guide is packed with actionable tips that will help you build interfaces that are inclusive, compliant, and simply better for everyone.
If you haven’t read Part 1 yet, start here: Why It Matters, Who It Helps, and What Changes Under the European Accessibility Act.
Adapting a website or web application to EAA is not just about audits and documentation – it is primarily about the daily decisions made by the people who write the code. Below are practical tips that every frontend developer should know and apply when working on accessibility.
The basis of the accessible front-end is the use of appropriate HTML tags. Elements such as <header>, <nav>, <main>, <section>, <aside>, <button>, or <label> not only organize the code but also enable assistive technologies (e.g., screen readers) to interpret the structure and function of the page correctly.
<!-- ❌ Incorrect -->
<div onclick="submitForm()">Save</div>
<!-- ✅ Correct -->
<button type="submit">Save</button>
Another example would be a website navigation:
<!-- ❌ Difficult to scan an HTML document, poor accessibility-->
<div class="nav">
<div class="nav-links">
<p><a href="#">Home Page</a></p>
<p><a href="#">Articles</a></p>
<p><a href="#">About</a></p>
</div>
</div>
<!-- ✅ With semantic HTML elements -->
<nav>
<ul>
<li><a href="#">Home Page</a></li>
<li><a href="#">Articles</a></li>
<li><a href="#">About</a></li>
</ul>
</nav>
Headings (<h1> to <h6>) play a key role in the structure of an HTML document, not only from an SEO perspective, but above all in terms of accessibility. Headings enable people using screen readers to quickly orient themselves in the page layout and navigate its content. Incorrect use of headings disrupts the document's semantics and makes navigation difficult.
It’s like a book without a table of contents – difficult to search, chaotic, and unreadable. The same applies to websites: assistive technologies use headings to outline the page structure, allowing users to quickly “scan” the content.
Key principles:
<!-- ❌ Incorrect structure – hierarchy omitted -->
<h1>Company blog</h1>
<h3>Product news</h3>
<h4>Version 2.0 – what's new?</h4>
<h2>Guides</h2>
<!-- ✅ Correct structure -->
<h1>Company blog</h1>
<h2>Product news</h2>
<h3>ersion 2.0 – what's new?</h3>
<h2>Guides</h2>
<h3>How to implement accessibility step by step?</h3>
The correct...
]]>Digital accessibility is no longer just a "nice to have" — it's becoming a legal requirement across the European Union. In this first piece of our two-part series, we explore what digital accessibility means, why it matters (not just ethically but also legally and strategically), and what changes will affect businesses and institutions after the European Accessibility Act (EAA) comes into force.
You’ll learn who benefits from accessible design, what obligations are involved, and how investing in accessibility leads to better digital products for everyone.
If you're a developer looking for practical techniques and implementation tips, head over to Part 2 of this series: Practical Essentials for Front-End Developers.
On June 28, 2025, the European Accessibility Act (EAA) entered into force. This legislation is expected to have a significant impact on the way digital products are designed and created in the European Union. Companies and public institutions will be required to adapt their services, including digital ones, to new accessibility standards.
For businesses operating in the EU, this means ensuring that their physical and digital products meet the EAA requirements. Even non-EU companies selling their services or products within the EU must comply with these regulations if they wish to continue operating in this market. Only micro-enterprises (companies with fewer than 10 employees and annual revenues of less than $2 million) are exempt from this obligation.
It is worth remembering two critical deadlines that determine the pace of implementation of the regulations:
Failure to comply with the requirements of the European Accessibility Act can lead to serious consequences, from financial penalties ranging from several thousand to even several hundred thousand euros (depending on the Member State) to product sales bans within the EU and loss of reputation and customers to more accessible competitors.
For front-end developers, UX/UI designers, business owners, and app developers, this is a clear signal that accessibility must be taken seriously not just as best practice but also as a legal requirement.
This is a good time to examine web accessibility, why it's so important, and what can be done to make websites and web applications accessible to the widest possible range of users, including people with disabilities, seniors, and users with limited access to technology.
The European Accessibility Act covers many products and services that impact users' daily lives. These regulations include:
The world of embedded systems has traditionally been seen as distinct from mainstream software development, often characterized by manual processes, intricate hardware dependencies, and lengthy release cycles. However, as embedded systems become increasingly complex and connected, the principles of DevOps, which emphasize automation, collaboration, and continuous delivery, are proving to be not just beneficial but essential.
This article expands upon the concepts introduced in my online presentation for the Embedded Israel meetup, also available recorded.
The core promise of DevOps - faster feedback loops, automated pipelines, and increased reliability - is even more critical in embedded development. Unlike web or mobile applications, embedded systems directly interact with physical hardware, introducing unique complexities. Manual firmware flashing, plugging and unplugging cables, and delayed testing on physical boards can significantly slow down development.
By embracing DevOps, we can automate many of these historically manual steps. It means:
While the benefits are clear, embedded DevOps comes with its own set of challenges to address:
One of the first critical decisions in an embedded workflow is how to manage the build environment.

Native builds are straightforward for small projects, but can lead to "it works on my machine" problems due to environment drift. Virtualized builds, especially with Docker, provide a consistent and reproducible environment by locking in compiler versions, dependencies, and build scripts. This stability is crucial for CI/CD and collaborative development.
Since embedded devices often lack the resources for on-device compilation, cross-compilation is fundamental. We compile code on a more powerful host machine for the target embedded architecture. Many vendors provide official toolchains, such as Espressif's ESP-IDF for ESP32 or ARM compilers for STM32 via Buildroot or Yocto, making this process more streamlined.
Static linking is another common practice, where all necessary libraries are linked directly into the final binary. It simplifies deployment by eliminating missing library issues, but can result in larger binaries.
QEMU acts as a virtualizer, allowing developers to emulate embedded systems (like a Raspberry Pi or STM32)...
]]>More and more companies are beginning to consider machine learning – whether to increase process automation, improve user experience, or keep up with the competition. An idea appears – sometimes inspired by a conference, a conversation with the tech team, or a signal from the market that "others already have something with AI, and we don’t yet."
At that point, there’s often a temptation to jump straight into execution: hire a team, pick a tool, start training something – just to "see how it goes." From my perspective, as someone who has been working on machine learning projects for years, that’s a straight path to frustration and a burned budget.
Machine learning is not a plug-and-play technology. Its effectiveness doesn’t come down to the model alone but to the entire environment: the data, processes, integrations, and, above all, the correct problem definition. That’s why a machine learning workshop is not a formality but a critical project phase. As a proof, take a look at our success story from one of our projects, which started as an ML workshop.
The workshop is the moment when expectations meet reality:
This blog post explains how we ran a tailored machine learning workshop for a client building a data extraction platform for accounting professionals.
In the first part, you'll learn about the client's business context: who the client is, what problem they wanted to solve, and why they reached out to us. Next, I describe how we structured and prepared the workshop: from designing the template to drafting a realistic project timeline.
We then show how the workshop unfolded in practice: what data we reviewed, which technical and business constraints surfaced, and what insights we gathered about the users and platform. Finally, we explain how this input shaped the final proposal, including cost estimates, success metrics, and a deployment-ready roadmap.
If you're wondering what kind of value a well-run ML workshop can bring, and how it directly translates into smarter investment decisions, this article gives you a complete, real-world example.
It is a long read, so grab a coffee and set aside about 15 minutes. It’s a small investment but definitely worth your time.
We were approached by a company developing a platform for accounting professionals. Its core value lies in automating the process of extracting data from invoices and feeding it into ERP systems or finance management platforms.
The initial contact came through a team...
]]>