AI Bug Hunting with Semgrep

Have you ever wondered how experts find secret “trapdoors” in the apps you use every day? It is not just about typing fast like in the movies. Today, we are going to look at how modern tools, especially Artificial Intelligence, help security researchers find dangerous flaws in famous software like Google’s Gemini and VirtualBox.

Today’s technology allows us to analyze millions of lines of code in seconds, which was impossible just a decade ago. Imagine having a digital detective that never sleeps, pointing out exactly where a programmer made a mistake. In this tutorial, we will explore two major security flaws discovered by Vladimir Tokarev and learn how AI played a starring role in the process.

Let us begin with the first discovery: a command injection vulnerability in the Gemini Command Line Interface (CLI). To understand this, you must first understand what a command line is. It is a way to talk to your computer by typing text instead of clicking icons. In the Gemini CLI, there is a feature called ide install which allows users to add extra plugins, known as .vsix files.

The problem occurs because of how the computer processes the name of these files. When the researcher ran the tool in “IDE mode,” he found that the software takes the filename and joins it directly with a system command. In the world of coding, we call this “string concatenation.” For example, if the software wants to run install [filename], an attacker could name their file ; open -a Calculator. Because the programmer did not “sanitize” the input—which means checking it for tricks—the computer sees the semicolon and thinks it is supposed to start a brand-new task. Suddenly, instead of just installing a plugin, the computer opens the calculator! While opening a calculator is harmless, a real hacker could use this to delete files or steal your private data.

Moving on to a more complex topic, let us discuss the VirtualBox Integer Overflow. VirtualBox is a “Virtual Machine” (VM), which is essentially a computer running inside another computer. The guest (the inside computer) should never be able to mess with the host ( your actual computer). However, Vladimir found a way to “escape” this digital cage.

The flaw is located in a specific part of the graphics system called the VMSVGA-SVGA device. Inside the code, there is a function named vmsvga3dRectCopy. Its job is to move rectangles of pixels around the screen, similar to when you drag a folder across your desktop. To do this, the computer must calculate where the pixels are in the memory buffer.

Here is where the math gets tricky for the computer. The software uses a “32-bit unsigned integer” to check if the movement stays within the allowed memory area. However, the actual address where the data is stored uses a “64-bit” calculation. Think of it like this: your math teacher gives you a small ruler (32-bit) to measure a very long hallway (64-bit). If the measurement is too big for the small ruler, the number “overflows” and resets to a small number. Because the safety check sees a small number, it thinks everything is fine, even though the data is actually being written outside the safe boundaries of the buffer. This “Out-of-Bounds” error allows a researcher to read and write directly into the memory of the main computer, potentially taking total control.

Now, how did AI help find these? Usually, researchers use a tool called Semgrep to scan code. When Vladimir scanned the Gemini CLI, Semgrep gave him 6,000 warnings. No human has the time to check 6,000 things manually! This is where AI models like Claude Opus or GPT-4 come in. Vladimir used the AI to act as a filter. He instructed the AI to look at the 6,000 notifications and remove the ones that were clearly not real problems. The AI narrowed it down to just 12 interesting leads. Out of those 12, the researcher found the real vulnerabilities we just discussed.

However, you must remember that AI is not a magic button. It still makes mistakes, which we call “hallucinations.” Sometimes, the AI will swear a piece of code is broken when it is actually perfectly fine. A researcher must still use their intuition and “depth reasoning” to verify the findings. AI is excellent at “shallow tasks”—like checking a single function for a basic error—but it struggles with “deep tasks,” where you have to understand how ten different parts of a huge program work together over many steps.

In conclusion, the combination of traditional security tools and modern AI is making software safer than ever before. While attackers can use AI to build exploits faster, defenders use it to find and patch holes before they can be used for harm. If you are interested in becoming a security researcher, now is the best time to start. You have access to the same powerful AI assistants that the pros use. I recommend that you start by reading “write-ups” or blog posts about “CVEs” (Common Vulnerabilities and Exposures) that have already been fixed. This will help you understand the patterns of mistakes that programmers make. Keep practicing your coding skills, stay curious, and always remember to use your powers for good to help make the digital world a safer place for everyone!