Find Vulnerabilities in Your Code Quickly & Cheaply

Ricardo Castillo
Mar 1, 2022
5 min read

In a recent survey by Forrester Research, 42% of organizations that had experienced a Cybersecurity breach blamed the incident on a software security bug. Similarly, when software analysis firm, CAST, reviewed 278 million lines of code from 1,380 software developers, they discovered over 1.3 million vulnerabilities caused by errors and sloppy code. The problem is real.

There are an estimated 27 million software developers around the world. How many do you think have been trained on writing secure code? Yep, you guessed it, very few. In a world tethered to software and the internet, where war = cyberwar, the security of the code we create is vastly more significant than many of us have imagined.

This is why the technique known as Static Analysis is very important for all software companies, and why the tool, Semgrep, can be a key component of your security toolbelt. Let's explore both quickly.

What is Static Analysis?

Static code analysis is an automated method of scrutinizing the code behind a software application before that application is deployed. It involves comparing the structure of software code against known vulnerable patterns to detect flaws.

Static analysis is not the holy grail, but it complements the capabilities of any software developer excellently. It's an integral part of the Secure Software Development Lifecycle, that Resilient deploys with our startup packages and established software business packages.

Let's Talk. I'm interested in Static Analysis.

What is Semgrep?

Semgrep is a light, fast, Free and open-source static analysis tool for finding software bugs and enforcing coding standards. It provides value at 3 levels:

Editor: That is, as your developers write code on their local machine.
Commit: At the instant, your developers attempt to commit their local code to the central machine.
Continuous Integration/ Deployment: This refers to your ability to enable various periodic or situational scans of all or parts of your codebase using Semgrep.

Let's Talk. I'm interested in Semgrep.

... OK! Time to Nerd-Out! 🤓 😊

How does Semgrep Work?

Semgrep is able to analyze different coding languages because of how it runs. Semgrep does not scan your source codes in plaintext or just by specific language. It scans on a very specific method which allows Semgrep to parse lines of codes into a branch of functions and its corresponding languages before it analyzes the source code and matches it with a ruleset. It can freely scan different languages inside an application codebase with ease and still give you digestible results for your developers to understand.

Another important factor of Semgrep that differentiates it from other static code analysis tools is its flexibility. With the option to create your own rulesets, you can narrow it down and tailor it specifically to how your application works and how your developers approach your program.

This helps in finding positive results and lessening the number of false positives that usually come from generic pattern formats and generic rulesets. Having your own rulesets allows the tool to come up with more comprehensible and realistic results that are specifically groomed on how your application works.

How to Use Semgrep

We need to prep our operating system in order to install semgrep on our machine. For this example, I am using a Linux Distro called Debian. Any linux distro can be installed with semgrep. You would just need to have python3 package with pip installed on your machine.

The next step is installing semgrep using this line of code in your CLI:

$ python3 -m pip install semgrep

Make sure you have your $PATH set properly in order to use semgrep on your current machine and directory.

You can check if you successfully installed semgrep by using this command:

$ semgrep --help

It should output the CLI reference and manual of semgrep

After successfully installing semgrep, you can now proceed to run semgrep on your target source code. The command will go like this:

$ semgrep --config= < ruleset > < target directory/target source code >

This command would run the ruleset specified on the config syntax on the target source code.

At the end of the scan, it would output the results. You can add syntax like -v for a verbose scan so you can see the results in real-time and if it encounters an error. You can also use -o < results.txt > for you to output your results into a text file.

Vulnerability Hunting with Semgrep

I used OWASP WebGoat as a test target. WebGoat is a deliberately insecure web application maintained by OWASP which is designed to teach web application security lessons. This program is a demonstration of common server-side application flaws. The exercises are intended to be used by people to learn about application security and penetration testing techniques.

I executed this command on my CLI:

This command executes the ruleset found on semgrep registry (Semgrep) which is a total of 1848 rules to our target source (/home/ric/WebGoat-develop). Another part of this command is the output syntax which tells semgrep to save the results on results.txt file.

It will start the scan and you should see the progress bar just like in the image below.

After the scan is done, it will display errors (rules not applicable to specific paths or directories). In this case, it is a timeout error that can be fixed by adjusting the timeout threshold settings but would make the scan longer.

This also shows the summary of your scan and in this example, semgrep ran 1848 rules on 1006 files on OWASP WebGoat and resulted to find 3753 vulnerabilities.

Some of the notable results are listed down below:

Direct parameter passed that will lead to SSRF. This can be fixed by adding input validation on the line of code.

     97┆     private static void downloadFileFromURL(String urlString, File destination) throws Exception {  
     98┆         if (System.getenv("MVNW_USERNAME") != null && System.getenv("MVNW_PASSWORD") != null) {  
     99┆             String username = System.getenv("MVNW_USERNAME");  
    100┆             char[] password = System.getenv("MVNW_PASSWORD").toCharArray();  
    101┆             Authenticator.setDefault(new Authenticator() {  
    102┆                 @Override  
    103┆                 protected PasswordAuthentication getPasswordAuthentication() {  
    104┆                     return new PasswordAuthentication(username, password);  
    105┆                 }  
    106┆             });

The App logs sensitive information. Please ensure that sensitive information is never logged.

     51┆         System.out.println("- Using base directory: " + baseDirectory.getAbsolutePath());  
      ⋮┆----------------------------------------
     76┆         System.out.println("- Downloading from: " + url);  
      ⋮┆----------------------------------------
     81┆                 System.out.println(  
     82┆                         "- ERROR creating output directory '" + outputFile.getParentFile().getAbsolutePath() + "'");
      ⋮┆----------------------------------------
     85┆         System.out.println("- Downloading to: " + outputFile.getAbsolutePath());

Hardcoded secrets are found in the software code. This is a bad coding practice. On GET requests or just scanning elements on a web app, this would leak critical data that may be used by a malicious actor. The example below shows a hardcoded username and a hardcoded key.

     23┆         var username = "unknown";

     52┆             this.feedbackResourceBundleKey = "lesson.completed";

SQL Injection. A common mistake in coding is leaving functions with user input unsanitized leading to SQL Injections.

     28┆              targetConnection.createStatement().execute("SET SCHEMA \"" + user.getUsername() + "\"");

These are just a few of the notable results found on OWASP Webgoat. Many other vulnerabilities are found like missing headers, missing CSRF token protection, and vulnerable function that can be exploited by user input through a bracket placeholder. Not all findings are true positives that point out that a specific line of code are vulnerable.

In some cases, some patterns on the default ruleset of semgrep may trigger false positives. I think this would open the door to making your own customized ruleset in order to narrow down specific and common mistakes your development team makes. This will lessen the false positive on scans and make secure code practice more efficient and enforced.

Let's Talk. I need help optimizing static analysis.

Other Resources

In this first blog, we didn't get to dive into the CI/CD functionality of semgrep or the centralized dashboard for managing bugs and enforcing standards. But you can check these semgrep resources: