Clam AntiVirus ClamAV, a GPL anti-virus toolkit for UNIX 2014-03-28T20:57:00Z WordPress <![CDATA[ClamAV 0.95 Engine End of Life Announcement]]> 2014-03-28T20:57:00Z 2014-03-28T20:57:00Z ClamAV Community,

This notice is to inform you that effective immediately ClamAV 0.95 (and all minor versions) is no longer supported in accordance with ClamAV’s EOL policy which can be found here:

While the current CVD’s being distributed will still work on ClamAV 0.95, and we are not enabling the functionality to actually make those versions be able to update, this does serve as notice that we are no longer going to be testing against that version in our regression tests.

We will also be EOL’ing 0.96 in coming months, so if either of those versions is currently in use, it is highly suggested that you upgrade to the most current version.

Thank you for using ClamAV!

<![CDATA[Open Source Community Webinar]]> 2014-03-09T04:20:02Z 2014-03-09T04:20:02Z ClamAV community,

First off, we’d like to thank everyone for their continued use of our projects and products here at Sourcefire, now a part of Cisco.  We love making great software, and we love for you to use it and contribute back.  It’s been a great transition so far into the Cisco community, and recently, we held an Open Source Community Meeting at RSA, and we’d like to provide the content out to our Open Source user base as well.

The best way for us to do this is through a Webinar where we can present the current state of our projects, the future of the projects, how the projects are continuing to move forward inside of Cisco and of course, make ourselves available for Questions and Answers.
We are planning to hold the WebinarThursday, March 13, 201412:00 PM EST

Register Now for the webinar. We look forward to seeing you and hearing from you then!

<![CDATA[Programmatic Boolean Simplification and ClamAV Signatures]]> 2014-03-07T04:20:03Z 2014-03-07T04:20:03Z 0 <![CDATA[Introducing OpenSSL as a dependency to ClamAV]]> 2014-02-23T04:20:02Z 2014-02-23T04:20:02Z In an upcoming release, we are planning on introducing OpenSSL as a dependency to ClamAV.  We wanted to get this out to the community for any feedback that could be provided in order for everyone to understand why we are doing it.  So first, I’ll cover a few reasons we are planning to introduce it, then outline some Pros and Cons:

  1. Performance. OpenSSL has code optimized for many platforms. In several tests that we’ve performed, we’ve averaged a 70% increase in performance.
  2. OpenSSL’s code has had a lot of eyes on it. Cryptography is hard to get right.
  3. Planned future work depends on it.
Pros for OpenSSL:

  1. Industry-standard cryptography code
  2. Many, many eyes have looked over OpenSSL’s code.
  3. It’s used pretty much everywhere.
  4. We will be able to provide a better freshclam experience in a future release.
  6. Portability. OpenSSL works pretty much everywhere.
  7. Maintainability. With OpenSSL backing major infrastructure, operating systems provide quick patches/updates to OpenSSL.

Cons for OpenSSL:

  1. Possibly bigger memory footprint
  2. First required dependency for ClamAV’s engine
As always we are receptive to feedback from the community.  It is always welcome over on the ClamAV-Users list:

<![CDATA[Open Source Community Meeting next week at RSA!]]> 2014-02-21T04:20:03Z 2014-02-21T04:20:03Z After a lot of hard work by our teams, and with RSA just a few days away, we are proud to announce that along with Cisco and Sourcefire’s corporate teams being present at RSA, and for the first time we will also be holding an Open Source Community Meeting!

Matt Watchinski (Director of the Vulnerability Research Team) and myself, Joel Esler, (Open Source Manager) will be presenting on the state of our Open Source projects at Sourcefire, the state of Open Source now that we are Cisco,  some future developments and of course, open Q&A!

So here’s some attendance details:

Open Source Community Meeting
Executive Conference Center
55 4th Street—Level 2
San Francisco, CA 94103

Wednesday, February 26th, 2014
12:00pm – 2:00pm

Lunch will be provided on site.

We also have some exclusive Swag give-aways that not only no one else has, but aren’t available anywhere else!  Available for the first 40 people that come through the door (if we have your size).

We’ll have availability for about 50 people on site, so first come, first served, let’s make this a repeating event!

We look forward to seeing you there!

<![CDATA[Introducing ClamAV community signatures]]> 2014-02-19T04:20:03Z 2014-02-19T04:20:03Z
I am pleased to announce the creation of a new ClamAV signatures contribution program. My name is Alain Zidouemba and I will be managing this program.
If you would like to submit a ClamAV signature, you may do so by emailing community-sigs [at] lists [dot]  clamav [dot] net. We will require that each signature:
- not be a hash-based signature- be accompanied by a MD5/SHA1/SHA256 for a sample the signature is meant to detect.- come with a brief description of the threat the signature is trying to detect and what the signature is looking for
Please DO NOT attach malware to your email. Instead, submit your sample here
Signatures submitted will be tweaked if necessary in order to conform to our standards. After the signature passes quality assurance testing, it will be released with proper attribution unless you prefer to remain anonymous.
You can subscribe to the mailing list here. More information about this program will be added in the FAQ in a few days.
We look forward to a fruitful collaboration on community-sigs [at] lists [dot] clamav [dot] net.

]]> 0 <![CDATA[Follow Up: Generating ClamAV Signatures with IDAPython and MySQL]]> 2014-02-18T18:37:00Z 2014-02-18T18:37:00Z 0 <![CDATA[Careto: Covering unavailable samples]]> 2014-02-13T04:20:02Z 2014-02-13T04:20:02Z 0 <![CDATA[Generating ClamAV Signatures with IDAPython and MySQL]]> 2014-02-12T19:11:00Z 2014-02-12T19:11:00Z Covering malware is a constant fight and the more automation you can integrate, the easier life becomes. This post will go over a relatively easy setup for generating ClamAV signatures based on a set of samples.

I chose to work with OSX malware, specifically targeting Mach-O files. This would give me a relatively small sample set to work with. I downloaded the files from VirusTotal using the search type:macho positives:5+. At the time of download, this yielded 239 samples.

The first problem was grouping samples. Grouping the samples would allow to generate a single signature for multiple samples. One signature for each sample is costly and leads to a bloated signature set. For this, I set up three MySQL tables.

    binaries – stores information about each sample seen
    | Field | Type        | Null | Key | Default | Extra          |
    | id    | int(11)     | NO   | PRI | NULL    | auto_increment |
    | md5   | varchar(32) | NO   |     | NULL    |                |
    | size  | int(11)     | NO   |     | NULL    |                |

    functions – stores information about each function seen
    | Field | Type        | Null | Key | Default | Extra          |
    | id    | int(11)     | NO   | PRI | NULL    | auto_increment |
    | md5   | varchar(32) | NO   |     | NULL    |                |
    | size  | int(11)     | NO   |     | NULL    |                |

    link_table – associates each binary with a set of functions
    | Field   | Type    | Null | Key | Default | Extra |
    | prog_id | int(11) | NO   | PRI | NULL    |       |
    | fn_id   | int(11) | NO   | PRI | NULL    |       |

The table binaries stores a hash of each program, a unique id, and the program’s size. The table functions stores the md5sum of the bytes comprising the function, a unique id, and the size of the function. The table link_table links each binary to the functions it contains. The grouping is done based on common functions between binaries.

In order to populate these tables I wrote an IDAPython script. It iterates through the functions of the program, calculates their md5sum, and then inserts that information into the functions table if its length is greater than 19. The value 19 was selected after some light analysis in order to filter out functions that only consisted of a few instructions. Here is the snippet that populates functions and link_table.

    # for all function offsets
for fn_ea in Functions():
if fn_ea == None:

        # get function from offset
        f = idaapi.get_func(fn_ea)

        # get function bytes
        start = f.startEA
        size = f.endEA - start
        bytes = GetManyBytes(start, size)

        # if the function is sufficiently long
        if bytes != None and len(bytes) > 19:
            fn_hash = md5(bytes).hexdigest().upper()
            fn_size = str(len(bytes))
            fn_data = (fn_hash, fn_size)

            # get function id, or insert and get function id
            fn_id = get_fn_id(cursor, cnx, fn_data)

            # link binary to function
            link_query = ‘REPLACE INTO link_table (prog_id, fn_id) VALUES (%s, %s)’
            link_data = (prog_id, fn_id)
            cursor.execute(link_query, link_data)

IDA and this script are called by a batch script for every target binary. Once these tables are populated another script is ran, This script uses the MySQL functionality group_concat to group binaries, based on their common functions, into a list. The problem with this approach is that if binaries A, B, and C share functions x, y, and z, and binaries A and C share functions w, x, y, and z, then we will have duplicates in the list returned. To remedy this problem the script simply loops through the rows returned and if any list of binaries is completely contained in another list, it is removed. Any binary not in these groupings is marked to get its own signature.

Next, the md5sums of the functions common to each group are added to the table communicate. This was the best way for me to pass this information between scripts. Once this table is populated, another IDAPython script is called on the first binary in a group. This script iterates through the functions in the binary and if the function’s md5sum matches one in the list of shared functions, its basic blocks are loaded into a table basic_blocks. This table stores the parent function’s md5sum, the bytes that comprise the basic block, the basic block’s md5sum, the size of the basic block, and its entropy. The byte_ prefix is used to differentiate between attributes of the raw data and the hex encoded version used in the ClamAV signatures.

    communicate – used to pass the md5s of common functions
    | Field  | Type        | Null | Key | Default | Extra |
    | fn_md5 | varchar(32) | NO   | PRI | NULL    |       |

    basic_blocks – stores basic block information from functions

    | Field        | Type        | Null | Key | Default | Extra |
    | fn_md5       | varchar(32) | NO   | PRI | NULL    |       |
    | hex_bytes    | mediumtext  | NO   |     | NULL    |       |
    | bb_md5       | varchar(32) | NO   | PRI | NULL    |       |
    | byte_size    | int(11)     | NO   |     | NULL    |       |
    | byte_entropy | double      | NO   |     | NULL    |       |

Once the basic blocks are stored, the IDAPython script completes and returns the the signature generation script. The basic blocks are queried for, sorted by their parent function and a metric entropy * size. The script then iterates through the functions and selects the best basic block based on the previously mentioned metric. It continues to do this until it has a sufficient amount of bytes. It then constructs an LDB signature.

With my newly created signatures, I ran a test on all the samples I had downloaded.

—————- SCAN SUMMARY—————-
Known viruses: 107
Engine version: 0.98.1
Scanned directories: 1
Scanned files: 239
Infected files: 190

Data scanned: 78.35 MB
Data read: 81.36 MB (ratio 0.96:1)
Time: 2.332 sec (0 m 2 s)

The interesting lines are highlighted. Since this script should give near total coverage, a detection rate of 190/239, while impressive, did not meet my expectations. Something was amiss. My colleague Shaun Hurley noticed that 64 bit Mach-O files were being neglected. Thinking about it, this made sense. IDA has different versions for 32 bit and 64 bit files. I modified the scripts to use idaw64.exe and reran them on the 64 bit binaries. The combined signature set was more impressive.

—————- SCAN SUMMARY—————-
Known viruses: 155
Engine version: 0.98.1
Scanned directories: 1
Scanned files: 239
Infected files: 232

Data scanned: 78.82 MB
Data read: 81.36 MB (ratio 0.97:1)
Time: 2.535 sec (0 m 2 s)

Great success!

This method does have some drawbacks. Since I was running it in a VM, concerns about hard disk space influenced the choice to group based on functions rather than grouping based on basic blocks. This will be fixed by offloading MySQL to a more dedicated machine whose hard drive I can fill up. As well, only common functions between the binaries are considered when selecting basic blocks. This was an oversight on my part since other functions may not be exact matches but could share a lot of common code. With the extra database space, I do not think grouping based on basic blocks is an unreasonable task for these relatively small sets of samples. Building in automatic identification of 32 bit and 64 bit files would remove some manual effort from the process.

A good example of a signature generated for multiple samples is this one for Flashback:


While that signature is just extracted x86, it alerts on the following 15 samples:


Overall, I’m very happy with these results. Since IDAPro is used to extract everything, this work will translate well to the other binary types that IDA is capable of parsing – most importantly, portable executables.

<![CDATA[ClamAV Mailing List Maintenance, Monday, February 10th, 2014]]> 2014-02-06T21:59:00Z 2014-02-06T21:59:00Z This notice is for the members of the ClamAV mailing lists found here:

On Monday, February 10th, 2014 starting at 10am EST, the ClamAV Mailing lists will be moving to new server hardware.  We anticipate this outage to last approximately four (4) hours.  We will be notifying everyone when the new server is up and operational.

Thank you for your patience.

Joel Esler
Threat Intelligence Team Lead
Open Source Manager
Vulnerability Research Team