You can view the documentation below, or browse our GitHub Repository, where you can contribute to user manual and FAQ.
General | Installing ClamAV | How to Report A Bug | Miscellaneous FAQ | ClamAV Virus Database FAQ | End of Life Policy (EOL) | Potentially Unwanted Applications (PUA) | Mailing Lists FAQ | Troubleshooting FAQ | Safebrowsing | Upgrading ClamAV | ClamAV on Microsoft Windows FAQ | Which Version of ClamAV should I use? | Uninstalling ClamAV | ClamAV Overview | Interpreting Scan Alerts FAQ | Freshclam FAQ | How do I ignore a ClamAV signature?
Manual | Clam AntiVirus User Manual
Manual | UserManual | Installing ClamAV on Unix / Linux / macOS from Source | Installing ClamAV on Windows | Introduction | LibClamAV | On-Access Scanning | Creating signatures for ClamAV | Usage | ClamAV Development | Contribute
Manual | UserManual | Installation-Unix | Installation on Debian and Ubuntu Linux Distributions | Installation on macOS (Mac OS X) | Installation on Redhat and CentOS Linux Distributions
Manual | UserManual | Signatures | Trusted and Revoked Certificates | Body-based Signature Content Format | Bytecode Signatures | Signatures based on container metadata | Database Info | Dynamic Configuration (DCONF) | Passwords for archive files \[experimental\] | Extended signature format | File Type Magic | ClamAV File Types | Functionality Levels (FLEVELs) | File hash signatures | Logical signatures | PhishSigs | Using YARA rules in ClamAV | Allow list databases
Manual | UserManual | Usage | Configuration | Scanning | Signature Testing and Management
Additional | Microsoft Authenticode Signature Verification | Private Local Mirrors
For ClamAV library & application projects, submit pull-requests to: https://github.com/Cisco-Talos/clamav-devel
For ClamAV documentation projects, submit pull-requests to: https://github.com/Cisco-Talos/clamav-faq/pulls
Tip: If you find that any of the bugs or projects have already been completed, you can help out simply by updating the list in a pull-request to update this document.
-DMAINTAINER_MODE=ON
-DCODE_COVERAGE=ON
There’s only so much our core dev team can schedule into each release. Many bugs probably won’t be fixed without your help! Feel free to troll our open Bugzilla tickets if you’re looking for project ideas!
The following are a list of project ideas for someone looking to work on a larger project. Any projects labeled “Risky” or “Exploratory” are thought to be more likely to fail, or to have signifacnt drawbacks that will result in the new feature being ultimately rejected.
Please don’t take it personally if the ClamAV team decide not to merge your implementation due to perceived complexity, stability, or other such concerns.
Contributors are expected to implement ample documentation for any new code or feature. Directions on how to test the contribution as well as unit and/or system tests will significantly help with PR review and will improve the likelihood that your contribution will be accepted.
Unstable or incomplete work is not likely to be accepted. The core development team has a long backlog of tasks and a currated roadmap for the next 6-12 months and will not have time to complete an unfinished project for you.
Contributors submitting a sizeable new feature will be asked to sign a Contributors License Agreement (CLA) before the contribution can be accepted.
-DMAINTAINER_MODE=ON
The purpose of “maintainer” build-mode is to update source generated by tools like Flex, Bison, and GPerf which are not readily accessible on every platform.
In this case, the project is to add GNU gperf
support to the our CMake build system’s Maintainer-Mode (-DMAINTAINER_MODE=ON
). To complete this task, you’ll need to detect GPerf when using Maintainer-Mode, and it should be required. When the build runs, it should regenerate and overwrite the libclamav/jsparse/generated
files in the source directory using gperf
with jsparse-keywords.gperf
.
The contributor should add the new option to CMakeOptions.cmake
and document the feature in INSTALL.cmake.md
as well as in the clamav-faq
repo’s development.md
developer documentation, after the feature has merged.
Category: Low-hanging fruit, Development
What you will learn from this project:
Required skills:
Project Size: Small
-DCODE_COVERAGE=ON
Add a -DCODE_COVERAGE=ON
option to the CMake build system which will build ClamAV with code coverage features enabled.
An ideal solution would support code coverage in when using GCC, Clang, and MSVC.
See development.md
in the clamav-faq
repo for additional insight on how gcov
, lcov
, and genhtml
can be used today with the Autotools build system.
The contributor should add the new option to CMakeOptions.cmake
and document the feature in INSTALL.cmake.md
as well as in the clamav-faq
repo’s development.md
developer documentation, after the feature has merged.
Category: Low-hanging fruit, Development
What you will learn from this project:
Required skills:
Project Size: Small
ClamAV parses the PE/ELF/MachO headers on executables that it scans, but doesn’t make all of the data that it extracts available for use by NDB/LDB signatures. Some features that would be great to have include:
.crb
rulesAs PE, ELF, and MachO parsing features already exist in C, C is the mostly likely language of choice. However any major new self contained code would ideally be written in Rust.
Category: Core Development
What you will learn from this project:
Required skills:
Project Size: Large
Today, ClamAV works by scanning files on disk for malware. It’d be great if ClamAV could also be used to scan process memory on a system its running on in order to detect malware that isn’t present on disk.
The ClamAV team is already looking into integrating such a feature from clamav-win32, a project by Gianluigi Tiesi who has graciously agreed to allow us to include this memory scanning feature and others in the upstream clamav project.
This project would be to develop a similar capability for use on Linux and/or macOS and/or BSD Unix scanning clients.
As this is a relatively large new feature, an ideal solution would be written in Rust.
Category: Fun/Peripheral
What you will learn from this project:
Required skills:
Project Size: Large
Background: ClamAV has for a long time had runtime support for running portable plugins we call “bytecode signatures”. ClamAV has a custom bytecode compiler to compile these plugins from a C-like language and uses LLVM or a homegrown “bytecode interpreter” to run the plugins. This solution is strikingly similar to a newer portable plugin technology: WebAssembly!
The goal of project would be to create a proof-of-concept WebAssembly (wasm) runtime in ClamAV so that “wasm signatures” could be written in Rust and executed in a wasm sandbox. As with our current bytecode signature technology, the wasm signatures would run at specific hooks in the ClamAV scanning process. They would need access to the file map (buffer) being scanned, and would be given a limited API to call into ClamAV functions.
For a proof-of-concept, executing a local wasm plugin that has access to the file being scanned (without copying the data) would be fine. A production solution would need to convert the wasm plugin to an ascii-text encoding so it can be distributed much the same way the current bytecode signature .cbc
plugins are distributed. As with the bytecode signatures, clamscan
and clamd
must not load the plugins unless they’ve been digitally signed or the --bytecode-unsigned
/BytecodeUnsigned
options are set, which would disable this safety precaution.
Important Notes: The ClamAV bytecode compiler project is currently undergoing a major re-write. Once complete, the new bytecode compiler will effectively be a Python script that invokes clang
with a collection of custom compiler passes that effectively compile C code into ClamAV-bytecode plugins. This project would have you extend that project to instead use rustc
to compile Rust ClamAV-WASM plugins.
Category: Core Development, Fun
What you will learn from this project:
Required skills:
Project Size: Large
ClamAV includes support for unpacking executables generated by several software packers so that malware can’t use them to easily evade detection. The list of packers currently supported can be found in the Introduction of the ClamAV Manual. There are many packers out there, though, so there is always a need to write unpacking code for ones that are frequently used by malware authors. Some that are currently needed include:
Improvements to existing executable (PE/ELF/MachO) parsing code would likely be in C, but any new standalone modules would ideally be written in Rust.
Category: Fun/Peripheral
What you will learn from this project:
Required skills:
Project Size: Large
Yara extracts certain properties of .NET executables and makes them available for signatures to use for detection: https://yara.readthedocs.io/en/v3.6.0/modules/dotnet.html
Can ClamAV do something similar? For instance, extract the GUIDs and allow matching on those the way we do entries in the PE VersionInfo section?
Tip: An ideal solution for this and any new file parsing feature should be written in Rust and called by our existing C code.
Category: Fun/Peripheral
What you will learn from this project:
Required skills:
Project Size: Large
ClamAV and Sigtool currently support parsing OLE Office files to decompress and extract macros for scanning. The newer version OOXML Office files do not have this support, resulting in detection possible for macros in these documents. The ability to both extract and scan macros would enable better coverage. This might mean creating a new target type to prevent creating two signatures one for OLE macros and another for OOXML macros.
Tip: An ideal solution for this and any new file parsing feature should be written in Rust and called by our existing C code.
Category:
What you will learn from this project:
Required skills:
Project Size: Medium
Known file types are currently baked into each ClamAV versions along with file type magic signatures. See filetypes_int.h
, filetypes.h
, and filetypes.c
. The hardcoded signature definitions for these hardcoded types are generally overridden by daily.ftm
, a component of daily.cvd
used to tweak file type identification definitions after release.
This project would be to re-architect how file types are stored in libclamav so new file types can be dynamically added when daily.ftm
(or some other .ftm
file) is loaded. Supplemental .ftm
files should supplement the existing file type definitions, allowing an extra.ftm
file to be tested alongside daily.cvd
.
This new capability when combined with the ability to register bytecode signatures as new file type scanners will dramatically increase the ability to extend ClamAV functionality between major version updates. Even when combined with logical signatures that target specific file types (using the proposed new Type:
keyword instead of Target:
, see below project idea), will allow creative analysts to write more compact and efficient logical signatures.
Category: Fun, Core Development
What you will learn from this project:
Required skills:
Project Size: Medium
Bytecode signatures are the portable executable plugin format for ClamAV. If ClamAV file types each had one or more*
linked list of file type handlers (“scanners”), then a bytecode API could be added to register a bytecode signature as a new scanner for a file type.
This project should be completed after the project to dynamically add new file types with new file type magic signatures (above). This new scanning architecture would be really powerful way to add features to the product without requiring a major version update. When combined with the project to run WebAssembly signatures written in Rust (project idea above) – this plugin-based scanner feature would have the potential to become the fastest and safest way to add new capabilities to ClamAV.
Example use case:
One example use case of this feature would be to alert on the malicious use of crypto miner wallet IDs.
Cryptomining malware has become increasingly prevalent with the rise in cryptocurrency prices, and we have thousands of wallet identifiers known to be associated with malicious cryptomining campaigns. We don’t have a robust way of using these IDs for detection, though, because we only want to raise an alert if the ID appears to be used in a malicious way (Ex: hardcoded into a mining application or as part of a coin miner configuration file) and not in legitimate ways (Ex: blog posts about campaigns or wallet blacklists used by the mining pools).
The two use-cases that we want to alert on are miner config files and executables with the embedded wallet identifier. We could have two .ftm
rules (one for each case) that indicate a CL_TYPE_MINER
or something like that, and then scanning execution for CL_TYPE_MINER
can go to the bytecode sig to perform any other checks that may be necessary.
*
Additional Considerations: ClamAV has several locations in the scanning process for invoking file type scanners:
1. After initial file type identification, and before the “raw scan”. In cli_magic_scan()
.
2. Once for each embedded file types found when using scanraw()
to also match on embedded type recognition signatures*
. In scanraw()
.
- *
Embedded type recognition signature matching is a feature used to identify self-extracting archives and some harder to identify file formats, like XML-based office document formats, DMG files, master boot records (MBR), etc. It isn’t used for some archive and disk image formats that we’ll unpack later anyways because they cause excessive type false positives and duplicate file scanning. A common example without this safety measure was duplicate file extraction and scanning of zip file entries found in a tarball.
3. After scanning all of the found embedded types (above). At the end of scanraw()
. These could probably be moved to (4) if it is deemed safe to remove the 1st “safety measure” call to scanraw()
in cli_magic_scan()
(i.e we’d only call scanraw()
once, ever).
4. Again, after the call to scanraw()
at the bottom of cli_magic_scan()
, for types that have bytecode hooks that won’t execute unless a logical signature matches, requiring scanraw()
to perform matching first.
Considering that there are 3 or 4 placement options for scanners, it may be required to have 3 (or 4) different lists to add to when registering a new scanner to indicate when to run the scanner in the scanning process. An enum argument for the function would indicate which list to add it to. If inserting the new scanner for a given type from the front of the list, and only invoking the next scanner if the first one returns CL_EPARSE
or CL_EFORMAT
, then a scanner registration could be used to override an existing/built-in one or supplement it, whichever is desired.
This project would would require coming up with a common file-type-scanner API for all scanners (including bytecode scanners), and would enable moving all file-type-scanners out of scanners.c
and into a new file for each in a scanners
subdirectory. A separate parsers
subdirectory should be added at this time and each file type parser would be moved there. The distinction between a “scanner” and a “parser” is this. A scanner uses a parser to extract bits to be scanned. A parser may simply be something like an archive extraction library. In some cases, particularly in internally developed code, the distinction may be less clear and so the entire thing may be better placed under the scanners
directory as the entry-point will doubtless need to use the common file-type-scanner API.
This project will also require creating lots of regression tests for file type identification to ensure that the new architecture doesn’t accidentally misclassify or fail to scan certain files.
The majority of the work won’t actually change ClamAV’s behavior, which may seem frustrating, but the end goal is super cool. Code cleanup and organization along the way will also make a meaningful difference. This project could be split into pieces:
Category: Very Fun, Core Development
What you will learn from this project:
Required skills:
Project Size: Very Large
ClamAV signatures have a “Target Type” which is an integer type which can be used in signatures to limit signature matches to specific file types. ClamAV also categorizes signature patterns into two different Aho-Corasick pattern-matching trie’s by Target Type. Target Type 1
(Windows executables (EXE/DLL/SYS/etc.) go in one trie, and everything else goes in the other trie. Unfortunately, not every file type has an associated target type. In addition, while it’s conceivable to be able to add new text-based file types dynamically (see the above project idea about file type magic signatures), it is less feasible to dynamically add new numerical target types.
For some advanced reading, see: - https://www.clamav.net/documents/clamav-file-types - https://www.clamav.net/documents/logical-signatures
This project is to add a new “Type:
” keyword to the TargetDescriptionBlock
for Logical Signature (.ldb
) to limit logical signature alerts to specific file types, much like you currently can do with Target Types (“Target:
”), Container File Types (“Container:
”), and Container Intermediate Types (“Intermediates:
”). While this isn’t expected to improve scan times, it should reduce overall signature size as analysts will no longer need to duplicate the file-type-magic signature in order to limit alerting on a signature match by file type.
To illustrate, this is the file type magic signature for a Microsoft Shorcut File, aka CL_TYPE_LNK
:
0:0:4C0000000114020000000000C000000000000046:Microsoft Windows Shortcut File:CL_TYPE_ANY:CL_TYPE_LNK:100
Though we can classify a file as CL_TYPE_LNK
and even unpack the file with custom scanner using that type, there is presently no way to write a signature for CL_TYPE_LNK
files without duplicating the 0:4C0000000114020000000000C000000000000046
bit.
At present a signature to alert on a “malicious” shortcut containing 0xdeadbeef
might look like this:
SignatureName;Target:0;(0&1);0:4C0000000114020000000000C000000000000046;deadbeef
After this change, the signature could instead read:
SignatureName;Target:0,Type:CL_TYPE_LNK;(0);deadbeef
Category: Low-hanging Fruit, Core Development
What you will learn from this project:
Required skills:
Project Size: Small
Add a callback function to give libclamav file parsers the ability to request additional file data from the scanning application – I.e. clamscan
and clamd
(and by extension clamdscan
& clamonacc
).
This feature would enable support for split-archive scans, if all components of the split archive are present and available to the scanning application. To make this work for clamdscan
+clamd
, or clamonacc
+clamd
, the request would also have to be relayed by clamd
over the socket API to the scanning client, and the client would have to respond with additional data, filepath, or file descriptor for clamd
to provide via the callback to file parser.
Disclaimer: It’s entirely likely that this idea is bogus and wouldn’t work over the clamd
+clamdscan
socket API. This task would require a fair amount exploratory coding.
When a file is scanned, the scanner (eg cli_scanrar
) may call a callback function provided by clamscan or clamd to request scan access to other files by name, with the expectation that it would receive an fmap
in response. Specifically, when the first file in a split archive is scanned, the parser could request fmap
s for subsequent files to provide to the archive extraction library. Direct scanning of files other than the first file in a split archive will skip, because they are split and are not the first file.
Category: Risky/Exploratory, Core Development
What you will learn from this project:
Required skills:
Project Size: Large