
![]() |
CANINE Download Web Page |
MenuWhat is CANINE?Papers Describing CANINE System Architecture Getting CANINE Installation Instructions |
|
CANINE attempts to solve two problems that the current NetFlow tools often struggle with: (1) NetFlows come in many different, incompatible formats, and (2) the sensitivity of NetFlow logs can hinder the sharing of these logs and thus make it difficult for developers to get real data to use.
CANINE's capabilities:As a converter, CANINE augments existing flow tools as it enables tools working exclusively with one type of NetFlows to operate on data from NetFlows in other formats. This is very beneficial given the fact that different types of NetFlows can come from complementary sources as the format is often tied to the routing hardware or computer collecting the data. Currently, CANINE provides support for the following NetFlow formats:
As an anonymizer, CANINE addresses problems with sharing sensitive logs. People often have concerns about information disclosure when publishing results or performing demonstrations that utilize sensitive NetFlow logs. CANINE provides multiple methods of anonymizing the following fields:
Figure 1: Overview of CANINE system architecture
The general system architecture of CANINE is shown in Figure 1. CANINE consists of the two main modules: (1) the CANINE GUI and (2) the conversion/anonymization engines. The CANINE GUI accepts user input for NetFlow conversion and anonymization options, sends the request to the processing engine and summarizes the results of the performed actions in a pop-up window. First, the conversion engine reads the NetFlow data record from the source file and parses it into its component fields. Next it sends the unanonymized data to the anonymization engine. The anonymization engine houses a collection of anonymization algorithms, and it anonymizes the data according to the user's chosen options before it sends the data back to the conversion engine. The conversion engine reassembles the anonymized data according to the conversion options and writes the records to the destination file. Statistics are collected and sent back to the GUI which displays them in a new window.
You may want to consider looking at FLAIM (Framework for Log Anonymization and Information Management) for your anonymization needs. FLAIM is our next generation anonymization tool that replaces CANINE which is no longer under active development. FLAIM is (1) much faster being a multi-threaded C++ application, (2) completely scriptable as a command line tool, and (3) supports many types of logs. However, if you want a GUI or need to convert NetFlows to other formats, CANINE is still the right tool for you.
Executables for the following platforms are available, as well as a jar package that can be executed by any JRE.
Proceed to the "Download Form" to choose your platform.
Please also check our
change log.
Go to the top!
While CANINE is a Java program, we provide compiled executables
for Windows, Linux, Mac OS and Solaris. However, little testing has been
performed on the Mac OS and Solaris platforms. We also provide a JAR file that
can be used on any platform as long as the Java Runtime Environment (JRE) is
installed. Thus the JAR version is dependent upon the Sun Java JRE. Information
on installing the JRE can be found here. Installation of the
entire Java Developer's Kit (JDK) is not necessary, but it is certainly
sufficient.
We support a fixed length ASCII format derived from raw
Argus binary logs. If the user wants to use raw Argus logs, they need to use a
shell script we wrote that uses the Argus utility ra to convert the raw
Argus log to a text format. This script is dependent upon the argus software
utility ra running on a UNIX-like OS. The script and its use is described
in more detail in a following
section.
For detailed description of the usage, refer to the Manual
and Examples section.
Go to the top!
Since different versions of Cisco NetFlow Export datagrams are generated by the diverse routing equipment at the NCSA and because Cisco datagrams are of variable length, we have created an NCSA internal format with uniform record length. This not only enables easier access control and data manipulation, but the fixed length records are necessary for visualization tools that depend upon random access to NetFlow records. Furthermore, the format serves as an internal format into which multiple versions of NetFlows can be transformed. Each record (44 bytes) contains the principle information about a network flow, including IP addresses, ports, protocol used, bytes transferred, etc. Since it is an unified representation of Cisco NetFlows of multiple versions, we also call them CiscoUnified or NCSA Unified NetFlows. For the detailed specification of the format, please refer to our related publications.
Argus is a fixed-model Real Time Flow Monitor designed to track and report on the status and performance of all network transactions seen in a data network traffic stream. Argus provides a common data format for reporting flow metrics such as connectivity, capacity, demand, loss, delay, and jitter on a per transaction basis. The record format that Argus uses is flexible and extensible, supporting generic flow identifiers and metrics, as well as application/protocol specific information.
The raw Argus format is undocumented, and the ra utility (bundled with Argus) must be used to extract records from Argus flow files. We wrote a script (download here) that uses the ra utility to create the Argus ASCII flows usable by CANINE and our NetFlow visualization tools. This script must be run on a *NIX platform (e.g., Solaris, Linux, BSD, or Mac OS). The script is also dependent upon ra, which can be found as part of the Argus 2.0.5 toolkit. Argus version 2.0.6 will not work as the interface to ra and command line options have completely changed.
3. Main GUI

Figure 2: Main GUI of CANINE
The root window of CANINE is shown in Figure 2. In the source [destination] information fields, the user designates the source [destination] NetFlow format and file. Below these fields, the user can choose the fields to anonymize and the specific anonymization algorithms to use---many fields have multiple anonymization options. The bottom area is used to start [stop] processing and display the current progress.
4. Anonymization
The anonymization engine of CANINE supports the anonymization of several fields, often in multiple ways. Below we describe the different anonymization algorithms supported by CANINE below.

Figure 3: IP address anonymization options
4.1.1 Truncation
Truncation is the most basic type of IP address anonymization. Here the user chooses the number of least significant bits to truncate from an IP address. For example, truncating 8 bits would simply replace an IP address with the corresponding class C network address. Truncating all 32 bits would replace every IP with the constant address of 0.0.0.0. Truncation is probably the most common type of log anonymization currently employed.
4.1.2 Random permutation
With this method, a random permutation on the set of possible IP addresses is applied to translate each IP address. We generate a random permutation through use of two random hash tables.
4.1.3 Prefix-preserving pseudonymization
Prefix-preserving pseudonymization is a special class of permutations that have a unique structure preserving property. The property is that two anonymized IP addresses match on a prefix of n bits if and only if the unanonymized addresses match on n bits. We implement this algorithm in such a way that a user supplied passphrase generates an AES key that in turn determines the permutation. This is allows anonymization to be done in parallel with a consistent mapping between anonymizers. This is difficult to do when shared tables are used as in the previous method.

Figure4: Timestamp anonymization options
4.2.1 Time unit annihilation
Timestamps can be broken down into the units of Year, Month, Day, Hour, Minute and Second. We support the annihilation of any subset of those units. If one wishes to remove the hour, minute and second information, they can do so. Likewise, if someone wishes to obfuscate the date, they can remove the year, month and day information. If they want to completely eliminate time information, i.e. perform black marker anonymization of the entire field, they can select all of the time units for annihilation. Starting times are adjusted so that the duration of the flow is kept the same.
4.2.2 Random time shifts
In some cases it may be important to know how far apart two events are temporally without knowing exactly when they happened. For this reason a log or set of logs can be anonymized at once such that all timestamps are shifted by the same random number. If one does this to two different sets of logs at different times, then this random number will be different between the data sets. This means that data-mining cannot be done by indexing the time field between the data sets. The solution requires the ability to choose the number by which to shift. However, it seems cumbersome and impractical for data owners to save and keep track of shifting amounts used on different logs. That is why we do not support that ability in CANINE but instead warn users to be aware of the troubles with data-mining (by indexing the timestamp) between sets anonymized at different times when using this specific method.
4.2.3 Enumeration
One could remove all time information except the order in which the events occurred. In this last method, the algorithm chooses a random ending time for the earliest record. All other ending times are equidistant from each other and in chronological order. Corresponding starting times are calculated from the original flow duration. Implementation of this method can be troublesome, especially when dealing with streamed data. The problem arises because entries in the logs are not presorted by starting or ending time. They are close to being in order by ending time, but they are not in perfect order. Sorting cannot work perfectly on streamed data, and it would be extremely slow on large log files. Our solution is to buffer events to sort locally. Since events are never terribly disordered, this can sort things with great accuracy. If data is from multiple routers, there will likely be small errors in this regard anyway, due to time skews between routers.

Figure5: Port number anonymization options
4.3.1 Bilateral classification
Usually the port number is useless unless one knows exactly what it is. However, there is one important piece of information that does not require one to know the actual port number: whether or not the port is ephemeral. In this way we can classify ports as being below 1024 or above 1023. To make the output look the same as the input, a representative of the set (port 0) replaces all non-ephemeral ports, and 65535 replaces all ephemeral port numbers.
4.3.2 Black marker anonymization
This is the same, from an information theoretic view, as printing the logs and blacking out all port information. In a digital form, we just replace all ports with a constant. We chose port 0 is for the constant. We needed to use a 16 bit representation for 0 so that programs that process unanonymized logs can still process anonymized logs. We have been careful to ensure that anonymized logs do not break current tools by changing the format.
While we can conceive of no reason to anonymize this field,
protocol information can simply be removed. We do this by replacing the protocol
number with the unused, but IANA reserved, number of 255. This is the maximal
number for that 8 bit field.
It is conceivable that one may wish, for privacy reasons, to anonymize byte counts. Users may not want others to know whether or not they are using a lot of bandwidth. Thus we support black marker anonymization of this field where all byte counts are replaced with the constant of 0, an impossible byte count in reality because headers do account for some of those bytes.
5. Information panel

Figure 6: Information panel
After a CANINE task finishes, a brief summary is shown to the user in a pop-up window (Figure 6). The task summary includes the following information: source and destination formats/filenames, date of processing, anonymization methods used, number of records processed and the total processing time. The user can save and print the task summary for future reference.
For your convenience, we have included sample data sets of Argus
NetFlow format and NCSA Unified format below. The sample data files can be found
here.
Go to the top!
Visit archives of the Canine mailing list to search/browse through other posted questions and answers.
Go to the top!