Back to the SIFT Home Page

CANINE Download Web Page

Menu

What is CANINE?
Papers Describing CANINE
System Architecture
Getting CANINE
Installation Instructions

Mailing List



Go to the top!

What is CANINE?

"CANINE: Converter and ANonymizer for Investigating Netflow Events"

CANINE attempts to solve two problems that the current NetFlow tools often struggle with: (1) NetFlows come in many different, incompatible formats, and (2) the sensitivity of NetFlow logs can hinder the sharing of these logs and thus make it difficult for developers to get real data to use.

CANINE's capabilities:


Go to the top!


Papers Describing CANINE

For more detailed description of CANINE motivation, goals, and usage visit our CANINE publications

Go to the top!

System Architecture

Figure 1: Overview of CANINE system architecture

The general system architecture of CANINE is shown in Figure 1. CANINE consists of the two main modules: (1) the CANINE GUI and (2) the conversion/anonymization engines. The CANINE GUI accepts user input for NetFlow conversion and anonymization options, sends the request to the processing engine and summarizes the results of the performed actions in a pop-up window. First, the conversion engine reads the NetFlow data record from the source file and parses it into its component fields. Next it sends the unanonymized data to the anonymization engine. The anonymization engine houses a collection of anonymization algorithms, and it anonymizes the data according to the user's chosen options before it sends the data back to the conversion engine. The conversion engine reassembles the anonymized data according to the conversion options and writes the records to the destination file. Statistics are collected and sent back to the GUI which displays them in a new window.



Go to the top!

Getting CANINE

You may want to consider looking at FLAIM (Framework for Log Anonymization and Information Management) for your anonymization needs. FLAIM is our next generation anonymization tool that replaces CANINE which is no longer under active development. FLAIM is (1) much faster being a multi-threaded C++ application, (2) completely scriptable as a command line tool, and (3) supports many types of logs. However, if you want a GUI or need to convert NetFlows to other formats, CANINE is still the right tool for you.

Executables for the following platforms are available, as well as a jar package that can be executed by any JRE.

Proceed to the "Download Form" to choose your platform.

Please also check our change log.

Go to the top!


Installation Instructions

While CANINE is a Java program, we provide compiled executables for Windows, Linux, Mac OS and Solaris. However, little testing has been performed on the Mac OS and Solaris platforms. We also provide a JAR file that can be used on any platform as long as the Java Runtime Environment (JRE) is installed. Thus the JAR version is dependent upon the Sun Java JRE. Information on installing the JRE can be found here. Installation of the entire Java Developer's Kit (JDK) is not necessary, but it is certainly sufficient.

We support a fixed length ASCII format derived from raw Argus binary logs. If the user wants to use raw Argus logs, they need to use a shell script we wrote that uses the Argus utility ra to convert the raw Argus log to a text format. This script is dependent upon the argus software utility ra running on a UNIX-like OS. The script and its use is described in more detail in a following section.

For detailed description of the usage, refer to the Manual and Examples section.

Go to the top!


Manual

1. CiscoNCSA format

Since different versions of Cisco NetFlow Export datagrams are generated by the diverse routing equipment at the NCSA and because Cisco datagrams are of variable length, we have created an NCSA internal format with uniform record length. This not only enables easier access control and data manipulation, but the fixed length records are necessary for visualization tools that depend upon random access to NetFlow records. Furthermore, the  format serves as an internal format into which multiple versions of NetFlows can be transformed. Each record (44 bytes) contains the principle information about a network flow, including IP addresses, ports, protocol used, bytes transferred, etc. Since it is an unified representation of  Cisco NetFlows of multiple versions, we also call them CiscoUnified or NCSA Unified NetFlows. For the detailed specification of the format, please refer to our related publications.

2. Argus format

Argus is a fixed-model Real Time Flow Monitor designed to track and report on the status and performance of all network transactions seen in a data network traffic stream. Argus provides a common data format for reporting flow metrics such as connectivity, capacity, demand, loss, delay, and jitter on a per transaction basis. The record format that Argus uses is flexible and extensible, supporting generic flow identifiers and metrics, as well as application/protocol specific information.

The raw Argus format is undocumented, and the ra utility (bundled with Argus) must be used to extract records from Argus flow files. We wrote a script (download here) that uses the ra utility to create the Argus ASCII flows usable by CANINE and our NetFlow visualization tools. This script must be run on a *NIX platform (e.g., Solaris, Linux, BSD, or Mac OS). The script is also dependent upon ra, which can be found as part of the Argus 2.0.5 toolkit. Argus version 2.0.6 will not work as the interface to ra and command line options have completely changed.

3. Main GUI

Figure 2: Main GUI of CANINE

The root window of CANINE is shown in Figure 2. In the source [destination] information fields, the user designates the source [destination] NetFlow format and file. Below these fields, the user can choose the fields to anonymize and the specific anonymization algorithms to use---many fields have multiple anonymization options. The bottom area is used to start [stop] processing and display the current progress.

4. Anonymization

The anonymization engine of CANINE supports the anonymization of several fields, often in multiple ways. Below we describe the different anonymization algorithms supported by CANINE below.

4.1 IP address

Figure 3: IP address anonymization options

4.1.1 Truncation

Truncation is the most basic type of IP address anonymization. Here the user chooses the number of least significant bits to truncate from an IP address. For example, truncating 8 bits would simply replace an IP address with the corresponding class C network address. Truncating all 32 bits would replace every IP with the constant address of 0.0.0.0. Truncation is probably the most common type of log anonymization currently employed.

4.1.2 Random permutation

With this method, a random permutation on the set of possible IP addresses is applied to translate each IP address. We generate a random permutation through use of two random hash tables.

4.1.3 Prefix-preserving pseudonymization

Prefix-preserving pseudonymization is a special class of permutations that have a unique structure preserving property. The property is that two anonymized IP addresses match on a prefix of n bits if and only if the unanonymized addresses match on n bits. We implement this algorithm in such a way that a user supplied passphrase generates an AES key that in turn determines the permutation. This is allows anonymization to be done in parallel with a consistent mapping between anonymizers. This is difficult to do when shared tables are used as in the previous method.

4.2 Timestamp

Figure4: Timestamp anonymization options

4.2.1 Time unit annihilation

Timestamps can be broken down into the units of Year, Month, Day, Hour, Minute and Second. We support the annihilation of any subset of those units. If one wishes to remove the hour, minute and second information, they can do so. Likewise, if someone wishes to obfuscate the date, they can remove the year, month and day information. If they want to completely eliminate time information, i.e. perform black marker anonymization of the entire field, they can select all of the time units for annihilation. Starting times are adjusted so that the duration of the flow is kept the same.

4.2.2 Random time shifts

In some cases it may be important to know how far apart two events are temporally without knowing exactly when they happened. For this reason a log or set of logs can be anonymized at once such that all timestamps are shifted by the  same random number. If one does this to two different sets of logs at different times, then this random number will be different between the data sets. This means that data-mining cannot be done by indexing the time field between the data sets. The solution requires the ability to choose the number by which to shift. However, it seems cumbersome and impractical for data owners to save and keep track of shifting amounts used on different logs. That is why we do not support that ability in CANINE but instead warn users to be aware of the troubles with data-mining (by indexing the timestamp) between sets anonymized at different times when using this specific method.

4.2.3 Enumeration

One could remove all time information except the order in which the events occurred. In this last method, the algorithm chooses a random ending time for the earliest record. All other ending times are equidistant from each other and in chronological order. Corresponding starting times are calculated from the original flow duration. Implementation of this method can be troublesome, especially when dealing with streamed data. The problem arises because entries in the logs are not presorted by starting or ending time. They are close to being in order by ending time, but they are not in perfect order. Sorting cannot work perfectly on streamed data, and it would be extremely slow on large log files. Our solution is to buffer events to sort locally. Since events are never terribly disordered, this can sort things with great accuracy. If data is from multiple routers, there will likely be small errors in this regard anyway, due to time skews between routers.

4.3 Port number

Figure5: Port number anonymization options

4.3.1 Bilateral classification

Usually the port number is useless unless one knows exactly what it is. However, there is one important piece of information that does not require one to know the actual port number: whether or not the port is ephemeral. In this way we can classify ports as being below 1024 or above 1023. To make the output look the same as the input, a representative of the set (port 0) replaces all non-ephemeral ports, and 65535 replaces all ephemeral port numbers.

4.3.2 Black marker anonymization

This is the same, from an information theoretic view, as printing the logs and blacking out all port information. In a digital form, we just replace all ports with a constant. We chose port 0 is for the constant. We needed to use a 16 bit representation for 0 so that programs that process unanonymized logs can still process anonymized logs. We have been careful to ensure that anonymized logs do not break current tools by changing the format.

4.4 Protocol

While we can conceive of no reason to anonymize this field, protocol information can simply be removed. We do this by replacing the protocol number with the unused, but IANA reserved, number of 255. This is the maximal number for that 8 bit field.
 

4.5 Byte count

It is conceivable that one may wish, for privacy reasons, to anonymize byte counts. Users may not want others to know whether or not they are using a lot of bandwidth. Thus we support black marker anonymization of this field where all byte counts are replaced with the constant of 0, an impossible byte count in reality because headers do account for some of those bytes.

5. Information panel

Figure 6: Information panel

After a CANINE task finishes, a brief summary is shown to the user in a pop-up window (Figure 6). The task summary includes the following information: source and destination formats/filenames, date of processing, anonymization methods used, number of records processed and the total processing time. The user can save and print the task summary for future reference.


Go to the top!


Sample NetFlows Input Data Files

For your convenience, we have included sample data sets of Argus NetFlow format and NCSA Unified format below. The sample data files can be found here.
Go to the top!


Mailing List

To subscribe to CANINE mailing list, send mail to ncsalist@ncsa.uiuc.edu or majordomo@ncsa.uiuc.edu with the body of message containing the single line:
subscribe CANINE

Visit archives of the Canine mailing list to search/browse through other posted questions and answers.

Go to the top!
Free Web Site Counter
Free Web Site Counter
All rights reserved. ©2005 Board of Trustees of the University of Illinois