Mithril: Applying Adapability for Survivability
Collaborative scientific computing sites, such as the NRL Center for
Computational Science, NSF computing sites (NCSA, SDSC, PSC, NCAR) and
similar labs in DOE (e.g. NERSC, LBNL), have large distributed user
communities, spread both geographically (over the globe) and
administratively. A constant threat to these computing sites is the
compromise of the end systems of their users. When such a compromise
occurs, a typical repercussion is that user credentials (e.g. SSH keys
or passwords) stored or used on that system will be captured by the
attacker and used to gain illicit access to the computing site.Under
normal day-to-day operation, production security teams at the
computing sites handle a continuous small number of account
compromises caused by compromise of these user systems by manually
detecting such compromises (via monitoring of audit logs), revoking
compromised credentials, and working with the end user and their
administrators to restore integrity to the compromised
system. However, incidents can occur, such as the incident that
occurred in the summer of 2004, referred to as Incident 216 (this name
comes from an internal FBI designation of the case), which overwhelm
this day-to-day process. In Incident 216, the attackers compromised
such a large number of user end systems that it became impossible for
site security personnel to keep up with the process of detecting their
compromise and arranging the restoration of their integrity. In the
face of this incident many sites were forced to take their own systems
or even their entire site off the net due to their inability to
maintain integrity.
Incident 216 illustrates the situation faced by
collaborative computing sites: their security measures and mechanisms
are sufficient to allow them to maintain an acceptable operational
state, however extreme situations overwhelm these security measures
and mechanisms leaving the sites unable to maintain their integrity. A
natural reaction to this situation is to raise the level of security
at sites to higher levels that would be sufficient to provide
protection from Incident 216-like attackers. This is akin to
establishing a security perimeter around a hazardous area and allow
only limited, authorized personnel to enter the area to respond to the
hazard and to enable continuity of essential services. However, as we
discuss subsequently, this brings with it significant costs, in terms
of both purchasing and supporting new technologies, and decreased
usability for users.
The Mithril project focuses on the application of survivability
research to standard open source software to allow such sites to
continue to operate and serve customers in the face of a extraordinary
attack by temporarily and gracefully reducing their level of service
but raising their level of security. We will develop a set of
integrated security enhancements that not only increases day-to-day
security, but also allows dynamic, temporary adaptations in security
in response to a heightened level of threat. These enhancements will
allow a site to maintain a high-level of openness and usability during
normal periods of operation, but respond quickly to increased threat
levels with increased security, while still continuing to serve key
customers.
Here is a paper which summarizes our accomplishments and insights.
Mithril: An Experiment in Adaptive Security
Mithril is a collaboration between NCSA, PNNL and the NRL
Center for Computational Science (CCS). NCSA and PNNL will lead the
research and development efforts, with NRL CCS providing requirements
and evaluations to ensure applicability of our work to NRL. NCSA will
provide over management for the project.
Project Staff:
This project is funded by the Office of Naval Research (ONR) grant N0001404-1-0562 through the
National Center for Advanced Secure Systems Research
Any opinions, findings, and conclusions or recommendations expressed in this publication are
those of the authors and do not necessarily reflect the view of ONR.