Near and Medium Term e-Science Security Roadmap
1. Introduction
This document should be read in conjunction with the e-Science Security Research Agenda (http://www.nesc.ac.uk/teams/stf/documents/), developed to support the EPSRC call for security. It provides an agenda for security research for the medium to long-term. Our goal here is to identify those things that need to be considered in the near term and fall perhaps more into education, support, development and applied research than the pure research domain. Together we hope the two documents provide a view of the whole development/research agenda at this time. It is important to note that some of the items listed in the Research Agenda are close to applied research and if not funded through the EPSRC call should be considered in other funding opportunities.
Input for this document has been taken from previous roadmap activities of the STF, workshops and discussions with the e-Science pilot projects, the Core Programme Town meeting on Security, held on April 11 2005 (http://www.nesc.ac.uk/events/town meeting0405/) and the follow on meeting on April 12 of a small international group considering the near term issues. Input is also gained from the study completed by Angela Sasse and Brock Craft on perceived effectiveness and usefulness of security of security solutions as well as usability within e-Science projects.
2. Classification
We classify the timescale in which results are required using three categories:
Short-term needs: New security technology or understanding is needed for projects to complete their work. Lack of adequate security will impede progress in the next year or so, and if not addressed may lead some projects to fail.
Medium-term needs: New security technology or understanding is needed for projects to deploy their results in an operational setting. Lack of adequate security will not impede the projects, but will lead to failure for the programme as a whole as few successful projects will lead to sustained exploitation.
Medium to Long-term needs: New security technology or understanding will be needed to make project results widely available. Existing technology is sufficient to develop the project results and to deploy them for ongoing operational use, but only within a restricted community. Lack of security may adversely impact future investment in e-Science capabilities. [This category is largely covered in the Research agenda and not in this document.]
With this classification, short-term needs must be addressed in the next year or so. Medium term needs must be met within the next three years or so, such that projects can give rise to operational systems. Long terms needs are required in anticipation of follow-up action to the current programme. These timescales do not indicate start times; needs identified for the long-term are challenging and their selection as priority items indicates the need for an immediate start.
3. Priority Technical Needs
This section outlines the priority needs for each security concern in turn. The degree to which each of these categories is of concern to a particular project depends on the nature of the work, and this is best established by a risk management approach. However, most projects will have concerns in each of the categories, and need mechanisms to support them all to some degree. For the simplest projects this may be hidden from the application by the design of its infrastructure, in more complex situations it will be necessary to involve security in the design of the application.
The priorities include education and support activities alongside evaluation, development and deployment.
Authentication
Authentication is the establishment and propagation of a user's identity in the system.
Technologies are available, and state of the art commercial packages offer many of the required capabilities, although interoperability between authorities and user mobility are still problems. Ongoing JISC projects[5] are investigating large-scale deployment in the UK academic community. Immediate concerns include the usability of PKI credentials by `average users' but the problem with PKI is actually quite complex. From a recent discussion in the HCIsec community, here is a list of what users have to know:
How to import a trust anchor.
How to import a certificate.
How to protect your privates (private keys, that is).
How to apply for a certificate in your environment.
Why you shouldn't ignore PKI warnings.
How to interpret PKI error messages.
How to turn on digital signing.
How to install someone's public key in your address book.
How to get someone's public key.
How to export a certificate.
Risks of changing encryption keys.
How to interpret security icons in sundry browsers.
How to turn on encryption.
The difference between digital signatures and .signature files.
What happens if a key is revoked.
What does the little padlock *really* mean
What does it mean to check the three boxes in Netscape/Mozilla?
What does "untrusted CA' mean in Netscape/Mozilla?
How to move and install certificates and private keys.
It is possible to build PKI applications that require less knowledge from the end-users if there are clear and universal policies (do's and don'ts) of how the various parties in an e-Science environment interact with each other, but
Developers in general do not know how to create usable interfaces and don't normally consider designing policies; it will substantial applied research to develop usable PKI based applications and provide example solutions that show how to do it.
The experience users have had with PKI based applications has often been very difficult, so the perception that it is inherently “unusable” will be hard to overcome, and so at this time many projects will try to avoid it at all cost.
Even usable solutions can be developed, it will require a lot of administrative effort and hence may be expensive to run; again, this could be overcome but it requires work along the lines of a).
Priority needs:
Grid and other community interoperability (short term)
At the present time there is a lot of work considering the implementation of Shibboleth across the academic community. It is important that the Grid authentication systems are interoperable with Shibboleth and indeed other institutions. The infrastructure must accommodate interworking between different user certification authorities, and may need to accommodate different methods of authentication, such as Kerberos. There is a need also to ensure that scientists are sufficiently informed about Shibboleth usage.
Support for User Credential Management. (short term)
Easy to use user credential management is required to make it practical for users to transport their credentials between applications and end-systems. The solution may be a mechanism for transporting or managing a user's public key certificates and associated private keys, or a system service that links user-friendly credentials, such as pass phrases, to the authentication infrastructure.
Support for Secure Roaming. (medium to long term)
Secure roaming will build on the above services to allow users greater freedom of location, and to simplify providing facilities for `visiting' users.
Authorisation and Delegation
Authorisation is concerned with controlling access to services based on policy. A complete authorisation system includes tools to specify and manage policy, mechanisms to distribute or obtain policies, tools to create and manage authorisation tokens, services that use policies to make an access decisions, and mechanisms that request and enforce access decisions.
Basic technologies are available, but we have insufficient practical experience to know if these will scale well with respect to numbers of users and sites. A full armament of user-friendly management tools is not yet available. Alternative (still to be identified) solutions may provide better scalability. The management of policies and their distribution, particularly in terms of the flexibility to set up dynamic or short-term groups of users, is a major weakness. The inability to support flexible dynamic delegation policies is a serious impediment to some applications. Other applications have strict privacy concerns about the contents of their authorisation tokens. The only implemented approach to delegation today is the user proxy employed by Globus, which implements time-limited delegation of full user rights; the security and functional weaknesses in this approach mean that it is not suitable for many applications, and some projects have therefore developed bespoke solutions.
Priority needs:
Evaluation and development of integration and interoperability of systems such as Shibboleth and Grid security systems. (short term)
Tools to assist and allow integration with the policy management/authorisation systems to the backend systems that are in use.
The selection and support of at least one (and preferably several) policy/authorisation infrastructures. (short term)
A number systems are available (CAS[6], Akenti[7], PERMIS[8], VOMS[9], VOM[10], Cardea) that address parts of the authorisation, policy management and authentication requirements; some of these are already subject to evaluation in JISC projects[5]. At least one of these, and preferably several, should be supported centrally to assist new and existing projects. In addition to functionality and scalability, selection criteria should include manageability, usability, performance, the extent that the product can migrate to common interface standards (such as SAML[11]) and its ability to integrate with web-services.
2.4 A policy reference model and supporting protocols. (medium term)
The authorisation and policy management system is complex enough to need a reference model that identifies the main components and the protocols by which they communicate. Delivery of such a model would facilitate the development of interoperable components, as opposed to the present situation where each project tends to adopt its own policy language. Emerging standards (SAML[11], XACML[12]) provide a starting point, but more development and consolidation is needed to create a comprehensive model. The optimum way to progress this activity would be via standards bodies such as the GGF.
2.5 Policy creation and management tools. (medium term)
The manageability of authorisation policies for both users and systems must be addressed if more complex grids are to be established. As well as function and scale, such tools also need to be able to easily deal with the creation and destruction of short-lived virtual organisations (e.g. projects or user groups).
Auditing
Auditing is the analysis of records of account (e.g. security event logs) to investigate security events, procedures or the records themselves. At this time most auditing is simply collecting large amounts of data, in the vain “you never know when it might come in handy”. There is a need for to understand better and to determine what auditing activities would be effective and efficient in an e-Science context. This could be done by experts working collaboratively with projects that have valuable assets and have experienced security problems. The roadmap considers the generation of such records under the same heading. Other related functions include performance measurement and accounting or charging. Although these are not security related and therefore are not considered further here, there may well be synergies between the technologies and methods used for grid accounting with those required for secure auditing. Thus full advantage should be taken of these where they exist.
There are also emerging grid standards for usage records. A short-term need would be to provide some implementations of these and tools to allow their collection, manipulation and use would help to drive take up of these - aiding interoperability.
Logging, intrusion detection and auditing of security in managed computer facilities is well established in theory and practice, although there may still be issues regarding “missing” audit records. Grid computing adds the complication that some of the information required by a local audit system may be distributed elsewhere, or may be obscured by layers of indirection.
Priority needs:
Implementations of emerging standards for audit (short).
Understanding of audit data curation (short/medium)
Tools to support the generation of complete diagnostic trails. (medium)
Diagnostic chains depend on the forms of authentication, authorisation, and delegation in use, and there are likely to be several. To avoid the need for each authority to understand the whole infrastructure in the distributed system, it is necessary to create tools that allow some types of record (e.g. user accountability information) to be obtained securely from other parts of the system and interpreted in a common framework.
Privacy
Privacy requirements relate to the use of data, in the context of consent established by the data owner, or subject. The primacy of the data subject poses special problems when data are amalgamated or copied to third parties. Privacy is therefore distinct from confidentiality, although it may be supported by confidentiality mechanisms including authorisation. Some users will have privacy concerns about their personal data and their authorisation credentials (attributes/roles) as well as their experimental data.
Privacy is particularly significant for projects processing personal information, or subject to ethical restrictions: projects utilizing health data are particular examples. There is little prior art in privacy grid science, although there is useful UK background in privacy including hospital systems[13]. Web based standards such as P3P[14] may contribute to only a small fraction of the necessary security mechanisms.
Privacy protection requires an understanding of what should be protected and to what degree, there cannot be generic privacy policies and practices, but need to be developed by the projects themselves, in close collaboration with people whose data are being used. Awareness and training is needed here. There is also the need to overcome the perception that privacy is something that is absolute - it is one value that has to be balanced against others (e.g. the benefits gained by scientific enquiry) - as noted earlier this is a risk management exercise.
Priority needs:
4.1 Education and awareness of privacy. Case studies indicating different balance choices.
4.2 The generation and promulgation of examples of good practice in both policy and implementation for health systems. (medium term).
The exploitation of grid technology in health systems needs a transferable understanding of suitable privacy policies, how they can be applied, and what mechanisms can be used to implement them.
Confidentiality
Confidentiality is concerned with ensuring that information is not made available to unauthorised individuals, services or processes. It is usually supported by access control within systems, and encryption between and within systems.
Confidentiality is generally well understood, but the grid introduces the new problem of transferring or signalling the intended protection policy when data are staged between systems. This is required in support of privacy, and also more generally for sensitive data. Some applications already have the requirement to store encrypted information in their databases, and this brings with it the associated problems of key management and the encryption of query messages.
Priority needs:
Support for encrypted fields in databases and query messages (medium term)
Some projects are already experimenting with and developing tools for encrypted fields in databases and query messages. Their results should be promulgated to a wider audience as they become available.
Integrity
Technical solutions exist to maintain the integrity of data in transit and in storage. The more general question of provenance (maintaining the integrity of chains or groups of related data) is a requirement that is being researched by a number of grid projects, and elsewhere.
Priority needs:
To collect and promulgate successful approaches to provenance management.
(medium term)
Fabric
The Fabric consists of the distributed computing and network resources and connectivity that support grid applications.
The fabric impacts grid security in two ways: an insecure fabric may undermine the security of the grid and fabric security measures may impede grid operations (e.g. firewalls may be configured to block essential grid traffic). The interaction between grids and fabric security is therefore an area of importance; grid infrastructure can be made more compatible with conventional fabric operation, and we can also develop more advanced fabric devices.
This is also related to confidentiality and provenance. To develop and keep confidence in any distributed grid fabric, especially where delegation is performed, it should be possible to easily satisfy requests for either of these (on remote machines/fabric).
Priority needs:
7.1 The interaction of grid software with existing university firewalls (short term).
There have been several instances of university firewalls shutting out grid traffic as a result of tighter overall campus security due to either successful hacking or virulent viruses (neither of which were grid related). Promulgation of current best practice for grid applications and university firewall policies is needed.
The development of advanced firewalls. (medium term)
The web-service approach tunnels much of the application and protocol relevant information as uniformly wrapped XML text. Firewalls need to be developed that are able to parse this content, and link their operation to network policies, without unduly affecting the performance of grid communications.
Trust
Trust is that characteristic allowing one entity to assume that a second entity will behave exactly as the first entity expects [15]. Trusted entities are those for which this expectation is assumed, with the consequence that data they originate are assumed to be correct, and obligations that they promise to undertake will be fulfilled. Contractual or other agreements about what entities are to be trusted to do, and to what extent, are therefore of fundamental importance to virtual organisations.
There is an important distinction between `trust management' systems according to RFC 2704, which implement authorisation, and the wider requirements of trust management. For example, both industrial and health applications require the agreement between users and resources providers of restrictions that cannot be implemented by access control (e.g. restrictions on the export of software, or a guarantee that personal data is deleted after use). There is therefore a need to understand and represent policy and contractual agreements between groups of users and resource providers; such agreements may exist inside or outside the system, and are typically not supported by technical mechanisms today.
Priority needs:
A policy framework to allow the establishment of `virtual grids'. (medium term)
The framework should consider how virtual grids are agreed, implemented and managed. It needs to specify the types of policy that can be supported and the extent to which these policies can be supported by technical security measures. It must also address the question of scalability to ensure that administration of complex grids is feasible.
4. Operational Characteristics
Operational characteristics (usability, performance and scalability, manageability, inter-operability, assurance) are a significant consideration in the adequacy of existing technical solutions, and are therefore important in the implementation of new technical capabilities. These concerns cut across the functionality requirements, so each concern is cross-referenced to the most relevant functional recommendations.
Usability
Usability is concerned with the ease and accuracy with which a system can be used, particularly by end-users who do not have detailed security skills or knowledge. In the context of security, ease of use implies that users are able to focus on their main goals, rather than supporting security functions; accuracy implies that users are able to use security functionality correctly, and do not inadvertently introduce security vulnerabilities through ignorance or avoidance of security related actions. Simplicity of use is a critical factor to success.
The critical usability problem in today's infrastructure is the management of private keys by a user. Users are expected to be able to carry out a difficult, technical, process to move private keys between machines; as a result many users will fail to restrict or protect their own private keys, or will avoid the process and be limited to a single terminal. (1.2)
Other recommendations with a particular usability element include roaming (1.3), and health and privacy policies (4.1). In the latter, the UK data protection framework [16] requires user interaction and understanding, so the design and implementation of security features in these systems must allow users to understand and configure security policy, as far as necessary to meet these obligations.
Usability can also be extended to the project world, in the sense that security methods and practices must be useable by project practitioners, the developers, administrators, and project manager and service providers - indeed all stakeholders.
Performance and Scalability
Performance and scalability are concerned with the extent that security services and technology will support large numbers (usually of users and services), or sizes of data, without significantly impacting the performance of the application.
Most grid projects are still in development, so scalability and performance issues have yet to be met in practice. The number of users will rise significantly when current systems move into their exploitation phase. Two areas give immediate cause for concern: the Authentication infrastructure, including the present e-science CA, and the mechanisms for mapping grid users' identities (distinguished names) to end-system usernames. (1.1, 2.1).
Interoperability
Interoperability is concerned with the ability to traverse the boundary between two grid environments, in such a way that agreements between the parties, or constraints expressed in the associated protocol, are upheld. Interoperability is dependent on both external policy agreements and technical standards. A different, but practical and desirable, feature of interoperability is to facilitate an open framework for security components and services.
The most pressing interoperability problem is the number of potential Certification Authorities; certificate management does not constrain research today, but is likely to become a manageability problem as the number of users increases. (1.1). The selection of one or more authorisation infrastructures (2.1) has been recommended, but it must be noted that most of the available options use propriety expressions of policy, and this must be resolved (2.2) to facilitate future interoperability.
Priority needs:
9.1 The interoperability of Grid services (short term).
There exist a number of leading service infrastructures for Grid development. It is important that evaluation and implementation of these systems is seen as an ongoing activity to allow evaluation of interoperability. This is likely to inform future development in these infrastructures.
Assurance
Assurance is concerned with the ability to quantify the reliability with which a particular service upholds a given security policy or function. It contributes to the degree of trust that a user might place in a service, or that might persuade two parties to agree on a particular policy framework. Although it is just one component of trust, it is included here because there are established international frameworks[17] for setting and measuring assurance.
There are few assurance issues in the current grid, although the mechanism that maps global to local identities has been subject to an assurance evaluation. In general, any of the recommendations that result in security mechanisms should be careful to use good design practice (such as least privilege, and minimising security critical software) even if a formal assurance evaluation is not required.
Security awareness, Education and Training
Awareness, education and training has come up in several parts of this document. It is clear that a continuing activity is required as technology develops. As technical solutions to various components of the security puzzle are provided, it is essential that there is a programme of activity to educate developers and users with regard to these technologies. This needs to be an ongoing effort but there are some clear priorities at this time including WS-* security specifications and PKI security.
There is a need also to ensure that there is appropriate training in the use of grid software that includes security requirements and usage - from the low level systems software to the application services.
A key point that hasn't been touched on yet in this document is the issue of security policies. Security policies are inherently specific and have to be written for the requirements of sites and projects. Unfortunately it would seem that those who should write the security policies don't know that they should and don't know how to do it. In e-Science, these are site and project managers, those running VOs. There is a clear requirement for:
a programme to increase awareness among these stakeholders, and
basic education on security requirements and responsibilities, and
c) training on how to write policies. A tool could be developed to make it less cumbersome and error-prone to write the policies
Best Practice and Guidance
The solutions required for individual projects vary according the needs of the project. As experience builds on appropriate solutions it is important to promulgate best practice through the development of guides and through tools to assist in planning appropriate security solutions.
A clear finding from the security survey is that the developers need a one-stop shop for up-to-date, complete documentation for the software basis they use (Globus and WS) sample designs and implementations (patterns) that provide good security they can re-use and adapt. They also need accessible documentation/tutorials on security issues in these implementations, and someone they can ask when they get stuck.
5. Summary
There are a number of actions that need to be taken in near term and medium term in order to provide the e-Science community with the know-how and tools to deliver secure environments. These include activities in education, documentation, development as well as applied research. It is clear from the workshops, engagement with users and the work done by Sasse and Craft that a number of these issues are critical in both time and take up. We recommend that the e-Science Core Programme ensures appropriate funding to enable projects in these areas.
References
1. The Globus Project, [Author ID0: at Thu Jun 23 16:51:00 2005 ]
STF Near/Medium Roadmap draft 1.0
STF Page 1 14/07/2005