Final Report of the Core User Directory Pilot Project
Introduction
The Core User Directory (CUD) Pilot Project was set up in October 2008 under the auspices of the Office of the Director of IT to produce and evaluate a pilot implementation of a Core User Directory for the collegiate University.
This final report of the Pilot Project begins with a brief recapitulation of the motivation for establishing the project and of its overall goals. This is followed by an executive summary of the main outcomes at the end of the project. A project summary describes how the project was conducted in more detail, and the report concludes with a proposed set of recommendations for future action.
Why a Core User Directory?
The need for a CUD arises from the highly distributed nature of the University's processes, in particular the fact that data concerning persons associated with the collegiate University is needed, stored, and managed by many different agencies. Something like a CUD is necessary in any large and complex organisation, but the need is particularly acute within the collegiate University. At Oxford, people typically have multiple affiliations, both collegiate and university, with the consequence that personal data is stored in many different places. Some personal data is duplicated across different stores, not always consistently or correctly. There is no simple way of reliably collating data about the same person held by different stores, because there is no single data attribute identifying that person. Some key business processes, for example the annual returns to HEFCE stating the number of staff and students attending the University, are needlessly complex or even impossible to calculate because data from different sources must be reconciled or cross-checked to avoid counting staff members who are also students twice. Some business needs (identification of potential donors for example) are critically dependent on the availability of reliable historical data, which may be dauntingly expensive or complex to produce effectively.
The significance of this problem was recognised in the ICT Strategic Plan, endorsed by Council GPC and PRAC in 2007. That plan identified the establishment of a University-wide Identity Management (IM) system as a key strategic priority. By providing a unique and reliable identifier for all individuals associated with the University, an IM system makes feasible the reliable collation of personal data maintained in different stores. It facilitates the combination of different data attributes relating to the same individual, even where these are managed in different places.
The CUD Pilot Project addressed a central requirement for Identity and Access Management: the creation of a single central directory in which all persons associated with the University are given a single unique identifier. It also provided a minimal set of associated data attributes, such as names and date of birth, likely to be of use to most applications. Databases across the University are thus able to use the CUD as a single point of reference. The role of a CUD is however neither to replace existing databases nor to become a centralized data warehouse. Its object is to facilitate the sharing of personal data, to enable more effective management and sharing of data, to eliminate errors caused by the retention of obsolete or inconsistent data, to facilitate maintenance of an accurate historical record, and to reduce time wasted on manual data entry processes.
Summary of Outcomes
- a data reconciliation service, which can be used to link data attributes concerning the same person but held in different data stores;
- a data provision service, which provides a useful subset of all the information known about a person directly from the CUD;
- a foreign key service, allowing data providers to store pointers to their own data within the CUD for use by others.
The CUD pilot implementation consolidates and reconciles personal data dynamically from five major central data sources: the University Card database, the Oracle Student System, the OUCS Registration database, Staff Records, and University Telecomms. As proof of concept, the CUD also includes data feeds provided by two University departments (DPAG and Earth Sciences), and by a college (New). All the data is refreshed daily.
The CUD pilot implementation exposes a range of interfaces, from a simple web form providing lookup facilities to a full Application Programming Interface (API) which can be embedded within (for example) a JAVA applet. The system uses reliable open source components, relying on proven technologies which are compatible with or identical to those used elsewhere in the University's IT infrastructure. Maintenance and future development of the system thus fit in well with the skill set available to the University.
- which data attributes should be included in the CUD and where is their authoritative source?
- on what criteria should a proposal to add or remove a data attribute to the CUD be assessed and by whom?
- which security and access policies should be applied to the CUD?
- should all central data providers be required to make their data available via the CUD rather than independently of it and if so, how?
- who should develop and maintain the CUD service?
These issues aside, the services currently provided on a pilot basis are sufficiently robust and of sufficient importance that they could be migrated to a full production service with minimal further development effort.
A set of specific recommendations for future action needed to accomplish this is given at the conclusion of the present document.
The CUD Pilot Project
Governance of the project
The CUD Pilot Project was managed by a project Working Party (WP) chaired by Mike Fraser (OUCS), with extensive representation from across the collegiate University. The Working Party formally reports to the PICT subcommittee of PRAC; its membership and terms of reference are available online.
Key staff from UAS, OUCS, OULS, and from individual colleges and departments contributed significant amounts of time and enthusiasm to the process of managing and running the Pilot Project. Additional effort during the consultation phase was contributed by Jonathan Ward (BSP). Implementation work was carried out by an external consultancy, Rob Hebron Consultancy, and project management provided by OUCS staff Tony Brett (2008) and Lou Burnard (2009).
A detailed project profile document, drawn up by Mike Fraser, Paul Jeffreys, and Tom Payne in February 2008 sets out the intended scope and deliverables of the project.
History of the Project
The Working Party met thirteen times during the project, and minutes of each meeting are available online. It held two workshops, presentations from which are also available from the project website.
Requirements analysis
Following specification of the project in early 2008 under the sponsorship of Professor Jeffreys, the project began with a wide- ranging consultation of key stakeholders across the collegiate University. This took the form of a number of in-depth interviews, aiming to identify what data stores already existed and to characterize the processes used to update and maintain them. Potential use cases for a CUD were also identified. Results from these interviews were consolidated and formed the basis of a facilitated one-day workshop held in May 2008.
Several common themes emerged during the interviews. Almost unanimously, the data providers interviewed professed themselves keen to co-operate, recognizing the limitations and drawbacks inherent in the current regime. Several respondents remarked that data attributes they used were obtained at second or third hand from other providers, or were of dubious pedigree. An unexpectedly large number were relying on the OUCS Registration Database or the University Card database (which are provided for quite other purposes) as authoritative sources of personal information about University members. There was a general perception that initial loading of a CUD would be challenging, and that the problems posed by multiple affiliations (and multiple statuses) should not be underestimated.
At the same time, a portfolio of "use cases" was assembled, comprising more than a dozen specific instances of tasks or facilities which could not currently be supported, but which the availability of something like the proposed CUD would facilitate. This process proved to be very useful in clarifying for the project team and others the range of opinions about how the CUD might be used in practice; it also provided the project with a set of identifiable goals which were subsequently used to drive the evaluation process.
The May 2008 Workshop also started the process of determining priorities for the pilot implementation, recognising that there were two quite distinct activities involved: a process of data cleaning and reconciliation as a means of constructing the CUD itself from existing data sources; and the provision of a range of interfaces to access the resulting database. The Project also considered a range of different implementation strategies: for example extending existing but more narrowly focussed systems such as the Card or the OUCS Registration Databases; or seeking a commercially produced package solution. Such systems as were known to exist would however require a high degree of customization; it seemed better therefore to take a tool-kit approach, using freely available and well understood open source components. This was also the recommendation of the ICT Support Team, following their own detailed evaluation of available identity management solutions. The Working Party agreed at a meeting in May 2008 that this approach represented the most practical way forward, by combining reliance on open protocols and the ability to leverage existing technical expertise.
A presentation about the project at the annual ICT Forum conference in July 2008 indicated that there was a wide degree of interest and enthusiasm amongst ICT staff in the University, but also some misconceptions about its scope (for example, that the CUD should support password-sharing or act as a long-term data warehouse).
Following a report submitted in June 2008, the PRAC Budget Subcommittee agreed that funding for an implementation phase should be provided, and routed to the Project through BSP Corporate Fund. It was also noted that care should be taken to maintain coherence with any other identity management activities being undertaken within the collegiate University.
Implementation
Although not formally defined as work packages, it was noted at this stage that the pilot would also need to deliver an effective communication strategy, an appropriate evaluation procedure, and (if successful) an appropriate development plan for any successor project, addressing project-to-service issues.
- the data contains information about a person who is associated with the University and is necessary for the University's business purposes;
- the data should be required by two or more data consumers, or should be required by a central data service, or should be provided by two or more data providers;
- the data must come from a source which uses a persistent person-based key or which provides sufficient other attributes to identify each person concerned.
The WP noted however that these were necessary but not (always) sufficient criteria. It also noted that policies in this area would be an important part of a more wide ranging University Information Policy, to be developed under the aegis of the Registrar.
During February and March, work on the pilot continued rapidly. With more data sources being integrated (notably from University Telecomms), the degree of success in mapping data about the same individuals from different sources increased to around the 90% level.
It was agreed that inclusion of new data sources would be greatly facilitated by the existence of reliable documentation about them, and the project put some effort into defining a simple XML format for such metadata. This format, documented in a technical paper by Lou Burnard, might be used to define any data source in a standardized way, such that inclusion of selected attributes from it into the CUD could be automated.
A simple XML format for export of data from the CUD was also designed: this was necessary to enable queries of any kind to return arbitrary subsets of data. In this context, it is noteworthy that consumers of CUD data are equally likely to be providers of CUD data.
The implementation phase of the project concluded with a demonstration of the fully operational database at the end of March, which was favourably received by the Working Party.
Evaluation
At the end of April, the WP reviewed a discussion document on the evaluation of the CUD Pilot, which summarizes the state of the pilot implementation at that time. The WP formally agreed that the high level goals and technical objectives of the Pilot project had been satisfactorily achieved, and that it was therefore appropriate to move forward by assessing the extent to which the pilot system supported the use cases which had been identified at the start of the project.
It was agreed that, with a small number of exceptions, most of the original use cases were still within scope and contact persons were selected for each from the WP membership. During May and early June 2009, about a dozen different use cases were tested against the Pilot implementation. Preliminary feedback, at the WP meeting at the end of June, reported a number of interesting problems, mostly data related, (for example, reflecting the quality of data available, or local privacy concerns), but overall the volunteers who had experimented with the system gave very positive reports of its usefulness.
Two example applications demonstrate the usefulness of the system, even in its pilot state. The first, a consequence of the inclusion of telecomms data, was the simple ability to combine personal data such as an email address with a current telephone number; this has not been possible up to now because the telephone data is indexed by physical location. The second enabled OUCS to change its internal processes so as to deliver single sign-on credentials for students under offer for Michaelmas 2009 well in advance of the availability of their Card data.
Rob Hebron and Lou Burnard gave a well-received workshop describing the project at the ICT Forum conference in early July, and slides from this were subsequently revised for discussion at the final workshop of the Pilot Project held 16 July.
Recommendations
- A project should be instigated as soon as possible, with the goal of
migrating the current pilot project to a fully supported and maintained
service. This service should:-
- deliver the three aspects of the CUD (data provision, data reconciliation, and foreign key storage) on a reliable basis, via a fully-supported service hosted at OUCS;
- develop a full communications plan to demonstrate the usefulness and usability of the service by appropriate means such as case studies, workshops, technical briefing papers, demonstration services etc.;
- adopt a phased approach, aiming to develop the CUD user-base incrementally by a gradual rollout of targeted services, developed centrally or by parners within the collegiate University.
- A CUD Management Board, responsible for further development and
maintenance of the CUD, should be set up with senior representation from
across the University. Its remit should explicitly address major infrastructural
and strategic issues such as:
- formulation of university-wide attribute release policies
- formulation of university-wide security and privacy policies
- establishing the feasibility of setting up and maintaining a university-wide data inventory
- ensuring good high level communication between the CUD service and major sources and consumers of personal data across the collegiate university, notably the central university services such as HR, OSS, OUCS etc.
- The CUD Service will need to be suitably resourced. A realistic and viable service will require two senior staff in full time roles, and possibly additional junior support effort. A total running cost of £150k p.a. is realistic. This would cover one full time database administrator post, and one 0.5 or 0.75 FTE project manager, to be replaced in due course by an 0.5 FTE service manager position.
- The next step will therefore be to prepare a project business case for consideration by the relevant Committees.

