Disclosure: This post was jointly written by Todd Carpenter, Heather Flanagan and Chris Shillum, three members of the Resource Access for the 21st Centure (RA21) leadership team.
Recent posts on Scholarly Kitchen from Lisa Janicke Hinchliffe and Roger Schonfeld have expressed concerns about the direction of, and motivation behind the RA21 project. Both of these posts risk promulgating misunderstandings about the albeit complex technologies involved. We wonder if they are also projecting broader fears and concerns about changes in technology and the implications for scholarly communication onto this initiative?
Anyone who has used the internet knows that access control and identity management are fraught with problems, not only for users of online services, but for providers as well. How many passwords do you have for how many different sites? How often do you have to reset passwords you have forgotten, or credentials that have expired? System developers have to tread a fine line between security, usability, privacy, control and access, and often have to make trade-offs between security and ease-of-use. These challenges have been discussed, debated, and argued over for a very long time in the online community and huge amounts of effort have been expended to develop technologies to solve some of the problems.
Access control for scholarly information resources sidestepped these issues for years. After initially trying to use usernames and passwords for access to online systems, and realizing they were unwieldy for everyone involved to administer and use, resource providers adopted IP addresses as stand-in credentials for access to networked resources. Essentially, it was presumed that if you were on a campus or business network, you should be authorized to gain access to resources to which the campus subscribed. This made a certain amount of sense when users had to plug wires into the wall to get access to the Internet and when they did most of their work on campus. However, methods of accessing internet and digital resources have evolved. With the growth in mobile devices, remote working, and the expectation that information resources can be accessed from anywhere, at any time, from any device, these assumptions have become more and more problematic.
RA21 aims to solve these problems once and for all, by promoting a modern, standards-based access management system, which will meet the needs and expectations of users familiar with the seamless interactions of the consumer web, while preserving the privacy and user control that is rightly expected in a scholarly setting. It is important to dispel some myths so that we can move on from the outdated and anachronistic world of IP-authentication.
MYTH 1: IP AUTHENTICATION IS INHERENTLY PRIVACY PRESERVING WHILE FEDERATED AUTHENTICATION TECHNOLOGIES ARE NOT – BUSTED
RA21 is proposing the adoption of a federated authentication system based on a technology called SAML to authorize users’ access to institutionally provided resources. We are building on this technology specifically because it has inherent mechanisms, both technical and legal, to protect privacy and put the user and their institution in control of what personal information, if any, is released to the service provider.
On the technical front, SAML can provide the exact same degree of anonymity as IP- based authentication. Most service providers, for example platforms such as ScienceDirect, Wiley Online Library and ACS Publications, have supported anonymous authentication via Shibboleth for years. In this model, the SAML protocol allows the user’s institution to make a secure, trusted assertion that the user is a member their authorized user community without disclosing any specific information about that individual. In fact, this mechanism provides more privacy protection than IP access control, as in some circumstances IP addresses can be traced back to individuals, as evidenced by a ruling from the Court of Justice of the European Union
There is a critical point to be understood here: while the user may have signed in individually to their campus or corporate ID management system, knowledge of that user’s individual identity can remain within the institution and doesn’t have to be shared with the service provider. This is exactly analogous to what happens when a user signs in as an individual to a proxy server.
Hinchliffe is right to say that the institution can also choose to pass attributes to the service provider that are specific to the individual. These can then be used to provide additional convenience to users by allowing personalized features on service provider sites to be linked to a single institutional login. Conceptually, this is exactly the same as a service provider allowing the user to create a local account on their system, as most have done for years, but without the needs for separate usernames and passwords. RA21 believes that this should always be done in an open and transparent manner with the full consent of the user via a registration process. We agree that work needs to be done to set norms and establish best practices in this area, building on efforts being done at Internet2, Duke University, and elsewhere on the Scalable Consent framework. We hope to advance these best practices as RA21 moves forward towards implementation.
As a community, we need to raise awareness of misconfigurations of the type observed by Lisa, whereby campus ID systems incorrectly state that service providers require specific personal information to provide access, when in fact they do not, so that these can be resolved and eliminated. We anticipate a future phase of RA21 supporting the roll out of the new standard that includes a focus on educating institutions about best practices and breaking down the often rigid silos between libraries and campus IT departments that lead to these misunderstandings.
The concept of federated identity management was invented in the research and academic community close to 20 years ago. Alongside the technology, a model for building a fabric of trust has been established, based around the idea of identity management federations. To join a federation, which are typically organized geographically, identity providers and service providers must agree to a set of practices and policies such as those embodied in the US-based InCommon federation’s Participant Operating Practices. These understandings are backed by legal agreements which the participants must sign with the federation operator.
The combination of the technical and legal protections already in place in the research and academic identity management community mean that the starting point for RA21 is dramatically different than on the consumer web where, with services such as Google and Facebook, the user is generally the product, and all information is considered ‘fair game’. When the purpose is purely to support authorization to a service, privacy has a far better chance of being preserved.
When it comes to individual identity, issues around consent are actually very clear. The user should be able to consent to any sharing of personal information such as their name or email address – those items that are useful for personalization, but not fundamentally necessary to the authorization transaction. Not only is informed consent required by the GDPR, the forthcoming EU legislation, it is the right thing to do and RA21 is committed to this. Considerable work has already been done in analyzing the impact of GDPR on current practices in academic identity federations around the world.
MYTH 2: PROXY SERVERS WORK JUST FINE AS A SOLUTION FOR OFF-CAMPUS ACCESS – BUSTED
Many libraries have turned to proxy services, such as EZproxy, to solve the problem of off-campus access. These services have a huge installed base and in particular have been very good at integrating with a wide variety of campus ID systems and patron authentication services to ensure that the correct set of users can gain access to external resources. However, they also present significant problems in configuration and management, and fail to address changing patterns of resource access effectively.
One of the major difficulties faced by users when navigating the world of scholarly information resources is the need for authorization at the point-of-access. Users typically reach content provider sites from points such as Google, PubMed, references in other articles, and links sent by colleagues. From these starting points, users move from system to system among an array of resource providers and research workflow tools. They are essentially starting from anywhere and going to anywhere in their journey to access the most relevant and useful information.
Proxy servers just don’t work in this scenario; the fundamental assumption behind URL rewriting proxy servers such as EZproxy is that the user starts their research journey on the institutional portal, and can therefore follow a “proxied link” to the relevant information resources. If the user arrives at a content provider without starting in the right place, the content provider has no way to know where the user is from and therefore whether they should be granted access. Federated authentication solves this problem by allowing the user to tell the service provider where they are from, so that the service provider can point the user back to their institution to sign in. This is only possible because of the centrally managed metadata distribution services that identity federations provide. However, the “Where are you from” user experience today is inconsistent and difficult to use. This is the core problem that RA21 is trying to solve.
Proxy servers are also increasingly problematic given the drive for all websites to move to https in order to protect user privacy from snooping by governments, ISPs, and malicious actors. To work in an https environment, a proxy server has to decrypt the stream of information from a resource provider’s site, modify the contents to add proxied links, and then re-encrypt the information using its own SSL certificate before sending back to the user. The very same process is applied in reverse to requests sent from the user’s browser to the resource provider’s site which potentially contain the user’s personal data such as email addresses and passwords. Not only does this expose a weak point of vulnerability at the proxy where the user’s personal information is present in clear text, it also acclimatizes the user to the very patterns a hacker would use to stage a man-in-the-middle attack, and causes complex configuration challenges for those managing and supporting proxy servers.
We are encouraged by work that has been done to allow EZProxy to act as a gateway between campus authentication systems and service providers using CAS or SAML, and see this as a promising path to support an incremental transition to federated authentication.
MYTH 3: RA21 JUST WANTS TO ENABLE PUBLISHERS TO TRACK USERS ACROSS EACH OTHER’S PLATFORMS – BUSTED
In her article, Hinchliffe states that with Federated Identity:
…you could leave a data trail of both who you are and what resources (content and tools) you are using. Yes, that means your data could be potentially aggregated across platforms and combined with other datasets to create a more complete profile of you as a user. It is likely that you are already leaving trails of use data connected to the IP addresses of the devices that you use. With federated identity, the trail is connected to you and to the devices
This is wide of the mark on several fronts: First, federated authentication is not necessary to set up this kind of cross-site tracking, as any of us know who have experienced those annoying ads that follow us across the web once we have expressed an interest in buying a particular kind of product from a particular site. If they had wished to, scholarly resource providers could have set up exactly the same kind of tracking mechanism as used by the giant internet advertising networks. The fact that they have chosen not to do so, in the nearly two decades since Doubleclick promulgated this technology, demonstrates that there is limited if any commercial motivation for them to share this information, while the impacts to user privacy are likely unacceptable to users and the institutions that buy these resources.
Secondly, the SAML technology proposed for RA21 is different than the technology used by the major social network providers and it includes specific technical mechanisms to protect the user from cross-site correlation of their user data. As outlined earlier, federated authentication supports anonymous access should the identity provider and user so choose. And even when personalized access is desired, SAML provides a mechanism whereby a different opaque pseudonym is assigned for the same user to each service provider, specifically preventing data sharing and cross-correlation among service providers.
MYTH 4: RA21 CREATES YET ANOTHER USERNAME AND PASSWORD – BUSTED
Through the SAML protocol as described earlier, RA21 leverages a user’s existing institutional credentials and does not require the creation of publisher-specific usernames and passwords. The vast majority of users accessing scholarly resources from a campus or corporate network have very likely already signed into those networks using their institutionally provided credentials. RA21 seeks to enable a seamless and convenient experience where users who are already signed into their home institution are not prompted to re-enter their usernames and passwords.
Read full information.