In this section
What's New - Issue 47, July 2012
In this issue:
- What's On - Forthcoming events from July 2012 onwards
- What's New - New reports and initiatives since the last issue
- What's What - Indiana Jones and the Raiders of the Lost Research Data, William Kilbride, DPC
- Who's Who - Sixty second interview with Patrick McCann, Digital Curation Centre
- One World - nestor - The German Network of Expertise for Digital Preservation, Sabine Schrimpf, German National Library
- Your View? - Comments and views from readers
What's New is a joint publication of the DPC and DCC
The DCC have a number of events coming up that may be of interest to you. For further details on any of these, please see our DCC events listings at http://www.dcc.ac.uk/events/. You can also browse through our DCC events calendar to see a more extensive list of both DCC and external events.
JISC-British Library DataCite Workshop: Describe, disseminate, discover: metadata for effective data citation
6 July 2012
Good quality metadata is essential for effective data discovery and citation. This workshop will take a look at the challenges that metadata-management presents for data repositories, as well as the potential tools that metadata can support. An introduction to the development of the DataCite schema will be presented, with an opportunity to discuss how it can be incorporated into day-to-day data management practices. The emphasis of the workshop will be on practical approaches to metadata capture and usage, with case-studies providing an insight into how the issues are being addressed at the repository level.
The 7th International Conference on Open Repositories (OR2012)
9 - 13 July 2012
EDINA, the University of Edinburgh's Information Services and the Digital Curation Centre are delighted to announce that the University of Edinburgh will be hosting the Seventh International Conference on Open Repositories (OR2012) from 9-13 July, 2012. The theme and title of the 2012 conference at Edinburgh – Open Services for Open Content: Local In for Global Out – reflects the current move towards open content, ‘augmented content’, distributed systems and data delivery infrastructures.
Open Repositories Developer Challenge 2012
9-12 July 2012
The DevCSI project (funded by JISC and based at the Innovation Support Centre, UKOLN, University of Bath) is proud to announce that it is once again organising the Open Repositories Developer Challenge 2012 at the Seventh International Conference on Open Repositories in Edinburgh, Scotland - Open Repositories 2012. We are working closely with the Repositories Fringe and the challenge is kindly sponsored by Microsoft Research. The challenge is:'Show us some thing new and cool in the world of Open Repositories'.
ISKO UK and BCS IRSG seminar and workshop: I think, therefore I classify
16 July 2012
Classification is central to organizing information for our users, and we have secured an array of first-class speakers. The aim of this workshop is to help us individually and collectively review our approaches to the subject. In the breaks there will be demonstrations from vendors of software designed to handle classification automatically.
Links That Last: Linked Data, Persistent Identifiers and Digital Preservation
19 July 2012
This briefing day will introduce the topics of persistent identifiers and linked data, discussing the practical implications of both approaches to digital preservation. It will consider the viability of services that offer persistent identifiers and what these offer in the context of preservation; it will review recent developments in linked data, considering such data sets might be preserved; and by introducing these two parallel topics it will go on to consider whether both approaches can feasibly be linked to create a new class of robust linked data. Based on commentary and case studies from leaders in the field, participants will be encouraged to consider practical implications for their own wok and new directions for research and development in the field.
Advanced Techniques on Data Analytics and Data Visualization
22-24 August 2012
Advanced tools such as data mining, data modeling, data visualization, and information analysis etc. have been extensively researched recently. The objective of this workshop is to provide an interactive platform for academic researchers and industrial practitioners to exchange ideas, disseminate the latest development in data analytics and data visualization. Conceptual models, tools, and current issues are encouraged to be discussed.
2nd SPRUCE 'Mashup', London 18-20 September
18-20 September 2012
JISC funded SPRUCE Project cordially invites you to the second SPRUCE Digital Preservation Mashup. SPRUCE is organising a series of free events around the UK that will provide support and technical expertise to address the real digital preservation challenges that institutions face. The best work from event attendees will be awarded funding to develop the activity and embed it within business as usual processes. £60k is available for these awards.
Research Networks: Underpinning Discovery, Supporting Knowledge Transfer
27 September 2012
The UK is ranked as having the second strongest research base in the world behind only the US. The UK's universities and research centres have an exceptional international reputation, but there is room for improvement. Given the spiralling costs associated with research and development, the process needs to become more efficient and deliver better value for money. That may mean working more closely with other institutions, with the private sector, across disciplines and across international borders. If research is better coordinated and if resources are pooled more effectively, then it is more likely that outcomes will achieve excellence and commercial success. The conference aims to explore how to foster pioneering research and innovation. The programme will showcase best practice of knowledge transfer, collaboration and excellence, highlighting the network infrastructures that can help to deliver results.
1-5 October 2012
Registration has opened for the ninth annual conference on digital preservation, which will be held at the University of Toronto. The finalised program will be published in dure course.
23rd International CODATA Conference “Open Data and Information for a Changing Planet
28-31 October 2012
The theme “Open Data and Information for a Changing Planet” encompasses some relevant issues in data-intensive scientific fields. Nurturing an open environment for data and information is crucial for disseminating research results to a wide audience and allowing thorough, collaborative analysis. Also, the theme distinguishes between data and information and by so doing highlights the role data-intensive science plays in transforming raw observations into applicable, intelligible results and discoveries. CODATA 23 will bring together stakeholders from industry, research and academia who will highlight, debate and address these issues over a three day period. It will provide an international forum where these stakeholders, in collaboration with the ICSU and CODATA international networks and other networks can create a dialogue on legal, economic, and technological challenges; evaluate societal impacts; and put forward possible solutions that can in turn benefit the planet. Nurturing an open environment for data and information plays a pivotal role in this process. This will be the underpinning message of the conference.
For more information on any of the items below, please visit the DCC website at http://www.dcc.ac.uk.
New ICE International (Digital) Curation education list:
As a follow up to the JISC ICE (International Curation Education) Forum, JISC have setup a new mailing list to facilitate and promote discussion on issues to do with education and professional development in the field of digital curation, preservation, archiving and records management. It is a moderated list and is owned by JISC and the Digital Curation Centre. Everyone with an interest in this topic is welcome to subscribe.
Support your Digital Preservation community!
A new proposal for a Digital Preservation question and answer site needs your support. This community has a wealth of resources on digital preservation and curation but they're spread across all sorts of different web sites. Finding exactly the right information, particularly for someone new to Digital Preservation, is not easy. Repetition or re-invention of existing work, due to lack of awareness of solutions already available, is another common problem. A single point of contact for advice, that is driven by the expertise of the whole community, would therefore be really useful and that's exactly what Stack Exchange is for.
Stack Exchange began as Stack Overflow, the go-to website for assistance with programming problems. The functionality and socially driven moderation that evolved there resulted in such an effective question and answer web site that it was opened up to all sorts of different topics (http://stackexchange.com/sites). A dedicated Stack Exchange site for Digital Preservation has now been proposed, but needs support from the community before it can go live.
Please demonstrate your support by committing to use the site here (http://area51.stackexchange.com/proposals/39787/digital-preservation), but we need even more to get the site live. We need commitment from existing Stack Exchange users. That means having a reputation score of 200 or more on another Stack Exchange site. It's easy to get 200 rep by asking and answering questions. The new Libraries and Information Science Stack Exchange may be an ideal place for you to do this (http://libraries.stackexchange.com/).
Comments invited on Community Capability Model Framework
The Community Capability Model Framework is a tool developed by UKOLN, University of Bath, and Microsoft Research to assist institutions, research funders and researchers in growing the capability of their communities to perform data-intensive research by:
- profiling the current readiness or capability of the community,
- indicating priority areas for change and investment, and
- developing roadmaps for achieving a target state of readiness.
Following a community consultation process undertaken by the project to develop the framework, including workshops and group consultations, the resulting framework for describing community capability is now captured as a draft white paper.
COAR Open Access Agreements and Licenses Task Force
This is a multi-stakeholder Task Force initiated and supported by COAR (Confederation of Open Access Repositories), with members representing a number of different types of organizations (libraries, licensing agencies, library associations, and open access groups) with a common interest in promoting sustainable and effective practices for open access. The Task Force aims to review and assess the growing number of open access agreements being implemented between publishers and research institutions.
First Open Access Publisher Visible on OpenAIRE
We are delighted to announce the integration of the first open access publisher into OpenAIRE. Renowned open access publishers such as Copernicus Publications offer a great service to authors and significantly contribute to the uptake of the European Commission’s Open Access Pilot” says Norbert Lossau, Scientific Coordinator of OpenAIRE, an initiative co-funded by the European Commission (EC). Copernicus and OpenAIRE have worked together to identify publications resulting from EC-funded projects. As a result, well over 400 publications have now been imported to the journals’ and OpenAIRE’s databases and will regularly be updated. Moreover, on submitting articles, authors can easily acknowledge EC-funding and will be alerted about the opportunity to use project funds for article processing charges. OpenAIRE builds up a Pan-European publication Infrastructure, bringing together 33 European countries to provide open access to European research results. It regularly harvests information from an increasing number of open access repositories and journals, and in the near future, from data archives. Further services deployed by OpenAIRE will support statistics and the creation of complex publications linking from articles to research data.
Joint statement on data citation from STM publishers and DataCite
On 14th June 2012, DataCite and the STM-Association signed a joint statement to encourage publishers and data centres to link articles and underlying data:
- To improve the availability and findability of research data, DataCite and STM encourage authors of research papers to deposit researcher validated data in trustworthy and reliable Data Archives.
- DataCite and STM encourage Data Archives to enable bi-directional linking between datasets and publications by using established and community endorsed unique persistent identifiers such as database accession codes and DOI names.
- DataCite and STM encourage publishers to make visible or increase visibility of these links from publications to datasets.
- DataCite and STM encourage Data Archives to make visible or increase visibility of these links from datasets to publications.
- DataCite and STM support the principle of data re-use and for this purpose actively participate in initiatives for best practice recommendations for the citation of datasets.
- DataCite and STM invite other organizations involved in research data management to join and support this statement.
What's What - Editorial - Indiana Jones and the Raiders of the Lost Research Data
William Kilbride, Executive Director, DPC
I'll cheerfully admit that my first meaningful exposure to academic research was Indiana Jones and the Raiders of the Lost Ark.
So the reality of modern archaeological research is a little disappointing. For the record I've not yet had to climb a crcodile-infested canyon, fight Nazis, free child slaves, jump from a plane in a liferaft, or dodge poison darts in some centuries-old death trap (except in the metaphorical sense). Fieldwork in Scotland is just not like that.
But the series has a few well observed subtleties which you could be forgiven for missing. There's the vast, orderly and anonymous warehouse where the Ark of the Covenant ends up: I'm pretty sure that you could recreate a scene like that in the Glasgow Museum Resource Centre. There's the hugely learned and charming Marcus Broadie figure - a hopeless fish-out-of-water when it comes to fieldwork. There's the local fixer who really runs the dig, and there's the absolute dependence on poorly provisioned shovel-bums who are quickly shooed in the unlikely event that something interesting turns up. When Salah says, 'They've shanghaied every digger in Cairo', you should perhaps understand 'they've made three months unpaid fieldwork compulsory for progression into fourth year'.
It's Indiana Jones's research data management problems that resonate most with me these days. Indeed a good proportion of 'The Last Crusade' is concerned with the misadventures of Henry Jones's research notebook - 'I wrote it down so I wouldn't need to remember'. You can't really blame the Jones's: the CIA didn't seem to provide much of a research data infrastructure, nor did they mandate a data management plan. Frankly the whole planning and risk assessment process seems a little sloppy. So what the Jones's really suffer from is a policy and governance vacuum. My own experience of research has been of increasingly detailed requirements and increasingly sophisticated infrastructure enabling ever-greater expectations about access and interoperability. Jones's research notebook would surely now have morphed into an online virtual research environment integrating structured and non-structured data and allowing 24/7 access from anywhere in the world to anyone who wanted it.
Before you accuse managerialism of taking the romance out of research, then consider the length of time that some of those things were in the ground before the hapless archaeologists got involved. The codes of practice around archaeological research don't just protect research data. In some senses they protect archaeology from archaeologists. It's long been understood, if imperfectly implemented that 'A discovery dates only from the time of the record of it, and not from the time of its being found in the soil'. And this is not just about archaeology: similar themes about competence, trust and responsility are found in other areas of research where the the object of study is of greater value than any given set of results; or where the results are so valuable that they need to he shared with others very speedily. A little attention to good practice is a small price to pay.
But archaeology is helpful because it always raises serious questions about about the long term. How will we find and how will we understand data in the future? There are at least two issues to consider.
On one hand we have the long-term issue of persistent identifiers. A good example is the Datacite service which starts with the simple proposition that it should be easier to find and cite research data online. This creates the conditions for increased acceptance of research data as a legitimate output from research in its own right. It enables the re-use of data by others and in so doing it means that the long term value of data can more easily be realised. Datacite is a member of the Digital Object Identifier foundation so they are thinking not just about access now but about the long term too. DOIs are designed to provide persistence so that, even if a data set moves, the identifier can still be resolved.
On the other hand we have the issue of understanding the meaning of data to maximise its usefulness. This is where the 'linked data' approach shows most promise. Linked data is conceptually simple in so far as the meaning of any object can be expressed as a subject/predicate/object triple. These triples might be expressed in standard english as 'X has value Y'. The depth of the network means that any statement can be constructed as a set of links. So instead of actually expressing 'X' or 'Y' or even 'has value' we can express the same by providing three links in the right order. This matters because it begins to provide the sort of structured meaning which a computer could understand, and it's close to the sorts of requirements that digital preservation creates for representation information.
These two questions - how links can be provided that last and express useful information - are relevant to each other. Some sort of persistence is required if linked data is to be meaningful in anything other than a short time horizon, while informed and machine-readable description will make archives more navigable and manageable as they grow . But there are challenges too. Long chains of interdependent data sets are likely to be fragile, encountering many of the problems which the digital preservation community has been tryin to address for two decades now. Arguably there has been too little dialogue between the advocates of 'linked data' and the services of 'persistent identifiers'. It's certainly true that the digital preservation community needs to understand the implications of what's on offer. Is there a category of 'robust linked data' that marries the best of both approaches? (Is it there already?) What are the technical or organisational obstacles which inhibit this?
The next DPC briefing day on 'Links that Last' in Cambridge on the 19th July won't be seeking for the Holy Grail or the Ark of the Convenant but it will be exploring this question of 'robust linked data'. It will be interesting to see how our intrepid explorers extricate themeselves from this particular labyrinth. In the Last Crusade, Indiana's well-intentioned but flawed attempts to follow his father's research means that Henry Jones's notebook turns up at the worst possible moment at the worst possible place. I wonder if our well-intentioned but flawed attempts to enable access could back fire too.
Is there a linked data equivalent of Jones Sr's exasperated outburst: 'I should have mailed it to the Marx brothers'?
Who's Who: Sixty Second Interview with Patrick McCann, Institutional Support Officer, Digital Curation Centre
Where do you work and what's your job title?
I’m an Institutional Support Office for the Digital Curation Centre and I’m based at the Humanities Advanced Technology and Information Institute at the University of Glasgow.
Tell us a bit about your organisation
The DCC is a world-leading centre of expertise in digital information curation with a focus on building capacity, capability and skills for research data management across the UK's higher education research community. We provide expert advice and practical help to anyone in UK higher education and research wanting to store, manage, protect and share digital research data.
What projects are you working on at the moment?
I’m currently working on improving the CARDIO tool (http://cardio.dcc.ac.uk) and I am involved with the DCC’s programme of engagements with higher education institutions. These allow us to provide intensive, tailored support to increase data management capability.
I’ll shortly be undertaking a short piece of work to follow up on the work I did on MediaWiki and Facebook at the SPRUCE Mashup Glasgow.
How did you end up in digital preservation?
When I joined HATII as a developer it was a partner in a number of digital curation and preservation projects, including the DCC. I found the subject fascinating – it’s extremely important, but most people unfamiliar with it either haven’t considered it at all or don’t appreciate the complexity and subtlety of the problems. Working on the CASPAR project really brought home the mix of technical and organisational issues that need to be tackled.
What are the challenges of digital preservation for an organisation such as yours?
While working with institutions we often encounter individuals who are aware of research data management issues but who may be struggling to make the case to those around them, and in particular to those in a position to implement changes. Making the case for research data management as something which can provide real benefits to an institution beyond fulfilling the requirements of funders is a key challenge.
What sort of partnerships would you like to develop?
The DCC has a number of partnerships with other organisations, but the key people we want to work with are researchers and others within higher education seeking to curate research data. We’re here to help them to do that.
If we could invent one tool or service that would help you, what would it be?
I know from previous work on the LIFE project that estimating the costs of digital preservation is very difficult. A tool which could produce reliable estimates of the costs and benefits of implementing digital preservation measures would be immensely useful.
And if you could give people one piece of advice about digital preservation ....?
I’ve come across a number of people struggling with preservation issues on their own. There are other people out there dealing with this stuff too, so ask for help! The DCC website’s a good place to start. In particular, look out for DCC events near you!
If you could save for perpetuity just one digital file, what would it be?
Can I zip up all of the photos and videos of my daughter?
Finally, where can we contact you or find out about your work?
One World: nestor - the German network of expertise for digital preservation
Sabine Schrimpf, German National Library
Three years after its transformation from project phase to a sustainable partner consortium, nestor has delivered the third of its annual nestor Practitioners’ Days in June 2012. The Practitioners’ Day exemplifies the mission of nestor: to bring together experts from different communities to discuss and find out about new developments and practical approaches to digital preservation. The event provides a forum for exchange of experience and is intended for all who deal with the practical and concrete issues of digital long-term preservation. Following two more general and comprehensive events in 2010 and 2011, the nestor Practitioners’ Day 2012 set two specific focal points, one on cost and business models and another on the specific issues of preserving audiovisual media.
Altogether, however, the activities of nestor have broadened and diversified during the last couple of years. The nestor partners now host nine working groups, which are open to non-nestor members as well:
“Networking and Cooperation” provides a forum for identifying and addressing collective problems in digital preservation, “Media Preservation” gathers expertise and best practices from the area of AV and multimedia preservation. In the WG “Rights/Legal Issues”, legal experts discuss passages of the coyright legislation that hinder digital preservation and suggest possible solutions. The WG “Digital Preservation” has compiled and published guidelines on how the concept of significant properties can be pragmatically used in digital preservation processes, and the WG “Emulation” was established as a focal point for exchanging experiences on emulation projects. Participants interested in formulating a Digital Preservation Policy for their institution have joined forces in the WG “Policies”. A process model for establishing an institution specific cost model is developed in the WP “Cost”. The nestor WG on “Certification” has developed and tested a procedure for self evaluation against DIN 31644 “Criteria for trustworthy digital archives” in accordance with the European Framework for Audit and Certification . In 2012, these procedures shall be definitely determined and tools are under development to support them. It is envisaged to provide a “nestor seal” to repositories that have successfully undergone extended self evaluation. An ad-hoc WG, founded just for this particular task, has just published a German translation of the OAIS Reference Model. 
Since March 2011, nestor has hosted three workshops on web archiving. While the first two events were on invitation only, the third web archiving workshop was offerend as public event to the interested community. With almost 150 participants, the demand for the forum on web archiving issues was impressively confirmed.
Another topic that has been devoted a lot of attention in 2011 and 2012 is long term availability of research data. A joint workshop of nestor and the German Data Forum explored the issues of preservation of Social Science and Economic data by the end of 2011. In cooperation with the D-Grid GmbH  a baseline study on the preservation of research data was conducted and has has just been published recently. 
The group of higher education institutions that collaborate in the Framework of a Memorandum of Understanding with nestor has grown to now twelve institutions that work collaboratively towards a digital preservation curriculum.  The higher education partners also co-organise the annual nestor school; the next one will take place in autumn of 2012.
Recognizing the fact that much work remains to be done in digital preservation, nestor is ready to offer forums to address new arising topics and is, therein, open to national and international cooperation.