{"id":45,"date":"2015-11-19T15:35:40","date_gmt":"2015-11-19T15:35:40","guid":{"rendered":"https:\/\/candicemorey.org\/?p=45"},"modified":"2016-09-15T16:46:35","modified_gmt":"2016-09-15T16:46:35","slug":"a-visit-from-the-ghost-of-research-past","status":"publish","type":"post","link":"https:\/\/www.candicemorey.org\/?p=45","title":{"rendered":"A visit from the Ghost of Research Past"},"content":{"rendered":"<p class=\"p1\"><em><span class=\"s1\">A request for an old data set recently afforded me the opportunity, much like Ebenezer Scrooge, of revisiting my Past-Self when I was a brand-new post-graduate student, and allowing Past-Self and Future-Self to help me critique how my lab curates our data and materials in the present day. Both Past-Self and Future-Self are compelling agitators for a proactive approach to opening data, especially implementing a <a href=\"http:\/\/bayesfactor.blogspot.com\/2015\/11\/habits-and-open-data-helping-students.html\">Data Partner<\/a> scheme.\u00a0<\/span><\/em><\/p>\n<p class=\"p1\"><span class=\"s1\">Openness about our work is consistent with believing that the work is important and excellent. Being asked for access to your work is an acknowledgement that it is valuable, and sharing it is an expression of your confidence in its value. I\u2019ve found openness to be rewarding, leading to additional citations, gracious acknowledgments, and sometimes new collaboration opportunities.\u00a0<\/span><\/p>\n<p class=\"p1\"><span class=\"s1\">However, requests for data or materials fluster us, arriving out-of-the-blue. It always seems necessary to perform fresh checks: Is the code understandable and functional? Data may need to be explained and possibly tidied: what do the column headings mean again? Could there be identifying details in any of the responses? I might spend hours performing these checks before complying with a request.\u00a0<\/span><\/p>\n<p class=\"p1\"><span class=\"s2\">Waiting until the request arrives to open up data and materials can be seen as a tacit judgment on the expected impact of the data. Why, if I believe\u00a0the work I do is worthwhile, am I not preparing it for public consumption <em>before<\/em> I publish it? When did I start imagining that no one was likely to be interested in re-analyzing my data or using my experimental code? \u00a0<\/span><\/p>\n<p class=\"p1\"><span class=\"s1\">Recently I was asked for data from the first paper I ever published, part of my master\u2019s research project, which were collected in autumn 2002 and published in 2004. Possibly, sharing data that has been untouched for more than 10 years is asking too much. It wouldn\u2019t have been strange if I had lost it in institutional moves and computer crashes, or if it proved impossible to adequately document. But if found, going through these data would give me an opportunity to pay a visit to my Past-Self, recall what it was like to begin a research project for the first time, and maybe learn something from her.<\/span><\/p>\n<p class=\"p1\"><span class=\"s2\">One thing that struck me as I examined Past-Self&#8217;s data is that Past-Self organized it expecting that other people would be looking at it. Past-Self inserted comments explaining what numeric codes meant. Past-Self wrote summaries of the purpose of experiments, and Past-Self organized files into hierarchical directories with sub-folders for data files, analyses, and experimental stimuli. I think it would have surprised Past-Self that no one would ask to look at this information until 2015. Past-Self thought this work was important and documented it accordingly.<\/span><\/p>\n<p class=\"p1\"><span class=\"s2\">Though Past-Self began as a data-sharing idealist, she had minimal skills for curating data and materials. Some organization elements improved drastically in the later experiments in her project. Past-self learned it is better to make category codes self-explanatory (e.g., why assign \u201cmale\u201d or \u201cfemale\u201d to arbitrary numeric codes instead of just entering the words?). Past-self developed sensible conventions for naming files. Past-self reduced redundancies in data recording.\u00a0<\/span><\/p>\n<p class=\"p1\"><span class=\"s2\">But though some practices improved, it also became clear that Past-Self abandoned the expectation that anyone apart from her and her supervisor would ever see these raw data and materials. As the project drew on, the helpful comments disappeared, and the summaries for subsequent experiments were unchanged from the earliest ones. The whole directory was organized around an 8-experiment master\u2019s project, which eventually resulted in the publication of three experiments in two separate papers. Past-Self never re-organized these materials so that it would be immediately obvious how to locate the materials pertaining to each paper specifically.<\/span><\/p>\n<p class=\"p1\"><span class=\"s2\">Altogether I interrogated Past-Self for\u00a0about 5 hours: we\u00a0located the data sets requested, established through re-analysis that they did in fact include the same data that were published, saved them in an accessible non-proprietary format, documented what the data sets contained and how these variables were coded, and published the data and guidance on Open Science Framework. On the one hand, that isn\u2019t terrible. My Future-Self, who checked in throughout this process, insists that 5 hours of work accomplished\u00a0now is a sound investment. It enables a colleague on the other side of the planet to do a meaningful new analysis, from which we might all learn something novel. Furthermore, those data are now available to anyone else who might have other ideas for how our\u00a0data can be useful. Future-Self insists that this will lead to glory. On the other hand, this 5 hours of work entirely replicated work that Past-Self did more than 10 years ago in her haphazard manner. If Past-Self had carried on carefully documenting her data, if she had considered that materials should be available in commonly accessible formats, and if she had updated her personal repository to reflect the published record, then these materials would have been ready for sharing upon request in minutes, not hours. Future-Self is anxious to know how I am going to prevent this waste of time. Past-Self wonders whether I can do more to help my trainees learn good habits.<\/span><\/p>\n<p class=\"p1\"><span class=\"s2\">What, if any, are the constraints to\u00a0proactively\u00a0curating lab work? Proactive curation is obviously\u00a0desirable for Future-Self: it\u00a0saves her time and effort and it increases the impact and utility of the work. It is arguably good for trainees and PIs alike. Because I work with many short-term trainees, I have handled most data curation myself, but this is a\u00a0valuable\u00a0skill that Past-Self needed to learn better, and that Future-Self wants delegated. The <a href=\"http:\/\/bayesfactor.blogspot.com\/2015\/11\/habits-and-open-data-helping-students.html\">Data Partner scheme<\/a> is ideal for this: my trainees can be paired with trainees from a colleague\u2019s lab, and these two students will help each other curate data by seeing whether their partner\u2019s work is clear, self-explanatory, and reproducible. They do this independently of me. When the data are shown to me, they have already been vetted by one other person, providing an additional chance to catch mistakes. My trainees get the practice that Past-Self lacked, and Future-Self will never wonder whether data and materials are ready to be shared.<\/span><\/p>\n<p class=\"p1\"><span class=\"s2\">Are you at Psychonomics 2015? Come to our talk,\u00a0<\/span>Open Science: Practical Guidance for Psychological\u00a0Scientists, Friday at 10:40 am, in the Statistics and Methodology II session.<\/p>\n<p class=\"p1\">Update: Check out <a href=\"http:\/\/www.lornecampbell.org\/?p=116\">Lorne Campbell&#8217;s<\/a> thoughts on this too.<\/p>\n<p class=\"p1\"><span class=\"s2\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"A request for an old data set recently afforded me the opportunity, much like Ebenezer Scrooge, of revisiting my Past-Self when I was a brand-new post-graduate student, and allowing Past-Self and Future-Self to help me critique how my lab curates our data and materials in the present day. Both Past-Self&hellip;\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,3],"tags":[11,13,12],"class_list":["post-45","post","type-post","status-publish","format-standard","hentry","category-measurement","category-research","tag-open-science","tag-reproducibility","tag-research","odd"],"_links":{"self":[{"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=\/wp\/v2\/posts\/45","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=45"}],"version-history":[{"count":5,"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=\/wp\/v2\/posts\/45\/revisions"}],"predecessor-version":[{"id":47,"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=\/wp\/v2\/posts\/45\/revisions\/47"}],"wp:attachment":[{"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=45"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=45"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.candicemorey.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=45"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}