This documentation file was generated on 2022-07-30 by Diana Castillo and Clara Llebot ------------------- # GENERAL INFORMATION ------------------- 1. Title of Dataset - Content analysis of institutional research data policies in the US. 2. Creator Information - REQUIRED Name: Clara Llebot Lorente Institution: Oregon State University College, School or Department: Oregon State University Libraries Address: 121 The Valley Library Corvallis OR 97331–4501  USA Email: clara.llebot@oregonstate.edu ORCID: https://orcid.org/0000-0003-3211-7396 Role: Conceptualization, Data curation, formal analysis, Investigation, Validation, Writing - original draft, Writing - review & editing. Name: Diana J. Castillo Institution: Oregon State University College, School or Department: Oregon State University Libraries Address: 121 The Valley Library Corvallis OR 97331–4501  USA Email: diana.castillo@oregonstate.edu ORCID: https://orcid.org/0000-0001-9582-2071 Role: data curation, formal analysis, investigation, validation, writing - original draft, writing - review & editing 3. Contact Information - REQUIRED [Usually a creator, but may be somebody else. Consider adding more than one contact if the main contact is expected to change positions soon (e.g. a student expected to graduate)] Name: Clara Llebot Lorente Institution: Oregon State University College, School or Department: Oregon State University Libraries Address: 121 The Valley Library Corvallis OR 97331–4501  USA Email: clara.llebot@oregonstate.edu ORCID: https://orcid.org/0000-0003-3211-7396 Role: Conceptualization, Data curation, formal analysis, Investigation, Validation, Writing - original draft, Writing - review & editing. Name: Diana J. Castillo Institution: Oregon State University College, School or Department: Oregon State University Libraries Address: 121 The Valley Library Corvallis OR 97331–4501  USA Email: diana.castillo@oregonstate.edu ORCID: https://orcid.org/0000-0001-9582-2071 Role: data curation, formal analysis, investigation, validation, writing - original draft, writing - review & editing 5. Publisher Oregon State University ------------------- CONTEXTUAL INFORMATION ------------------- 1. Abstract for the dataset - REQUIRED This dataset contains the data generated for a study to evaluate if institutional research data policies in the US support or encourage the FAIR principles (Wilkinson et al 2016). The data include a list of criteria developed for the analysis, and the reasoning behind the inclusion or exclusion of criteria from other sources. The comparison of the results of this study with the results of previous work by Briney et al 2015. And the resulting dataset with the values for each of the criteria, on the 40 institutional research data policies that we identified.  2. Context of the research project that this dataset was collected for. This dataset contains the data generated for a study to evaluate if institutional research data policies in the US support or encourage the FAIR principles (Wilkinson et al 2016). This dataset was also created to aid in the process of writing an institutional research data policy. The dataset captures examples of the language used in 40 policies. 3. Date of data collection: Identification of policies was conducted in 2020, and finalized on 2020-09-01. Coding of the policies according to the criteria was done in 2021 and 2022. 4. Geographic location of data collection: This analysis only includes policies from High research education institutions in the United States.  5. Funding sources that supported the collection of the data: Two interns, paid by Oregon State University Libraries, helped with the collection of data. -------------------------- SHARING/ACCESS INFORMATION --------------------------  1. Licenses/restrictions placed on the data - REQUIRED This work is licensed under a Creative Commons No Rights Reserved (CC0) license. 2. Licenses/restrictions placed on the code None. 3. Links to publications related to the dataset: The dataset that explains how the policies were identified is in: Llebot, C. & Castillo, D. J. (2021) Research Data Policies at Doctoral Universities with very high research activity (R1 institutions) [Data set]. Oregon State University. https://doi.org/10.7267/4j03d6263 An article interpreting the data in this dataset has been submitted for review: Llebot, C. & Castillo, D. J. Are Institutional Research Data Policies in the US Supporting the FAIR Principles? A Content Analysis. 4. Recommended citation for the data: Llebot, C. & Castillo, D. J. (2022) Content analysis of institutional research data policies in the US. [Data set]. Oregon State University. https://doi.org/10.7267/2n49t893t 6. Dataset Digital Object Identifier (DOI) https://doi.org/10.7267/2n49t893t 7. Limitations to reuse None.  -------------------------- VERSIONING AND PROVENANCE --------------------------  1. Last modification date 2022-07-29 2. Links/relationships to other versions of this dataset: None. 3. Was data derived from another source? The dataset that explains how the policies were identified is in: Llebot, C. & Castillo, D. J. (2021) Research Data Policies at Doctoral Universities with very high research activity (R1 institutions) [Data set]. Oregon State University. https://doi.org/10.7267/4j03d6263 4. Additional related data collected that was not included in the current data package: None. -------------------------- METHODOLOGICAL INFORMATION - REQUIRED -------------------------- A general description of the methods used to generate this dataset can be found in this article, pending publication. Llebot, C. & Castillo, D. J. Are Institutional Research Data Policies in the US Supporting the FAIR Principles? A Content Analysis. 1. Description of methods used for collection/generation of data:  A. Selection of policies: See a complete description of how university policies were identified, and requirements for inclusion in this study in Llebot, C. & Castillo, D. J. (2021) Research Data Policies at Doctoral Universities with very high research activity (R1 institutions) [Data set]. Oregon State University. https://doi.org/10.7267/4j03d6263 Policies included in this study must fulfill two requirements: 1. The policy is a stand alone policy that exclusively refers to research data. This excludes policies that also talk about research data, but that do not focus on research data, like intellectual property (IP) policies. 2. The policy is a university-wide official policy, created by university administration. This excludes faculty senate policies, faculty workbooks, etc., and policies from departments and colleges. B. Development of criteria: The criteria for this work were obtained from three sources: 1. The FAIR Policy Landscape Analysis. Davidson, Joy, Claudia Engelhardt, Vanessa Proudman, Lennart Stoy, and Angus Whyte. 2019. “D3.1 FAIR Policy Landscape Analysis,” December. https://doi.org/10.5281/ZENODO.3558172. 2. The Briney, Goben, and Zilinsky article. Briney, Kristin, Abigail Goben, and Lisa Zilinski. 2015. “Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies.” Journal of Librarianship and Scholarly Communication 3 (2): 1232. https://doi.org/10.7710/2162-3309.1232. 3. The authors of this study created some new criteria.  The file decision making_documentation_for_criteria.csv in this dataset gives the reasons why each of the criteria were included, and why some of the criteria created in the FAIR Policy Landscape Analysis were not included.  The final list of criteria is included in file final_criteria_with_descriptions.csv C. Content analysis Each of the authors of this dataset coded the 40 policies identified for each of the criteria developed. We include descriptions about how we have interpreted each of the criteria in final_criteria_with_descriptions.csv Authors recorded comments about their decisions, and copied the parts of the policy that supported their choices. These comments can be found in content_analysis_with_comments.csv The content analysis method allows us to count how frequently each of the criteria occur in policies. For more details about the content analysis method see Byrne (2016). Byrne, David. 2016. “Data Analysis and Interpretation.” In Research Project Planner, by David Byrne. 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications, Inc. https://doi.org/10.4135/9781526408570. 2. Describe any quality-assurance procedures performed on the data: We used a similar dataset compiled by other authors in 2014 (Briney et al 2015) to ensure that we had not missed any policies that had been identified previously. We also used that study to estimate how many new policies have been published, and how many policies have been updated since 2014.  Briney, Kristin; Goben, Abigail; Zilinski, Lisa, 2015, “Data from: Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies”, http://dx.doi.org/10.7910/DVN/GAZPAJ, Harvard Dataverse --------------------- DATA & FILE OVERVIEW - REQUIRED --------------------- 1. File List    A. Filename:  Compare_Briney2015_ThisStudy.csv             Short description: Comparison between the data published by Briney et al 2015 and the results of this study.            B. Filename:  content_analysis.csv       Short description: Results of the content analysis, where we note how we coded each of the policies for each of the criteria.    C. Filename:  content_analysis_with_comments.csv             Short description: Same as content_analysis.csv, results of the coding of each policy for each criteria. This version includes the comments that each of the authors took when coding. These contents are sometimes an explanation of the rationale about why we decided one value vs another. Often the comments includes copy pasting of the text of the policy that justifies the value assigned. The comments have not been revised for accuracy, so it is possible that in some cases a discussion between the authors prompted to change a value, and that the discussion is not reflected in the comments.     D. Filename:  final_criteria_with_descriptions.csv       Short description:  List of criteria used in this study with explanations on how we interpreted them, and possible values for each criteria.          E. Filename: decisionmaking_documentation_for_criteria.csv              Short description: List of all the criteria from our sources, and descriptions of why each of them were included or excluded for this study. 2. Formats All files are csv tabular files. ----------------------------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: Compare_Briney2015_ThisStudy.csv ----------------------------------------- 1. Number of variables: 17 2. Number of cases/rows: 135  3. Missing data codes:         blank        Not Applicable         NA           As defined in the Values below. 4. Variable List Institution     Description: Name of the institution        Values: text. R12014     Description: Was R1 institution in 2014 according to the data in Briney et al 2015.    Values:  yes = yes, it was an R1 institution in 2014.              no = no, it was not an R1 institution in 2014. DataPolicyBriney      Description: Briney et al concluded that this university had a data policy in 2014          Values:  yes = yes, the institution had a data policy according to Briney 2015.              no = no, the institution did not have a data policy according to Briney 2015.              NA = The institution was not an R1 or R2 institution in 2014 but was an R1 institution in 2020.   DataPolicyBrineyRevised    Description: Data from Briney et al (column DataPolicyBriney) modified. We have turned "yes" into "no" when the policy from 2014 does not fit the criteria we are using in this study.         Values:  yes = yes, the institution had a data policy.              no = no, the institution did not have a data policy.              NA = The institution was not an R1 or R2 institution in 2014 but was an R1 institution in 2020.  CommentRevision       Description: If there is a change between DataPolicyBriney and DataPolicyBrineyRevise, this comment explains why.         Values:  definition_does_not_apply = means that what Briney et al identified as a   policy in their study does not qualify as a policy according to our definition.               other = a reason different than definition_does_not_apply. CommentDefinitionDoesNotApply     Description:   If CommentRevision is definition_does_not_apply explain why the definition does not apply.          Values:  text R12020       Description: Was R1 institution in 2020 according to the Carnegie classification in January 2020        Values:  yes = yes, it was an R1 institution in 2020.              no = no, it was not an R1 institution in 2020. DataPolicyLlebot      Description: Llebot and Castillo concluded that this university had a data policy in 2020.        Values:  yes = yes, the institution had a data policy according to Llebot and Castillo.              no = no, the institution did not have a data policy according to Llebot and Castillo. SameR1Status       Description:   The university had R1 status in 2014  (according to Briney study) and in 2020.          Values:  yes = the universitiy had R1 status in 2014 and 2020.              no = the university did not have R1 status in 2014 or in 2020.  DataPolicy14and20     Description:   Indicates when there is a policy both in 2014 and in 2020. Uses formula "=if(AND(D106="yes",H106="yes"), "yes","no")"          Values:  yes = DataPolicyLlebot is "yes" AND DataPolicyBrineyRevised is "yes".              no = either DataPolicyLlebot is not "yes" or DataPolicyBrineyRevised is not "yes" SamePolicy      Description: If both "DataPolicyBrineyRevised" and "DataPolicyUs" have a "yes" (if DataPolicy14and20 is "yes", we checked if it was the same document. Briney et al graciously gave us the documents that they had used in their study to complete this check.         Values:  yes = the policies are the same.              no = there are differences in the policy. NA = DataPolicyBriney and DataPolicyLlebot are not both "yes" NoSamePolicyComment      Description: If SamePolicy is "no", explanation of what changed.          Values:  text Inconsistency      Description: Quality control. Checks if there are inconsistencies in the data. Inconsistency is defined as a university having a "yes" in DataPolicyBrineyRevised and a "no" in DataPolicyLlebot. Use equation "=if(AND(D2="yes",H2="no"), "yes","no")"          Values:  yes= inconsistency, DataPolicyBrineyRevised is yes, but DataPolicyLlebot is no.              no = no inconsistency.  NewPolicy       Description: Identify all cases that did not have a policy in the early study, but have a policy in 2020. DataPolicyBrineyRevised = "no" and DataPolicyLlebot = "yes". Equation "=if(AND(D2="no",H2="yes"), "yes","no")"           Values:  yes = new policy.              no = no new policy.  NewPolicyComment      Description:   Comment about new policy when it seems appropriate        Values:  text NewPolicyNA     Description:   Identify cases that we don't know if they had a policy in the early study because they were not R1 or R2 in 2014, but have a policy now. DataPolicyBrineyRevised = "NA" and DataPolicyLlebot = "yes". "=if(AND(D2="NA",H2="yes"), "yes","no")"         Values: yes = policy in 2020, no data in 2014.     no = all cases different than yes.  NoPolicy     Description:   Identify cases that have never had a policy. DataPolicyBrineyRevised = "no" and DataPolicyLlebot = "no".  "=if(AND(D2="no",H2="no"), "yes","no")"         Values:  yes = university that did not have a policy in 2014 (DataPolicyBrineyRevised = "no") and did not have a policy in 2020 (DataPolicyLlebot = "no").              no = policies that had a policy or no data either in 2014 or 2020 ----------------------------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: content_analysis.csv ----------------------------------------- 1. Number of variables: 37 2. Number of cases/rows: 43  3. Missing data codes:         There should not be any blanks. Many of the possible coding values are NA, meaning Not Mentioned, as explained in criteria_with_descriptions.csv 4. Variable List PolicyNumber Description: Number assigned to each policy Value: text following the pattern "PolicyX" where X is a number between 1 and 40.  Note: Harvard university had three different policies that had content that other universities usually have in one policy. Because of this reason there are four lines for Harvard University. Three for each of the policies, and one with a summary of the information in each of the policies. In this case we assigned the same number to all of Harvard Policies and added a letter to differentiate between them.  CriteriaX Description: Criteria number, where X is a number. X corresponds to the numbers of the criteria in criteria_with_descriptions.csv The second row of this file has the text of each of the criteria. Data start at row 3.  Value: see the possible values for each criteria in file criteria_with_descriptions.csv ----------------------------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: content_analysis_with_comments.csv ----------------------------------------- 1. Number of variables: 37 2. Number of cases/rows: 129  3. Missing data codes:       blank        Authors did not include any comment. Many of the possible coding values are NA, meaning Not Mentioned, as explained in criteria_with_descriptions.csv 4. Variable List PolicyNumber Description: It describes three types of content: the coding for each of the policies, and the comments about the coding by each of the authors. Number assigned to each policy Value: "PolicyX" where X is a number between 1 and 40, indicates that this row holds the coding results for each policy. "CommentsPolicyX_Clara" indicate that the row hosts the comments by author Clara Llebot about the coding of PolicyX. "CommentsPolicyX_Diana" indicate that the row hosts the comments by author Diana J Castillo about the coding of PolicyX.  Note: Harvard university had three different policies that had content that other universities usually have in one policy. Because of this reason there are four lines for Harvard University. Three for each of the policies, and one with a summary of the information in each of the policies. In this case we assigned the same number to all of Harvard Policies and added a letter to differentiate between them. Each of the policies  has rows for the comments of each of the authors.  CriteriaX Description: Criteria number, where X is a number. X corresponds to the numbers of the criteria in criteria_with_descriptions.csv The second row of this file has the text of each of the criteria. Data start at row 3.  Value: For "PolicyX" type of rows see the possible values for each criteria in file criteria_with_descriptions.csv. For "CommentsPolicyX" type of rows the value is text. The text of the comments can be both copied extracts of the policy that justify the coding decision, or comments by the authors explaining why they decided to code with a particular value. Note: The comments have not been revised for accuracy, so it is possible that in some cases a discussion between the authors prompted to change a value, and that the discussion is not reflected in the comments.  ----------------------------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: final_criteria_with_descriptions.csv ----------------------------------------- 1. Number of variables: 4 2. Number of cases/rows: 36  3. Missing data codes:         NA           NA means Not mentioned in the policy. 4. Variable List CriteriaNumber Description: number of the criteria. Criteria that are dependent on other criteria have been numbered as a sub record of the parent criteria.  Values: number as in X or X.Y in the case of dependent criteria. Criteria Description: criteria used for the analysis Values: text Instructions Description: notes that explain how we have interpreted each criteria, instructions to code with precision Values: text Values Description: possible values for each criteria, separated by semicolons Values: text ----------------------------------------- TABULAR DATA-SPECIFIC INFORMATION FOR: decisionmaking_documentation_for_criteria.csv ----------------------------------------- 1. Number of variables: 7 2. Number of cases/rows: 54 3. Missing data codes:         Blank           Not assigned final criteria number 4. Variable List Final Criteria Number Description: The number assigned to a criteria if it was included in our final list of criteria Values: integral Criteria Description: criteria used for the analysis Values: text Include/Exclude Description: Whether or not a criteria was included in the final criteria list Values: text Reasons for not including/excluding Description: Reason why a criteria was included or excluded from the final criteria list Values: text Origin: FAIRsFAIR Description: If the criteria came from the FAIRsFAIR report Values: text Origin: Briney Description: If the criteria came from the Briney et al. study Values: text Origin: this study Description: If the criteria originated with this study Values: text ----------------------------------------- HUMAN SUBJECT RESEARCH STUDY INFORMATION:  ----------------------------------------- 1. Was the study evaluated by an Institutional Review Board (IRB) or any other Ethics Committee? No.