Research Data Management - Guidance for CEU Students

Blinken OSA professionals help you to develop your research data strategy when designing a project, planning your research, or writing your thesis. This short guide summarizes the basics you should know about Research Data Management. For personal consultation please email to Ms Katalin Dobó at dobok@ceu.edu.

Introduction and basic concepts

Research data: "recorded factual material" (Butterfly collection at the Nature-Historical Museum, Admont)
Research data: "recorded factual material" (Butterfly collection at the Nature-Historical Museum, Admont)

What are Research Data?

All data generated by you in the course of your work, or collected from other sources and reused for your research.

Examples of research data include primary data, publications, physical collections, samples, software and models. In the field of humanities and social sciences researchers may produce and use data recorded in the form of images, videos, survey responses, interview transcripts, statistics, demographics, or opinion polling.

Research data can be stored on paper, audiovisual media, and in various analog and digital formats.

What is Research Data Management?

Handling your data during the project. RDM includes planning, documenting data, formatting data, storing data, anonymizing data, and controlling access to your data. RDM also offers practices that support long-term preservation, access, and use of your data after the project has been completed.

A good research data strategy supports the reproducibility and transparency of your research. It also makes it possible for other scientists to reuse your primary data, and significantly enhances the citation rate of your work.

We suggest that you integrate data management into your research rather than waiting until the ends of the project, when you are unlikely to want to spend time documenting data you are no longer actively using.

What is a Data Management Plan and why do you need it?

A DMP is a formal document that describes the data produced during a research project and outlines data management strategies that will be implemented during and after the active phase of the research project.

The practical aim of having a proper Data Management Plan is to meet the expectations and requirements of your university, research funder, publisher, or of the relevant legislation.

If your research is funded by the National Science Foundation, the National Endowment for the Humanities, or Horizon 2020 -- to name only a few of the biggest funding bodies -- you have to:

  • include a Data Management Plan in your grant application, and keep it up-to-date;
  • share all of the data generated in your project -- that is, to deposit your data in a research data repository, where third parties can access, reproduce and disseminate it, and to provide the tools needed to use raw data to validate your research results.

In most cases you are expected to preserve your research data for at least 10 years.

This is a useful checklist to consider when writing a DMP. It covers all the data management-related activities:

  • What data will you collect, generate or create, and how?
  • What documentation and metadata standards will be used to describe your data?
  • How will you manage ethical, copyright and intellectual property rights issues?
  • How will the data be stored and backed up during the active phase of your research? What is your plan for long-term preservation?
  • How will you set access controls and security measures?
  • Which data should be retained, shared, and archived?
  • How will you share your datasets? Will you impose restrictions on data sharing?
  • What data management costs do you anticipate?

There are two excellent web-based, open source tools that provide guidance and templates to write your Data Management Plan:

Practical solutions to data-related problems

Organizing your data

Some tips about organizing your data:

  • When choosing file formats prefer non-proprietory, open standards and use unencrypted, uncompressed formats;
  • Research data files and folders need to be properly labeled and organized. Plan folder hierarchy in advance;
  • Name your files according to agreed conventions. Develop a scheme that includes all important metadata;
  • Consistently identify and distinguish versions of research data files;
  • Provide a method for easy adoption;
  • Be consistent throughout the process.

Describing your data

Two metadata standards are commonly used to describe research data:

  • Dublin Core http://dublincore.org is a general purpose metadata standard, a vocabulary of fifteen properties for use in resource description;
  • DDI (Data Documentation Initiative) http://www.ddialliance.org describes data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. It helps you to generate codebooks, design and edit questionnares, manage datasets, etc.

Each discipline may have specific requirements regarding required metadata elements. There are different standards for specific domains or for particular types of data. We can provide assistance and help you select the most appropriate metadata standard to use.

Digital data should also have a unique and persistent identifier.

Backing up and storing your data

You may need to clean, manipulate, or process your data. What is important during this stage is to create a “master” version to be analyzed and eventually archived and to document all the changes you made to the raw data. You need to document hardware and software specifications, and the model and code you used to run the analysis.

When backing up your data there is a simple rule, called the 3-2-1 rule, to follow: you need backup copies in three locations that are maintained regularly: two locally, on two different devices, and one off site (the off site copy could be stored in the cloud). Data storage might be different during the active phase of your research. For long-term preservation of your data you should consider different services (see below).

Your data -- whether you collect or produce them during your research -- are expensive and often irreplaceable. Always use up-to-date anti-virus software and strong passwords.

If you work with personal or sensitive data, you are expected not to use personal computers or commercial cloud services for data storage, since to do so would represent a clear security risk. Highly sensitive electronic data should be encrypted.

OSA Archivum can assist you by providing a safe place and by storing your paper or audiovisual records with sensitive data during your studies at CEU.  

Preserving and sharing your data

Long-term digital preservation has three primary goals: to safeguard the integrity of your content, to prove its authenticity, and to ensure its accessibility. Integrity of your data means that it has not been corrupted over time or in transit, while authenticity means that your data are genuine, free from tampering, including their physical characteristics, structure, content and context; accessibility means that you will be able to retrieve and read the content even decades later.

There are multiple risks that can lead to the loss of digital data. Data are subject to degradation over time; software and hardware become obsolete. Without appropriate documentation (codebooks, descriptive metadata, etc.) your data may be impossible to interpret in the future.

A trustworthy repository can assist you in combating these risks. Digital preservation experts use a set of criteria for assessing the capability of repositories to maintain digital content over time. We can help you identify an appropriate data repository to archive your data: in data-type specific repositories, in general or institutional repositories.

You can browse lists of data repositories from different academic disciplines in global repository registries such as:

When you deposit your research data in an open repository, you are required to share your content under the standard license of Creative Commons. The most commonly applied license is CC BY 4.0, which allows you to freely share and adapt, as your uploaded data can be freely shared and adapted. When using other people's data, one must give appropriate credit, and indicate if changes to the original data were made.

Legal issues

When collecting sensitive data from individuals you should protect the confidentiality of the information and the subject's privacy through de-identification or anonymization. You should carefully consider how your study participants could be identified, both directly and indirectly, as well as how anonymization may affect potential reuse of your data.

Intellectual property rights should be clarified while collecting or producing data during your research. All permissions from data owners will have to be gained prior to sharing these data.

Consult us to determine what regulations, policies, and restrictions may apply to your research data.