Guides: Open access: Open data

Research Services Librarians

Email Me

Open data

Open data refers to sharing research outputs (software, datasets, excludes personal data, and some government data) "available under an open (data) license that permits anyone freely to access, reuse and redistribute." See the Open data handbook

The NZ government has an open data declaration providing transparency in data towards more informed decision making. Overseas, more than 160 research-intensive universities through eight university networks, signed the Sorbonne Declaration on Research Data Rights in January 2020. The declaration sets out the needs and benefits of research data being open, by default.

Best practices in open data include:

depositing in data repositories, see re3data.org, a searchable registry of research data repositories, and DataCite
citing data, see DataCite
the Open data handbook provides advice on how to open up and promote data
the Peer Reviewers' Openness Initiative includes reviewers and editors who have set open data standards for manuscript reviews

Who supports open data?

More and more publishers are supporting or requiring the underlying data or data sets researchers create and use in their published work to be made accessible. You can find some examples of journals and publishers with these data policies here:

Sharing your data

In order to make the most of your data set you may be able to make your data discoverable and useable to other researchers. Citations on datasets are becoming another way of showing your own academic impact.
There are a number of repositories where you might be able to store your datasets depending on the size, confidentiality, and field of research your data set relates to. When choosing where to store your data you should consider things such as:

Size of your data set
Where the storage servers are based
Does the repository ensure persistent access?
If a DOI or URI is supplied for others to be able to properly cite your data
The audience or users of the data repository
What type of access will you allow others to have if they wish to use your data
The formats you may store your data in?

Open data repositories

Some examples of repositories for open data:
Repository Name	Fees/Cost	Size Limits
Figshare	Free for up to 20GB private and unlimited public space , additional fees apply for larger datasets.	Files larger than 5GB will require you to contact support in order to upload (up to 250GB)
Zenodo	Free, but large file uploading and storage will prompt a discussion of donations toward sustainability.	50GB per dataset, larger files can be discussed
Mendeley Data	Free	10GB per dataset
Dryad	$150 USD for first 50 GB, and $50 USD for each additional 10 GB (Fee waivers may be available)	Files larger than 300GB will require you to contact support in order to upload
GitHub (Code storage)	Free to use for public and open source projects.	Recommended size limit of uploads of less than 5GB, strict limit of files exceeding 100MB in size

For a list of subject-specific data repositories check Data repositories

Making your open data impactful

It is important to properly describe your data when you store it on a repository. The more information you can give to adequately describe the dataset and its potential will make the data more discoverable to other researchers. This information is what we call metadata and it is best practice to adhere to a set of metadata standards to make your work discoverable. The Digital Curation Centre (DCC) provides a list of general research data metadata standards and discipline specific metadata standards which you can review.

The information you provide about your data should provide context, enable proper reuse, communicate restrictions or limitations, support research replication and validation, provide links to research publications and finally attract disciplinary content management systems, aggregators, publishers and search engines (i.e. Google). Here are some things to consider when entering your dataset into a repository:

Use a descriptive title for the data set
Avoid using too many acronyms when describing the data
Use at least three keywords to describe the dataset
Give context as to how the dataset has been used, link to the publications associated with the dataset
Give clear instructions if any programs are needed to use/interpret the data
Include the Language of the dataset
Indicate the version of the dataset if applicable
Inform users of reuse rights

Open access