This information is factual only and not to be considered legal advice. Contact the university legal advisors at the Vice-Chancellor's office if legal advice is needed.
How to organise research data during an ongoing project - on a project level and on a data level.
How to organise a research project’s data depends mainly on the researcher or researchers in the project. It is important that the method of organising data is deeply rooted in the people handling the data. Primarily, it is important to settle on a structure to describe and store data as and when it is produced in the project.
The structure needs to be logical, predictable and as intuitively built as possible to reduce any thresholds that may risk not using it in the everyday life. For a research group, the structure needs to be described and well-understood so that everyone uses it.
If you are the only researcher, it is sufficient if the structure is understandable to yourself during the ongoing project, but once the project is complete and the research data needs to be stored, the structure also needs to be logical and understandable to others. This is why it can be good to, either work according to a structure that others understand, or schedule time to decipher data and re-organise the structure at the end of the project.
There are guidelines and recommendations that can be useful to plan what information needs to be tied to data and how it is to be structured. Information concerning the project altogether can be structured on an overall level. This could cover descriptions of:
On data level, each file or unit of data needs to be organised so that the content and what has happened to it is understandable over time.
Raw data is data that has not been coded, grouped, refined or modified in any way. Raw data has more potential usages than modified data. Raw data increases possibilities of reusing the data. Hence, sets of raw data should be kept untouched if possible. Hence, any processing of data is to be made using copies of the raw data. Depending on the research field, examples of raw data can be:
Be observant that the raw data can contain sensitive data that needs to be omitted in later revisions, for instance personal data.
Modified data should be organised in a predictable structure, marked with information on content and revision.
Organising quantitative data is closely linked to the tools and formats used in data processing. Hence, you need to start out from the structure that the data is to be sorted into and the values to be entered. This goes for both databases and spreadsheets.
A database is a system of information in which it is easy to find, organise and reorganise or edit information in various ways. In order to accommodate this, it is important that the contents of the database is organised in a logical and consistent way. The content of the database must also be described.
Suitable software for databases can, for instance, be Microsoft Access, MySQL, Microsoft SQL Server, Oracle and PostgreSQL. How complex a database is varies, and there are often good opportunities for users to control the database functions themselves.
Spreadsheets are a simple form of maintaining a database where data is organised in tabulated spreadsheets. Beside data variables, the spreadsheet can contain generated summaries in, for example, charts, graphs or tables. Pictures can also be included and formulas can be used to generate values or functions. How to use the spreadsheet can be controlled to the individual cell.
Software for using spreadsheets is available, for instance, in Microsoft Office and Open Office and WordPerfect Office suits. All these software also support XML-based file formats.
There are a few things worth considering and adhering to make quantitative data as consistently and easily organised as possible – also in the long run:
For more information about the processing of quantitative data and preparing quantitative data for long-term storage and publication, we recommend the guides to best practices provided by the Swedish National Data Service. In their Guides to Best Practice on Databases and Spreadsheets (in Swedish), you can find more information about the processing of specific formats such as XML and SPSS. A link is found under the headline “Further information” below.
Qualitative data is text format that can be described in the introduction of the document. Relevant information can be:
If the data has been modified, information about the modification must be included. Other formats than text files may need to be accompanied by a document with information according to the above. Information with importance to other information should preferably be stored together.
It is advantageous if pseudonyms or other means of anonymising informants are used consistently so that one informant is always referred to with the same anonymisation. For instance, I1, I2 and I3 for informant 1, 2 and 3, even if the informants can be handled in separate documents.
Information about what characterises each informant can be summarised in a separate document, including information about what files they are mentioned in, to simplify searches in the material.
File names can be used to describe the content in a structured way. How to do this can vary greatly, from using a number coding system combined with a key to the code, to describing in text in the file name what the file concerns. To keep track of various versions of the same material, the file name of each version can contain details of date and time.
Prior to final storage
Prior to final storage, an overview list of the data material will be needed in which the content of each file/part is described in a way that makes it easy for others to find the part they are interested in.
For further information about describing research data during ongoing projects, we recommend the Best Practice guides from the Swedish National Data Service. These are only available in Swedish so far. There, you can find more information and recommendations on file formats, what type of material should preferably be stored together, and more information about what is important to consider for future long-term storage.
The UK Data Service pages on documenting research data provides concise information on data documentation on various levels.
Part 5 in the course BAS online from the Swedish National Data Service concerns “Documentation during the research process and principles for assessing if metadata is sufficient for secondary use”.