Organise data during ongoing research

How to organise research data during an ongoing project - on a project level and on a data level.

Keeping data in good order

How to organise a research project’s data depends mainly on the researcher or researchers in the project. It is important that the method of organising data is deeply rooted in the people handling the data. Primarily, it is important to settle on a structure to describe and store data as and when it is produced in the project.

The structure needs to be logical, predictable and as intuitively built as possible to reduce any thresholds that may risk not using it in the everyday life. For a research group, the structure needs to be described and well-understood so that everyone uses it.

If you are the only researcher, it is sufficient if the structure is understandable to yourself during the ongoing project, but once the project is complete and the research data needs to be stored, the structure also needs to be logical and understandable to others. This is why it can be good to, either work according to a structure that others understand, or schedule time to decipher data and re-organise the structure at the end of the project.

Overall level

There are guidelines and recommendations that can be useful to plan what information needs to be tied to data and how it is to be structured. Information concerning the project altogether can be structured on an overall level. This could cover descriptions of:

Research design
Method materials
Structure for data files
What data format is used and what type of resources are available in each format
Use of other data resources
Validation of data
Version control
Changes in working and investigatory material for studies in which data collection takes place over time
Secrecy and issues of access and use of various parts of the study’s data
Information on research output, for instance publications

Data level

On data level, each file or unit of data needs to be organised so that the content and what has happened to it is understandable over time.

Raw data

Raw data is data that has not been coded, grouped, refined or modified in any way. Raw data has more potential usages than modified data. Raw data increases possibilities of reusing the data. Hence, copies or sets of raw data should be kept untouched if possible. Hence, any processing of data is to be made using copies of the raw data. Depending on the research field, examples of raw data can be:

measurement results
unprocessed statistics
sensory data
tests and test results
source texts
interviews (including notes and audio recordings)
unprocessed transcripts

Be observant that the raw data can contain sensitive data that needs to be omitted in later revisions, for instance personal data.

Modified data should be organised in a predictable structure, marked with information on content and revision.

Quantitative data

Organising quantitative data is closely linked to the tools and formats used in data processing. Hence, you need to start out from the structure that the data is to be sorted into and the values to be entered. This goes for both databases and spreadsheets.

Databases

A database is a system of information in which it is easy to find, organise and reorganise or edit information in various ways. In order to accommodate this, it is important that the contents of the database is organised in a logical and consistent way. The content of the database must also be described.

Suitable software for databases can, for instance, be Microsoft Access, MySQL, Microsoft SQL Server, Oracle and PostgreSQL. How complex a database is varies, and there are often good opportunities for users to control the database functions themselves.

Spreadsheets

Spreadsheets are a simple form of maintaining a database where data is organised in tabulated spreadsheets. Beside data variables, the spreadsheet can contain generated summaries in, for example, charts, graphs or tables. Pictures can also be included and formulas can be used to generate values or functions. How to use the spreadsheet can be controlled to the individual cell.

Software for using spreadsheets is available, for instance, in Microsoft Office and Open Office and WordPerfect Office suits. All these software also support XML-based file formats.

General guidelines

There are a few things worth considering and adhering to make quantitative data as consistently and easily organised as possible – also in the long run:

Use a controlled vocabulary with established keywords when data is entered.
Be consistent when naming charts, spreadsheets, columns and rows so that differing parts of your material can be combined. Take into account the limitations of different software if you are planning on using several.
Avoid unnecessary formatting and layout. Check that necessary formatting and layout is included when data is migrated or when using a different software.
Avoid pasting material, or at least avoid that such material is only available as pasted material. This could be images, tables or charts. Always store and archive material as separate files in a way that can be linked to related charts or spreadsheets – preferably in the same folder.
Document how fields and data are coded, and store the documentation together with the database or spreadsheet.
Check that the data is consistently entered, for instance that capital letters are used when they should be.
Enter dates in a consistent format.
Boolean data type is all about binary values in which something either is, or is not in a certain way. This type of value is most easily exported if expressed as easily as possible, for instance use “1 or 0” or “Y or N”.
Check that all relevant decimals are always shown, even when data is exported. Make adjustments if necessary.
Avoid using currency symbols as these can be automatically modified depending on where the file is opened. Instead, type currencies using letter codes, for instance SEK or USD. These are typed before the number with a space between. Document any abbreviations used.

For more information about the processing of quantitative data and preparing quantitative data for long-term storage and publication, we recommend the guides to best practices provided by the Swedish National Data Service. In their Guides to Best Practice on Databases and Spreadsheets (in Swedish), you can find more information about the processing of specific formats such as XML and SPSS. A link is found under the headline “Further information” below.

Qualitative data

Qualitative data is text format that can be described in the introduction of the document. Relevant information can be:

background and context (for instance about the place where an interview took place and how this may affect the results).
information about the participants
explanation of abbreviations and symbols used in transcriptions
keys to codes used
observations or events that may affect the results
reflections

If the data has been modified, information about the modification must be included. Other formats than text files may need to be accompanied by a document with information according to the above. Information with importance to other information should preferably be stored together.

Informants

It is advantageous if pseudonyms or other means of anonymising informants are used consistently so that one informant is always referred to with the same anonymisation. For instance, I1, I2 and I3 for informant 1, 2 and 3, even if the informants can be handled in separate documents.

Information about what characterises each informant can be summarised in a separate document, including information about what files they are mentioned in, to simplify searches in the material.

File names

File names can be used to describe the content in a structured way. How to do this can vary greatly, from using a number coding system combined with a key to the code, to describing in text in the file name what the file concerns. To keep track of various versions of the same material, the file name of each version can contain details of date and time.

Example 1: 3 4 20200415 1547 (With a corresponding list describing the system where the content type “interview” is numbered 3, followed by version, date and time stamp)
Example 2: Interview version 4 20200415 1547

Prior to final storage

Prior to final storage, an overview list of the data material will be needed in which the content of each file/part is described in a way that makes it easy for others to find the part they are interested in.

Further information

For further information about describing research data during ongoing projects, we recommend the Best Practice guides from the Swedish National Data Service. There, you can find more information and recommendations on file formats, what type of material should preferably be stored together, and more information about what is important to consider for future long-term storage.

Guides to Best Practice (SND)

The UK Data Service pages on documenting research data provides concise information on data documentation on various levels.

Document your data (UK Data service)

Part 5 in the course BAS online from the Swedish National Data Service concerns “Documentation during the research process and principles for assessing if metadata is sufficient for secondary use”.

BAS Online Session 5: Principles for documentation (in Swedish only)

Latest update: 2024-05-03