Using nettskjemar

Nettskjemar connects to version 3 of the nettskjema api, and the main functionality here is to download data from a form into R. Once you have created a nettskjema client, and set up your Renvironment locally, you can start accessing your forms.

General recommendations

While functions to download data also have the option to turn off the codebook, i.e. return the data with the original questions as column names, this is not recommended. Working with data in R in this format is very unpredictable, and we cannot guarantee that the functions in this package will act as expected.

Therefore, you are highly advised if you are using this package, to turn on the codebook in the Nettskjema portal for your form, and setting up a codebook for the entire form.

You can toggle the codebook for a form by going to the Nettskjema portal and entering your form. Then proceed to “Settings” and then “General settings”, and make sure “Codebook activated” is set to “Yes”. Once this is toggled, you will need to setup the codebook, either manually (advised) or by using the pre-filling functionality in Nettskjema.

You can read more about the details around the codebook on the UiO webpages (only available in Norwegian).

Tidyverse compatible

The data returns in this package are developed to be tidyverse-compatible. This means that those who are familiar with tidyverse, should find working with the data as retrieved from this package fairly easy. If you want to learn about the tidyverse and how to use it, there are excellent resources for that on the Tidyverse webpage.

Download submissions

Perhaps at the core of this package is the ability to download submission answers to a form into a tibble (variation of a data.frame).

library(nettskjemar)
formid <- 123823

data <- ns_get_data(formid)
data
#>   formid $submission_id                  $created
#> 1 123823       27685292 2023-06-01T20:57:15+02:00
#>    freetext radio checkbox.questionnaires
#> 1 some text     1                       1
#>   checkbox.events checkbox.logs dropdown
#> 1               1             0        4
#>   radio_matrix.grants radio_matrix.lecture
#> 1                   1                    2
#>   radio_matrix.email checkbox_matrix.1.IT
#> 1                  2                    1
#>   checkbox_matrix.1.colleague
#> 1                           1
#>   checkbox_matrix.1.admin checkbox_matrix.1.union
#> 1                       0                       0
#>   checkbox_matrix.1.internet checkbox_matrix.2.IT
#> 1                          0                    0
#>   checkbox_matrix.2.colleague
#> 1                           0
#>   checkbox_matrix.2.admin checkbox_matrix.2.union
#> 1                       1                       0
#>   checkbox_matrix.2.internet       date  time
#> 1                          0 2023-06-01 12:00
#>           datetime number_decimal number_integer
#> 1 2023-06-12T13:33            4.5             77
#>   slider attachment_1 attachment_2 $answer_time_ms
#> 1      3    sølvi.png                        74630
#>  [ reached 'max' / getOption("max.print") -- omitted 2 rows ]

You will notice three columns that are prefixed with $. These are columns automatically added by the Nettskjema backend to your data, and the prefix is used to denote exactly this.

If you’d like to have labelled data, similar to that of SPSS and Stata, have a look at the vignette on labelled data for more information about that.

Fetching Raw Data

Raw answers come in a very different format than the data as seen in the codebook and preview in Nettskjema portal. The raw data have a timestamped format, per question for each submission. This means the data comes in a tall format (many rows per submission) rather than a wide format (one row per submission). The raw data may provide those who have keen interest in the time a users spent between each question. The data.frame this output will have one column that indicates which question the row is for, and another which is the response to that question.

# Fetch raw data
ns_get_data(formid, type = "long")
#>   formid formId submissionId  answerId elementId
#> 1 123823 123823     27685292 158263133   1641697
#> 2 123823 123823     27685292 158263124   1641698
#>   externalElementId textAnswer answerOptionIds
#> 1          freetext  some text                
#> 2             radio       <NA>         3879435
#>   externalAnswerOptionIds elementType
#> 1                            QUESTION
#> 2                       1       RADIO
#>           createdDate        modifiedDate
#> 1 2023-06-01T20:57:15 2023-06-01T20:57:15
#> 2 2023-06-01T20:57:15 2023-06-01T20:57:15
#>   subElementId answerAttachmentId filename mediaType
#> 1           NA                 NA     <NA>      <NA>
#> 2           NA                 NA     <NA>      <NA>
#>   size attachment.answerAttachmentId
#> 1   NA                            NA
#> 2   NA                            NA
#>   attachment.filename attachment.mediaType
#> 1                <NA>                 <NA>
#> 2                <NA>                 <NA>
#>   attachment.size
#> 1              NA
#> 2              NA
#>  [ reached 'max' / getOption("max.print") -- omitted 44 rows ]

Raw data cannot be labelled, and no other alterations have been made to the data. They come exactly as the API returns them, so you may do what you need with them.

Controlling checkbox output

The Nettskjema survey tool includes the possibility to create a matrix of checkboxes, i.e. giving the respondents the ability to select several options within a question. Checkboxes are returned as binary columns, one columns per checkbox where 1 indicates that the checkbox was ticked.

We provide some extra functionality to help work with these data. Currently we only support working with the checkbox matrix questions, but are working on a solution for all checkboxes.

ns_get_data(formid) |>
  ns_alter_checkbox(
    to = "list"
  )
#>   $submission_id formid                  $created
#> 1       27685292 123823 2023-06-01T20:57:15+02:00
#> 2       27685302 123823 2023-06-01T20:58:33+02:00
#>         freetext radio checkbox.questionnaires
#> 1      some text     1                       1
#> 2 another answer    -1                       0
#>   checkbox.events checkbox.logs dropdown
#> 1               1             0        4
#> 2               0             1        9
#>   radio_matrix.grants radio_matrix.lecture
#> 1                   1                    2
#> 2                   3                    3
#>   radio_matrix.email       date  time
#> 1                  2 2023-06-01 12:00
#> 2                  1 2023-02-07 14:45
#>           datetime number_decimal number_integer
#> 1 2023-06-12T13:33            4.5             77
#> 2 2024-02-15T08:55            2.2             45
#>   slider attachment_1 attachment_2 $answer_time_ms
#> 1      3    sølvi.png                        74630
#> 2      1               marius.jpeg           71313
#>   checkbox_matrix.1 checkbox_matrix.2
#> 1     colleague, IT             admin
#> 2          internet      admin, union
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

These can be separated into rows if wanted, using tidyverse syntax.

library(tidyr)
library(dplyr)

# As list column
ns_get_data(formid) |>
  ns_alter_checkbox(
    to = "list"
  ) |>
  relocate(checkbox_matrix.1, .after = 2)
#>   $submission_id formid checkbox_matrix.1
#> 1       27685292 123823     colleague, IT
#> 2       27685302 123823          internet
#>                    $created       freetext radio
#> 1 2023-06-01T20:57:15+02:00      some text     1
#> 2 2023-06-01T20:58:33+02:00 another answer    -1
#>   checkbox.questionnaires checkbox.events
#> 1                       1               1
#> 2                       0               0
#>   checkbox.logs dropdown radio_matrix.grants
#> 1             0        4                   1
#> 2             1        9                   3
#>   radio_matrix.lecture radio_matrix.email       date
#> 1                    2                  2 2023-06-01
#> 2                    3                  1 2023-02-07
#>    time         datetime number_decimal
#> 1 12:00 2023-06-12T13:33            4.5
#> 2 14:45 2024-02-15T08:55            2.2
#>   number_integer slider attachment_1 attachment_2
#> 1             77      3    sølvi.png             
#> 2             45      1               marius.jpeg
#>   $answer_time_ms checkbox_matrix.2
#> 1           74630             admin
#> 2           71313      admin, union
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]


# Turns list column, into rows 
ns_get_data(formid) |>
  ns_alter_checkbox(
    to = "list"
  ) |>
  unnest(checkbox_matrix.1) |>
  relocate(checkbox_matrix.1, .after = 2)
#> # A tibble: 5 × 23
#>   `$submission_id` formid checkbox_matrix.1
#>              <int>  <dbl> <chr>            
#> 1         27685292 123823 colleague        
#> 2         27685292 123823 IT               
#> 3         27685302 123823 internet         
#> 4         27685319 123823 colleague        
#> 5         27685319 123823 internet         
#> # ℹ 20 more variables: `$created` <chr>,
#> #   freetext <chr>, radio <int>,
#> #   checkbox.questionnaires <int>,
#> #   checkbox.events <int>, checkbox.logs <int>,
#> #   dropdown <int>, radio_matrix.grants <int>,
#> #   radio_matrix.lecture <int>,
#> #   radio_matrix.email <int>, date <chr>, …

If you are not used to working with list columns, another option is to work with delimited strings in columns. In this procedure, you can turn the checkboxes into a character column, where each option in separated by a character of your choosing.

# As delimited string column
ns_get_data(formid) |>
  ns_alter_checkbox(
    to = "character",
    sep = ";"
  ) |>
  relocate(checkbox_matrix.1, .after = 2)
#>   $submission_id formid checkbox_matrix.1
#> 1       27685292 123823      colleague;IT
#> 2       27685302 123823          internet
#>                    $created       freetext radio
#> 1 2023-06-01T20:57:15+02:00      some text     1
#> 2 2023-06-01T20:58:33+02:00 another answer    -1
#>   checkbox.questionnaires checkbox.events
#> 1                       1               1
#> 2                       0               0
#>   checkbox.logs dropdown radio_matrix.grants
#> 1             0        4                   1
#> 2             1        9                   3
#>   radio_matrix.lecture radio_matrix.email       date
#> 1                    2                  2 2023-06-01
#> 2                    3                  1 2023-02-07
#>    time         datetime number_decimal
#> 1 12:00 2023-06-12T13:33            4.5
#> 2 14:45 2024-02-15T08:55            2.2
#>   number_integer slider attachment_1 attachment_2
#> 1             77      3    sølvi.png             
#> 2             45      1               marius.jpeg
#>   $answer_time_ms checkbox_matrix.2
#> 1           74630             admin
#> 2           71313       admin;union
#>  [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
  
# Turns string column, into rows 
ns_get_data(formid) |>
  ns_alter_checkbox(
    to = "character",
    sep = ";"
  ) |>
  relocate(checkbox_matrix.1, .after = 2) |>
  separate_rows(checkbox_matrix.1)
#> # A tibble: 5 × 23
#>   `$submission_id` formid checkbox_matrix.1
#>              <int>  <dbl> <chr>            
#> 1         27685292 123823 colleague        
#> 2         27685292 123823 IT               
#> 3         27685302 123823 internet         
#> 4         27685319 123823 colleague        
#> 5         27685319 123823 internet         
#> # ℹ 20 more variables: `$created` <chr>,
#> #   freetext <chr>, radio <int>,
#> #   checkbox.questionnaires <int>,
#> #   checkbox.events <int>, checkbox.logs <int>,
#> #   dropdown <int>, radio_matrix.grants <int>,
#> #   radio_matrix.lecture <int>,
#> #   radio_matrix.email <int>, date <chr>, …