Skip to content

Data Export Columns

Geoff Goehle edited this page Aug 20, 2014 · 8 revisions

These are potential columns in the big data table we are hoping to build. I only included things which we currently have access to on the theory that we want to use existing data, not make upgrades to our current tables and then allow those upgrades time to propagate.

There are some questions which need to be addressed or at least dismissed.

  • Table design: Do we have one giant table for portability and ease of use, or do we have smaller tables for performance and assume people know how to do joins?
  • Data purity: A lot of data which people care about can be included but will not be very "pure". For example fields like seed, due date, answer date can be included from the appropriate tables, but they may not actually be the seed, due date, answer date that the student had when the answer was recorded. Most of the time they will be the same but there will certainly be times when they are not.
  • Computed columns: Do we want to have computed columns that do not reflect any data we actually store and may involve some educated guessing. For example, we have the final number of incorrect attempts and correct attempts and the final score, but we dont have these values at the time of each attempt. Do we want to try and compute these values for each answer row? What happens when we fail (e.g. the question has auxiliary fields which were not recorded) or are just wrong (e.g. the seed changed)?

Table Columns

  • Answer ID: Salted hash or Unique int
  • Course ID: Salted hash
  • Student ID: Salted hash
  • Set ID: Salted hash
  • Problem ID: Salted hash
  • Answer Timestamp: Unix time
  • Answer String
  • Answer Correct String: String of 1's, 0's corresponding to correctness of answers
  • Problem Path: Also serves as unique identifier of problem
  • Final Problem Status
  • Total Incorrect Attempts
  • Total Correct Attempts
  • Problem Value: Possibly impure
  • Problem Max Attempts: Possibly impure
  • Seed: Possibly impure
  • Open Date: Unix time, Possibly impure
  • Due Date: Unix time, Possibly impure
  • Answer Date: Unix time, Possibly impure
  • Set Type
  • Library Subject: Possibly Missing
  • Library Chapter: Possibly Missing
  • Library Section: Possibly Missing
  • Library Keywords: Possibly Missing
  • Raw Status of Attempt: Post-Computed (ratio of 1's to number of answers).
  • Attempt Number: Post-Computed
  • Status of Attempt: Post-Computed, Possibly Missing, Possibly Impure
  • Final Set Grade: Post-Computed, Possibly Missing, Possibly Impure
  • Number Incorrect Previous Attempts: Post-Computed, Possibly Missing, Possibly Impure
  • Number Correct Previous Attempts: Post-Computed, Possibly Missing, Possibly Impure

First Pass

  • ID Info
  • 0 - Answer ID hash
  • 1 - Course ID hash
  • 2 - Student ID hash
  • 3 - Set ID hash
  • 4 - Problem ID hash
  • User Info
  • 5 - Permission Level
  • 6 - Final Status
  • Set Info
  • 7 - Set type
  • 8 - Open Date (unix time)
  • 9 - Due Date (unix time)
  • 10 - Answer Date (unix time)
  • 11 - Final Set Grade (percentage)
  • Problem Info
  • 12 - Problem Path
  • 13 - Problem Value
  • 14 - Problem Max Attempts
  • 15 - Problem Seed
  • 16 - Attempted
  • 17 - Final Incorrect Attempts
  • 18 - Final Correct Attempts
  • 19 - Final Status
  • OPL Info
  • 20 - Subject
  • 21 - Chapter
  • 22 - Section
  • 23 - Keywords
  • Answer Info
  • 24 - Answer timestamp (unix time)
  • 25 - Attempt Number
  • 26 - Raw status of attempt (percentage of correct blanks)
  • 27 - Status of attempt (post computed may be blank)
  • 28 - Number of Answer Blanks
  • 29/30 etc... - The following columns will come in pairs. The first will be
    the text of the answer contained in the answer blank
    and the second will be the binary 0/1 status of the answer
    blank. There will be as many pairs as answer blanks.