Data Export Columns

These are potential columns in the big data table we are hoping to build. I only included things which we currently have access to on the theory that we want to use existing data, not make upgrades to our current tables and then allow those upgrades time to propagate.

There are some questions which need to be addressed or at least dismissed.

Table design: Do we have one giant table for portability and ease of use, or do we have smaller tables for performance and assume people know how to do joins?
Data purity: A lot of data which people care about can be included but will not be very "pure". For example fields like seed, due date, answer date can be included from the appropriate tables, but they may not actually be the seed, due date, answer date that the student had when the answer was recorded. Most of the time they will be the same but there will certainly be times when they are not.
Computed columns: Do we want to have computed columns that do not reflect any data we actually store and may involve some educated guessing. For example, we have the final number of incorrect attempts and correct attempts and the final score, but we dont have these values at the time of each attempt. Do we want to try and compute these values for each answer row? What happens when we fail (e.g. the question has auxiliary fields which were not recorded) or are just wrong (e.g. the seed changed)?

Table Columns

Answer ID: Salted hash or Unique int
Course ID: Salted hash
Student ID: Salted hash
Set ID: Salted hash
Problem ID: Salted hash
Answer Timestamp: Unix time
Answer String
Answer Correct String: String of 1's, 0's corresponding to correctness of answers
Problem Path: Also serves as unique identifier of problem
Final Problem Status
Total Incorrect Attempts
Total Correct Attempts
Problem Value: Possibly impure
Problem Max Attempts: Possibly impure
Seed: Possibly impure
Open Date: Unix time, Possibly impure
Due Date: Unix time, Possibly impure
Answer Date: Unix time, Possibly impure
Set Type
Library Subject: Possibly Missing
Library Chapter: Possibly Missing
Library Section: Possibly Missing
Library Keywords: Possibly Missing
Raw Status of Attempt: Post-Computed (ratio of 1's to number of answers).
Attempt Number: Post-Computed
Status of Attempt: Post-Computed, Possibly Missing, Possibly Impure
Final Set Grade: Post-Computed, Possibly Missing, Possibly Impure
Number Incorrect Previous Attempts: Post-Computed, Possibly Missing, Possibly Impure
Number Correct Previous Attempts: Post-Computed, Possibly Missing, Possibly Impure

First Pass

ID Info
0 - Answer ID hash
1 - Course ID hash
2 - Student ID hash
3 - Set ID hash
4 - Problem ID hash
User Info
5 - Permission Level
6 - Final Status
Set Info
7 - Set type
8 - Open Date (unix time)
9 - Due Date (unix time)
10 - Answer Date (unix time)
11 - Final Set Grade (percentage)
Problem Info
12 - Problem Path
13 - Problem Value
14 - Problem Max Attempts
15 - Problem Seed
16 - Attempted
17 - Final Incorrect Attempts
18 - Final Correct Attempts
19 - Final Status
OPL Info
20 - Subject
21 - Chapter
22 - Section
23 - Keywords
Answer Info
24 - Answer timestamp (unix time)
25 - Attempt Number
26 - Raw status of attempt (percentage of correct blanks)
27 - Status of attempt (post computed may be blank)
28 - Number of Answer Blanks
29/30 etc... - The following columns will come in pairs. The first will be
the text of the answer contained in the answer blank
and the second will be the binary 0/1 status of the answer
blank. There will be as many pairs as answer blanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Export Columns

Table Columns

First Pass

Clone this wiki locally