-
Notifications
You must be signed in to change notification settings - Fork 53
Dataset Description #76
Comments
There are field descriptions for the raw data in the schema here: https://github.com/mozilla/overscripted/blob/master/data_prep/raw_data_schema.template There are the additional fields The other additional fields could be documented in a README in the data_prep folder if you want to. |
An example to help understand this more:
Now if we go to location, we can actually see the call being made when we look at the html. Finally the value being passed is a browser identification of some sorts. You can see here that value field is actually a common user agent. Hope this helps :) |
Note that you cannot see the "call" being made when you look at the html. You can see the request to load the script. In the context of this dataset, "call" means individual calls to individual JavaScript APIs that are made by the script, in this case googletagmanager.js |
It is unfortunate that the medium blog posts confuses this issue. Here's a piece of the discussion from @asquare14 and i about this on the gitter chat on mar 11
I initially suggested to @asquare14 to post a comment on medium, but perhaps some clarification in the main README is in order. |
@mlopatka - should we edit the medium blog post? |
@birdsarah I've updated the blog post to align our phrasing. Good catch. |
We need a section describing each column of the dataset.
Even a single line description for each field would be very helpful for somebody who is starting to work with the dataset.
Most of the multivariate datasets have descriptions of each field.
e.g.: https://archive.ics.uci.edu/ml/datasets/cardiotocography#
Here, the section "Attribute Information" describes each attribute/column
Overscripted dataset Attributes:
['argument_0', 'argument_1', 'argument_2', 'argument_3', 'argument_4',
'argument_5', 'argument_6', 'argument_7', 'argument_8', 'arguments',
'arguments_n_keys', 'call_id', 'call_stack', 'file_name', 'func_name',
'in_crawl_list', 'in_iframe', 'in_stripped_crawl_list', 'location',
'locations_len', 'operation', 'script_url', 'symbol', 'time_stamp',
'value_1000', 'value_len']
Descriptions for the above attributes of dataset needs to be added.
The text was updated successfully, but these errors were encountered: