Skip to content

Latest commit

 

History

History
94 lines (63 loc) · 2.55 KB

gitcoin data feature.md

File metadata and controls

94 lines (63 loc) · 2.55 KB

Feature hierarchy:

[feature_source].[feature_set].[feature_name]

examples:

"gitcoin.co"."user_sybil"."sybil_likelihood"

"api.github.com"."reputation"."followers"

"manual_googlesheets"."joe"."favouriteicons_count"

Selection criteria:

  1. Coverage: data fields covering a significant amount of users are prefered

sparse features require more preparation effort to be useful, it is possible to include, maybe sometimes consider to derive/pca/merge sparse ones

  1. Battle-tested: Used in previous GR rounds

  2. Subject Matter Experts

  3. Industry anti-spam resources (curated list, blacklist, IP addresses)

  4. based on data source list (asked for it from dao members)

V0.1.1 feature name and datatype

[api.github.com]

  • Number of repos, int

  • Number of repos where fork=false, int

  • Followers, int

  • Followed, int

  • watchers, int

    #derived or row level

  • total issues (summary from all repos), int

  • highest open_issues_count of rep, int

  • total forks (summary from all repos), int

  • highest forks of rep, int

  • total stars (summary from all repos), int

  • highest stars of rep, int

  • language, size, created_at, pushed_at date of each repo

  • Is the user followed by an authenticated github account - and history

  • Most number of folders created across all repos, int

  • Most number of files created across all reposts, int

[onchain][]

[gitcoin.co][behaviour]

  • Registration, Date/Time
  • Last Activity, Date/Time
  • kudos collected, int
  • quest started, int
  • Quest finished, int
  • donation row level data (Address, grant, quantity, token, time)
  • hackathon started count, int
  • bounty started count, int
  • bounty finished count,int
  • IP Address row level/summary data (IP, date or first/last date range)
  • browsing row level data (type of url, date)
  • changed_default_preference, boolean
  • count of activities in each GR round data source TBD

[gitcoin.co][attributes]

  • Email domain (Inherit from Github), string
  • Feedback row level/summary
  • Built avatar, boolean
  • Job status, string
  • Location, string

Other potential:

[gov.gitcoin.co]

/#Some data rules already on https://www.daostewards.xyz/) [Discord activity raw data]

[user][self_defined_rules]

Ref

[GR10 antifraud report blockscience] (https://medium.com/block-science/evaluating-the-anti-fraud-results-for-gitcoin-round-10-cec9277ce5b2) combined dialog with Omnianalytics, Disruption