Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(usage): at-least-once and dynamic sampling #26

Merged
merged 71 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
1953cbc
sampler draft
Aug 6, 2024
b98cb20
fix namespace
Aug 6, 2024
dd4ae9e
update unit tests, implementation with instance variables
Aug 7, 2024
2a36549
handle unexpected client input
Aug 7, 2024
c1f8731
allow client to pass a operation key generator
Aug 7, 2024
b88dced
operation key generator unit tests
Aug 7, 2024
ebcb30e
extract sampling context from operation
Aug 7, 2024
6fe09ee
use mock class object
Aug 7, 2024
e9274fb
fix document parsing
Aug 7, 2024
f05ebf3
readability clarity
Aug 7, 2024
a3bdca4
formatting edits
Aug 7, 2024
f209d9a
safe navigation operations
Aug 7, 2024
34eb95c
example implementation in usage_reporter
Aug 7, 2024
9df92e7
rough implementation with usage_reporter
Aug 8, 2024
8859786
fix typo
Aug 8, 2024
bb94aab
readme consistency
Aug 8, 2024
41fb25f
use context blocks instead of describe
Aug 8, 2024
a29ace7
refactor usage reporter unit tests
Aug 8, 2024
ff7972a
clean mock definition
Aug 8, 2024
c70303f
clarity of what's being tested
Aug 8, 2024
393dd3c
clarity of what's being tested
Aug 8, 2024
d6b6680
unit test for usage_reporter using sampler
Aug 8, 2024
1323f49
improve quality of should_include test
Aug 8, 2024
bc8611c
unit test granularity
Aug 8, 2024
cda4c55
fix debug statements
Aug 8, 2024
7fcfd83
allow thread to process
Aug 8, 2024
1403ce6
DRY test case
Aug 8, 2024
8457ddd
separate expectations on real obj vs spies
Aug 8, 2024
10c79e5
typo
Aug 8, 2024
8d712fd
localize sampling logic to one place
Aug 8, 2024
ff7ecca
fix query parsing
Aug 8, 2024
7923621
default op key based on op name
Aug 8, 2024
b595b54
remove test logs
Aug 8, 2024
e56b4a5
update method name, clean handling nil params
Aug 8, 2024
17196fc
split helper methods, safely call client procs
Aug 8, 2024
4c99b04
use client to test sampler
Aug 8, 2024
ad59de4
unique operation key
Aug 8, 2024
edb6ff3
split fixed vs dynamic sampling into two classes
Aug 9, 2024
e7bbcc6
typo
Aug 9, 2024
347dbb1
propagate sampling errors to usage_reporter
Aug 9, 2024
b94c628
fix threading issues with usage reporter unit tests
Aug 9, 2024
12129fd
close all threads created during a test
Aug 9, 2024
a5dc792
opt in to at least once sampling
Aug 9, 2024
efd8b65
remove instance variables from module
Aug 9, 2024
1b3fc68
update readme
Aug 9, 2024
ab59116
update param name
Aug 9, 2024
3341259
update readme option documentation
Aug 9, 2024
2f0466f
remove redundant calls in tests
Aug 9, 2024
62e0b29
refactor to avoid redundant calls
Aug 9, 2024
69014cc
fix lint issues
Aug 12, 2024
56ac231
avoid using singleton in unit test
Aug 12, 2024
861347f
formatting fixes
Aug 12, 2024
a7c036f
cover test cases for both types of samplers
Aug 12, 2024
487c3d0
delete testing file
charcoalyy Aug 12, 2024
4d7d9e3
memoize fn in unit tests
Aug 12, 2024
b91db6b
reduce hash key size of default keygen
Aug 13, 2024
7dde9dd
use built-in parsing for queries
Aug 13, 2024
3728528
update readme, fix faulty unit tests
Aug 13, 2024
3d3a8b7
move mocks closer to examples
Aug 13, 2024
038ee1e
rubocop fix
Aug 13, 2024
dc0ea0d
new config and code readability for at least once sampling
Aug 13, 2024
44bbb03
separate responsibility of initializing sampler
Aug 13, 2024
ec8a109
fix faulty unit tests
Aug 13, 2024
220299c
sampling object config
Aug 14, 2024
df98c92
Merge pull request #2 from charcoalyy/grouped-config
charcoalyy Aug 14, 2024
b94c8c6
rm unnecessary comment
Aug 14, 2024
2af0d8f
cleanup
Aug 14, 2024
d612b32
naming consistency
Aug 14, 2024
1a1b042
naming consistency fix
Aug 14, 2024
a9a9438
update benchmark gem lockfile
Aug 19, 2024
060e5f1
minor ver bump
Aug 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
graphql-hive (0.3.4)
graphql-hive (0.4.0)
graphql (>= 2.3, < 3)

GEM
Expand Down
43 changes: 29 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,26 +146,39 @@ class MySchema < GraphQL::Schema
use(
GraphQL::Hive,
{
# mandatory
token: 'YOUR-TOKEN',
collect_usage: true, # optional
report_schema: true, # optional
enabled: true, # Enable/Disable Hive Client (optional)

# optional
enabled: true, # enable/disable Hive Client
debug: false, # verbose logs
logger: MyLogger.new, # optional
endpoint: 'app.graphql-hive.com', # optional
port: 80, # optional
logger: MyLogger.new,
endpoint: 'app.graphql-hive.com',
port: 80,
buffer_size: 50, # forward the operations data to Hive every 50 requests
collect_usage_sampling: 1.0,
reporting: { # mandatory if `report_schema: true`
# mandatory member of `reporting`

collect_usage: true, # report usage to Hive
collect_usage_sampling: {
# optional members of `collect_usage_sampling`
sample_rate: 0.5, # % of operations reported
sampler: proc { |context| context.operation_name.includes?('someQuery') 1 : 0.5 }, # assign custom sampling rates (overrides `sampling rate`)
at_least_once: true, # sample every distinct operation at least once
key_generator: proc { |context| context.operation_name } # assign custom keys to distinguish between distinct operations
}
charcoalyy marked this conversation as resolved.
Show resolved Hide resolved

report_schema: true, # publish schema to Hive
# mandatory if `report_schema: true`
reporting: {
# mandatory members of `reporting`
author: 'Author of the latest change',
# mandatory member of `reporting`
commit: 'git sha or any identifier',
service_name: '', # optional
service_url: '', # optional
# optional members of `reporting
service_name: '',
service_url: '',
},
# you can pass an optional proc that will help identify the client (ex: Apollo web app) that performed the query
client_info: Proc.new { |context| { name: context.client_name, version: context.client_version } }

# pass an optional proc to client_info to help identify the client (ex: Apollo web app) that performed the query
client_info: proc { |context| { name: context.client_name, version: context.client_version } }
}
)

Expand All @@ -174,6 +187,8 @@ class MySchema < GraphQL::Schema
end
```

See default options for the optional parameters [here](https://github.com/charlypoly/graphql-ruby-hive/blob/01407d8fed80912a7006fee503bf2967fa20a79c/lib/graphql-hive.rb#L53).

<br/>

**A note on `buffer_size` and performances**
Expand Down
2 changes: 1 addition & 1 deletion k6/graphql-api/Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: ../..
specs:
graphql-hive (0.3.4)
graphql-hive (0.4.0)
graphql (>= 2.3, < 3)

GEM
Expand Down
33 changes: 4 additions & 29 deletions lib/graphql-hive.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,9 @@
require 'graphql-hive/usage_reporter'
require 'graphql-hive/client'

# class MySchema < GraphQL::Schema
# use(
# GraphQL::Hive,
# {
# token: 'YOUR-TOKEN',
# collect_usage: true,
# report_schema: true,
# enabled: true, // Enable/Disable Hive Client
# debug: true, // Debugging mode
# logger: MyLogger.new,
# endpoint: 'app.graphql-hive.com',
# port: 80,
# reporting: {
# author: 'Author of the latest change',
# commit: 'git sha or any identifier',
# service_name: '',
# service_url: '',
# },
# client_info: Proc.new { |context| { name: context.client_name, version: context.client_version } }
# }
# )
#
# # ...
#
# end
require 'graphql-hive/sampler'
require 'graphql-hive/sampling/basic_sampler'
require 'graphql-hive/sampling/dynamic_sampler'

module GraphQL
# GraphQL Hive usage collector and schema reporter
Expand Down Expand Up @@ -116,10 +94,7 @@ def platform_trace(platform_key, _key, data)
elapsed = ending - starting
duration = (elapsed.to_f * (10**9)).to_i

# rubocop:disable Layout/LineLength
report_usage(timestamp, queries, results, duration) if !queries.empty? && SecureRandom.random_number <= @options[:collect_usage_sampling]
# rubocop:enable Layout/LineLength

report_usage(timestamp, queries, results, duration) unless queries.empty?
results
else
yield
Expand Down
40 changes: 40 additions & 0 deletions lib/graphql-hive/sampler.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# frozen_string_literal: true

module GraphQL
class Hive < GraphQL::Tracing::PlatformTracing
# Sampler instance for usage reporter
class Sampler
def initialize(sampling_options, logger = nil)
# backwards compatibility with old `collect_usage_sampling` field
if sampling_options.is_a?(Numeric)
logger&.warn(
'`collect_usage_sampling` is deprecated for fixed sampling rates, ' \
'use `collect_usage_sampling: { sample_rate: XX }` instead'
)
passed_sampling_rate = sampling_options
sampling_options = { sample_rate: passed_sampling_rate }
end

sampling_options ||= {}

@sampler = if sampling_options[:sampler]
Sampling::DynamicSampler.new(
sampling_options[:sampler],
sampling_options[:at_least_once],
sampling_options[:key_generator]
)
else
Sampling::BasicSampler.new(
sampling_options[:sample_rate],
sampling_options[:at_least_once],
sampling_options[:key_generator]
)
end
end

def sample?(operation)
@sampler.sample?(operation)
end
end
end
end
34 changes: 34 additions & 0 deletions lib/graphql-hive/sampling/basic_sampler.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# frozen_string_literal: true

require 'graphql-hive/sampling/sampling_context'

module GraphQL
class Hive
module Sampling
# Basic sampling for operations reporting
class BasicSampler
include GraphQL::Hive::Sampling::SamplingContext

def initialize(client_sample_rate, at_least_once, key_generator)
@sample_rate = client_sample_rate || 1
@tracked_operations = {}
@key_generator = key_generator || DEFAULT_SAMPLE_KEY if at_least_once
end

def sample?(operation)
if @key_generator
sample_context = get_sample_context(operation)
operation_key = @key_generator.call(sample_context)

unless @tracked_operations.key?(operation_key)
@tracked_operations[operation_key] = true
return true
end
end

SecureRandom.random_number <= @sample_rate
end
end
end
end
end
34 changes: 34 additions & 0 deletions lib/graphql-hive/sampling/dynamic_sampler.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# frozen_string_literal: true

require 'graphql-hive/sampling/sampling_context'

module GraphQL
class Hive
module Sampling
# Dynamic sampling for operations reporting
class DynamicSampler
include GraphQL::Hive::Sampling::SamplingContext

def initialize(client_sampler, at_least_once, key_generator)
@sampler = client_sampler
@tracked_operations = {}
@key_generator = key_generator || DEFAULT_SAMPLE_KEY if at_least_once
end

def sample?(operation)
sample_context = get_sample_context(operation)

if @key_generator
operation_key = @key_generator.call(sample_context)
unless @tracked_operations.key?(operation_key)
@tracked_operations[operation_key] = true
return true
end
end

SecureRandom.random_number <= @sampler.call(sample_context)
end
end
end
end
end
39 changes: 39 additions & 0 deletions lib/graphql-hive/sampling/sampling_context.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# frozen_string_literal: true

module GraphQL
class Hive
module Sampling
# Helper methods for sampling
module SamplingContext
private

DEFAULT_SAMPLE_KEY = proc { |sample_context|
md5 = Digest::MD5.new
md5.update sample_context[:document].to_query_string
md5.hexdigest
}

def get_sample_context(operation)
_, queries, results, = operation

operation_name = queries.map(&:operations).map(&:keys).flatten.compact.join(', ')

parsed_definitions = []
queries.each do |query|
query_document = query.document
parsed_definitions.concat(query_document.definitions) if query_document
end
document = GraphQL::Language::Nodes::Document.new(definitions: parsed_definitions)

context_value = results[0].query.context

{
operation_name: operation_name,
document: document,
context_value: context_value
}
end
end
end
end
end
13 changes: 6 additions & 7 deletions lib/graphql-hive/usage_reporter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,6 @@ class Hive < GraphQL::Tracing::PlatformTracing
class UsageReporter
@@instance = nil

@queue = nil
@thread = nil
@operations_buffer = nil
@client = nil

def self.instance
@@instance
end
Expand All @@ -28,6 +23,8 @@ def initialize(options, client)
@options_mutex = Mutex.new
@queue = Queue.new

@sampler = Sampler.new(options[:collect_usage_sampling], options[:logger]) # NOTE: logs for deprecated field

start_thread
end

Expand Down Expand Up @@ -55,8 +52,9 @@ def start_thread
@thread = Thread.new do
buffer = []
while (operation = @queue.pop(false))
@options[:logger].debug("add operation to buffer: #{operation}")
buffer << operation
@options[:logger].debug("processing operation from queue: #{operation}")
buffer << operation if @sampler.sample?(operation)

@options_mutex.synchronize do
if buffer.size >= @options[:buffer_size]
@options[:logger].debug('buffer is full, sending!')
Expand All @@ -65,6 +63,7 @@ def start_thread
end
end
end

unless buffer.empty?
@options[:logger].debug('shuting down with buffer, sending!')
process_operations(buffer)
Expand Down
2 changes: 1 addition & 1 deletion lib/graphql-hive/version.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

module Graphql
module Hive
VERSION = '0.3.4'
VERSION = '0.4.0'
end
end
51 changes: 51 additions & 0 deletions spec/graphql/graphql-hive/sampler/basic_sampler_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# frozen_string_literal: true

require 'spec_helper'

RSpec.describe GraphQL::Hive::Sampling::BasicSampler do
let(:sampler_instance) { described_class.new(sample_rate, at_least_once, key_generator) }
let(:sample_rate) { 0 }
let(:at_least_once) { false }
let(:key_generator) { nil }

describe '#initialize' do
it 'sets the sample rate' do
expect(sampler_instance.instance_variable_get(:@sample_rate)).to eq(0)
end
end

describe '#sample?' do
let(:schema) { GraphQL::Schema.from_definition('type Query { test: String }') }
let(:timestamp) { 1_720_705_946_333 }
let(:queries) { [GraphQL::Query.new(schema, query: '{ test }', context: { header: 'value' })] }
let(:results) { [GraphQL::Query::Result.new(query: queries.first, values: { 'data' => { 'test' => 'test' } })] }
let(:duration) { 100 }
let(:operation) { [timestamp, queries, results, duration] }

it 'follows the sample rate for all operations' do
expect(sampler_instance.sample?(operation)).to eq(false)
end

context 'with at least once sampling' do
let(:at_least_once) { true }

it 'returns true for the first operation, then follows the sample rate for remaining operations' do
expect(sampler_instance.sample?(operation)).to eq(true)
expect(sampler_instance.sample?(operation)).to eq(false)
end

context 'when provided a custom key generator' do
let(:key_generator) { proc { |_sample_context| 'same_key' } }

it 'tracks operations by their custom keys' do
expect(sampler_instance.sample?(operation)).to eq(true)

queries = [GraphQL::Query.new(schema, query: '{ something_else }')]
different_operation = [timestamp, queries, results, duration]

expect(sampler_instance.sample?(different_operation)).to eq(false)
end
end
end
end
end
Loading