Tracking Poker

During a streamed poker game, a show will collect a number of metrics related to player performance and style. Typical metrics include:

Cumulative winnings - The cumulative winnings (or losses) of a given player at the conclusion of the stream.
Chip count - The size of a players stack at the conclusion of a stream.
Pre-flop raise - The frequency at which a player elects to raise preflop.
VPIP - How frequently players voluntarily enter a pot.

This project currently collects and aggregates these metrics for all players, from the most popular operator (HCL). This provides some insight into the on-stream performance of players over time.

Architecture

This project deploys a number of microservices to coordinate the collection of these statistics:

Pipeline - Routes events between services.
Ingest - Queries for new video assets.
Asset Ripper - Downloads and slices streams into individual frames for analysis.
Frame Analysis - Detects frames of interest and extracts statistics.
Inventory - Creates an API and useful read model for the statistics.
Client - A front-end for consuming the statistics.

Ingest

The ingest API is responsible for querying and dispatching new video assets into the pipeline. The service maintains a minimal read model to keep track of videos that have already been discovered.

To ensure downstream services have access to the full range of metadata and encoded versions of the asset a fixed duration of time after publishing must pass, before the asset is considered discoverable.

Asset ripper

This service is responsible for downloading a segment of the target show and slicing out a number of individual frames for further analysis. Commands dispatched to this service can either be a video ID or URL. Once assets are stored, their location and metadata are recorded.

This service is deployed with a custom Dockerfile that brings in some additional dependencies:

yt-dlp - A python package written to download YouTube videos and metadata.
ffmpeg - The swiss army knife of video, used to extract individual frames.

Frame analysis

This service is dispatched commands to do analysis on individual frames. The following takes place during analysis:

A frame is taken as input.
1. The service was tested with 35 random samples from the corpus.
2. The samples were pre-labelled or validated at each stage of the analysis with tests specifically suffixed with " DataBuilder".
3. These tests assisted with an exploratory approach to understanding the data, and are thus distinct from the other tests, which have much more focused test cases.
Frames are preprocessed.
1. The center area is cropped.
2. A binary threshold is applied to clear up noise.
OCR is applied to classify the frame as interesting or not.
1. This service runs a classification process as a cost saving measure, since detailed analysis with the more accurate Textract service is costly.
2. The OCR document is fuzzy matched to certain trigger words.
If classified as interesting, Textract is used for a more accurate OCR.
1. The results include words, tables and geometry of detected words.
The geometry of certain words are used to locate the arrows indicating if a figure represents a win or a loss.
1. A traditional algo is applied to detect if a shape is an up or down arrow.
The extracted statistics are recorded.

Inventory

The data model fits into the following hierarchy and relationships:

The inventory builds a read model using a DynamoDB table. On-demand pricing keeps costs minimal, because of the low volume of writes, while CDNs with a high TTL can protect the read-workload.

The schema uses a single-table design, to support one-shot fetching of related entities. Partitions are designed along the axis of operator, show, player and stat type to support the following queries:

Visually each partition organises according the these access patterns in the following way:

The key schema to build these partitions is documented below:

export type ShowStorage = Show & {
    entity_type: "show";
    pk: `operator#${OperatorId}`;
    sk: `show#date#${Date}#slug#${ShowId}#`;
    gsi1pk: `slug#${ShowId}`;
    gsi1sk: "show#";
};
type PlayerAppearanceStorage = PlayerAppearance & {
    entity_type: "player_appearance";
    pk: `player#${PlayerId}`;
    sk: `appearance#slug#${ShowId}#`;
    gsi1pk: `slug#${ShowId}`;
    gsi1sk: `appearance#player#${PlayerId}#`;
};
export type StatStorage = Stat & {
    entity_type: "player_stat";
    pk: `player#${PlayerId}`;
    sk: `stat#stat_type#${StatType}#slug#${ShowId}#`;
    gsi1pk: `slug#${ShowId}`;
    gsi1sk: `stat#stat_type#${StatType}#player#${PlayerId}#`;
    gsi2pk: `stat_type#${StatType}`;
    gsi2sk: `stat#player#${PlayerId}#slug#${ShowId}#`;
};

Some of the most interesting insights come from the aggregate of data points spanning the whole dataset. For the volume of data produced by a single operator, each partition could grow by some order of magnitude before impacting query performance. Querying for all data points then aggregating on demand, may eventually prove to not scale, but works for the volume of data in the foreseeable future.

Client

The client is an SPA deployed to an S3 bucket using the following key libraries:

swr for data fetching.
Next.js using the export bundling mode (SPA with no SSR or server component).
Chakra UI as a component library.

The client also ships with a debug mode that is switched on globally to show contextually relevant information for helping to debug and observe behaviour in the app. It shows responses from the inventory API for the current page and links directly to the logs of services, filtered by the content asset you are looking at:

This mode is activated by spamming the shift key in rapid succession, inspired by the activation of Windows XP accessibility feature "sticky keys".

Infrastructure

Where possible, all components use on-demand pricing (DynamoDB, ECS, Fargate), to keep costs low when infrastructure is idle.

Infrastructure-as-code

This project deploys to AWS using infrastructure-as-code via a number of CDK Stacks. CDK provides constructs at varying degrees of abstraction for orchestrating the creation of AWS services, using CloudFormation templates as an intermediary.

Observability

CloudWatch and X-Ray provide the foundation for logging and traces respectively. The "Trace Map" feature provides a useful visualisation for tracking down the root cause of errors as commands and events propagate through services:

The Results

The results are a number of leaderboards refreshed daily, with the ability to drill down on the players and shows that are interesting. Currently hosted at poker.sam152.com.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.husky		.husky
asset-ripper		asset-ripper
client		client
common		common
docs		docs
frame-analysis		frame-analysis
infra		infra
ingest		ingest
inventory		inventory
pipeline		pipeline
.dockerignore		.dockerignore
.gitignore		.gitignore
.prettierignore		.prettierignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tracking Poker

Architecture

Ingest

Asset ripper

Frame analysis

Inventory

Client

Infrastructure

Infrastructure-as-code

Observability

The Results

About

Languages

Sam152/tracking-poker

Folders and files

Latest commit

History

Repository files navigation

Tracking Poker

Architecture

Ingest

Asset ripper

Frame analysis

Inventory

Client

Infrastructure

Infrastructure-as-code

Observability

The Results

About

Resources

Stars

Watchers

Forks

Languages