Skip to content

Commit

Permalink
Run with mkdocs build
Browse files Browse the repository at this point in the history
  • Loading branch information
pflooky committed Jul 13, 2023
1 parent 966dccb commit 2753c59
Show file tree
Hide file tree
Showing 67 changed files with 12,696 additions and 11 deletions.
10 changes: 5 additions & 5 deletions docs/tech/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ There are many options available for you to use when you have a scenario when da

1. Create expression [datafaker](https://www.datafaker.net/documentation/expressions/)
1. Can be used to create names, addresses, or anything that can be found
under [here](tech/sample/datafaker/expressions.txt)
under [here](sample/datafaker/expressions.txt)
2. Create regex

## Foreign keys across data sets
Expand All @@ -27,7 +27,7 @@ sinkOptions:
- "transaction-cassandra.transactions.account_id"
```
[Sample can be found here.](tech/sample/plan/foreign-key-example-plan.yaml)
[Sample can be found here.](sample/plan/foreign-key-example-plan.yaml)
You can define any number of foreign key relationships as you want.
## Edge cases
Expand Down Expand Up @@ -57,8 +57,8 @@ You can alter the `status` column in the account data to only generate `open` ac
and define a foreign key between Postgres and parquet to ensure the same `account_id` is being used.
Then in the parquet task, define 1 to 10 transactions per `account_id` to be generated.

[Postgres account generation example task](tech/sample/task/jdbc/postgres/postgres-account-task.yaml)
[Parquet transaction generation example task](tech/sample/task/file/parquet/parquet-transaction-task.yaml)
[Plan](tech/sample/plan/scenario-based-plan.yaml)
[Postgres account generation example task](sample/task/jdbc/postgres/postgres-account-task.yaml)
[Parquet transaction generation example task](sample/task/file/parquet/parquet-transaction-task.yaml)
[Plan](sample/plan/scenario-based-plan.yaml)

## Generating JSON data
2 changes: 1 addition & 1 deletion docs/tech/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

## Run with custom data connections

1. Use sample `application.conf` from [here](../../app/src/main/resources/application.conf) and put under folder `/tmp/datagen`
1. Use sample `application.conf` from [here](sample/conf/application.conf) and put under folder `/tmp/datagen`
1. `cp app/src/main/resources/application.conf /tmp/datagen`
2. Fill in details of data connections as found [here](connections.md)
3. `docker run -v /tmp/datagen:/opt/app/data-caterer -e APPLICATION_CONFIG_PATH=/opt/app/datagen/application.conf pflookyy/data-caterer:0.1`
15 changes: 10 additions & 5 deletions docs/tech/generators.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,18 @@ descriptions:
|------------|---------|-------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| minLen | 1 | minLen: "2" | Ensures that all generated strings have at least length `minLen` |
| maxLen | 10 | maxLen: "15" | Ensures that all generated strings have at most length `maxLen` |
| expression | <empty> | expression: "#{Name.name}"<br/> expression:"#{Address.city}/#{Demographic.maritalStatus}" | Will generate a string based on the faker expression provided. All possible faker expressions can be found [here](tech/sample/datafaker/expressions.txt)<br/> Expression has to be in format `#{<faker expression name>}` |
| expression | <empty> | expression: "#{Name.name}"<br/> expression:"#{Address.city}/#{Demographic.maritalStatus}" | Will generate a string based on the faker expression provided. All possible faker expressions can be found [here](sample/datafaker/expressions.txt)<br/> Expression has to be in format `#{<faker expression name>}` |
| enableNull | false | enableNull: "true" | Enable/disable null values being generated |

**Edge cases**: ("", "\n", "\r", "\t", " ", "\\u0000", "\\ufff")

### Numeric

For all the numeric data types, there are 4 options to choose from: min, minValue, max and maxValue.
Generally speaking, you only need to define one of min or minValue, similarly with max or maxValue.
The reason why there are 2 options for each is because of when metadata is automatically gathered, we gather the statistics of the observed min and max values. Also, it will attempt to gather any restriction on the min or max value as defined by the data source (i.e. max value as per database type).
The reason why there are 2 options for each is because of when metadata is automatically gathered, we gather the
statistics of the observed min and max values. Also, it will attempt to gather any restriction on the min or max value
as defined by the data source (i.e. max value as per database type).

#### Integer/Long/Short/Decimal

Expand All @@ -62,7 +65,7 @@ The reason why there are 2 options for each is because of when metadata is autom

**Edge cases Integer**: (2147483647, -2147483648, 0)
**Edge cases Long/Decimal**: (9223372036854775807, -9223372036854775808, 0)
**Edge cases Short**: (32767, -32768, 0)
**Edge cases Short**: (32767, -32768, 0)

#### Double/Float

Expand All @@ -73,7 +76,8 @@ The reason why there are 2 options for each is because of when metadata is autom
| maxValue | 1000.0 | maxValue: "25.9" | Ensures that all generated values are less than or equal to `maxValue` |
| max | 1000.0 | max: "25.9" | Ensures that all generated values are less than or equal to `maxValue`. If `maxValue` is defined, `maxValue` will define the largest possible generated value |

**Edge cases Double**: (+infinity, 1.7976931348623157e+308, 4.9e-324, 0.0, -0.0, -1.7976931348623157e+308, -infinity, NaN)
**Edge cases Double**: (+infinity, 1.7976931348623157e+308, 4.9e-324, 0.0, -0.0, -1.7976931348623157e+308, -infinity,
NaN)
**Edge cases Float**: (+infinity, 3.4028235e+38, 1.4e-45, 0.0, -0.0, -3.4028235e+38, -infinity, NaN)

### Date
Expand All @@ -85,7 +89,8 @@ The reason why there are 2 options for each is because of when metadata is autom
| enableNull | false | enableNull: "true" | Enable/disable null values being generated |

**Edge cases**: (0001-01-01, 1582-10-15, 1970-01-01, 9999-12-31)
(Reference: https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala#L206)
(
Reference: https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala#L206)

### Timestamp

Expand Down
Loading

0 comments on commit 2753c59

Please sign in to comment.