Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ETL exercise #76

Merged
merged 4 commits into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,14 @@
"practices": [],
"prerequisites": [],
"difficulty": 8
},
{
"slug": "etl",
"name": "ETL",
"uuid": "63792af3-0fce-4c90-8cc3-4844ad6b6861",
"practices": [],
"prerequisites": [],
"difficulty": 5
}
]
},
Expand Down
27 changes: 27 additions & 0 deletions exercises/practice/etl/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Instructions

Your task is to change the data format of letters and their point values in the game.

Currently, letters are stored in groups based on their score, in a one-to-many mapping.

- 1 point: "A", "E", "I", "O", "U", "L", "N", "R", "S", "T",
- 2 points: "D", "G",
- 3 points: "B", "C", "M", "P",
- 4 points: "F", "H", "V", "W", "Y",
- 5 points: "K",
- 8 points: "J", "X",
- 10 points: "Q", "Z",

This needs to be changed to store each individual letter with its score in a one-to-one mapping.

- "a" is worth 1 point.
- "b" is worth 3 points.
- "c" is worth 3 points.
- "d" is worth 2 points.
- etc.

As part of this change, the team has also decided to change the letters to be lower-case rather than upper-case.

~~~~exercism/note
If you want to look at how the data was previously structured and how it needs to change, take a look at the examples in the test suite.
~~~~
16 changes: 16 additions & 0 deletions exercises/practice/etl/.docs/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Introduction

You work for a company that makes an online multiplayer game called Lexiconia.

To play the game, each player is given 13 letters, which they must rearrange to create words.
Different letters have different point values, since it's easier to create words with some letters than others.

The game was originally launched in English, but it is very popular, and now the company wants to expand to other languages as well.

Different languages need to support different point values for letters.
The point values are determined by how often letters are used, compared to other letters in that language.

For example, the letter 'C' is quite common in English, and is only worth 3 points.
But in Norwegian it's a very rare letter, and is worth 10 points.

To make it easier to add new languages, your team needs to change the way letters and their point values are stored in the game.
22 changes: 22 additions & 0 deletions exercises/practice/etl/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"authors": [
"Steffan153"
],
"files": {
"solution": [
"etl.sql"
],
"test": [
"etl_test.sql"
],
"example": [
".meta/example.sql"
],
"editor": [
"data.csv"
]
},
"blurb": "Change the data format for scoring a game to more easily add other languages.",
"source": "Based on an exercise by the JumpstartLab team for students at The Turing School of Software and Design.",
"source_url": "https://turing.edu"
}
10 changes: 10 additions & 0 deletions exercises/practice/etl/.meta/example.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
UPDATE etl
SET result = (
SELECT json_group_object(LOWER(value), TRIM(path, '$."') + 0)
FROM (
SELECT value, path
FROM json_tree(input)
WHERE type = 'text'
ORDER BY value
)
);
22 changes: 22 additions & 0 deletions exercises/practice/etl/.meta/tests.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# This is an auto-generated file.
#
# Regenerating this file via `configlet sync` will:
# - Recreate every `description` key/value pair
# - Recreate every `reimplements` key/value pair, where they exist in problem-specifications
# - Remove any `include = true` key/value pair (an omitted `include` key implies inclusion)
# - Preserve any other key/value pair
#
# As user-added comments (using the # character) will be removed when this file
# is regenerated, comments can be added via a `comment` key.

[78a7a9f9-4490-4a47-8ee9-5a38bb47d28f]
description = "single letter"

[60dbd000-451d-44c7-bdbb-97c73ac1f497]
description = "single score with multiple letters"

[f5c5de0c-301f-4fdd-a0e5-df97d4214f54]
description = "multiple scores with multiple letters"

[5db8ea89-ecb4-4dcd-902f-2b418cc87b9d]
description = "multiple scores with differing numbers of letters"
105 changes: 105 additions & 0 deletions exercises/practice/etl/canonical-data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
{
"exercise": "etl",
"comments": [
"Transforms a set of legacy Lexiconia data stored as letters per score",
"to a set of data stored score per letter.",
"Note: The expected input data for these tests should have",
"integer keys (not stringified numbers as shown in the JSON below",
"Unless the language prohibits that, please implement these tests",
"such that keys are integers. e.g. in JavaScript, it might look ",
"like `transform( { 1: ['A'] } );`"
],
"cases": [
{
"uuid": "78a7a9f9-4490-4a47-8ee9-5a38bb47d28f",
"description": "single letter",
"property": "transform",
"input": {
"legacy": {
"1": ["A"]
}
},
"expected": {
"a": 1
}
},
{
"uuid": "60dbd000-451d-44c7-bdbb-97c73ac1f497",
"description": "single score with multiple letters",
"property": "transform",
"input": {
"legacy": {
"1": ["A", "E", "I", "O", "U"]
}
},
"expected": {
"a": 1,
"e": 1,
"i": 1,
"o": 1,
"u": 1
}
},
{
"uuid": "f5c5de0c-301f-4fdd-a0e5-df97d4214f54",
"description": "multiple scores with multiple letters",
"property": "transform",
"input": {
"legacy": {
"1": ["A", "E"],
"2": ["D", "G"]
}
},
"expected": {
"a": 1,
"d": 2,
"e": 1,
"g": 2
}
},
{
"uuid": "5db8ea89-ecb4-4dcd-902f-2b418cc87b9d",
"description": "multiple scores with differing numbers of letters",
"property": "transform",
"input": {
"legacy": {
"1": ["A", "E", "I", "O", "U", "L", "N", "R", "S", "T"],
"2": ["D", "G"],
"3": ["B", "C", "M", "P"],
"4": ["F", "H", "V", "W", "Y"],
"5": ["K"],
"8": ["J", "X"],
"10": ["Q", "Z"]
}
},
"expected": {
"a": 1,
"b": 3,
"c": 3,
"d": 2,
"e": 1,
"f": 4,
"g": 2,
"h": 4,
"i": 1,
"j": 8,
"k": 5,
"l": 1,
"m": 3,
"n": 1,
"o": 1,
"p": 3,
"q": 10,
"r": 1,
"s": 1,
"t": 1,
"u": 1,
"v": 4,
"w": 4,
"x": 8,
"y": 4,
"z": 10
}
}
]
}
8 changes: 8 additions & 0 deletions exercises/practice/etl/create_fixture.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
DROP TABLE IF EXISTS "etl";
CREATE TABLE "etl" (
"input" TEXT,
"result" TEXT
);

.mode csv
.import ./data.csv etl
4 changes: 4 additions & 0 deletions exercises/practice/etl/data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"{""1"":[""A""]}",""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other maintainers, what are you thoughts on what is essentially a JSON exercise on a SQL track?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m on the fence. It’s not a great fit since we’re wrangling JSON at the end of the day and not SQL data. The input data makes sense to me as a data structure but not a table of data. There are a limited number of ways to solve it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is, since this is an SQLite track, we're trying to teach about its features, so doesn't it make sense to have an exercise where you practice handling JSON? Because JSON is actually used to store values in databases.

"{""1"":[""A"",""E"",""I"",""O"",""U""]}",""
"{""1"":[""A"",""E""],""2"":[""D"",""G""]}",""
"{""1"":[""A"",""E"",""I"",""O"",""U"",""L"",""N"",""R"",""S"",""T""],""2"":[""D"",""G""],""3"":[""B"",""C"",""M"",""P""],""4"":[""F"",""H"",""V"",""W"",""Y""],""5"":[""K""],""8"":[""J"",""X""],""10"":[""Q"",""Z""]}",""
2 changes: 2 additions & 0 deletions exercises/practice/etl/etl.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
-- Schema: CREATE TABLE "etl" ("input" TEXT, "result" TEXT);
Steffan153 marked this conversation as resolved.
Show resolved Hide resolved
-- Task: update the etl table and set the result based on the input field.
29 changes: 29 additions & 0 deletions exercises/practice/etl/etl_test.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
-- Setup test table and read in student solution:
.read ./test_setup.sql

-- Test cases:
INSERT INTO tests (name, uuid,
input, expected)
IsaacG marked this conversation as resolved.
Show resolved Hide resolved
VALUES
('single letter', '78a7a9f9-4490-4a47-8ee9-5a38bb47d28f',
'{"1":["A"]}',
'{"a":1}'),
('single score with multiple letters', '60dbd000-451d-44c7-bdbb-97c73ac1f497',
'{"1":["A","E","I","O","U"]}',
'{"a":1,"e":1,"i":1,"o":1,"u":1}'),
('multiple scores with multiple letters', 'f5c5de0c-301f-4fdd-a0e5-df97d4214f54',
'{"1":["A","E"],"2":["D","G"]}',
'{"a":1,"d":2,"e":1,"g":2}'),
('multiple scores with differing numbers of letters', '5db8ea89-ecb4-4dcd-902f-2b418cc87b9d',
'{"1":["A","E","I","O","U","L","N","R","S","T"],"2":["D","G"],"3":["B","C","M","P"],"4":["F","H","V","W","Y"],"5":["K"],"8":["J","X"],"10":["Q","Z"]}',
'{"a":1,"b":3,"c":3,"d":2,"e":1,"f":4,"g":2,"h":4,"i":1,"j":8,"k":5,"l":1,"m":3,"n":1,"o":1,"p":3,"q":10,"r":1,"s":1,"t":1,"u":1,"v":4,"w":4,"x":8,"y":4,"z":10}');


-- Comparison of user input and the tests updates the status for each test:
UPDATE tests
SET status = 'pass'
FROM (SELECT input, result FROM "etl") AS actual
WHERE (actual.input, json(actual.result)) = (tests.input, json(tests.expected));
Steffan153 marked this conversation as resolved.
Show resolved Hide resolved

-- Write results and debug info:
.read ./test_reporter.sql
16 changes: 16 additions & 0 deletions exercises/practice/etl/test_reporter.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
-- Update message for failed tests to give helpful information:
UPDATE tests
SET message = 'Result for ' || actual.input || ' is: ' || actual.result || ', but should be: ' || tests.expected
FROM (SELECT input, result FROM etl) AS actual
WHERE actual.input = tests.input AND tests.status = 'fail';

-- Save results to ./output.json (needed by the online test-runner)
.mode json
.once './output.json'
SELECT name, status, message, output, test_code, task_id
FROM tests;

-- Display test results in readable form for the student:
.mode table
SELECT name, status, message
FROM tests;
25 changes: 25 additions & 0 deletions exercises/practice/etl/test_setup.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
-- Create database:
.read ./create_fixture.sql

-- Read user student solution and save any output as markdown in user_output.md:
.mode markdown
.output user_output.md
.read ./etl.sql
.output

-- Create a clean testing environment:
DROP TABLE IF EXISTS tests;
CREATE TABLE IF NOT EXISTS tests (
-- uuid and name (description) are taken from the test.toml file
uuid TEXT PRIMARY KEY,
name TEXT NOT NULL,
-- The following section is needed by the online test-runner
status TEXT DEFAULT 'fail',
message TEXT,
output TEXT,
test_code TEXT,
task_id INTEGER DEFAULT NULL,
-- Here are columns for the actual tests
input TEXT NOT NULL,
expected TEXT NOT NULL
);