WIP
This repo was put together for an informal talk about strongly typed functional programming.
I'm using Haskell, an advanced, purely functional programming language as an elegant vehicle for communication of ideas. Unsurprisingly, it has the advantage that its syntax is particularly well suited for functional programming. This is not intended as a Haskell tutorial and many language details are simply brushed under the carpet, so to speak.
A couple of disclaimers:
- I assume the reader has been introduced to programming and functional programming in particular. I try to explain concepts as I go along, but there is a lot of implicit knowledge here.
- I have no intentions of being original here. I'll do my best to attribute credit where it's due.
This app is a very simple web service with only one route: POST /order. It accepts an order of Star Trek action figures, and runs a PlaceOrder
workflow that checks the validity of the order and publishes events for other hypothetical services to consume.
Try running the server and make a request with this payload (see here how to run the server):
{
"customerId": 1,
"orderLines": [
{
"productName": "Picard",
"quantity": 3
}
],
"shippingAddress": "Earth"
}
For the different characters available, you can check the file /db/products.txt
.
A successful request will be answered with the events created, for demonstration purposes.
The allure of FP, to me, is threefold:
- expressiveness: it tries to separate the meaning of a program from its implementation. When this is achieved, the implementation can be optimized (be it at application or compiler level, for example) to different realities (e.g, parallelism) without changing the meaning of the program. Take the
map
function: "apply a functionf
to every value in a data structured
, leaving the structure intact" expresses meaning, but says nothing about how to do it or what's the structure ofd
(is it a list, a tree or something else? If a list, should we do it from left-to-right, all-at-the-same-time or any other way?). - tackles complexity through modularity: given by laziness and function composition in general and more specifically by higher-order functions, modularity allows building big programs out of smaller ones, reusing and modifying components with a high degree of freedom. John Hughes makes a stronger case than I ever will in his paper Why Functional Programming Matters.
- correctness: this can be controversial, but I always feel software correctness is at the heart of most discussions around FP (in particularly in Haskell and other languages gravitating around it) in a way I just don't see in other ecosystems. Be it via testing, FP introduced property-based testing, or via correctness-by-construction: Haskell's type-system is one of the most rich of industry-grade languages and has been like that since its beginning, and new languages like Agda are breaking new ground with regards to type-safety. Also, see the discussion around effect systems (and maybe discover a new language).
The discussion around programming styles is surely an old one. I always suggest reading John Backus' 1977 Turing Award Lecture Can Programming Be Liberated from the von Neumann Style? for putting modern languages in perspective.
Programming as if functions truly matter. (Scott Wlaschin, author of Domain Modeling Made Functional)
The picture above is a call graph of our web service. Call graphs are a very useful tool to assess the architecture of an application as it lets us see how we are composing our functions.
We can distinguish three categories of objects in the call graph above:
- in blue (or is it green?), we have the functions pertaining to the
PlaceOrder
workflow. These are all pure functions combined into theplaceOrder
function. This function sequencesvalidateOrder
,priceOrder
andcreateEvents
, piping through them the incomingorder
request. - in light pink, we have some peripheric functions, such as DTO or HTTP error response constructors. These are not part of our core logic, but work at the edges of our domain. All of them are pure functions.
- finally, in orange, we perform IO to fetch dependencies, generate random ids, save events, and return an HTTP response back, among other things.
There are two very important things to note here:
- just a small, outer part of our application actually performs IO. Importantly, all our domain logic is pure.
- IO is insidious.
pushEvents
is colored in orange, but the call to the DB is done insaveEvents
. Any region of our call graph depending on impure code is itself impure: ifsaveEvents
is impure, so arepushEvents
,handlePlaceOrder
andmain
.
Let's start by analysing a simple function:
add :: Int -> Int -> Int
add x y = x + y
There are several things in the function definition above that may surprise you, if you're not used to it. Let's take it apart:
add :: Int -> Int -> Int
The above is a type definition. We can think of add
as a function which takes 2 arguments of type Int
and returns a value of the same type.
In reality that is not quite true... In Haskell, all functions take only 1 argument. That's because Haskell is based on lambda calculus. Let me show you an equivalent type definition:
add :: Int -> (Int -> Int)
add
is actually a function (or a lambda abstraction) that goes from Int
to another function Int -> Int
. Because of right-side associativity, there is no need for the parenthesis in this case, though sometimes they are useful to convey intent.
Let's now focus on the value definition of the add
function:
add x y = x + y
Above, we're saying: add
takes arguments x
and y
and returns x + y
.
But... how come?! We've just said that add
only takes 1 argument! That's because we're being (very conveniently) tricked by a sugary layer of the syntactic kind. Let's desugar it:
add = \x -> \y -> x + y
In Haskell, all functions are considered curried, and both definitions above are equivalent. \x -> y
is the syntax for what is sometimes called an anonymous function. The \
slash stands for a λ
. Equivalent code in Javascript, with similar syntax, for comparison:
// javascript
const add = x => y => x + y
By the way, there is no need in Haskell for qualifiers such as const
: all values are immutable.
In a functional language, function application is the most common operation. It makes sense to unclutter it, so application is defined with a space
:
result = add 9 3 -- 12
And remember what we said about curried functions:
add :: Int -> Int -> Int
add x y = x + y
add9 :: Int -> Int
add9 = add 9
result :: Int
result = add9 3 -- 12
Because we define functions as having the arguments that better express our intent (add
takes arguments x
and y
), but they're secretly curried, the pattern of partial application arises. This is very useful for dependency injection, for example:
-- priceOrder is built to be used in a pipeline which transforms Orders
priceOrder :: ProductMap -> ValidatedOrder -> Either Text PricedOrder
-- ProductMap is a dependency, 'injected' by partial application
priceOrderWithProductMap :: ValidatedOrder -> Either Text PricedOrder
priceOrderWithProductMap = priceOrder productMap
-- ^ is now ready to be included in a pipeline through which orders flow
TODO
TODO
TODO
Let's go back to the call graph for our web service. The orange ovals is where IO is being performed. Those seem like functions, but they're really not: they're Actions. Haskell is purely functional, meaning, all functions are pure. So, if those were functions, someone would be lying here, and we don't want that. IO actions are computed by pure functions and all you can do with an action is producing a new action from its output, or, if its output is of no interest, just sequencing it with another action!
TODO
When building programs, we very often need to build data with some structure. Haskell let's us express structured, composable data with Algebraic Data Types. As we can see next, this powerful feature allows the developer to build big types out of small ones, like Lego, combining them at will in order to declare sofisticated domains.
We can see types as sets of values with a certain structure. We usually say a value inhabits a given type if it belongs to the set of values of that type. For instance, the type Bool
is inhabited by 2 values: True
and False
. We say the size of Bool
is 2. The type String
is inhabited by all possible sequences of characters, the type Int32
by all signed integers representable with 32 bits, and so on.
Haskell's type system is a nominal type system. All values are originally produced by some data constructor, which determines their type. This is in contrast with, for example, the typing system used by Typescript where two values are of the same type if their structure (e.g, the entries of an object) is the same.
A nominal system is in some ways less flexible than a structural one, but brings better type-safety by preventing accidental type equivalence.
Data constructors are functions which return a value that inhabits the type they belong to. The 'mark' of the data constructor remains with the value throughout its life and gives rise to a very convenient idiom called pattern matching, which we'll see in a moment.
The building blocks of Algebraic Data Types are Sums and Products.
Bool
is a sum type and True
and False
are its data constructors. In this case, True
and False
take no arguments, and are thus called nullary constructors. Any given value of type Bool
can either be True
or False
, not both!
data Bool = True | False
Maybe a
is also a sum type. It is used to express the possibility of something not existing. For instance, if we want to parse a string to obtain an integer, we need to account for the possibility that the parsing can fail. This could be expressed by a function that returns a Maybe Int
.
data Maybe a = Nothing | Just a
The a
in Maybe a
is a type variable, also known sometimes as a generic. Just
is a (unary) data constructor that takes a value of type a
and returns a value of type Maybe a
. Some of the values inhabiting Maybe Int
are the following: Nothing
, Just (-2345)
, Just 0
, Just 1
, Just 42
, etc. Some values not inhabiting Maybe Int
are: Just "foo"
, Just True
, Just 3.14159
. These last ones are not acceptable because we are fixing the type variable a
(in Maybe a
) to be an Int
.
The size of Maybe Int
is the sum of all Int
s (in Just Int
) plus 1 (for Nothing
).
Either a b
is another sum type.
data Either a b = Left a | Right b
Following the same logic above, its size is the sum of all possible a
s and all possible b
s.
Either a b
can be used to express a computation that can have one of two types of outcome. If we want to tokenize a string of code, for example, it may be interesting to either return, say, a list of tokens or an error message signalling at which character the parsing failed. This could be achieved with Either Error (List Token)
, where Error
could be an alias for String
.
A tuple is the canonical product type:
data (,) a b = (,) a b
It can be a bit hard to see it immediately, so let's define an equivalent type:
data Tuple a b = MkTuple a b
A Tuple
is a type with only one data constructor MkTuple
. As always, this data constructor is a function. In this case, one which, given two arguments of type a
and b
, returns a value which inhabits the type Tuple a b
.
Tuples are often used when we have heterogenous data that we want to package together. We may want to package arguments to a function that outputs a formatted string with the name and age of a person: Tuple Name Age
, or simply (Name, Age)
if we choose to use the canonical (,)
type.
Naturally, we are free to produce our own product types. Also note that their data constructor can have as many parameters as we wish:
-- these are simple type aliases and are mainly
-- used to make our code more expressive
type Name = String
type Age = Int
type PlaceOfBirth = String
data Person = Person Name Age PlaceOfBirth
-- ^ type ^ data constructor
-- types and data constructors live in different namespaces and can be named the same
We can have as many Person
values as the possible combinations of Name
, Age
and PlaceOfBirth
. Thus, the size of the type Person
can be calculated as the product of the sizes of the arguments to its data constructor. Put simply: size of Person
= (size of Name
) * (size of Age
) * (size of PlaceOfBirth
).
In Haskell, records are not first class citizens. This has been a somewhat contentious topic, but some recent developments have made operating with records more ergonomic, although we won't go into detail here.
In Haskell, a record is just a product type with some syntactic sugar to allow named fields:
data Person = Person {
name :: Name,
age :: Age,
placeOfBirth :: PlaceOfBirth
}
This type is the same type we declared before. Only now we can access the fields of a Person
value by name. name
, age
and placeOfBirth
are functions that, when applied to a Person
, return the respective field.
The ::
symbol translates to "of type". name :: Name
reads "name of type Name".
Now that we have Sums and Products we can create more complex types by combining them!
data Role = Admin | Level0 | Level1
data Email = UnvalidatedEmail String | ValidatedEmail String
data User = User {
name :: String,
role :: Role
email :: Email
}
Some data structures are recursive in the sense that its parts can have the same structure as the whole. A list is a good example. Another would be trees.
data List a = Nil | Cons a (List a)
myList :: List Int -- myList has type List Int
myList = Cons 3 (Cons 2 (Cons 1 Nil))
data Tree a = Leaf a | Branch (Tree a) (Tree a)
myTree :: Tree String -- myTree has type Tree String
myTree = Branch (Leaf "A") (Branch (Leaf "B") (Leaf "C"))
As said before, every value carries a "birth mark", the mark of the data constructor that produced it. As long as the data constructor is in scope (just a detail for now, but something we will come back to), we can unpack a value, match on its constructor and extract the arguments that were once fed to it.
data Maybe a = Nothing | Just a
maybeIntToString :: Maybe Int -> String
maybeIntToString mx =
case mx of
Nothing -> "Nothing to see here!"
Just x -> "The number is " <> show x
stringX = maybeIntToString Nothing -- "Nothing to see here!"
stringY = maybeIntToString (Just 42) -- "The number is 42"
TODO
TODO
There is a Dockerfile included that can be used to build a Docker image and run the app containerized.
This is a good option if you don't have and don't want the Haskell toolchain installed in your machine. The Dockerfile leverages caching so you should be able to experiment with the source code without having to compile the whole application each time.
The following command will build the image and run the container. The app will be listening for requests on port 3000.
$ docker build . -t star-trek-haskell && docker run -t -p 3000:3000 star-trek-haskell
Remember to re-run the above command everytime you change the source code.
Run cabal run
. This should (re)compile the application and run it immediately after.
For most systems, the easiest way to install the Haskell toolchain is by installing GHCUp.
Once that is done, use it to install Cabal, the Haskell build and package system.