Recently I have been learning Idris: a functional language similar to Haskell with dependent types added to the mix. Writing a type-safe printf formatter function turns out to be a great exercise of how a language with dependent types can be utilized. I will show how writing our own printf
implementation can be type-safe at compile and runtime.
If you are familiar with printf
in other languages, you may know that printf takes a variable number of arguments of varying types matching to corresponding format specifiers inside a format string:
printf("Hello %s -- your age is: %d\n", userName, age)
In this case, the first argument userName
needs to be a string (specified by %s
), and the second argument age
needs to be an integer (specified by %d
) in that order. For printf to be type-safe, providing an incorrect number of arguments or types that do not match the corresponding format specifiers must be rejected.
While compilers for some languages may hardcode the logic for making such formatter functions type-safe, Idris allows programmers to write the type safety validation as part of the function’s implementation.
We will focus on writing a printf function that returns a resulting formatted String
rather than writing contents to a file stream as is the case in other languages. Let us begin implementing our printf:
-- Type annotation of printf
printf : (str : String) -> ?printf_type
-- Implementation of printf
printf str = ?printf_rhs
Here we are beginning to define and implement a function named printf
that takes a String
that we bind to str
name as an argument. That is all we really know currently.
Names prefixed with ?
are type holes – we have not determined what values to insert yet but the compiler will allow us to insert holes that enables us to type-check the rest of our program. In the case of printf_type
, we do not yet know what additional types (which themselves are values of type Type
) printf
may need to accept before producing an output.
We do have some intuition regarding printf
's type though. If we specify a formatter string like "Hello world!"
then ?printf_type
should be String
. On the other hand if we specify a formatter string like "%d coins"
then ?printf_type
should be Int -> String
indicating that an additional Int
argument is required before outputting a final formatted String
output.
We can see that the type of printf
depends on the value of the input str
– this a case for dependent types!
Let us leave printf
for now and start with parsing through a list of characters from a format string:
parse : List Char -> ?parse_type
(Note unlike in Haskell, List Char
is not the same as String
in Idris)
We are not sure what we want to return yet so we will insert ?parse_type
for now. Using interactive editing support, Idris can fill out a skeleton for the body:
parse : List Char -> ?parse_type
parse [] = ?parse_rhs_1
parse (x :: xs) = ?parse_rhs_2
Two cases have been filled out for us: one where the list of characters is empty, and another where we can analyze the head element being concatenated to the tail of the list. We will add two additional cases for handling %d
and %s
occurring in our list:
parse : List Char -> ?parse_type
parse [] = ?parse_rhs_1
parse ('%' :: 'd' :: xs) = ?parse_rhs_3
parse ('%' :: 's' :: xs) = ?parse_rhs_4
parse (x :: xs) = ?parse_rhs_2
(Note Idris will try to pattern match these cases on the passed argument from the order they are defined – from top to bottom)
Let us determine what ?parse_type
should be – we will define a data structure representing our format succinctly:
data Format = Number Format | Str Format | End
The three constructors for this Format
data structure are a Number
which takes a Format
argument recursively, a Str
which takes a Format
argument recursively, and lastly End
which denotes the end of the format.
Parsing "%d%s%d"
for example should result in Number (Str (Number End))
.
We still have to include string literals in our Format
structure for when we have to build our formatted string. We will achieve this by adding our own Prefix
(synonym for String
) to each constructor:
-- A type level function that takes no input and outputs String which is of type Type
Prefix : Type
Prefix = String
data Format = Number Prefix Format | Str Prefix Format | End Prefix
So now we will want to parse "%d points for %s."
to Number "" (Str " points for " (End "."))
.
(For simplicity, we are not creating a distinct constructor for string literals because only format specifiers affect the type of printf
, and we may not care about string literals when validating specific format structures as shown later when performing runtime validation)
We can now begin filling in holes to parse
:
parse : List Char -> Format
parse [] = End ?end_prefix
parse ('%' :: 'd' :: xs) = Number ?number_prefix (parse xs)
parse ('%' :: 's' :: xs) = Str ?str_prefix (parse xs)
parse (x :: xs) = parse xs
We left some holes for filling out the prefix. We will need to pass and build an accumulator to keep track of the current prefix:
parse : List Char -> String -> Format
parse [] prefix_acc = End prefix_acc
parse ('%' :: 'd' :: xs) prefix_acc = Number prefix_acc (parse xs "")
parse ('%' :: 's' :: xs) prefix_acc = Str prefix_acc (parse xs "")
parse (x :: xs) prefix_acc = parse xs (prefix_acc ++ ?to_str x)
If we parse a Number
or Str
we reset the prefix accumulator to ""
upon the next recursive call. We left a ?to_str
hole for converting x : Char
to a String
. We can use a built in function strCons : Char -> String -> String
to fill this:
parse (x :: xs) prefix_acc = parse xs (prefix_acc ++ strCons x "")
We will also create an additional function parseFormat
calling parse
with an empty prefix accumulator and using the built-in unpack
function to convert from String -> List Char
.
parseFormat : String -> Format
parseFormat xs = parse (unpack xs) ""
Back to filling the ?printf_type
hole in printf
, we now know how to go from String -> Format
, so we now need to figure out how to go from Format -> Type
. We can write such a type level function:
PrintfType : Format -> Type
Letting idris fill out a skeleton of the body for us yields:
PrintfType : Format -> Type
PrintfType (Number prefix' format) = ?PrintfType_rhs_1
PrintfType (Str prefix' format) = ?PrintfType_rhs_2
PrintfType (End prefix') = ?PrintfType_rhs_3
When we encounter a Number
we know we expect an Int
followed by rest of the type, and when we encounter a Str
we know we expect a String
followed by rest of the type. When we are at the End
, we expect a final formatted String
:
PrintfType : Format -> Type
PrintfType (Number prefix' format) = Int -> PrintfType format
PrintfType (Str prefix' format) = String -> PrintfType format
PrintfType (End prefix') = String
We can omit the prefix'
because it’s not used anywhere for evaluating PrintfType
:
PrintfType : Format -> Type
PrintfType (Number _ format) = Int -> PrintfType format
PrintfType (Str _ format) = String -> PrintfType format
PrintfType (End _) = String
Now we can finally fill in the ?printf_type
hole:
printf : (str : String) -> PrintfType (parseFormat str)
If str
is "Hello!"
then PrintfType (parseFormat str)
is String
. If str
is "Hello %d %s"
then PrintfType (parseFormat str)
is Int -> String -> String
. Makes sense.
We finally need to write a function that builds the formatted string based on the format and arguments passed in. Its type will be (format : Format) -> PrintfType format
:
printfFormat : (format : Format) -> PrintfType format
printfFormat (Number prefix' format) = ?printfFormat_rhs_1
printfFormat (Str prefix' format) = ?printfFormat_rhs_2
printfFormat (End prefix') = ?printfFormat_rhs_3
Idris tells us the following holes have types:
printfFormat_rhs_1 : Int -> PrintfType format
printfFormat_rhs_2 : String -> PrintfType format
printfFormat_rhs_3 : String
If we look back, this makes sense because it matches our definition of PrintfType
. The first two particularly are functions that still need to accept an input. We will need to also pass in an additional accumulator argument to build the formatted string:
printfFormat : (format : Format) -> String -> PrintfType format
printfFormat (Number prefix' format) acc = ?printfFormat_rhs_1
printfFormat (Str prefix' format) acc = ?printfFormat_rhs_2
printfFormat (End prefix') acc = ?printfFormat_rhs_3
Now we can fill in the holes to satisfy the types:
printfFormat : (format : Format) -> String -> PrintfType format
printfFormat (Number prefix' format) acc =
-- show num converts num from Int -> String
-- idea is: accumulator concatenated with prefix concatenated with argument value
\num => printfFormat format (acc ++ prefix' ++ show num)
printfFormat (Str prefix' format) acc =
\str => printfFormat format (acc ++ prefix' ++ str)
printfFormat (End prefix') acc = acc ++ prefix'
We can build an additional function that calls printfFormat
with an initial empty accumulator:
printfFormatter : (format : Format) -> PrintfType format
printfFormatter format = printfFormat format ""
And then finally implement printf
:
printf : (str : String) -> PrintfType (parseFormat str)
printf str = printfFormatter (parseFormat str)
Let us test several cases in the REPL:
*printf_ex> printf "hello world"
"hello world" : String
*printf_ex> printf "hello world %d" 5
"hello world 5" : String
*printf_ex> printf "hello world %s %d" "okay" 3
"hello world okay 3" : String
*printf_ex> printf "hello world %s %d" 5 3
(input):1:8:String is not a numeric type
*printf_ex> printf "hello world %d" 5 3
builtin:Type mismatch between
String (Type of printf "hello world %d" 5)
and
_ -> _ (Expected type)
*printf_ex> printf "hello world %d"
\num => prim__concat (prim__concat "" (prim__concat "hello world " (prim__toStrInt num))) "" : Int -> String
In the last case, we retrieved a partially evaluated function that still needs to take an Int
as input to produce String
. What is neat here is that our usage of printf
is validated at compile-time – we are not allowed to pass an incorrect number of arguments or incorrect types!
To show off validation at runtime where the format string is unknown at compile time, let us attempt to write a function that takes a format String
containing a single %d
specifier as input and returns a formatted string by applying a single known integer value:
formatSingleNumber : String -> String
formatSingleNumber str = printf str 999
As expected, we get a compile error because Idris has no way of knowing the input string really only contains a single number format specifier:
printf str does not have a function type (PrintfType (parseFormat str))
Idris is telling us we did not supply the expected PrintfType (parseFormat str)
type. Idris cannot evaluate this type without knowing what its dependent values are – and it needs to know them at compile time if we are to evaluate a call to printf
. We need to prove that the format contains a single Number
. We will change the type signature to return a Maybe String
type, which returns Just String
when the format can be matched and Nothing
when the format cannot be matched to a single number specifier:
formatSingleNumber : String -> Maybe String
formatSingleNumber str =
let format = parseFormat str in
case format of
-- we don't care about what the prefix values are so we will sub them with _
(Number _ (End _)) => Just (printfFormatter format 999)
_ => Nothing
Unfortunately, we still get a similar compile error:
printfFormatter format does not have a function type (PrintfType format)
Why can Idris still not evaluate PrintfType format
?
Idris turns out to be strict as part of being used for theorem proving and expressing equality relationships between data. The issue here particularly is that the format
we are passing to printfFormatter
is not proven to be equal in value to the (Number _ (End _))
pattern we matched against as far as the type system is concerned. In this case, we can resolve this simply by referring to the number format in our pattern match:
formatSingleNumber : String -> Maybe String
formatSingleNumber str =
let format = parseFormat str in
case format of
-- we don't care about what the prefix values are so we will sub them with _
-- using @ we can refer to the entire matched value
numberFormat@(Number _ (End _)) =>
Just (printfFormatter numberFormat 999)
_ => Nothing
And that is it. This function could be used for example to read a format string from a localization file (ignoring pluralization rules) and apply a single number to it if the format string is valid. We can test our function in the REPL:
*printf_ex> formatSingleNumber "Ponies: %d"
Just "Ponies: 999" : Maybe String
*printf_ex> formatSingleNumber "Ponies: %d %d"
Nothing : Maybe String
*printf_ex> formatSingleNumber "Ponies: %s"
Nothing : Maybe String
Dependent types allow us to implement a type-safe printf function in an elegant way, allowing the type of printf
depend on its input. In showing this, we also demonstrated how type holes and Idris’ interactivity support help us writing functions.
I wrote this in the spirit of helping me solidify experience in learning the language. If you are interested in learning more in better detail, I strongly suggest checking out Type-Driven Development with Idris which is what I will continue to be reading. My printf implementation (full gist) is derived from there, albeit with my own twist.