Type-safe printf in Idris

Dec 20, 2017

Recently I have been learning Idris: a functional language similar to Haskell with dependent types added to the mix. Writing a type-safe printf formatter function turns out to be a great exercise of how a language with dependent types can be utilized. I will show how writing our own printf implementation can be type-safe at compile and runtime.

Type-safety

If you are familiar with printf in other languages, you may know that printf takes a variable number of arguments of varying types matching to corresponding format specifiers inside a format string:

printf("Hello %s -- your age is: %d\n", userName, age)

In this case, the first argument userName needs to be a string (specified by %s), and the second argument age needs to be an integer (specified by %d) in that order. For printf to be type-safe, providing an incorrect number of arguments or types that do not match the corresponding format specifiers must be rejected.

While compilers for some languages may hardcode the logic for making such formatter functions type-safe, Idris allows programmers to write the type safety validation as part of the function’s implementation.

Beginning Holes

We will focus on writing a printf function that returns a resulting formatted String rather than writing contents to a file stream as is the case in other languages. Let us begin implementing our printf:

-- Type annotation of printf
printf : (str : String) -> ?printf_type
-- Implementation of printf
printf str = ?printf_rhs

Here we are beginning to define and implement a function named printf that takes a String that we bind to str name as an argument. That is all we really know currently.

Names prefixed with ? are type holes – we have not determined what values to insert yet but the compiler will allow us to insert holes that enables us to type-check the rest of our program. In the case of printf_type, we do not yet know what additional types (which themselves are values of type Type) printf may need to accept before producing an output.

We do have some intuition regarding printf's type though. If we specify a formatter string like "Hello world!" then ?printf_type should be String. On the other hand if we specify a formatter string like "%d coins" then ?printf_type should be Int -> String indicating that an additional Int argument is required before outputting a final formatted String output.

We can see that the type of printf depends on the value of the input str – this a case for dependent types!

Parsing Format

Let us leave printf for now and start with parsing through a list of characters from a format string:

parse : List Char -> ?parse_type

(Note unlike in Haskell, List Char is not the same as String in Idris)

We are not sure what we want to return yet so we will insert ?parse_type for now. Using interactive editing support, Idris can fill out a skeleton for the body:

parse : List Char -> ?parse_type
parse [] = ?parse_rhs_1
parse (x :: xs) = ?parse_rhs_2

Two cases have been filled out for us: one where the list of characters is empty, and another where we can analyze the head element being concatenated to the tail of the list. We will add two additional cases for handling %d and %s occurring in our list:

parse : List Char -> ?parse_type
parse [] = ?parse_rhs_1
parse ('%' :: 'd' :: xs) = ?parse_rhs_3
parse ('%' :: 's' :: xs) = ?parse_rhs_4
parse (x :: xs) = ?parse_rhs_2

(Note Idris will try to pattern match these cases on the passed argument from the order they are defined – from top to bottom)

Let us determine what ?parse_type should be – we will define a data structure representing our format succinctly:

data Format = Number Format | Str Format | End

The three constructors for this Format data structure are a Number which takes a Format argument recursively, a Str which takes a Format argument recursively, and lastly End which denotes the end of the format.

Parsing "%d%s%d" for example should result in Number (Str (Number End)).

We still have to include string literals in our Format structure for when we have to build our formatted string. We will achieve this by adding our own Prefix (synonym for String) to each constructor:

-- A type level function that takes no input and outputs String which is of type Type
Prefix : Type
Prefix = String

data Format = Number Prefix Format | Str Prefix Format | End Prefix

So now we will want to parse "%d points for %s." to Number "" (Str " points for " (End ".")).

(For simplicity, we are not creating a distinct constructor for string literals because only format specifiers affect the type of printf, and we may not care about string literals when validating specific format structures as shown later when performing runtime validation)

We can now begin filling in holes to parse:

parse : List Char -> Format
parse [] = End ?end_prefix
parse ('%' :: 'd' :: xs) = Number ?number_prefix (parse xs)
parse ('%' :: 's' :: xs) = Str ?str_prefix (parse xs)
parse (x :: xs) = parse xs

We left some holes for filling out the prefix. We will need to pass and build an accumulator to keep track of the current prefix:

parse : List Char -> String -> Format
parse [] prefix_acc = End prefix_acc
parse ('%' :: 'd' :: xs) prefix_acc = Number prefix_acc (parse xs "")
parse ('%' :: 's' :: xs) prefix_acc = Str prefix_acc (parse xs "")
parse (x :: xs) prefix_acc = parse xs (prefix_acc ++ ?to_str x)

If we parse a Number or Str we reset the prefix accumulator to "" upon the next recursive call. We left a ?to_str hole for converting x : Char to a String. We can use a built in function strCons : Char -> String -> String to fill this:

parse (x :: xs) prefix_acc = parse xs (prefix_acc ++ strCons x "")

We will also create an additional function parseFormat calling parse with an empty prefix accumulator and using the built-in unpack function to convert from String -> List Char.

parseFormat : String -> Format
parseFormat xs = parse (unpack xs) ""

Writing Printf Type

Back to filling the ?printf_type hole in printf, we now know how to go from String -> Format, so we now need to figure out how to go from Format -> Type. We can write such a type level function:

PrintfType : Format -> Type

Letting idris fill out a skeleton of the body for us yields:

PrintfType : Format -> Type
PrintfType (Number prefix' format) = ?PrintfType_rhs_1
PrintfType (Str prefix' format) = ?PrintfType_rhs_2
PrintfType (End prefix') = ?PrintfType_rhs_3

When we encounter a Number we know we expect an Int followed by rest of the type, and when we encounter a Str we know we expect a String followed by rest of the type. When we are at the End, we expect a final formatted String:

PrintfType : Format -> Type
PrintfType (Number prefix' format) = Int -> PrintfType format
PrintfType (Str prefix' format) = String -> PrintfType format
PrintfType (End prefix') = String

We can omit the prefix' because it’s not used anywhere for evaluating PrintfType:

PrintfType : Format -> Type
PrintfType (Number _ format) = Int -> PrintfType format
PrintfType (Str _ format) = String -> PrintfType format
PrintfType (End _) = String

Now we can finally fill in the ?printf_type hole:

printf : (str : String) -> PrintfType (parseFormat str)

If str is "Hello!" then PrintfType (parseFormat str) is String. If str is "Hello %d %s" then PrintfType (parseFormat str) is Int -> String -> String. Makes sense.

Building the String

We finally need to write a function that builds the formatted string based on the format and arguments passed in. Its type will be (format : Format) -> PrintfType format:

printfFormat : (format : Format) -> PrintfType format
printfFormat (Number prefix' format) = ?printfFormat_rhs_1
printfFormat (Str prefix' format) = ?printfFormat_rhs_2
printfFormat (End prefix') = ?printfFormat_rhs_3

Idris tells us the following holes have types:

printfFormat_rhs_1 : Int -> PrintfType format
printfFormat_rhs_2 : String -> PrintfType format
printfFormat_rhs_3 : String

If we look back, this makes sense because it matches our definition of PrintfType. The first two particularly are functions that still need to accept an input. We will need to also pass in an additional accumulator argument to build the formatted string:

printfFormat : (format : Format) -> String -> PrintfType format
printfFormat (Number prefix' format) acc = ?printfFormat_rhs_1
printfFormat (Str prefix' format) acc = ?printfFormat_rhs_2
printfFormat (End prefix') acc = ?printfFormat_rhs_3

Now we can fill in the holes to satisfy the types:

printfFormat : (format : Format) -> String -> PrintfType format
printfFormat (Number prefix' format) acc =
    -- show num converts num from Int -> String
    -- idea is: accumulator concatenated with prefix concatenated with argument value
    \num => printfFormat format (acc ++ prefix' ++ show num)
printfFormat (Str prefix' format) acc =
    \str => printfFormat format (acc ++ prefix' ++ str)
printfFormat (End prefix') acc = acc ++ prefix'

We can build an additional function that calls printfFormat with an initial empty accumulator:

printfFormatter : (format : Format) -> PrintfType format
printfFormatter format = printfFormat format ""

And then finally implement printf:

printf : (str : String) -> PrintfType (parseFormat str)
printf str = printfFormatter (parseFormat str)

Let us test several cases in the REPL:

*printf_ex> printf "hello world"
"hello world" : String

*printf_ex> printf "hello world %d" 5
"hello world 5" : String

*printf_ex> printf "hello world %s %d" "okay" 3
"hello world okay 3" : String

*printf_ex> printf "hello world %s %d" 5 3
(input):1:8:String is not a numeric type

*printf_ex> printf "hello world %d" 5 3
builtin:Type mismatch between
     String (Type of printf "hello world %d" 5)
and
     _ -> _ (Expected type)

*printf_ex> printf "hello world %d"
\num => prim__concat (prim__concat "" (prim__concat "hello world " (prim__toStrInt num))) "" : Int -> String

In the last case, we retrieved a partially evaluated function that still needs to take an Int as input to produce String. What is neat here is that our usage of printf is validated at compile-time – we are not allowed to pass an incorrect number of arguments or incorrect types!

Runtime Validation

To show off validation at runtime where the format string is unknown at compile time, let us attempt to write a function that takes a format String containing a single %d specifier as input and returns a formatted string by applying a single known integer value:

formatSingleNumber : String -> String
formatSingleNumber str = printf str 999

As expected, we get a compile error because Idris has no way of knowing the input string really only contains a single number format specifier:

printf str does not have a function type (PrintfType (parseFormat str))

Idris is telling us we did not supply the expected PrintfType (parseFormat str) type. Idris cannot evaluate this type without knowing what its dependent values are – and it needs to know them at compile time if we are to evaluate a call to printf. We need to prove that the format contains a single Number. We will change the type signature to return a Maybe String type, which returns Just String when the format can be matched and Nothing when the format cannot be matched to a single number specifier:

formatSingleNumber : String -> Maybe String
formatSingleNumber str =
    let format = parseFormat str in
    case format of
        -- we don't care about what the prefix values are so we will sub them with _
        (Number _ (End _)) => Just (printfFormatter format 999)
        _ => Nothing

Unfortunately, we still get a similar compile error:

printfFormatter format does not have a function type (PrintfType format)

Why can Idris still not evaluate PrintfType format?

Idris turns out to be strict as part of being used for theorem proving and expressing equality relationships between data. The issue here particularly is that the format we are passing to printfFormatter is not proven to be equal in value to the (Number _ (End _)) pattern we matched against as far as the type system is concerned. In this case, we can resolve this simply by referring to the number format in our pattern match:

formatSingleNumber : String -> Maybe String
formatSingleNumber str =
    let format = parseFormat str in
    case format of
        -- we don't care about what the prefix values are so we will sub them with _
        -- using @ we can refer to the entire matched value
        numberFormat@(Number _ (End _)) =>
            Just (printfFormatter numberFormat 999)
        _ => Nothing

And that is it. This function could be used for example to read a format string from a localization file (ignoring pluralization rules) and apply a single number to it if the format string is valid. We can test our function in the REPL:

*printf_ex> formatSingleNumber "Ponies: %d"
Just "Ponies: 999" : Maybe String

*printf_ex> formatSingleNumber "Ponies: %d %d"
Nothing : Maybe String

*printf_ex> formatSingleNumber "Ponies: %s"
Nothing : Maybe String

Conclusion

Dependent types allow us to implement a type-safe printf function in an elegant way, allowing the type of printf depend on its input. In showing this, we also demonstrated how type holes and Idris’ interactivity support help us writing functions.

I wrote this in the spirit of helping me solidify experience in learning the language. If you are interested in learning more in better detail, I strongly suggest checking out Type-Driven Development with Idris which is what I will continue to be reading. My printf implementation (full gist) is derived from there, albeit with my own twist.