- In a modular/reusable/extensible way
- In a self-documenting way
- Generate user documentation and parser from the same source

- We need to provide a main program,
`main :: IO ()`

- We can use
`getArgs :: IO [String]`

- We have functions to implement the desired functionality:
`ls :: Bool -> [FilePath] -> IO ()`

- How to bridge the gap?

`silly date`

Wed Feb 9 21:41:37 PST 2005

`silly bla`

Expected one of: cat, ls, date, help Found: bla

`silly help`

Usage: cat {<filename>} -- concatenate files ls [-l] {<filename>} -- list files (long) date -- print current date and time help -- show usage

- Documentation
Usage: cat {<filename>} -- concatenate files ls [-l] {<filename>} -- list files (long) date -- print current date and time help -- show usage

- Implementation
silly = cmd "cat" cat <@ files :-- "concatenate files" ! cmd "ls" ls <@ flag "-l" <@ files :-- "list files (long)" ! cmd "date" date :-- "print current date and time" ! cmd "help" help :-- "show usage"

main = run silly silly = cmd "cat" cat <@ files :-- "concatenate files" ! cmd "ls" ls <@ flag "-l" <@ files :-- "list files (long)" ! cmd "date" date :-- "print current date and time" ! cmd "help" help :-- "show usage"

- Plus the grammar for
`files`

...`files :: P [FilePath]`

`cat :: [FilePath] -> IO ()`

`ls :: Bool -> [FilePath] -> IO ()`

`date :: IO ()`

`help :: IO ()`

type P a nil :: a -> P a token :: (String->Maybe a) -> P a (!) :: P a -> P a -> P a (<@) :: P (a->b) -> P a -> P b parse :: P a -> [String] -> Either ErrorMessage a

- Note: the command line is of type [String], so the token type is String

type P a nil :: a -> P a token :: (String->Maybe a) -> String -> P a (!) :: P a -> P a -> P a (<@) :: P (a->b) -> P a -> P b (:--) :: P a -> String -> P a parse :: P a -> [String] -> Either ErrorMessage a usage :: P a -> String

- (New things highlighted)

cmd :: String -> a -> P a flag :: String -> P Bool run :: P (IO a) -> IO a kw :: String -> P () many :: P a -> P [a] some :: P a -> P [a] opt :: P a -> P (Maybe a) (#@) :: (a->b) -> P a -> P b

- We saw the first three in the example

- Possible alternatives
`ap :: P (a->b) -> P a -> P b`

`>>= :: P a -> (a->P b) -> P b`

`pair :: P a -> P b -> P (a,b)`

- Monadic parsing combinators?
- No!
- With >>= you can do more than context-free grammars.
- Static analysis of the grammar (extracting the documentation, for example) becomes difficult...

- Backtracking parsers:
`type P a = String -> Maybe (a,String)`

- How to turn a failure into a list of successes:
`type P a = String -> [(a,String)]`

- More sophisticated implemtations are needed
- to get good error messages
- for efficiency (e.g. to avoid space leaks)

- How to extract documentation?

- Command line grammars fairly simple
- Not recursive: can extract grammar without any tricks

- The commands that we parse are not particularly long
- Efficiency is not a problem: implementation can be kept simple

- To be able to extract the grammar, implement all combinators as constructors in a data type!
data P a where Nil :: a -> P a Ap :: P (b->a) -> P b -> P a Token :: (String->Maybe a) -> String -> P a (:--) :: P a -> String -> P a (:!) :: P a -> P a -> P a Many :: P a -> P [a] ...

- But...

*Existential quantification*is needed for the type of`Ap`

.*Indexed families of types*(aka*GADTs*) are needed for the type of`Many`

data P a where ... Ap :: P (b->a) -> P b -> P a ... Many :: P a -> P [a] ...

data P res = Nil res | forall arg . Ap (P (arg->res)) (P arg) | P res :-- String | Token (String->Maybe res) String | P res :! P res | forall item . Many (res->[item]) ([item]->res) (P item) many :: P a -> P [a] many = Many id id

- The extra arguments to
`Many`

express the fact that the result type is expected to be isomorphic to some list type.

`P a`

`parse :: P a -> [String] -> Either ErrorMessage a`

- Traverse and translate to conventional parsing combinators

`usage :: P a -> String`

- Traverse and pretty print

- Because of the existential quantification in the data type,
*polymorphic recursion*is required to type these traversal functions, so*type signatures*have to be given explicitly. - See the source code.

- a traditional combinator parser, and
- a representation of the grammar that can be printed:
`data P a = P Grammar (PM a)`

`data Grammar = ...`

- No need for type system extensions. Pure Haskell 98 is enough!
- See the source code for this solution.

- Supports the traditional, fairly flexible option syntax, but seems more limited otherwise
- Monolithic
- Originates in a language (C) that doesn't lead the thoughts to parsing combinators...

- Similar to what I did with marshalling for the FFI in 2001
- Less work for the programmer
- More complex implementation?
- How to add documentation?

- Implemented but not described today: support for named nonterminals
- Lessons learnt: having access to fancy language features doesn't make you a better programmer!
- Related work and links
- Introspective parsing combinators by Arthur I. Baars and S. Doaitse Swierstra: (Type Safe, Self Inspecting Code, HW2004)
- Source code for the combinators and examples in this presentation.
- Haskell Tools from the Programatica Project

- Prettier printing of grammars
- Add a combinator to accept options in arbitrary order
- Extract documentation from
*recursive*grammars - Convenient support for extra checks that requires access to an underlying monad. (E.g. checking that a file exists.)
- Report ambiguities in the grammar
- ...