Idris 2 语言文档
备注
Idris 2 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
有关 CC0 的更多信息,请访问:https://creativecommons.org/publicdomain/zero/1.0/
Idris 2 速成课程
这是一个Idris 2的速成课程(有点像教程,但恐怕没有那么温和!)。它提供了一个关于Idris语言编程的简要介绍。它涵盖了核心的语言功能,假设读者有一些现有函数式编程语言的经验,如Haskell或OCaml。
这是从 Idris 1 教程中修订和更新的。关于自 Idris 1 以来的变化细节,详见 自 Idris 1 以来的变化。
备注
Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/
介绍
在传统的编程语言中,类型 和 值 之间有明显的区别。例如,在 Haskell ,以下是类型,分别代表整数、字符、字符列表和任意值的列表:
Int
,Char
,[Char]
,[a]
相应地,以下值是这些类型的成员的示例:
42
,’a’
,"Hello world!"
,[2,3,4,5,6]
然而,在具有*依赖类型*的语言中,它们的区别不太明显。依赖类型允许类型“依赖”值——换句话说,类型是*一等*语言结构,可以像任何其他值一样被操作。标准示例是给定长度的列表类型 1, Vect n a
,其中 a
是元素类型, n
是列表的长度,且可以任意长。
当类型可以包含值,并且这些值描述属性时,例如一个列表的长度,函数的类型就可以开始描述它自己的属性。以两个列表的连接为例。这个操作的属性是:结果列表的长度是两个输入列表的长度之和。因此,我们可以给 app
函数提供以下类型,它用于连接向量:
app : Vect n a -> Vect m a -> Vect (n + m) a
本教程介绍了Idris,一种具有依赖类型的通用函数式编程语言。Idris 项目的目标是建立一个适用于可验证的通用编程的依赖类型语言。为此,Idris 被设计为一种编译语言,旨在生成高效的可执行代码。同时它还有一个轻量级的外部函数接口,允许与外部库轻松互动。
目标受众
本教程旨在作为该语言的简要介绍,并针对已经熟悉函数式语言的读者,如 Haskell 或 OCaml 。特别是假设对 Haskell 语法有一定程度的熟悉,尽管大多数概念至少会被简单地解释。同时还假设读者对使用依赖类型来编写和验证软件有一定的兴趣。
关于Idris的更深入的介绍,它以更慢的速度进行,涵盖了交互式程序开发,也有更多的示例,见 Type-Driven Development with Idris ,作者 Edwin Brady ,本书可从 Manning 获取。
示例代码
本教程包括一些示例代码,这些代码已经针对Idris 2进行了测试。这些文件与Idris 2发行版一起提供,所以你可以很容易地使用它们。它们可以在 samples
目录下找到。然而,强烈建议你自己输入它们,而不是简单地加载然后阅读。
脚注
- 1
通常,并且可能令人困惑的是,在依赖类型编程文献中称为“向量”。
入门
从源代码安装
Windows 的先决条件
MSYS2
要在 Windows 上构建 Idris 2 ,需要一个类似 Unix 的环境,用于构建过程中使用的所有工具。 MSYS2 为我们提供了这个环境。
下载最新版本的 MSYS2
运行安装程序。不要把它安装在程序文件下,因为它需要写文件(例如, “unix ” 主目录就在那下面)
在你安装 MSYS2 的目录中,找到文件
mingw64.ini
,并添加行MSYS2_PATH_TYPE=inherit
。这样就把 windows PATH 加入到 MSYS2 的 shell 中。启动 MSYS2 (点击 mingw64.exe ,因为开始菜单中的图标不会从 ini 中获取 MSYS2_PATH_TYPE ,但它可以被添加到系统设置中)
通过
pacman -Syu
更新安装最新版本安装构建所需的程序:
$ pacman -S make mingw-w64-x86_64-gcc
Chez Scheme
Chez Scheme 在 GitHub 上有一个现成的安装程序
下载安装程序并运行它,不要安装在有空格的路径中,目前 Idris2 对空格有问题。
将64位方案添加到 PATH 中。它是
\bin\ta6nt
子目录,是安装 Chez Scheme 的地方。因此,如果你使用 “C:Chez”,它将在C:\Chez\bin\ta6nt
构建
启动一个新的 MSYS2 shell ,让它知道你修改过的 PATH (使用 Mingw64 来获得正确的编译器是很重要的)。
导航到Idris2目录。
设置 Idris2 需要的 SCHEME 环境变量
export SCHEME=scheme
。这可以在 bash 配置文件或 Windows 设置中永久设置。现在
make bootstrap && make install
应该建立 Idris2 并安装在home/<username>/.idris2/bin
在你的 MSYS2 安装下。如果你把它添加到 Windows 设置的 PATH 中,它将可以从你打开的任何命令行(包括 Powershell 或 DOS )使用。
前置条件
因为Idris 2 是由 Idris 2 自身实现的,所以要启动它,你可以从生成的 Scheme 源码开始构建。要做到这一点,你需要 Chez Scheme (默认的,目前首选,因为它是最快的)或 Racket 。你可以从以下地方获得其中之一:
两者都可以从 MacPorts/Homebrew 和所有主要的 Linux 发行版获得。Windows 需要一些进一步的先决条件,详详见 Windows 的先决条件。
注意 :如果你从源文件安装 Chez Scheme,在本地构建它的时候,确保你运行 ./configure --threads
来构建多线程支持。
下载和安装
你可以从 Idris网站 下载Idris 2源代码,或者从Github上的 idris-lang/Idris2 获得最新的开发版本。 这包括Idris 2的源代码和由此产生的 Scheme 代码。 一旦你解压了源代码,你可以按以下方式安装它:
make bootstrap SCHEME=chez
其中 chez 是 Chez Scheme 编译器的可执行名称。这因系统而异,但通常是 scheme
, chezscheme
,或 chezscheme9.5
中的一种。如果你是通过 Racket 构建的,你可以按以下方式安装它:
make bootstrap-racket
一旦你用上述任何一个命令成功启动,你就可以用 make install
命令进行安装。 默认情况下,这将安装到 ${HOME}/.idris2
。你可以通过编辑 config.mk
中的选项来改变这个。例如,要安装到 /usr/local
,你可以编辑 IDRIS2_PREFIX
,如下所示:
IDRIS2_PREFIX ?= /usr/local
从包管理器安装
使用 Homebrew 进行安装
如果你是 Homebrew 用户,你可以通过运行以下命令安装 Idris 2 和所有的依赖:
brew install idris2
检查安装
为了检查安装是否成功,并编写你的第一个 Idris 程序,创建一个名为 hello.idr
的文件,并包含以下文本:
module Main
main : IO ()
main = putStrLn "Hello world"
如果你熟悉 Haskell ,应该相当清楚这个程序在做什么以及如何工作,如果不熟悉,我们将在后面解释细节。你可以通过在 shell 提示符下输入 idris2 hello.idr -o hello
来将程序编译成可执行文件。默认情况下,这将创建一个名为 hello
的可执行程序,它将调用一个生成和编译的 Chez Scheme 程序,在目标目录 build/exec
中,你可以运行它:
$ idris2 hello.idr -o hello
$ ./build/exec/hello
Hello world
请注意,美元符号 $
表示 shell 提示! Idris 命令的一些有用的选项是:
-o prog
编译成可执行文件,名为prog
。--check
文件类型检查和它的依赖关系,而不启动交互式环境。--package pkg
添加软件包为依赖项,例如--package contrib
表示使用 contrib 包。--help
显示使用摘要和命令行选项。
你可以在 编译为可执行文件 一节中找到更多关于编译成可执行文件的信息。
交互式环境
在 shell 提示符下输入 idris2
,就会启动交互式环境。你应该看到类似下面的内容:
$ idris2
____ __ _ ___
/ _/___/ /____(_)____ |__ \
/ // __ / ___/ / ___/ __/ / Version 0.5.1
_/ // /_/ / / / (__ ) / __/ https://www.idris-lang.org
/___/\__,_/_/ /_/____/ /____/ Type :? for help
Welcome to Idris 2. Enjoy yourself!
Main>
这给出了一个 ghci
风格的界面,允许对表达式进行求值以及类型检查;定理证明、编译;编辑;以及其他各种操作。命令 :?
给出了一个支持的命令列表。下面,我们看到一个运行的例子,其中 hello.idr
被加载, main
的类型被检查,然后程序被编译为可执行文件 hello
,可在目标目录 build/exec/
中获得。对文件进行类型检查,如果成功的话,会创建一个文件的字节码版本(在本例中是 build/ttc/hello.ttc
),以加快未来的加载速度。如果源文件发生变化,则重新生成字节码。
$ idris2 hello.idr
____ __ _ ___
/ _/___/ /____(_)____ |__ \
/ // __ / ___/ / ___/ __/ / Version 0.5.1
_/ // /_/ / / / (__ ) / __/ https://www.idris-lang.org
/___/\__,_/_/ /_/____/ /____/ Type :? for help
Welcome to Idris 2. Enjoy yourself!
Main> :t main
Main.main : IO ()
Main> :c hello main
File build/exec/hello written
Main> :q
Bye for now!
类型和函数
原语类型
Idris 定义了几个原语类型。 Int
, Integer
和 Double
用于数字操作, Char
和 String
用于文本操作,以及 Ptr
表示外来指针。库中还声明了几种数据类型,包括 Bool
,其值为 True
和 False
。我们可以用这些类型声明一 些常量。在文件 Prims.idr
中输入以下内容,并通过输入 idris2 Prims.idr
将其加载到 Idris 交互环境中:
module Prims
x : Int
x = 94
foo : String
foo = "Sausage machine"
bar : Char
bar = 'Z'
quux : Bool
quux = False
一个 Idris 文件由一个可选的模块声明(这里是 module Prims
)组成,后面是可选的导入列表和声明与定义的集合。在这个例子中没有指定导入。然而 Idris 程序可以由几个模块组成,每个模块的定义都有自己的命名空间。这将在 模块和命名空间 部分进一步讨论。当编写 Idris 程序时,定义的顺序和缩进都很重要。函数和数据类型必须在使用前定义,顺便说一下,每个定义都必须有一个类型声明,例如上面列表中的 x : Int
, foo : String
,。新的声明必须从与前一个声明相同的缩进层次开始。或者用分号 ;
来终止声明。
库模块 prelude
会被每个 Idris 程序自动导入,包括 IO 、算术、数据结构和各种常用函数的设施。preclude 模块定义了几个算术和比较运算符,我们可以在提示符下使用。在提示符下对事物进行求值会得到一个答案,例如:
Prims> 13+9*9
94 : Integer
Prims> x == 9*9+13
True
所有常见的算术和比较运算符都是被定义为原语类型的。它们通过接口被重载,我们将在 接口 一节中讨论,并且可以被扩展到用户定义的类型上工作。例如,布尔表达式可以用 if...then...else
构建来测试:
*prims> if x == 8 * 8 + 30 then "Yes!" else "No!"
"Yes!"
数据类型
数据类型的声明方式和语法与 Haskell 类似。例如,自然数和列表可以被声明如下:
data Nat = Z | S Nat -- Natural numbers
-- (zero and successor)
data List a = Nil | (::) a (List a) -- Polymorphic lists
数据类型名称不能以小写字母开头(我们将在后面看到为什么不可以!)。 上面的声明来自标准库。一进制自然数可以是零 (Z
),也可以是另一个自然数的后继者 (S k
)。列表可以是空的 (Nil
),也可以是添加到另一个列表前面的值 (x :: xs
)。在 List
的声明中,我们使用了一个 infix 运算符 ::
。像这样的新运算符可以使用缀序声明来添加,如下所示:
infixr 10 ::
函数、数据构造器和类型构造器都可以被赋予 infix 运算符作为名称。它们可以以前缀的形式使用,如果用括号括起来,例如: (::)
。中缀运算符可以使用任何符号:
:+-*\/=.?|&><!@$%^~#
一些由这些符号构建的运算符不能被用户定义。这些是
%
, \
, :
, =
, |
, |||
, <-
, ->
, =>
, ?
,
!
, &
, **
, ..
函数
函数是通过模式匹配实现的,同样使用与 Haskell 类似的语法。主要的区别是 Idris 要求所有函数的类型声明使用单冒号 :
(而不是Haskell的双冒号 ::
)。一些自然数算术函数可以定义如下,同样取自标准库:
-- Unary addition
plus : Nat -> Nat -> Nat
plus Z y = y
plus (S k) y = S (plus k y)
-- Unary multiplication
mult : Nat -> Nat -> Nat
mult Z y = Z
mult (S k) y = plus y (mult k y)
标准的算术运算符 +
和 *
也被重载,供 Nat
使用,并用上述函数实现。与 Haskell 不同,对函数名是否必须以大写字母开头没有任何限制。函数名 (上面的 plus
和 mult
)、数据构造函数 ( Z
, S
, Nil
和 ::
) 和类型构造函数 ( Nat
和 List
) 都属于同一个命名空间。然而,根据惯例,数据类型和构造函数名称通常以大写字母开头。我们可以在 Idris 提示下测试这些函数:
Main> plus (S (S Z)) (S (S Z))
4
Main> mult (S (S (S Z))) (plus (S (S Z)) (S (S Z)))
12
和算术运算一样,整数字面量也是使用接口重载的,这意味着我们也可以按如下方式测试函数:
Idris> plus 2 2
4
Idris> mult 3 (plus 2 2)
12
顺便说一下,你可能会想知道,既然我们的计算机已经内置了完美的整数运算,为什么我们还有一进制自然数。原因主要是一进制自然数有一个非常方便的结构,容易推理,而且容易与其他数据结构联系起来,我们将在后面看到。尽管如此,我们并不希望这种方便是以牺牲效率为代价的。幸运的是, Idris 知道 Nat
(和类似的结构化类型)和数字之间的关系。这意味着它可以优化表示,以及诸如 plus
和 mult
等函数。
where
从句
也可以使用 where
从句在 本地 定义函数。例如,为了定义一个反转列表的函数,我们可以使用一个辅助函数来累积新的反转后的列表,而这个函数不需要全局可见:
reverse : List a -> List a
reverse xs = revAcc [] xs where
revAcc : List a -> List a -> List a
revAcc acc [] = acc
revAcc acc (x :: xs) = revAcc (x :: acc) xs
缩进是很重要的 – where
块中的函数必须比外部函数有更深的缩进层次。
备注
作用域
任何在外层作用域中可见的名字在 where
从句中也是可见的(除非它们被重新定义,例如这里的 xs
)。在类型声明中出现的名字也将在 where
从句的作用域内。
除了函数, where
块也可以包括本地数据类型声明,比如下面的 MyLT
在 foo
的定义之外不能访问:
foo : Int -> Int
foo x = case isLT of
Yes => x*2
No => x*4
where
data MyLT = Yes | No
isLT : MyLT
isLT = if x < 20 then Yes else No
在 where
从句中定义的函数需要一个类型声明,就像任何顶层函数一样。下面是另一个例子,说明这在实践中是如何工作的:
even : Nat -> Bool
even Z = True
even (S k) = odd k where
odd : Nat -> Bool
odd Z = False
odd (S k) = even k
test : List Nat
test = [c (S 1), c Z, d (S Z)]
where c : Nat -> Nat
c x = 42 + x
d : Nat -> Nat
d y = c (y + 1 + z y)
where z : Nat -> Nat
z w = y + w
完全性和覆盖性
默认情况下,Idris 的函数必须是 covering
。也就是说,必须有涵盖输入类型的所有可能值的模式。例如,下面的定义将给出一个错误:
fromMaybe : Maybe a -> a
fromMaybe (Just x) = x
这给出了一个错误,因为 fromMaybe Nothing
没有定义。Idris会输出报告:
frommaybe.idr:1:1--2:1:fromMaybe is not covering. Missing cases:
fromMaybe Nothing
你可以用 partial
注解来忽略这一警告。
partial fromMaybe : Maybe a -> a
fromMaybe (Just x) = x
然而,这并不可取,一般来说,你只应该在函数的初始开发过程中,或者在调试过程中这样做。 如果你试图在运行时对 fromMaybe Nothing
进行求值,你将得到一个运行时错误。
孔
Idris programs can contain holes which stand for incomplete parts of programs. For example, we could leave a hole for the greeting in our “Hello world” program:
main : IO ()
main = putStrLn ?greeting
语法 ?greeting
引入了一个孔,它代表程序中尚未编写的一部分。这是一个有效的I dris 程序,你可以检查 greeting
的类型:
Main> :t greeting
-------------------------------------
greeting : String
检查一个孔的类型也会显示作用域内任何变量的类型。例如,给定一个不完整的定义 even
:
even : Nat -> Bool
even Z = True
even (S k) = ?even_rhs
我们可以检查 even_rhs
的类型,看到预期的返回类型,以及变量 k
的类型:
Main> :t even_rhs
k : Nat
-------------------------------------
even_rhs : Bool
孔的用途在于可以帮助我们 渐进的 写函数。与其一次写完整个函数,我们可以留下一些部分不写,Idris 会告诉我们完还需要完成哪些内容。
依赖类型
一等类型
在 Idris 中,类型是一类公民,意味着它们可以像其他语言结构一样被计算和操作(并传递给函数)。例如,我们可以写一个函数来计算一个类型:
isSingleton : Bool -> Type
isSingleton True = Nat
isSingleton False = List Nat
这个函数从一个 Bool
值计算出适当的类型,这个 Bool
值表示是否是一个单例。我们可以在任何可以使用类型的地方使用这个函数来计算一个类型。例如,它可以被用来计算一个返回类型:
mkSingle : (x : Bool) -> isSingleton x
mkSingle True = 0
mkSingle False = []
或者它可以用在输入类型上。以下函数计算 Nat
列表的总和,或返回给定的 Nat
,具体取决于单例标志是否为真:
sum : (single : Bool) -> isSingleton single -> Nat
sum True x = x
sum False [] = 0
sum False (x :: xs) = x + sum False xs
向量
依赖数据类型的一个标准例子是 “有长度的列表” 类型,在依赖类型文献中习惯上称为向量。它们作为 Idris 库的一部分,可以通过 Data.Vect
导入,或者我们可以像这样声明它们:
data Vect : Nat -> Type -> Type where
Nil : Vect Z a
(::) : a -> Vect k a -> Vect (S k) a
注意,我们使用了与 List
相同的构造函数名称。Idris 接受这样的临时名称重载,只要这些名称是在不同的命名空间(在实践中,通常是在不同的模块中)声明的。有歧义的构造函数名称通常可以通过不同的上下文来解决。
这声明了一个类型族,因此声明的形式与上面的简单类型声明相当不同。我们明确说明类型构造函数 Vect
的类型 – 它接受一个 Nat
和一个类型作为参数,其中 Type
代表类型的类型。我们说 Vect
是在 Nat
上建立 索引 的 ,并且通过 Type
参数化 。每个构造函数针对类型族的不同部分。 Nil
只能用来构造零长度的向量,而 ::
用来构造非零长度的向量。在 ::
的类型中,我们明确指出,一个类型为 a
的元素和一个类型为 Vect k a
的尾部(即一个长度为 k
的向量)组合成一个长度为 S k
的向量。
我们可以通过模式匹配的方式,在 Vect
这样的依赖类型上定义函数,就像在上面 List
和 Nat
这样的简单类型上一样。 Vect
上的函数的类型将描述涉及到的向量的长度会发生什么。例如,下面定义的 ++
用于链接两个 Vect
:
(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) Nil ys = ys
(++) (x :: xs) ys = x :: xs ++ ys
(++)
的类型指出,结果向量的长度将是输入长度的总和。如果我们把定义弄错了,使之不成立,Idris 将不接受这个定义。例如:
(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) Nil ys = ys
(++) (x :: xs) ys = x :: xs ++ xs -- BROKEN
当通过 Idris 类型检查器运行时,这将导致以下结果:
$ idris2 Vect.idr --check
1/1: Building Vect (Vect.idr)
Vect.idr:7:26--8:1:While processing right hand side of Main.++ at Vect.idr:7:1--8:1:
When unifying plus k k and plus k m
Mismatch between:
k
and
m
这个错误信息表明,两个向量之间存在长度不匹配 – 我们需要一个长度为 k + m
的向量,但提供了一个长度为 k + k
的向量。
有限集
有限集,顾名思义,是具有有限数量元素的集合。它作为Idris库的一部分,可以通过 Data.Fin
导入,或者可以按以下方式声明:
data Fin : Nat -> Type where
FZ : Fin (S k)
FS : Fin k -> Fin (S k)
从签名中,我们可以看到这是一个类型构造函数,它接收一个 Nat
,并产生一个类型。所以,这不是一个表示对象的容器的集合,相反,它是未命名元素的典型集合,例如,”5个元素的集合”。实际上,它是一个捕捉零到 (n - 1)
范围内的整数的类型,其中 n
是用来实例化 Fin
类型的参数。例如, Fin 5
可以被认为是0到4之间的整数的类型。
让我们更详细地看看这些构造函数。
FZ
是具有 S k
个元素的有限集的第零个元素; FS n
是具有 S k
元素的有限集的第 n+1
个元素。 Fin
由 Nat
索引,它表示集合中元素的数量。因为我们不能构造一个空集的元素,因此也就无法构造出 Fin Z
。
如上所述, Fin
家族的一个有用的应用是表示有界自然数。由于第一个 n
自然数构成了一个由 n
个元素组成的有限集合,我们可以将 Fin n
作为大于或等于零且小于 n
的整数集合。
例如,下面这个函数通过给定一个有界的索引 Fin n
来查找 Vect
中的元素。在 prelude 中定义如下:
index : Fin n -> Vect n a -> a
index FZ (x :: xs) = x
index (FS k) (x :: xs) = index k xs
这个函数在一个向量的指定位置查找一个值。该位置以向量的长度为界(每种情况下都是 n
),所以不需要进行运行时的边界检查。类型检查器保证该位置不大于向量的长度,当然也不小于零。
还要注意,这里没有 Nil
的情况。这是因为这是不可能的。因为没有类型为 Fin Z
且位置是 Fin n
的元素,那么 n
不可能是 Z
。因此,试图在一个空向量中查找一个元素,会在编译时产生一个类型错误,因为它将迫使 n
成为 Z
。
隐式参数
让我们仔细看看 index
的类型。
index : Fin n -> Vect n a -> a
它需要两个参数,一个是 n
个元素的有限集,一个是 n
个元素的向量,类型是 a
。但是还有两个名字, n
和 a
,这两个名字没有被明确声明。 index
使用了 隐式 参数 。我们也可以把 index
的类型写成:
index : forall a, n . Fin n -> Vect n a -> a
隐式参数是用``forall``声明的,在 index
的应用中没有给出;它们的值可以从 Fin n
和 Vect n a
参数的类型中推测出来。在类型声明中作为参数或索引出现的任何以小写字母开头的名称,如果没有应用于任何参数, 总是 会自动被绑定为隐式参数;这就是为什么数据类型名称不能以小写字母开头。隐式参数仍然可以在应用程序中明确给出,例如,使用 {a=value}
和 {n=value}
。
index {a=Int} {n=2} FZ (2 :: 3 :: Nil)
事实上,任何参数,不管是隐式还是显式,都可以被赋予一个名字。我们可以将 index
的类型声明为:
index : (i : Fin n) -> (xs : Vect n a) -> a
你是否要这样做是一个品味问题–有时它可以帮助记录一个函数,使参数的目的更加明确。
隐式参数的名字在函数的主体中是有作用域的,尽管它们在运行时不能使用。关于隐式参数还有很多要说的–我们将在 多重性 一节中讨论在运行时也可以使用的问题,以及其他事项
注:声明顺序和 mutual
块
一般来说,函数和数据类型必须在使用前定义,因为依赖类型允许函数作为类型的一部分出现,而类型检查可以依赖于特定函数的定义方式(尽管这只适用于完全函数;见 Totality Checking)。然而,可以通过使用 mutual
块来放宽这个限制,它允许数据类型和函数同时被定义:
mutual
even : Nat -> Bool
even Z = True
even (S k) = odd k
odd : Nat -> Bool
odd Z = False
odd (S k) = even k
在 mutual
块中,首先添加所有的类型声明,然后是函数体。因此,任何一个函数类型都不会依赖于块中其它函数的递归行为。
前向声明可以让你对相互定义的概念的声明顺序有更精细的控制。如果你需要在相互定义的函数的类型中提到一个数据类型的构造函数,或者需要依靠相互定义的函数的行为来进行类型检查,这就很有用。
data V : Type
T : V -> Type
data V : Type where
N : V
Pi : (a : V) -> (b : T a -> V) -> V
T N = Nat
T (Pi a b) = (x : T a) -> T (b x)
data Even : Nat -> Type
data Odd : Nat -> Type
data Even : Nat -> Type where
ZIsEven : Even Z
SOddIsEven : Odd n -> Even (S k)
data Odd : Nat -> Type where
SEvenIsOdd : Even n -> Odd (S k)
even : Nat -> Bool
odd : Nat -> Bool
-- or just ``even, odd : Nat -> Bool``
even Z = True
even (S k) = odd k
odd Z = False
odd (S k) = even k
将签名声明放在前面可以建议 Idris 检 测他们相应的相互定义。
I/O
如果计算机程序不以某种方式与用户或系统互动,那么它们就没有什么用处。像 Idris 这样的纯语言 – 即表达式没有副作用的语言 – 的困难在于 I/O 本质上是有副作用的。因此, Idris 提供了一个参数化的类型 IO
, 描述 运行时系统在执行一个函数时将执行的交互作用:
data IO a -- description of an IO operation returning a value of type a
我们先给出 IO
的抽象化定义,但实际上它描述了要执行的 I/O 操作是什么,而不是如何执行它们。由此产生的操作是在外部由运行时系统执行的。我们已经看到了一个I/O程序:
main : IO ()
main = putStrLn "Hello world"
putStrLn
的类型说明它接收一个字符串,并返回一个 I/O 动作,产生一个单元类型的元素 ()
。另外它有一个变体 putStr
,它描述了一个没有换行的字符串的输出:
putStrLn : String -> IO ()
putStr : String -> IO ()
我们还可以从用户输入中读取字符串:
getLine : IO String
还有一些其他的 I/O 操作可用。例如,通过在你的程序中添加 import System.File
,你可以获得读写文件的函数,包括:
data File -- abstract
data Mode = Read | Write | ReadWrite
openFile : (f : String) -> (m : Mode) -> IO (Either FileError File)
closeFile : File -> IO ()
fGetLine : (h : File) -> IO (Either FileError String)
fPutStr : (h : File) -> (str : String) -> IO (Either FileError ())
fEOF : File -> IO Bool
请注意,其中几个会返回 Either
,因为它们可能会失败。
“ do
” 记法
I/O 程序通常需要对行动进行排序,将一个计算的输出输入到下一个计算的输入中。然而, IO
是一个抽象类型,所以我们不能直接访问一个计算的结果。相反,我们用 do
记法来排列操作:
greet : IO ()
greet = do putStr "What is your name? "
name <- getLine
putStrLn ("Hello " ++ name)
语法 x <- iovalue
执行I/O操作 iovalue
,类型为 IO a
,并将类型为 a
的结果放入变量 x
。在这种情况下, getLine
返回一个 IO String
,所以 name
具有类型 String
。缩进很重要 – do 块中的每个语句必须在同一列开始。 pure
操作允许我们将一个值直接注入到一个 IO 操作中:
pure : a -> IO a
我们将在后面看到, do
符号比这里展示的更加通用,而且可以重载。
你可以尝试在 Idris 2 REPL 执行 greet
,运行命令 :exec greet
:
惰性
通常情况下,函数的参数在函数本身之前被求值(也就是说,Idris使用 及早 求值策略)。然而,这并不总是最好的方法。考虑一下下面的函数:
ifThenElse : Bool -> a -> a -> a
ifThenElse True t e = t
ifThenElse False t e = e
这个函数会使用 t
或 e
参数中的一个,而不是两个都用。我们希望 只有 被使用的参数被求值。为了实现这一点,Idris 提供了一个 Lazy
原语,它允许暂缓求值。它是一个原语,但在概念上我们可以把它看成是这样:
data Lazy : Type -> Type where
Delay : (val : a) -> Lazy a
Force : Lazy a -> a
一个 Lazy a
类型的值是不被求值的,直到它被 Force
强迫。Idris 类型检查器知道 Lazy
类型,并在必要时插入 Lazy a
和 a
之间的转换,反之亦然。因此,我们可以这样写 ifThenElse
,而不需要明确使用 Force
或 Delay
:
ifThenElse : Bool -> Lazy a -> Lazy a -> a
ifThenElse True t e = t
ifThenElse False t e = e
无限数据类型
我们可以通过余数据类型(codata),将递归参数标记为潜在无穷来定义无限的数据结构。余数据类型的一个例子是Stream,它的定义如下。
data Stream : Type -> Type where
(::) : (e : a) -> Inf (Stream a) -> Stream a
下面是一个例子,说明余数数据类型 Stream
可以用来形成一个无限的数据结构。在这种情况下,我们正在创建一个无限的 1 的流。
ones : Stream Nat
ones = 1 :: ones
有用的数据类型
Idris包括一些有用的数据类型和库函数(见发行版中的 libs/
目录,以及`文档 <https://www.idris-lang.org/pages/documentation.html>`_ )。本节描述了其中一些,以及如何导入它们。
List
和 Vect
我们已经看到过 List
和 Vect
数据类型:
data List a = Nil | (::) a (List a)
data Vect : Nat -> Type -> Type where
Nil : Vect Z a
(::) : a -> Vect k a -> Vect (S k) a
你可以通过 import Data.Vect
获得对 Vect
的访问。请注意,List 和 Vect 每个构造函数的名字都是一样的 – 构造函数的名字(事实上,一般的名字)可以被重载,只要它们被声明在不同的命名空间(见章节 模块和命名空间 ),并且通常会根据它们的类型来解析。作为语法糖,任何被命名为 Nil
和 ::
的实现都可以写成列表形式。例如:
[]
意味着Nil
[1,2,3]
意味着1 :: 2 :: 3 :: Nil
同样,任何名称为 Lin
和 :<
的实现都可以写成 snoc-list 形式:
[<]
意味着Lin
[< 1, 2, 3]
意味着Lin :< 1 :< 2 :< 3
。
prelude 包括一个预定义的 snoc-lists 的数据类型:
data SnocList a = Lin | (:<) (SnocList a) a
该库还定义了一些用于操作这些类型的函数。 map
对 List
和 Vect
都是重载的(我们将在后面的 接口 章节中讨论接口时看到更多精确的细节),并对列表或向量的每个元素应用一个函数。
map : (a -> b) -> List a -> List b
map f [] = []
map f (x :: xs) = f x :: map f xs
map : (a -> b) -> Vect n a -> Vect n b
map f [] = []
map f (x :: xs) = f x :: map f xs
例如,给定以下的整数向量,和一个将整数加倍的函数:
intVec : Vect 5 Int
intVec = [1, 2, 3, 4, 5]
double : Int -> Int
double x = x * 2
函数 map
可用于将向量中的每个元素翻倍:
*UsefulTypes> show (map double intVec)
"[2, 4, 6, 8, 10]" : String
关于 List
和 Vect
上的函数的更多细节,请查阅库文件:
libs/base/Data/List.idr
libs/base/Data/Vect.idr
函数包括过滤、追加、反转等。
题外话:匿名函数和操作符段
有更多的方法来写上述表达式。其中一种方法是使用匿名函数:
*UsefulTypes> show (map (\x => x * 2) intVec)
"[2, 4, 6, 8, 10]" : String
符号 \x => val
构建了一个匿名函数,它接受一个参数 x
并返回表达式 val
。匿名函数可以接受多个参数,用逗号分隔,例如: \x, y, z => val
。参数也可以被赋予明确的类型,例如: \x : Int => x * 2
,并且可以模式匹配,例如: \(x, y) => x + y
。另外我们也可以使用一个操作符段:
*UsefulTypes> show (map (* 2) intVec)
"[2, 4, 6, 8, 10]" : String
(*2)
是一个将一个数字乘以2的函数的缩写。它可以被扩展为 \x => x * 2
。类似地, (2*)
将被扩展为 \x => 2 * x
。
Maybe
Maybe
被定义在 Prelude 中,描述了一个可选的值。要么有一个给定类型的值,要么没有:
data Maybe a = Just a | Nothing
Maybe
是给操作提供类型的一种方式,可能会失败。例如,在 List
(而不是一个向量)中查找东西可能会导致越界错误:
list_lookup : Nat -> List a -> Maybe a
list_lookup _ Nil = Nothing
list_lookup Z (x :: xs) = Just x
list_lookup (S k) (x :: xs) = list_lookup k xs
maybe
函数用于处理 Maybe
类型的值,如果有值可以对该值应用一个函数,或者提供一个默认值后再应用函数:
maybe : Lazy b -> Lazy (a -> b) -> Maybe a -> b
注意,前两个参数的类型被包裹在 Lazy
中。由于这两个参数中只有一个会被实际使用,我们把它们标记为 Lazy
,以防它们是复杂的表达式,计算后再丢弃它们会很浪费。
元组
值可以用以下内置数据类型配对:
data Pair a b = MkPair a b
作为语法糖,我们可以写 (a, b)
,根据上下文,这意味着 Pair a b
或 MkPair a b
。元组可以包含任意数量的值,以嵌套对的形式表示:
fred : (String, Int)
fred = ("Fred", 42)
jim : (String, Int, String)
jim = ("Jim", 25, "Cambridge")
*UsefulTypes> fst jim
"Jim" : String
*UsefulTypes> snd jim
(25, "Cambridge") : (Int, String)
*UsefulTypes> jim == ("Jim", (25, "Cambridge"))
True : Bool
依赖对
依赖对允许一个对中的第二个元素的类型取决于第一个元素的值:
data DPair : (a : Type) -> (p : a -> Type) -> Type where
MkDPair : {p : a -> Type} -> (x : a) -> p x -> DPair a p
同样,这也有语法上的糖。 (x : a ** p)
是一对 A 和 P 的类型,其中名称 x
可以出现在 p
里面。 ( x ** p )
构建一个该类型的值。例如,我们可以将一个数字与一个特定长度的 Vect
配对:
vec : (n : Nat ** Vect n Int)
vec = (2 ** [3, 4])
如果你愿意,你可以用长的方式写出来;两者是等同的:
vec : DPair Nat (\n => Vect n Int)
vec = MkDPair 2 [3, 4]
类型检查器可以从向量的长度推断出第一个元素的值。我们可以写一个下划线``_``来代替我们期望类型检查器填写的值,所以上述定义也可以写成:
vec : (n : Nat ** Vect n Int)
vec = (_ ** [3, 4])
我们也可能倾向于省略这对元素中第一个元素的类型,因为它同样可以被推断出来:
vec : (n ** Vect n Int)
vec = (_ ** [3, 4])
依赖对的一个用途是返回依赖类型的值,其中的索引不一定事先知道。例如,如果我们根据一些谓词从 Vect
中过滤出元素,我们将不会事先知道所产生的向量的长度:
filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)
如果 Vect
是空的,结果就是:
filter p Nil = (_ ** [])
在 ::
的情况下,我们需要检查对 filter
的递归调用的结果,从结果中提取长度和矢量。要做到这一点,我们使用 case
表达式,它允许对中间值进行模式匹配:
filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)
filter p Nil = (_ ** [])
filter p (x :: xs)
= case filter p xs of
(_ ** xs') => if p x then (_ ** x :: xs')
else (_ ** xs')
依赖对有时被称为 “Sigma 类型”。
记录
记录 是将几个值(记录的*字段* )收集在一起的数据类型。Idris 提供了定义记录的语法,并自动生成字段访问和更新函数。与用于数据结构的语法不同,Idris 中的记录遵循一种与 Haskell 不同的语法。例如,我们可以在一个记录中表示一个人的名字和年龄:
record Person where
constructor MkPerson
firstName, middleName, lastName : String
age : Int
fred : Person
fred = MkPerson "Fred" "Joe" "Bloggs" 30
使用 constructor
关键字提供构造函数名称,然后给出*字段* ,这些字段在 where 关键字之后的缩进块中(这里是 firstName
, middleName
, lastName
,和 age
)。你可以在一行中声明多个字段,只要它们具有相同的类型。字段名可以用来访问字段的值:
*Record> fred.firstName
"Fred" : String
*Record> fred.age
30 : Int
*Record> :t (.firstName)
Main.Person.(.firstName) : Person -> String
我们可以使用前缀字段投影,就像在Haskell中一样:
*Record> firstName fred
"Fred" : String
*Record> age fred
30 : Int
*Record> :t firstName
firstName : Person -> String
可以使用pragma %prefix_record_projections off
在每条记录的定义中禁用前缀字段投影,这使得所有随后定义的记录只产生点状的投影。这个 pragma 在模块结束前或在最近一次出现 %prefix_record_projections on
之前都是有效的。
我们还可以使用字段名来更新一条记录(或者更准确地说,产生一个更新了给定字段的记录副本):
*Record> { firstName := "Jim" } fred
MkPerson "Jim" "Joe" "Bloggs" 30 : Person
*Record> { firstName := "Jim", age $= (+ 1) } fred
MkPerson "Jim" "Joe" "Bloggs" 31 : Person
语法 { field := val, ... }
产生一个函数,更新记录中的给定字段。 :=
给一个字段分配一个新的值, $=
应用一个函数来更新它的值。
每条记录都被定义在自己的命名空间中,这意味着字段名可以在多条记录中重复使用。
记录和记录中的字段可以有依赖类型。允许更新改变一个字段的类型,只要其结果是良类型。
record Class where
constructor ClassInfo
students : Vect n Person
className : String
将 students
字段更新为不同长度的向量是安全的,因为它不会影响记录的类型:
addStudent : Person -> Class -> Class
addStudent p c = { students := p :: students c } c
*Record> addStudent fred (ClassInfo [] "CS")
ClassInfo [MkPerson "Fred" "Joe" "Bloggs" 30] "CS" : Class
我们也可以用 $=
来更简洁地定义 addStudent
:
addStudent' : Person -> Class -> Class
addStudent' p c = { students $= (p ::) } c
嵌套记录投影
嵌套的记录字段可以使用点符号访问:
x.a.b.c
map (.a.b.c) xs
对于点符号,点后不能有空格,但是点前可以有空格。合成投影必须有括号,否则 map .a.b.c xs
将被理解为 map.a.b.c xs
。
嵌套的记录字段也可以用前缀符号访问:
(c . b . a) x
map (c . b . a) xs
周围有空格的点代表函数组合运算符。
嵌套记录更新
Idris 还提供了一个方便的语法来访问和更新嵌套记录。例如,如果一个字段可以用表达式 x.a.b.c
来访问,它可以用以下语法来更新:
{ a.b.c := val } x
这将返回一个新的记录,由路径 a.b.c
访问的字段被设置为 val
。语法也是一等的,即 { a.b.c := val }
本身有一个函数类型。
$=
符号对嵌套的记录更新也有效。
依赖记录
记录也可以依赖于数值。记录有 参数 ,这些参数不能像其他字段一样被更新。参数作为结果类型的参数出现,并写在记录类型名称的后面。例如,一个对类型可以定义如下:
record Prod a b where
constructor Times
fst : a
snd : b
使用前面的 Class
记录,可以用 Vect
来限制类的大小,并通过对记录的大小进行参数化,将大小纳入类型。 例如:
record SizedClass (size : Nat) where
constructor SizedClassInfo
students : Vect size Person
className : String
在前面 addStudent
的情况下,我们仍然可以在 SizedClass
上添加一个学生,因为大小是隐含的,当添加一个学生的时候大小会被更新:
addStudent : Person -> SizedClass n -> SizedClass (S n)
addStudent p c = { students := p :: students c } c
事实上,我们刚才看到的依赖对类型在实践中被定义为一条记录,其字段 fst
和 snd
允许从依赖对中投影出数值:
record DPair a (p : a -> Type) where
constructor MkDPair
fst : a
snd : p fst
可以使用记录更新语法来更新依赖字段,前提是所有相关字段都要一次性更新。例如:
cons : t -> (x : Nat ** Vect x t) -> (x : Nat ** Vect x t)
cons val xs
= { fst := S (fst xs),
snd := (val :: snd xs) } xs
甚至可以更省事:
cons' : t -> (x : Nat ** Vect x t) -> (x : Nat ** Vect x t)
cons' val
= { fst $= S,
snd $= (val ::) }
更多表达式
let
绑定
计算出的中间值可以使用 let
来绑定到变量:
mirror : List a -> List a
mirror xs = let xs' = reverse xs in
xs ++ xs'
我们也可以在 let
绑定中进行模式匹配。例如,我们可以从记录中提取字段,如下所示,也可以通过在顶层进行模式匹配:
data Person = MkPerson String Int
showPerson : Person -> String
showPerson p = let MkPerson name age = p in
name ++ " is " ++ show age ++ " years old"
这些 let 绑定可以使用类型注解:
mirror : List a -> List a
mirror xs = let xs' : List a = reverse xs in
xs ++ xs'
我们还可以使用符号 :=
来代替 =
,除其他事项外,避免命题相等的歧义:
Diag : a -> Type
Diag v = let ty : Type := v = v in ty
本地定义也可以使用 let
引入。就像顶层定义和在 where
子句中定义的一样,你需要:
声明函数和它的类型
通过模式匹配来定义函数
foldMap : Monoid m => (a -> m) -> Vect n a -> m
foldMap f = let fo : m -> a -> m
fo ac el = ac <+> f el
in foldl fo neutral
符号 :=
不能在局部函数定义中使用。这意味着它可以用来交错使用 let 绑定和局部定义,而不会引入歧义。
foldMap : Monoid m => (a -> m) -> Vect n a -> m
foldMap f = let fo : m -> a -> m
fo ac el = ac <+> f el
initial := neutral
-- ^ this indicates that `initial` is a separate binding,
-- not relevant to definition of `fo`
in foldl fo initial
列表推导式
Idris提 供了 推导式 符号,作为建立列表的方便速记法。其一般形式是:
[ expression | qualifiers ]
通过对 expression
进行求值,根据逗号分隔的 qualifiers
给出的条件生成一个符合条件的列表。例如,我们可以建立一个毕达哥拉斯三段论的列表,如下所示:
pythag : Int -> List (Int, Int, Int)
pythag n = [ (x, y, z) | z <- [1..n], y <- [1..z], x <- [1..y],
x*x + y*y == z*z ]
[a..b]
符号是另一种速记方法,它在 a
和 b
之间建立一个数字列表。或者 [a,b..c]
在 a
和 c
之间建立一个数字列表,增量由 a
和 b
之间的差异指定。这适用于 Nat
, Int
和 Integer
类型,是 prelude 中的 enumFromTo
和 enumFromThenTo
函数的语法糖。
case
表达式
另一种检查中间值的方法是使用 case
表达式。例如,下面的函数在一个给定的字符处将一个字符串分成两个:
splitAt : Char -> String -> (String, String)
splitAt c x = case break (== c) x of
(x, y) => (x, strTail y)
break
是一个库函数,它在给定函数返回真值的地方将一个字符串分解成一对子字符串。然后我们对它返回的一对子字符串进行解构,并删除第二个子字符串的第一个字符。
一个 case
表达式可以匹配多种情况,例如,检查一个中间值的类型 Maybe a
。回顾 list_lookup
函数,它在一个列表中查找一个索引,如果索引出界则返回 Nothing
。我们可以用它来写 lookup_default
,它查找一个索引,如果索引出界则返回一个默认值:
lookup_default : Nat -> List a -> a -> a
lookup_default i xs def = case list_lookup i xs of
Nothing => def
Just x => x
如果索引在范围内,我们得到该索引的值,否则我们得到一个默认值:
*UsefulTypes> lookup_default 2 [3,4,5,6] (-1)
5 : Integer
*UsefulTypes> lookup_default 4 [3,4,5,6] (-1)
-1 : Integer
完全性
Idris 区分了 完全 和 部分 函数。完全函数是一个这样的函数,它要么:
对所有可能的输入终止,或者
产生一个非空的、有限的、或者一个可能是无限结果的前缀
如果一个函数是完全的,我们可以认为其类型是对该函数将做什么的精确描述。例如,如果我们有一个返回类型为 String
的函数,我们知道一些不同的东西,这取决于它是否是完全的:
如果是完全的,它将在有限时间内返回一个类型为
String
的值:如果是部分的,那么只要不崩溃或进入无限循环,就会返回一个
String
。
Idris 做了这个区分,所以它知道哪些函数在类型检查时是安全的(正如我们在 一等类型 中看到的)。毕竟,如果它试图在类型检查期间求值一个没有终止的函数,那么类型检查就不会终止!因此,在类型检查期间,只有完全函数会被求值。部分函数仍然可以在类型中使用,但不会被进一步求值。
接口
我们经常希望定义的函数能在几种不同的数据类型中工作。例如,我们希望算术运算符至少能在 Int
, Integer
和 Double
上工作。我们希望 ==
能在大多数数据类型上工作。我们希望能够以一种统一的方式显示不同的类型。
为了实现这一点,我们使用 接口 ,它类似于 Haskell 中的类型类或 Rust 中的 traits 。为了定义一个接口,我们提供一个可重载函数的集合。一个简单的例子是 Show
接口,它被定义在 prelude 中,提供了一个将数值转换为 String
的接口:
interface Show a where
show : a -> String
生成一个如下类型的函数(我们称之为 Show
接口的 方法 ):
show : Show a => a -> String
我们可以把它理解为:“ 在 a
实现 Show
的约束下,该函数接受一个输入 a
并返回一个 String
”。我们可以通过为它定义接口的方法来实现该接口。例如, Nat
的 Show
实现可以定义为:
Show Nat where
show Z = "Z"
show (S k) = "s" ++ show k
Main> show (S (S (S Z)))
"sssZ" : String
一个类型对于同一个接口只能有一种实现——实现不得重合。实现声明本身可以有约束。为了帮助解决这个问题,实现的参数必须是构造函数(数据或类型构造函数)或变量(也就是说,你无法为函数赋予实现)。例如,为向量定义一个 Show
的实现,我们需要知道有一个 Show
的实现用于元素类型,因为我们要用它把每个元素转换为 String
:
Show a => Show (Vect n a) where
show xs = "[" ++ show' xs ++ "]" where
show' : forall n . Vect n a -> String
show' Nil = ""
show' (x :: Nil) = show x
show' (x :: xs) = show x ++ ", " ++ show' xs
请注意,我们需要在 show'
函数中明确 forall n .
,因为 n
已经在作用域内,并且固定为顶层的 n
的值。
默认定义
Prelude 定义了一个 Eq
接口,它提供了比较值的相等或不相等的方法,并为所有的内置类型提供了实现:
interface Eq a where
(==) : a -> a -> Bool
(/=) : a -> a -> Bool
要为类型实现一个接口,我们必须给出所有方法的定义。例如, Nat
类型的 Eq
接口实现:
Eq Nat where
Z == Z = True
(S x) == (S y) = x == y
Z == (S y) = False
(S x) == Z = False
x /= y = not (x == y)
很难想象在很多情况下, /=
方法除了是应用 ==
方法的结果的否定之外,还会是什么。因此,在接口声明中为每个方法给出一个默认的定义是很方便的,默认定义可以调用其它方法:
interface Eq a where
(==) : a -> a -> Bool
(/=) : a -> a -> Bool
x /= y = not (x == y)
x == y = not (x /= y)
Eq
的最小完整实现需要定义 ==
或 /=
,但不需要同时定义。如果缺少一个方法的定义,并且有一个默认的定义,那么就用默认的定义来代替。
扩展接口
接口也可以被扩展。相等关系 Eq
的下一个逻辑步骤是定义一个排序关系 Ord
。我们可以定义一个 Ord
接口,它继承了 Eq
的方法,同时也定义了一些自己的方法:
data Ordering = LT | EQ | GT
interface Eq a => Ord a where
compare : a -> a -> Ordering
(<) : a -> a -> Bool
(>) : a -> a -> Bool
(<=) : a -> a -> Bool
(>=) : a -> a -> Bool
max : a -> a -> a
min : a -> a -> a
Ord
接口允许我们比较两个值并确定它们的顺序。只有 compare
方法是必需的;其他每个方法都有一个默认的定义。利用这一点,我们可以写一些函数,比如 sort
,这个函数可以将一个列表按递增顺序排序,前提是列表的元素类型在 Ord
接口中。我们在胖箭头 =>
的左边给出类型变量的约束,在胖箭头的右边给出函数类型:
sort : Ord a => List a -> List a
函数、接口和实现可以有多个约束。多个约束条件以逗号分隔的列表方式写在括号里,例如:
sortAndShow : (Ord a, Show a) => List a -> String
sortAndShow xs = show (sort xs)
约束和类型一样,是语言中的一等对象。你可以在 REPL 中看到这一点:
Main> :t Ord
Prelude.Ord : Type -> Type
所以, (Ord a, Show a)
是一对普通的 Types
,将两个约束作为该对的第一个和第二个元素。
注:接口和 mutual
块
Idris是严格的 “先定义后使用”,除了在 mutual
块中。在 mutual
块中,Idris 分两遍进行扫描:第一遍是类型,第二遍是定义。当 mutual 块包含一个接口声明时,它在第一遍中扫描接口头,但没有方法类型,在第二遍扫描方法类型和所有的默认定义。
参数的量
默认情况下,在 interface
声明中没有明确赋予类型的参数被分配为数量 0
。这意味着该参数在运行时不能在方法的定义中使用。
例如, Show a
在 show
方法的类型中产生了一个数量为 0
的类型变量 a
:
Main> :set showimplicits
Main> :t show
Prelude.show : {0 a : Type} -> Show a => a -> String
然而有些用例要求一些参数在运行时可用。例如,我们可能想为 Storable
类型声明一个接口。约束 Storable a size
意味着我们可以将 a
类型的值存储在一个 Buffer
中,正好是 size
字节。
如果用户提供一个方法来在通过给定一个偏移量读取类型 a
的值,那么我们可以通过计算 k
和 size
的适当偏移量来读取存储在缓冲区中的 k
的元素。这可以通过为 peekElementOff
方法提供一个默认的实现来证明,该方法通过 peekByteOff
和参数 size
来实现。
data ForeignPtr : Type -> Type where
MkFP : Buffer -> ForeignPtr a
interface Storable (0 a : Type) (size : Nat) | a where
peekByteOff : HasIO io => ForeignPtr a -> Int -> io a
peekElemOff : HasIO io => ForeignPtr a -> Int -> io a
peekElemOff fp k = peekByteOff fp (k * cast size)
请注意, a
被明确标记为运行时不相关,所以它被编译器删除了。相当于我们可以写成 interface Storable a (size : Nat)
。 | a
的含义在 确定参数 中有解释。
函子与应用子
到目前为止,我们看到的都是单参数接口,其中参数的类型是 Type
。一般来说,可以有任何数量的参数(甚至是零个),而且参数可以有 任何 类型。如果参数的类型不是 Type
,我们需要给出一个明确的类型声明。例如, Functor
接口在 prelude 中是这样定义的:
interface Functor (0 f : Type -> Type) where
map : (m : a -> b) -> f a -> f b
函子允许在结构中应用一个函数,例如,将一个函数应用于 List
中的每个元素:
Functor List where
map f [] = []
map f (x::xs) = f x :: map f xs
Idris> map (*2) [1..10]
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20] : List Integer
在定义了 Functor
之后,我们可以定义 Applicative
,它抽象了函数应用的概念:
infixl 2 <*>
interface Functor f => Applicative (0 f : Type -> Type) where
pure : a -> f a
(<*>) : f (a -> b) -> f a -> f b
单子和 do
- 记法
Monad
接口允许我们对绑定和计算进行封装,它是 “ do ” 记法 一节中 do
记法的基础 。它扩展了上面定义的 Applicative
,并有如下定义:
interface Applicative m => Monad (m : Type -> Type) where
(>>=) : m a -> (a -> m b) -> m b
还有一个不进行绑定操作的运算符, Monad
将其定义为:
v >> e = v >>= \_ => e
在 do
块内,应用以下语法转换:
x <- v; e
变成v >>= (\x => e)
v; e
变成v >> e
let x = v; e
变成let x = v in e
IO
有一个 Monad
的实现,是使用原语函数定义。我们也可以为 Maybe
定义一个实现,如下所示:
Monad Maybe where
Nothing >>= k = Nothing
(Just x) >>= k = k x
利用这一点,我们可以做更多的事情,例如,定义用于对 Maybe Int
进行加法操作的函数,使用单子来封装错误处理:
m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = do x' <- x -- Extract value from x
y' <- y -- Extract value from y
pure (x' + y') -- Add them
如果两个值都是有值的,这个函数将从 x
和 y
中提取数值,或者如果一个或两个都不是(”快速失败”),则返回 Nothing
。管理 Nothing
的情况是由 >>=
操作符实现的,被 do
符号所隐藏。
Main> m_add (Just 82) (Just 22)
Just 94
Main> m_add (Just 82) Nothing
Nothing
do
符号的翻译完全是句法性的,所以没有必要将 (>>=)
和 (>>)
操作符作为 Monad
接口中定义的操作符。一般来说,Idris 会尝试区分你所指的运算符的类型,但你可以用限定的 do 符号明确选择,例如:
m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = Prelude.do
x' <- x -- Extract value from x
y' <- y -- Extract value from y
pure (x' + y') -- Add them
Prelude.do
意味着 Idris 将使用在 Prelude
中定义的 (>>=)
和 (>>)
。
模式匹配绑定
在 do
记法中,有时我们想在一个函数的结果上立即进行模式匹配,例如,假设我们有一个函数 readNumber
从控制台读取一个数字,如果该数字有效,则返回一个形式为 Just x
的值,否则为 Nothing
:
import Data.String
readNumber : IO (Maybe Nat)
readNumber = do
input <- getLine
if all isDigit (unpack input)
then pure (Just (stringToNatOrZ input))
else pure Nothing
如果我们用它来写一个函数来读取两个数字,如果两个数字都无效,则返回 Nothing
,然后我们想对 readNumber
的结果进行模式匹配:
readNumbers : IO (Maybe (Nat, Nat))
readNumbers =
do x <- readNumber
case x of
Nothing => pure Nothing
Just x_ok => do y <- readNumber
case y of
Nothing => pure Nothing
Just y_ok => pure (Just (x_ok, y_ok))
如果有大量的错误处理,这可能很快就会被深度嵌套!所以我们可以在一行中结合绑定和模式匹配。例如,我们可以尝试对形式为 Just x_ok
的值进行模式匹配:
readNumbers : IO (Maybe (Nat, Nat))
readNumbers
= do Just x_ok <- readNumber
Just y_ok <- readNumber
pure (Just (x_ok, y_ok))
然而,仍然有一个问题,因为我们现在省略了 Nothing
的情况,所以 readNumbers
不再是完全函数!我们可以把 Nothing
的情况加回来,如下所示:
readNumbers : IO (Maybe (Nat, Nat))
readNumbers
= do Just x_ok <- readNumber
| Nothing => pure Nothing
Just y_ok <- readNumber
| Nothing => pure Nothing
pure (Just (x_ok, y_ok))
这个版本的 readNumbers
的效果与第一个版本相同(事实上,这是它的句法糖,会直接翻译成第一个版本的形式)。每个语句的第一部分( Just x_ok <-
和 Just y_ok <-
)给出了首选的绑定方式–如果匹配,将继续执行 do
块的其余部分。第二部分给出了备选的绑定方式,其中可能有多个绑定方式。
!
-记法
在许多情况下,使用 do
- 记法会使程序变得不必要的冗长,特别是在上面 m_add
的情况下,值被绑定后立即使用且只用一次。在这些情况下,我们可以使用一个速记版本,如下所示:
m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = pure (!x + !y)
符号 !expr
表示表达式 expr
应该被求值,然后被隐含地绑定。从概念上讲,我们可以把 !
看作是一个前缀函数,其类型如下:
(!) : m a -> a
然而,请注意,它并不是一个真正的函数,只是语法而已。一个子表达式 !expr
将在其当前作用域内尽可能地提升 expr
,将其绑定到一个新的名称 x
,并将 !expr
替换为 x
。表达式从左到右,从深度开始提升。在实践中, !
- notation 允许我们以更直接的方式进行编程,同时仍然提供一个符号线索,说明哪些表达式是单子。
例如,表达式:
let y = 94 in f !(g !(print y) !x)
被提升为:
let y = 94 in do y' <- print y
x' <- x
g' <- g y' x'
f g'
单子推导式
我们在 更多表达式 一节中看到的列表推导式符号更为通用,它适用于任何实现了 Monad
和 Alternative
的数据类型:
interface Applicative f => Alternative (0 f : Type -> Type) where
empty : f a
(<|>) : f a -> f a -> f a
一般来说,推导式的形式是: [ exp | qual1, qual2, …, qualn ]
其中 quali
可以是下列之一:
生成器
x <- e
一个 守卫 ,它是一个类型为
Bool
的表达式let 绑定
let x = e
翻译一个推导式 [exp | qual1, qual2, ..., qualn]
,首先使用以下函数将任何作为 guard 的限定符 qual
转换为 guard qual
:
guard : Alternative f => Bool -> f ()
然后将推导式转换为 do 记法:
do { qual1; qual2; ...; qualn; pure exp; }
使用单子推导式, m_add
的另一个定义是:
m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = [ x' + y' | x' <- x, y' <- y ]
接口和IO
一般来说, IO
库中的操作不是直接使用 IO
编写的,而是通过 HasIO
接口编写的:
interface Monad io => HasIO io where
liftIO : (1 _ : IO a) -> io a
HasIO
的解释,通过 liftIO
解释了如何将一个原语 IO
操作转换为某个底层类型的操作,只要该类型有一个 Monad
实现。 这些接口允许程序员定义一些更具表现力的交互式程序的概念,同时仍然可以直接访问 IO
原语。
习语括号
虽然 do
记法给序列另一种含义,但习语给了 应用子 另一种含义。本节中的符号和较大的例子是受 Conor McBride 和 Ross Paterson 的论文 “Applicative Programming with Effects ” 的启发 1 。
首先,让我们重新审视上面的 m_add
。它所做的实际上是对从 Maybe Int
中提取的两个值应用一个运算符。我们可以把这个应用子:
m_app : Maybe (a -> b) -> Maybe a -> Maybe b
m_app (Just f) (Just a) = Just (f a)
m_app _ _ = Nothing
利用这一点,我们可以写一个替代性的 m_add
,它使用这个替代性的函数应用概念,并明确调用 m_app
:
m_add' : Maybe Int -> Maybe Int -> Maybe Int
m_add' x y = m_app (m_app (Just (+)) x) y
我们不必在有应用子的地方插入 m_app
,而是可以使用习语括号来为我们完成这项工作。要做到这一点,我们可以让 Maybe
实现 Applicative
,如下所示,其中 <*>
的定义与上面 m_app
相同(这是在 Idris 库中定义的):
Applicative Maybe where
pure = Just
(Just f) <*> (Just a) = Just (f a)
_ <*> _ = Nothing
Using <*>
we can use this implementation as follows, where a function
application [| f a1 … an |]
is translated into pure f <*> a1 <*>
… <*> an
:
m_add' : Maybe Int -> Maybe Int -> Maybe Int
m_add' x y = [| x + y |]
一个错误处理解释器
在定义求值器时,习语括号通常是有用的。McBride 和 Paterson 描述了这样一个求值器 1 ,用于类似于以下的语言:
data Expr = Var String -- variables
| Val Int -- values
| Add Expr Expr -- addition
求值器将相对于上下文映射变量(表示为 String
s) 到 Int
类型的求值,并可能失败。我们定义了一个数据类型 Eval
来包装一个求值器:
data Eval : Type -> Type where
MkEval : (List (String, Int) -> Maybe a) -> Eval a
将求值器包裹在一个数据类型中意味着我们以后可以为它提供接口的实现。我们首先定义了一个函数,用于在求值过程中从上下文中获取数值:
fetch : String -> Eval Int
fetch x = MkEval (\e => fetchVal e) where
fetchVal : List (String, Int) -> Maybe Int
fetchVal [] = Nothing
fetchVal ((v, val) :: xs) = if (x == v)
then (Just val)
else (fetchVal xs)
当定义语言的求值器时,我们将在 Eval
的上下文中应用函数,所以很自然地给 Eval
一个 Applicative
的实现。在 Eval
允许有 Applicative
的实现之前, Eval
必须有 Functor
的实现:
Functor Eval where
map f (MkEval g) = MkEval (\e => map f (g e))
Applicative Eval where
pure x = MkEval (\e => Just x)
(<*>) (MkEval f) (MkEval g) = MkEval (\x => app (f x) (g x)) where
app : Maybe (a -> b) -> Maybe a -> Maybe b
app (Just fx) (Just gx) = Just (fx gx)
app _ _ = Nothing
求值一个表达式时可以利用的习语括号来处理错误:
eval : Expr -> Eval Int
eval (Var x) = fetch x
eval (Val x) = [| x |]
eval (Add x y) = [| eval x + eval y |]
runEval : List (String, Int) -> Expr -> Maybe Int
runEval env e = case eval e of
MkEval envFn => envFn env
例如:
InterpE> runEval [("x", 10), ("y",84)] (Add (Var "x") (Var "y"))
Just 94
InterpE> runEval [("x", 10), ("y",84)] (Add (Var "x") (Var "z"))
Nothing
命名实现
对于同一类型的接口,可能需要有多个实现,例如,为排序或打印数值提供替代方法。为了实现这一点,实现可以被 命名 ,如下所示:
[myord] Ord Nat where
compare Z (S n) = GT
compare (S n) Z = LT
compare Z Z = EQ
compare (S x) (S y) = compare @{myord} x y
这就像平常一样声明了一个实现,但是有一个明确的名字, myord
。语法 compare @{myord}
为 compare
提供了一个明确的实现,否则它将使用 Nat
的默认实现。例如,我们可以用它来对 Nat
的列表进行反向排序。给出以下列表:
testList : List Nat
testList = [3,4,1]
我们可以使用默认的 Ord
实现进行排序,通过使用 sort
函数, import Data.List
后可用,然后我们可以用命名的实现 myord
进行尝试,在 Idris 提示符下输入:
Main> show (sort testList)
"[1, 3, 4]"
Main> show (sort @{myord} testList)
"[4, 3, 1]"
有时,我们还需要访问一个命名的父级实现。例如,prelude 中定义了以``Semigroup`` 接口:
interface Semigroup ty where
(<+>) : ty -> ty -> ty
然后,它定义了 Monoid
,用一个 “neutral” 值扩展了 Semigroup
:
interface Semigroup ty => Monoid ty where
neutral : ty
我们可以为 Nat
定义 Semigroup
和 Monoid
两种不同的实现,一种基于加法,一种基于乘法:
[PlusNatSemi] Semigroup Nat where
(<+>) x y = x + y
[MultNatSemi] Semigroup Nat where
(<+>) x y = x * y
加法的中性值是 0
,但乘法的中性值是 1
。因此,重要的是,当我们定义 Monoid
的实现时,它们会扩展正确的 Semigroup
实现。我们可以通过实现中的 using
子句来做到这一点,具体如下:
[PlusNatMonoid] Monoid Nat using PlusNatSemi where
neutral = 0
[MultNatMonoid] Monoid Nat using MultNatSemi where
neutral = 1
using PlusNatSemi
子句表明, PlusNatMonoid
应扩展 自 PlusNatSemi
。
接口构造器
接口,就像记录一样,可以用一个用户定义的构造函数来声明。
interface A a where
getA : a
interface A t => B t where
constructor MkB
getB : t
然后 MkB : A t => t -> B t
。
确定参数
当一个接口有一个以上的参数时,如果用来寻找实现的参数受到限制,就会有助于解决。比如说:
interface Monad m => MonadState s (0 m : Type -> Type) | m where
get : m s
put : s -> m ()
在这个接口中,只需要知道 m
就可以找到这个接口的实现,然后 s
可以从实现中确定。这是在接口声明之后用 | m
声明的。我们称 m
为 MonadState
接口的 决定性参数 ,因为它是用来寻找实现的参数。这类似于Haskell中 功能依赖 的概念* ` <https://wiki.haskell.org/Functional_dependencies>`_ 。
- 1(1,2)
Conor McBride and Ross Paterson. 2008. Applicative programming with effects. J. Funct. Program. 18, 1 (January 2008), 1-13. DOI=10.1017/S0956796807006326 https://dx.doi.org/10.1017/S0956796807006326
模块和命名空间
一个 Idris 程序由一个模块的集合组成。每个模块包括一个可选的 module
声明,用来给出模块的名称,一个 import
声明列表,给出要导入的其他模块,以及一个类型、接口和函数的声明和定义的集合。例如,下面的列表给出了一个定义二叉树类型的模块 BTree
(在文件 BTree.idr
中):
module BTree
public export
data BTree a = Leaf
| Node (BTree a) a (BTree a)
export
insert : Ord a => a -> BTree a -> BTree a
insert x Leaf = Node Leaf x Leaf
insert x (Node l v r) = if (x < v) then (Node (insert x l) v r)
else (Node l v (insert x r))
export
toList : BTree a -> List a
toList Leaf = []
toList (Node l v r) = BTree.toList l ++ (v :: BTree.toList r)
export
toTree : Ord a => List a -> BTree a
toTree [] = Leaf
toTree (x :: xs) = insert x (toTree xs)
修饰词 export
和 public export
表示哪些名称对其他命名空间可见。这些将在下面进一步解释。
然后,这就给出了一个主程序(在文件 bmain.idr
中),它使用 BTree
模块对一个列表进行排序:
module Main
import BTree
main : IO ()
main = do let t = toTree [1,8,2,7,9,3]
print (BTree.toList t)
相同的名字可以被定义在多个模块中:名字可以用模块的名字来 限定 。在 BTree
模块中定义的名字,全限定名如下:
BTree.BTree
BTree.Leaf
BTree.Node
BTree.insert
BTree.toList
BTree.toTree
如果名字没有歧义,就没有必要给出完全限定的名字。名称也可以通过使用 with
关键字给出一个明确的限定,或者根据它们的类型来消除歧义。
with
表达式中的关键字有两种变体:
with BTree.insert (insert x empty)
用于单个名称with [BTree.insert, BTree.empty] (insert x empty)
用于多个名称
这对于 do
记法特别有用,它通常可以改善错误消息: with MyModule.(>>=) do ...
尽管一般来说,模块名称和文件名之间没有正式的联系,模块和文件使用相同的名称是明智的。 import
语句指的是文件名,使用点来分隔目录。例如, import foo.bar
将导入文件 foo/bar.idr
,按照惯例,该文件的模块声明是 module foo.bar
。对模块名称的唯一要求是,带有 main
函数的主模块必须被称为 Main
—— 尽管其文件名不需要是 Main.idr
。
导出修饰符
Idris 允许对命名空间内容的可见性进行精细的控制。默认情况下,所有定义在名字空间的名字都是私有的。 这有助于规范一个最小的接口和隐藏内部细节。Idris 允许函数、类型和接口被标记为 private
, export
或 public export
。它们的一般含义如下:
private
意味着它不会被导出。这是默认设置。export
意味着顶层类型已被导出。public export
意味着整个定义被导出。
修改可见性的另一个限制是,定义不能引用更低层次的可见性中的任何东西。例如, public export
定义不能使用 private
或 export
名称,而 export
类型不能使用 private
名称。这是为了防止私有名称泄露到模块的接口中。
用于函数时的含义
export
类型被导出public export
类型和定义被导出,定义被导入后可以使用。换句话说,定义本身被认为是模块接口的一部分。public export
这个长名字是为了让你在做这件事时三思而行。
备注
Idris 中的类型同义词是通过编写函数创建的。设置模块的可见性时,如果要在模块外使用所有类型的同义词,最好将它们设置为 public export
。否则,Idris 将不知道该同义词是谁的同义词。
由于 public export
意味着一个函数的定义被导出,这实际上使函数定义成为模块 API 的一部分。因此,一般来说,除非你真的想导出完整的定义,否则最好不要对函数使用 public export
。
备注
对于初学者 。如果函数只需要在运行时访问,使用 export
。但是,如果它也要在 编译时使用 (例如,证明一个定理),则使用 public export
。例如,考虑前面讨论的函数 plus : Nat -> Nat -> Nat
,以及下面的定理。 thm : plus Z m = m
。为了证明它,类型检查器需要将 plus Z m
还原为 m
(从而得到 thm : m = m
)。* 为了实现这一点,它需要访问*的定义 plus
,其中包括方程式 plus Z m = m
。因此,在这种情况下, plus
必须被标记为 public export
。
数据类型的含义
对于数据类型,其含义是:
export
类型构造器被导出public export
类型构造器和数据构造器会被导出
接口上的含义
对于接口,其含义是:
export
接口名称被导出public export
接口名称、方法名称和默认定义被导出
传播内部模块的 API
此外,一个模块可以重新输出它所导入的模块,方法是在 public
修改器上使用 import
。例如:
module A
import B
import public C
模块 A
将导出名称 a
以及模块 C
中的任何公共或抽象名称,但不会从模块 B
重新导出任何东西。
重命名导入
有时,能够通过不同的命名空间(通常是较短的命名空间)访问另一个模块中的名称是很方便的。为此,你可以使用 import…as 。例如:
module A
import Data.List as L
这个模块 A
可以访问从模块 Data.List
导出的名称,但也可以通过模块名称 L
明确地访问它们。 import...as
也可以与 import public
结合起来,创建一个模块,从其他子模块导出一个更大的API:
module Books
import public Books.Hardback as Books
import public Books.Comic as Books
在这里,任何导入 Books
的模块都可以访问 Books.Hardback
和 Books.Comic
的导出接口,两者都在命名空间 Books
。
显式命名空间
定义一个模块也隐含地定义了一个命名空间。然而,命名空间也可以被 明确 地赋予 。如果你想在同一个模块中重载名字,这会非常有用:
module Foo
namespace X
export
test : Int -> Int
test x = x * 2
namespace Y
export
test : String -> String
test x = x ++ x
这个模块(公认是设计好的)定义了两个函数,其全称是 Foo.X.test
和 Foo.Y.test
,可以通过其类型来区分:
*Foo> test 3
6 : Int
*Foo> test "foo"
"foofoo" : String
导出规则 public export
和 export
,是 按命名空间 ,而不是 按文件 ,所以上面的两个 test
定义需要 export
标志才能在它们自己的命名空间之外可见。
参数化块
例如,可以使用 parameters
声明,在一些参数上对函数组进行参数化:
parameters (x : Nat, y : Nat)
addAll : Nat -> Nat
addAll z = x + y + z
parameters
块的作用是将声明的参数添加到该块内的每个函数、类型和数据构造器中。具体来说,就是将参数添加到参数列表的前面。在块之外,必须明确地给出参数。 addAll
函数,当从 REPL 中调用时,将有以下类型签名。
*params> :t addAll
addAll : Nat -> Nat -> Nat -> Nat
和以下定义。
addAll : (x : Nat) -> (y : Nat) -> (z : Nat) -> Nat
addAll x y z = x + y + z
参数块可以是嵌套的,也可以包括数据声明,在这种情况下,参数被明确地添加到所有类型和数据构造器中。它们也可以是具有隐含参数的依赖类型:
parameters (y : Nat, xs : Vect x a)
data Vects : Type -> Type where
MkVects : Vect y a -> Vects a
append : Vects a -> Vect (x + y) a
append (MkVects ys) = xs ++ ys
要在块外使用 Vects
或 append
,我们还必须给出 xs
和 y
的参数。在这里,我们可以使用占位符来表示可以由类型检查器推断出来的值:
Main> show (append _ _ (MkVects _ [1,2,3] [4,5,6]))
"[1, 2, 3, 4, 5, 6]"
多重性
Idris 2 是基于 量化类型理论(QTT) ,这是由 Bob Atkey 和 Conor McBride 开发的核心语言。在实践中,Idris 2 中的每个变量都有一个 数量 与之相关。数量是的取值是下列其中之一:
0
,表示变量在运行时被 擦除1
,表示变量在运行时 正好使用一次不受限制 ,这与 Idris 1 的行为相同
我们可以通过检查孔看到变量的多重性。例如,如果我们有以下关于向量的 append
的骨架定义…
append : Vect n a -> Vect m a -> Vect (n + m) a
append xs ys = ?append_rhs
…我们可以看一下 append_rhs
这个孔:
Main> :t append_rhs
0 m : Nat
0 a : Type
0 n : Nat
ys : Vect m a
xs : Vect n a
-------------------------------------
append_rhs : Vect (plus n m) a
0
旁边的 m
, a
和 n
表示它们在范作用域内,但在运行时将会出现 0
次,也就是说,将会 保证 它们在运行时会被删除。
多重性可以显式地写在函数类型中,如下所示:
ignoreN : (0 n : Nat) -> Vect n a -> Nat
- 这个函数在运行时n
将不可见duplicate : (1 x : a) -> (a, a)
- 这个函数必须准确地只使用x
一次(因此,顺便说一下,祝你实现它。这个例子没有实现,因为它需要使用x
两次!)
如果没有多重性注解,参数是不受限制的。另一方面,如果名字被隐式绑定(比如上面两个例子中的 a
),那么参数就会被抹去。所以,上面的类型也可以写成:
ignoreN : {0 a : _} -> (0 n : Nat) -> Vect n a -> Nat
duplicate : {0 a : _} -> (1 x : a) -> (a, a)
本节描述了多重性对你的 Idris 2 程序的实际意义,并有几个例子。特别描述了:
如果将 Idris 1 程序转换到 Idris 2 ,对于大多数程序来说,其中你需要了解的最重要的问题是 擦除 。然而,最有趣的,也是给 Idris 2 带来更多表现力的,是 线性 ,所以我们将从线性开始。
线性
1
多重性表达了一个变量必须被精确的只使用一次。我们所说的 “使用 ” 是指以下两种情况:
如果变量是一个数据类型或原始值,它将被模式匹配,例如,通过成为 case 语句的主题,或成为模式匹配的函数参数等等,
如果该变量是一个函数,则该函数被应用(即只用一个参数运行)
首先,我们将看到这在一些函数和数据类型的小例子上是如何工作的,然后看它如何被用来编码 资源协议 。
上面,我们看到了 duplicate
的类型。让我们试着以交互的方式来写它,看看出了什么问题。我们首先给出类型和一个带孔的骨架定义
duplicate : (1 x : a) -> (a, a)
duplicate x = ?help
检查一个孔的类型可以告诉我们作用域内每个变量的多重性。如果我们检查 ?help
的类型,我们会发现我们在运行时不能使用 a
,而且我们必须准确地只使用 x
一次:
Main> :t help
0 a : Type
1 x : a
-------------------------------------
help : (a, a)
如果我们用 x
来表示对中的一部分…
duplicate : (1 x : a) -> (a, a)
duplicate x = (x, ?help)
…那么剩下的孔的类型告诉我们,我们不能把它用于其他地方了:
Main> :t help
0 a : Type
0 x : a
-------------------------------------
help : a
如果我们尝试定义 duplicate x = (?help, x)
,也会发生同样的情况(试试吧!)。
为了避免解析上的歧义,如果你为一个变量给出一个明确的多重性,就像对 duplicate
的参数那样,你也需要给它一个名字。但是,如果这个名字不在类型的作用域内使用,你可以用 _
来代替名字,如下所示:
duplicate : (1 _ : a) -> (a, a)
多重性 1
背后的意图是,如果我们有一个函数,其类型为以下形式…
f : (1 x : a) -> b
…那么类型系统给出的保证是: 如果 f x` 被精确使用一次,那么 x 被精确使用一次 。所以,如果我们坚持试图定义 duplicate
…:
duplicate x = (x, x)
…然后 Idris 会抱怨:
pmtype.idr:2:15--8:1:While processing right hand side of Main.duplicate at pmtype.idr:2:1--8:1:
There are 2 uses of linear name x
类似的直觉也适用于数据类型。考虑以下类型, Lin
,它包装了一个必须使用一次的参数, Unr
,它包装了一个可以不受限制使用的参数
data Lin : Type -> Type where
MkLin : (1 _ : a) -> Lin a
data Unr : Type -> Type where
MkUnr : a -> Unr a
如果 MkLin x
被使用一次,那么 x
被使用一次。但是如果 MkUnr x
被使用一次,就不能保证 x
被使用的频率。我们可以通过开始为 Lin
和 Unr
写投影函数来更清楚地看到这一点,以便提取参数
getLin : (1 _ : Lin a) -> a
getLin (MkLin x) = ?howmanyLin
getUnr : (1 _ : Unr a) -> a
getUnr (MkUnr x) = ?howmanyUnr
检查孔的类型表明,对于 getLin
,我们必须准确地使用 x
一次(因为 val
参数被使用一次,通过对其进行模式匹配为 MkLin x
,如果 MkLin x
被使用一次,x
必须使用一次):
Main> :t howmanyLin
0 a : Type
1 x : a
-------------------------------------
howmanyLin : a
然而,对于 getUnr
,我们仍然必须使用 val
一次,再次对其进行模式匹配,但是使用 MkUnr x
一次并不会对 x
产生任何限制。因此, x
在 getUnr
的正文中可以不受限制地使用:
Main> :t howmanyUnr
0 a : Type
x : a
-------------------------------------
howmanyUnr : a
如果 getLin
有一个不受限制的参数…
getLin : Lin a -> a
getLin (MkLin x) = ?howmanyLin
…那么 x
在 howmanyLin
中是不受限制的:
Main> :t howmanyLin
0 a : Type
x : a
-------------------------------------
howmanyLin : a
记住从 MkLin
的类型中得到的直觉是,如果 MkLin x
正好使用一次, x
也正好使用一次。但是,我们没有说 MkLin x
会被精确使用一次,所以对 x
没有限制。
资源协议
利用能够表达参数的线性用法的一种方法是在定义资源使用协议时,我们可以使用线性来确保任何独特的外部资源只有一个实例,我们可以使用参数为线性的函数来表示该资源的状态转换。例如,一扇门可以处于两种状态之一, Open
或 Closed
data DoorState = Open | Closed
data Door : DoorState -> Type where
MkDoor : (doorId : Int) -> Door st
(好吧,我们在这里只是假装–想象一下 doorId
是对一个外部资源的引用!)
我们可以定义开门和关门的函数,明确描述它们如何改变门的状态,并且它们在门中是线性的
openDoor : (1 d : Door Closed) -> Door Open
closeDoor : (1 d : Door Open) -> Door Closed
记住,直觉是这样的,如果 openDoor d
被精确使用一次,那么 d
也被精确使用一次。因此,只要一扇门 d
在创建时具有多重性 1
,我们就 知道 ,一旦我们对它调用 openDoor
,我们将不能再使用 d
。鉴于 d
是一个外部资源,而 openDoor
已经改变了它的状态,这是一件好事!
我们可以通过使用以下类型的 newDoor
函数来确保我们创建的任何门都具有多重性 1
newDoor : (1 p : (1 d : Door Closed) -> IO ()) -> IO ()
也就是说, newDoor
需要一个函数,它正好运行一次。这个函数需要一个门,这个门被精确地使用一次。我们将在 IO
中运行它,以表明当我们创建门时,与外部世界有一些互动。由于多重性 1
意味着门必须被精确地使用一次,我们需要在完成后能够删除门
deleteDoor : (1 d : Door Closed) -> IO ()
因此,一个正确的 门 协议的使用例子是
doorProg : IO ()
doorProg
= newDoor $ \d =>
let d' = openDoor d
d'' = closeDoor d' in
deleteDoor d''
交互性的建立这个程序是很有启发性的,沿途会出现一些漏洞,看看 d
, d'
等变量的多重性如何变化。比如说
doorProg : IO ()
doorProg
= newDoor $ \d =>
let d' = openDoor d in
?whatnow
检查 ?whatnow
的类型,发现 d
现在已经用完了,但我们还必须要使用 d'
正好一次:
Main> :t whatnow
0 d : Door Closed
1 d' : Door Open
-------------------------------------
whatnow : IO ()
请注意, d
的多重性 0
意味着我们仍然可以 谈论它 - 特别是,我们仍然可以在类型中推理它 - 但我们不能在程序的其余部分的相关位置再次使用它。在整个程序中影射 d
这个名字也是可以的
doorProg : IO ()
doorProg
= newDoor $ \d =>
let d = openDoor d
d = closeDoor d in
deleteDoor d
如果我们没有正确遵循协议——创建门,打开它,关闭它,然后删除它—— 那么程序就不能通过类型检查。例如,我们可以尝试在完成之前不删除门
doorProg : IO ()
doorProg
= newDoor $ \d =>
let d' = openDoor d
d'' = closeDoor d' in
putStrLn "What could possibly go wrong?"
这给出了以下错误:
Door.idr:15:19--15:38:While processing right hand side of Main.doorProg at Door.idr:13:1--17:1:
There are 0 uses of linear name d''
关于这里的细节还有很多要讲的!但是,这在很大程度上显示了我们如何在类型层面上使用线性来捕获资源使用协议。如果我们有一个需要保证线性使用的外部资源,比如 Door
,我们就不需要在 IO
单子中对该资源进行操作,因为我们已经对操作进行了排序,并且没有访问任何过时的资源状态。这类似于交互式程序在 Clean编程语言 中的工作方式,事实上这也是 IO
在Idris 2中的内部实现方式,用一个特殊的 %World
类型来表示外部世界的状态,它总是被线性地使用
public export
data IORes : Type -> Type where
MkIORes : (result : a) -> (1 x : %World) -> IORes a
export
data IO : Type -> Type where
MkIO : (1 fn : (1 x : %World) -> IORes a) -> IO a
在类型系统中拥有多重性,会引起一些有趣的问题,例如:
我们是否可以使用线性信息来告知内存管理,例如,对不需要进行垃圾回收的函数进行类型级别的保证?
应如何将多重性纳入
Functor
,Applicative
和Monad
等接口?如果我们有
0
,和1
作为多重性,为什么要止步于此?为什么没有2
,3
或者更多(例如 Granule )多重性多态怎么样,就像 Linear Haskell 提案 中那样?
即使没有这些, 现在 我们能做什么?
擦除
1
多重性在我们可以表达的属性种类方面给了我们很多可能性。但是, 0
多重性也许更重要,因为它允许我们精确地知道哪些值在运行时是相关的,哪些是编译时才有的(也就是说,哪些是被删除的)。使用 0
多重性意味着一个函数的类型现在可以准确地告诉我们它在运行时需要什么。
例如,在 Idris 1 中你可以得到一个向量的长度,如下所示
vlen : Vect n a -> Nat
vlen {n} xs = n
这很好,因为它在恒定时间内运行,但代价是 n
在运行时必须可用,所以在运行时我们总是需要向量的长度,如果我们曾经调用 vlen
。Idris 1 可以推断出是否需要长度,但是程序员没有简单的方法来确定。
在 Idris 2 中,我们需要明确指出,在运行时需要 n
vlen : {n : Nat} -> Vect n a -> Nat
vlen xs = n
(顺便说一下,还要注意在 Idris 2 中,在类型中绑定的名字也可以在定义中使用,而不需要明确地重新绑定它们)
这也意味着,当你调用 vlen
时,你需要可用的长度。例如,这将产生一个错误
sumLengths : Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen ys
Idris 2 会报告:
vlen.idr:7:20--7:28:While processing right hand side of Main.sumLengths at vlen.idr:7:1--10:1:
m is not accessible in this context
这意味着它需要使用 m
作为参数传递给 vlen xs
,在这里它需要在运行时可用,但是 m
在 sumLengths
中不可用,因为它有多重性 0
。
我们可以通过将 sumLengths
的右侧替换成一个孔来更清楚地看到这一点……
sumLengths : Vect m a -> Vect n a -> Nat
sumLengths xs ys = ?sumLengths_rhs
…然后在REPL检查孔的类型:
Main> :t sumLengths_rhs
0 n : Nat
0 a : Type
0 m : Nat
ys : Vect n a
xs : Vect m a
-------------------------------------
sumLengths_rhs : Nat
相反,我们需要为 m
和 n
提供无限制多重性的绑定
sumLengths : {m, n : _} -> Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen xs
请记住,在绑定器上不给出多重性,就像这里的 m
和 n
一样,意味着变量的使用不受限制。
如果你要将 Idris 1 程序转换到 Idris 2 中使用,这可能是你需要考虑的最大问题。但需要注意的是,如果你有绑定的隐式参数,例如…
excitingFn : {t : _} -> Coffee t -> Moonbase t
…那么最好确保 t
真的被需要,否则由于运行时间不必要地建立 t
的实例,性能可能会受到影响!
关于擦除的最后一点说明:试图对一个具有多重性 0
的参数进行模式匹配是一个错误,,除非其值可以从其他地方推断出来。因此,下面的定义会被拒绝
badNot : (0 x : Bool) -> Bool
badNot False = True
badNot True = False
这被拒绝了,错误是:
badnot.idr:2:1--3:1:Attempt to match on erased argument False in
Main.badNot
然而,下面的情况是好的,因为在 sNot
中,尽管我们似乎在被删除的参数 x
上进行了匹配,但它的值是可以从第二个参数的类型中唯一推断出来的
data SBool : Bool -> Type where
SFalse : SBool False
STrue : SBool True
sNot : (0 x : Bool) -> SBool x -> Bool
sNot False SFalse = True
sNot True STrue = False
到目前为止,Idris 2 的经验表明,在大多数情况下,只要你在 Idris 1 程序中使用非绑定隐式参数,它们在 Idris 2 中无需过多修改即可工作。 Idris 2 类型检查器将指出你在运行时需要非绑定隐式参数的地方–有时这既令人惊讶又具有启发性!
类型的模式匹配
思考依赖类型的一种方式是将它们视为语言中的 “一等 ” 对象,因为它们可以像其他结构体一样被分配给变量、传递和从函数中返回。但是,如果它们是真正的一等对象,我们也应该能够对它们进行模式匹配。Idris 2 允许我们这样做。例如
showType : Type -> String
showType Int = "Int"
showType (List a) = "List of " ++ showType a
showType _ = "something else"
我们可以进行以下尝试:
Main> showType Int
"Int"
Main> showType (List Int)
"List of Int"
Main> showType (List Bool)
"List of something else"
对函数类型进行模式匹配很有意思,因为返回类型可能取决于输入值。例如,让我们为 showType
添加一个案例
showType (Nat -> a) = ?help
检查 help
的类型将告诉我们:
Main> :t help
a : Nat -> Type
-------------------------------------
help : String
所以,返回类型 a
取决于类型 Nat
的输入值,我们需要想出一个值来使用 a
,比如说
showType (Nat -> a) = "Function from Nat to " ++ showType (a Z)
请注意,绑定器上的多重性,以及在 非擦除式 类型上的模式匹配能力,意味着以下两种类型是不同的
id : a -> a
notId : {a : Type} -> a -> a
在 notId
的情况下,我们可以在 a
上进行匹配,得到的函数肯定不是同一函数
notId {a = Int} x = x + 1
notId x = x
Main> notId 93
94
Main> notId "???"
"???"
能够区分相关和不相关的类型参数有一个重要的结果,在一个函数中,如果 a
有多重性 0
,那么 只有 a
是参数化的。所以,在 notId
的情况下, a
不是 参数,所以我们不能因为它是多态的而对该函数的行为方式得出任何结论,因为类型告诉我们它可能对 a
进行模式匹配。
另一方面,这只是一个巧合,在非依赖类型的语言中,类型是 不相关的 并会被抹去,而值是 相关的 且会在运行时保留。Idris 2 是基于 QTT 的,允许我们精确区分相关和不相关的参数。类型可以是相关的,值(如 n
向量的索引)可以是不相关的。
关于多重性的更多细节,见 Idris 2: Quantitative Type Theory in Action 。
包
Idris 包括一个简单的构建系统,用于从一个命名的包描述文件中构建包和可执行文件。这些文件可以与 Idris 编译器一起使用,以管理开发过程。
包描述
一个包的描述包括以下内容:
一个头,由关键词``package``组成,后面是一个包名。包名可以是任何有效的 Idris 标识符。iPKG 格式也需要一个带引号的版本,接受任何有效的文件名。
描述包内容的字段,
<field> = <value>
。
至少有一个字段必须是模块字段,其值是一个逗号分隔的模块列表。例如,给定一个 idris 包 maths
,其中有模块 Maths.idr
, Maths.NumOps.idr
, Maths.BinOps.idr
,和 Maths.HexOps.idr
,相应的包文件应该是:
package maths
modules = Maths
, Maths.NumOps
, Maths.BinOps
, Maths.HexOps
运行 idris2 --init
将在当前目录下交互式地创建一个新的包文件。生成的包文件列出了所有可配置的字段,并附有简要说明。
其他包文件的例子可以在 libs
目录下的主Idris资源库中找到,也可以在 `第三方库 <https://github.com/idris-lang/Idris-dev/wiki/Libraries>`_中找到 。
使用包文件
Idris 本身知道软件包,并且有特殊的命令来帮助,例如,构建软件包,安装软件包,和清理软件包。 例如,考虑到前面的 maths
包,我们可以按以下方式使用 Idris:
idris2 --build maths.ipkg
将构建包中的所有模块idris2 --install maths.ipkg
将安装这个包,使其他 Idris 库和程序可以访问它。idris2 --clean maths.ipkg
将删除所有中间代码和构建时产生的可执行文件。
一旦安装了 math 包,命令行选项 --package maths
使其可以访问(缩写为 -p maths
)。比如:
idris2 -p maths Main.idr
在 Atom 中使用包依赖
如果你在使用 Atom 编辑器,并且有对另一个软件包的依赖,例如对应于 import Lightyear
或 import Pruviloj
,你需要让 Atom 知道它应该被加载。最简单的方法是通过一个 .ipkg 文件来实现。 ipkg 文件的一般内容将在本教程的下一节中描述,但现在这里有一个简单的示例,用于这个微不足道的案例:
创建一个文件夹 myProject。
添加一个只包含几行的 myProject.ipkg 文件:
package myProject
depends = pruviloj, lightyear
在 Atom 中,使用文件菜单,打开文件夹 myProject 。
示例——良类型的解释器
在这一节中,我们将使用到目前为止所看到的功能来编写一个更大的例子,一个简单的函数式编程语言的解释器,有变量、函数应用、二进制运算符和 if...then...else
结构。我们将使用依赖类型系统来确保任何可以被表示的程序都有良好的类型。
语言的表示
首先,让我们定义语言中的类型。我们有整数、布尔运算和函数,用 Ty
表示:
data Ty = TyInt | TyBool | TyFun Ty Ty
我们可以写一个函数,将这些表示方法转化为具体的 Idris 类型–记住,类型是一等的,所以可以像其他值一样被计算:
interpTy : Ty -> Type
interpTy TyInt = Integer
interpTy TyBool = Bool
interpTy (TyFun a t) = interpTy a -> interpTy t
我们将定义我们的语言的一种表示方式,即只有类型良好的程序才能被表示。我们将按表达式的类型、 和 局部变量的类型(上下文)来索引表达式的表示。上下文可以使用 Vect
数据类型表示,因此我们需要在源文件顶部导入 Data.Vect
:
import Data.Vect
表达式由局部变量的类型和表达式本身的类型索引:
data Expr : Vect n Ty -> Ty -> Type
表达式的完整表示是:
data HasType : (i : Fin n) -> Vect n Ty -> Ty -> Type where
Stop : HasType FZ (t :: ctxt) t
Pop : HasType k ctxt t -> HasType (FS k) (u :: ctxt) t
data Expr : Vect n Ty -> Ty -> Type where
Var : HasType i ctxt t -> Expr ctxt t
Val : (x : Integer) -> Expr ctxt TyInt
Lam : Expr (a :: ctxt) t -> Expr ctxt (TyFun a t)
App : Expr ctxt (TyFun a t) -> Expr ctxt a -> Expr ctxt t
Op : (interpTy a -> interpTy b -> interpTy c) ->
Expr ctxt a -> Expr ctxt b -> Expr ctxt c
If : Expr ctxt TyBool ->
Lazy (Expr ctxt a) ->
Lazy (Expr ctxt a) -> Expr ctxt a
上面的代码使用了 base 库中的 Vect
和 Fin
类型。 Fin
可作为 Data.Vect
的一部分使用。在整个过程中, ctxt
指的是局部变量上下文。
由于表达式是按其类型索引的,我们可以从构造函数的定义中读取语言的类型规则。让我们依次看看每个构造函数。
We use a nameless representation for variables — they are de Bruijn
indexed. Variables are represented by a proof of their membership in
the context, HasType i ctxt T
, which is a proof that variable i
in context ctxt
has type T
. This is defined as follows:
data HasType : (i : Fin n) -> Vect n Ty -> Ty -> Type where
Stop : HasType FZ (t :: ctxt) t
Pop : HasType k ctxt t -> HasType (FS k) (u :: ctxt) t
We can treat Stop as a proof that the most recently defined variable
is well-typed, and Pop n as a proof that, if the n
th most
recently defined variable is well-typed, so is the n+1
th. In
practice, this means we use Stop
to refer to the most recently
defined variable, Pop Stop
to refer to the next, and so on, via
the Var
constructor:
Var : HasType i ctxt t -> Expr ctxt t
So, in an expression \x. \y. x y
, the variable x
would have a
de Bruijn index of 1, represented as Pop Stop
, and y 0
,
represented as Stop
. We find these by counting the number of
lambdas between the definition and the use.
A value carries a concrete representation of an integer:
Val : (x : Integer) -> Expr ctxt TyInt
A lambda creates a function. In the scope of a function of type a ->
t
, there is a new local variable of type a
, which is expressed
by the context index:
Lam : Expr (a :: ctxt) t -> Expr ctxt (TyFun a t)
Function application produces a value of type t
given a function
from a
to t
and a value of type a
:
App : Expr ctxt (TyFun a t) -> Expr ctxt a -> Expr ctxt t
We allow arbitrary binary operators, where the type of the operator informs what the types of the arguments must be:
Op : (interpTy a -> interpTy b -> interpTy c) ->
Expr ctxt a -> Expr ctxt b -> Expr ctxt c
Finally, If
expressions make a choice given a boolean. Each branch
must have the same type, and we will evaluate the branches lazily so
that only the branch which is taken need be evaluated:
If : Expr ctxt TyBool ->
Lazy (Expr ctxt a) ->
Lazy (Expr ctxt a) ->
Expr ctxt a
Writing the Interpreter
When we evaluate an Expr
, we’ll need to know the values in scope,
as well as their types. Env
is an environment, indexed over the
types in scope. Since an environment is just another form of list,
albeit with a strongly specified connection to the vector of local
variable types, we use the usual ::
and Nil
constructors so
that we can use the usual list syntax. Given a proof that a variable
is defined in the context, we can then produce a value from the
environment:
data Env : Vect n Ty -> Type where
Nil : Env Nil
(::) : interpTy a -> Env ctxt -> Env (a :: ctxt)
lookup : HasType i ctxt t -> Env ctxt -> interpTy t
lookup Stop (x :: xs) = x
lookup (Pop k) (x :: xs) = lookup k xs
Given this, an interpreter is a function which
translates an Expr
into a concrete Idris value with respect to a
specific environment:
interp : Env ctxt -> Expr ctxt t -> interpTy t
The complete interpreter is defined as follows, for reference. For each constructor, we translate it into the corresponding Idris value:
interp env (Var i) = lookup i env
interp env (Val x) = x
interp env (Lam sc) = \x => interp (x :: env) sc
interp env (App f s) = interp env f (interp env s)
interp env (Op op x y) = op (interp env x) (interp env y)
interp env (If x t e) = if interp env x then interp env t
else interp env e
Let us look at each case in turn. To translate a variable, we simply look it up in the environment:
interp env (Var i) = lookup i env
To translate a value, we just return the concrete representation of the value:
interp env (Val x) = x
Lambdas are more interesting. In this case, we construct a function which interprets the scope of the lambda with a new value in the environment. So, a function in the object language is translated to an Idris function:
interp env (Lam sc) = \x => interp (x :: env) sc
For an application, we interpret the function and its argument and apply
it directly. We know that interpreting f
must produce a function,
because of its type:
interp env (App f s) = interp env f (interp env s)
Operators and conditionals are, again, direct translations into the
equivalent Idris constructs. For operators, we apply the function to
its operands directly, and for If
, we apply the Idris
if...then...else
construct directly.
interp env (Op op x y) = op (interp env x) (interp env y)
interp env (If x t e) = if interp env x then interp env t
else interp env e
Testing
We can make some simple test functions. Firstly, adding two inputs
\x. \y. y + x
is written as follows:
add : Expr ctxt (TyFun TyInt (TyFun TyInt TyInt))
add = Lam (Lam (Op (+) (Var Stop) (Var (Pop Stop))))
More interestingly, a factorial function fact
(e.g. \x. if (x == 0) then 1 else (fact (x-1) * x)
),
can be written as:
fact : Expr ctxt (TyFun TyInt TyInt)
fact = Lam (If (Op (==) (Var Stop) (Val 0))
(Val 1)
(Op (*) (App fact (Op (-) (Var Stop) (Val 1)))
(Var Stop)))
Running
To finish, we write a main
program which interprets the factorial
function on user input:
main : IO ()
main = do putStr "Enter a number: "
x <- getLine
printLn (interp [] fact (cast x))
Here, cast
is an overloaded function which converts a value from
one type to another if possible. Here, it converts a string to an
integer, giving 0 if the input is invalid. An example run of this
program at the Idris interactive environment is:
$ idris2 interp.idr
____ __ _ ___
/ _/___/ /____(_)____ |__ \
/ // __ / ___/ / ___/ __/ / Version 0.5.1
_/ // /_/ / / / (__ ) / __/ https://www.idris-lang.org
/___/\__,_/_/ /_/____/ /____/ Type :? for help
Welcome to Idris 2. Enjoy yourself!
Main> :exec main
Enter a number: 6
720
Aside: cast
The prelude defines an interface Cast
which allows conversion
between types:
interface Cast from to where
cast : from -> to
It is a multi-parameter interface, defining the source type and object type of the cast. It must be possible for the type checker to infer both parameters at the point where the cast is applied. There are casts defined between all of the primitive types, as far as they make sense.
Views and the “with
” rule
警告
NOT UPDATED FOR IDRIS 2 YET
Dependent pattern matching
Since types can depend on values, the form of some arguments can be
determined by the value of others. For example, if we were to write
down the implicit length arguments to (++)
, we’d see that the form
of the length argument was determined by whether the vector was empty
or not:
(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) {n=Z} [] ys = ys
(++) {n=S k} (x :: xs) ys = x :: xs ++ ys
If n
was a successor in the []
case, or zero in the ::
case, the definition would not be well typed.
The with
rule — matching intermediate values
Very often, we need to match on the result of an intermediate
computation. Idris provides a construct for this, the with
rule, inspired by views in Epigram
1, which takes account of
the fact that matching on a value in a dependently typed language can
affect what we know about the forms of other values. In its simplest
form, the with
rule adds another argument to the function being
defined.
We have already seen a vector filter function. This time, we define it
using with
as follows:
filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)
filter p [] = ( _ ** [] )
filter p (x :: xs) with (filter p xs)
filter p (x :: xs) | ( _ ** xs' ) = if (p x) then ( _ ** x :: xs' ) else ( _ ** xs' )
Here, the with
clause allows us to deconstruct the result of
filter p xs
. The view refined argument pattern filter p (x ::
xs)
goes beneath the with
clause, followed by a vertical bar
|
, followed by the deconstructed intermediate result ( _ ** xs'
)
. If the view refined argument pattern is unchanged from the
original function argument pattern, then the left side of |
is
extraneous and may be omitted with an underscore _
:
filter p (x :: xs) with (filter p xs)
_ | ( _ ** xs' ) = if (p x) then ( _ ** x :: xs' ) else ( _ ** xs' )
with
clauses can also be nested:
foo : Int -> Int -> Bool
foo n m with (n + 1)
foo _ m | 2 with (m + 1)
foo _ _ | 2 | 3 = True
foo _ _ | 2 | _ = False
foo _ _ | _ = False
and left hand sides that are the same as their parent’s can be skipped by
using _
to focus on the patterns for the most local with
. Meaning
that the above foo
can be rewritten as follows:
foo : Int -> Int -> Bool
foo n m with (n + 1)
_ | 2 with (m + 1)
_ | 3 = True
_ | _ = False
_ | _ = False
If the intermediate computation itself has a dependent type, then the
result can affect the forms of other arguments — we can learn the form
of one value by testing another. In these cases, view refined argument
patterns must be explicit. For example, a Nat
is either even or
odd. If it is even it will be the sum of two equal Nat
.
Otherwise, it is the sum of two equal Nat
plus one:
data Parity : Nat -> Type where
Even : {n : _} -> Parity (n + n)
Odd : {n : _} -> Parity (S (n + n))
We say Parity
is a view of Nat
. It has a covering function
which tests whether it is even or odd and constructs the predicate
accordingly. Note that we’re going to need access to n
at run time, so
although it’s an implicit argument, it has unrestricted multiplicity.
parity : (n:Nat) -> Parity n
We’ll come back to the definition of parity
shortly. We can use it
to write a function which converts a natural number to a list of
binary digits (least significant first) as follows, using the with
rule:
natToBin : Nat -> List Bool
natToBin Z = Nil
natToBin k with (parity k)
natToBin (j + j) | Even = False :: natToBin j
natToBin (S (j + j)) | Odd = True :: natToBin j
The value of parity k
affects the form of k
, because the
result of parity k
depends on k
. So, as well as the patterns
for the result of the intermediate computation (Even
and Odd
)
right of the |
, we also write how the results affect the other
patterns left of the |
. That is:
When
parity k
evaluates toEven
, we can refine the original argumentk
to a refined pattern(j + j)
according toParity (n + n)
from theEven
constructor definition. So(j + j)
replacesk
on the left side of|
, and theEven
constructor appears on the right side. The natural numberj
in the refined pattern can be used on the right side of the=
sign.Otherwise, when
parity k
evaluates toOdd
, the original argumentk
is refined toS (j + j)
according toParity (S (n + n))
from theOdd
constructor definition, andOdd
now appears on the right side of|
, again with the natural numberj
used on the right side of the=
sign.
Note that there is a function in the patterns (+
) and repeated
occurrences of j
- this is allowed because another argument has
determined the form of these patterns.
Defining parity
The definition of parity
is a little tricky, and requires some knowledge of
theorem proving (see Section 定理证明), but for completeness, here
it is:
parity : (n : Nat) -> Parity n
parity Z = Even {n = Z}
parity (S Z) = Odd {n = Z}
parity (S (S k)) with (parity k)
parity (S (S (j + j))) | Even
= rewrite plusSuccRightSucc j j in Even {n = S j}
parity (S (S (S (j + j)))) | Odd
= rewrite plusSuccRightSucc j j in Odd {n = S j}
For full details on rewrite
in particular, please refer to the theorem
proving tutorial, in Section 定理证明.
- 1
Conor McBride and James McKinna. 2004. The view from the left. J. Funct. Program. 14, 1 (January 2004), 69-111. https://doi.org/10.1017/S0956796803004829
定理证明
Equality
Idris allows propositional equalities to be declared, allowing theorems about programs to be stated and proved. An equality type is defined as follows in the Prelude:
data Equal : a -> b -> Type where
Refl : Equal x x
As a notational convenience, Equal x y
can be written as x = y
.
Equalities can be proposed between any values of any types, but the only
way to construct a proof of equality is if values actually are equal.
For example:
fiveIsFive : 5 = 5
fiveIsFive = Refl
twoPlusTwo : 2 + 2 = 4
twoPlusTwo = Refl
If we try…
twoPlusTwoBad : 2 + 2 = 5
twoPlusTwoBad = Refl
…then we’ll get an error:
Proofs.idr:8:17--10:1:While processing right hand side of Main.twoPlusTwoBad at Proofs.idr:8:1--10:1:
When unifying 4 = 4 and (fromInteger 2 + fromInteger 2) = (fromInteger 5)
Mismatch between:
4
and
5
The Empty Type
There is an empty type, Void
, which has no constructors. It is therefore
impossible to construct a canonical element of the empty type. We can therefore
use the empty type to prove that something is impossible, for example zero is
never equal to a successor:
disjoint : (n : Nat) -> Z = S n -> Void
disjoint n prf = replace {p = disjointTy} prf ()
where
disjointTy : Nat -> Type
disjointTy Z = ()
disjointTy (S k) = Void
Don’t worry if you don’t get all the details of how this works just yet -
essentially, it applies the library function replace
, which uses an
equality proof to transform a predicate. Here we use it to transform a
value of a type which can exist, the empty tuple, to a value of a type
which can’t, by using a proof of something which can’t exist.
Once we have an element of the empty type, we can prove anything.
void
is defined in the library, to assist with proofs by
contradiction.
void : Void -> a
Proving Theorems
When type checking dependent types, the type itself gets normalised.
So imagine we want to prove the following theorem about the reduction
behaviour of plus
:
plusReduces : (n:Nat) -> plus Z n = n
We’ve written down the statement of the theorem as a type, in just the same way as we would write the type of a program. In fact there is no real distinction between proofs and programs. A proof, as far as we are concerned here, is merely a program with a precise enough type to guarantee a particular property of interest.
We won’t go into details here, but the Curry-Howard correspondence 1
explains this relationship. The proof itself is immediate, because
plus Z n
normalises to n
by the definition of plus
:
plusReduces n = Refl
It is slightly harder if we try the arguments the other way, because
plus is defined by recursion on its first argument. The proof also works
by recursion on the first argument to plus
, namely n
.
plusReducesZ : (n:Nat) -> n = plus n Z
plusReducesZ Z = Refl
plusReducesZ (S k) = cong S (plusReducesZ k)
cong
is a function defined in the library which states that equality
respects function application:
cong : (f : t -> u) -> a = b -> f a = f b
To see more detail on what’s going on, we can replace the recursive call to
plusReducesZ
with a hole:
plusReducesZ (S k) = cong S ?help
Then inspecting the type of the hole at the REPL shows us:
Main> :t help
k : Nat
-------------------------------------
help : k = (plus k Z)
We can do the same for the reduction behaviour of plus on successors:
plusReducesS : (n:Nat) -> (m:Nat) -> S (plus n m) = plus n (S m)
plusReducesS Z m = Refl
plusReducesS (S k) m = cong S (plusReducesS k m)
Even for small theorems like these, the proofs are a little tricky to construct in one go. When things get even slightly more complicated, it becomes too much to think about to construct proofs in this “batch mode”.
Idris provides interactive editing capabilities, which can help with building proofs. For more details on building proofs interactively in an editor, see 定理证明.
Theorems in Practice
The need to prove theorems can arise naturally in practice. For example,
previously (Views and the “with” rule) we implemented natToBin
using a function
parity
:
parity : (n:Nat) -> Parity n
We provided a definition for parity
, but without explanation. We might
have hoped that it would look something like the following:
parity : (n:Nat) -> Parity n
parity Z = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
parity (S (S (j + j))) | Even = Even {n=S j}
parity (S (S (S (j + j)))) | Odd = Odd {n=S j}
Unfortunately, this fails with a type error:
With.idr:26:17--27:3:While processing right hand side of Main.with block in 2419 at With.idr:24:3--27:3:
Can't solve constraint between:
plus j (S j)
and
S (plus j j)
The problem is that normalising S j + S j
, in the type of Even
doesn’t result in what we need for the type of the right hand side of
Parity
. We know that S (S (plus j j))
is going to be equal to
S j + S j
, but we need to explain it to Idris with a proof. We can
begin by adding some holes (see 完全性和覆盖性) to the definition:
parity : (n:Nat) -> Parity n
parity Z = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
parity (S (S (j + j))) | Even = let result = Even {n=S j} in
?helpEven
parity (S (S (S (j + j)))) | Odd = let result = Odd {n=S j} in
?helpOdd
Checking the type of helpEven
shows us what we need to prove for the
Even
case:
j : Nat
result : Parity (S (plus j (S j)))
--------------------------------------
helpEven : Parity (S (S (plus j j)))
We can therefore write a helper function to rewrite the type to the form we need:
helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in p
The rewrite ... in
syntax allows you to change the required type of an
expression by rewriting it according to an equality proof. Here, we have
used plusSuccRightSucc
, which has the following type:
plusSuccRightSucc : (left : Nat) -> (right : Nat) -> S (left + right) = left + S right
We can see the effect of rewrite
by replacing the right hand side of
helpEven
with a hole, and working step by step. Beginning with the following:
helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = ?helpEven_rhs
We can look at the type of helpEven_rhs
:
j : Nat
p : Parity (S (plus j (S j)))
--------------------------------------
helpEven_rhs : Parity (S (S (plus j j)))
Then we can rewrite
by applying plusSuccRightSucc j j
, which gives
an equation S (j + j) = j + S j
, thus replacing S (j + j)
(or,
in this case, S (plus j j)
since S (j + j)
reduces to that) in the
type with j + S j
:
helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in ?helpEven_rhs
Checking the type of helpEven_rhs
now shows what has happened, including
the type of the equation we just used (as the type of _rewrite_rule
):
Main> :t helpEven_rhs
j : Nat
p : Parity (S (plus j (S j)))
-------------------------------------
helpEven_rhs : Parity (S (plus j (S j)))
Using rewrite
and another helper for the Odd
case, we can complete
parity
as follows:
helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in p
helpOdd : (j : Nat) -> Parity (S (S (j + S j))) -> Parity (S (S (S (j + j))))
helpOdd j p = rewrite plusSuccRightSucc j j in p
parity : (n:Nat) -> Parity n
parity Z = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
parity (S (S (j + j))) | Even = helpEven j (Even {n = S j})
parity (S (S (S (j + j)))) | Odd = helpOdd j (Odd {n = S j})
Full details of rewrite
are beyond the scope of this introductory tutorial,
but it is covered in the theorem proving tutorial (see 定理证明).
Totality Checking
If we really want to trust our proofs, it is important that they are defined by total functions — that is, a function which is defined for all possible inputs and is guaranteed to terminate. Otherwise we could construct an element of the empty type, from which we could prove anything:
-- making use of 'hd' being partially defined
empty1 : Void
empty1 = hd [] where
hd : List a -> a
hd (x :: xs) = x
-- not terminating
empty2 : Void
empty2 = empty2
Internally, Idris checks every definition for totality, and we can check at
the prompt with the :total
command. We see that neither of the above
definitions is total:
Void> :total empty1
Void.empty1 is not covering due to call to function empty1:hd
Void> :total empty2
Void.empty2 is possibly not terminating due to recursive path Void.empty2
Note the use of the word “possibly” — a totality check can never be certain due to the undecidability of the halting problem. The check is, therefore, conservative. It is also possible (and indeed advisable, in the case of proofs) to mark functions as total so that it will be a compile time error for the totality check to fail:
total empty2 : Void
empty2 = empty2
Reassuringly, our proof in Section The Empty Type that the zero and successor constructors are disjoint is total:
Main> :total disjoint
Main.disjoint is Total
The totality check is, necessarily, conservative. To be recorded as
total, a function f
must:
Cover all possible inputs
Be well-founded — i.e. by the time a sequence of (possibly mutually) recursive calls reaches
f
again, it must be possible to show that one of its arguments has decreased.Not use any data types which are not strictly positive
Not call any non-total functions
Directives and Compiler Flags for Totality
警告
Not all of this is implemented yet for Idris 2
By default, Idris allows all well-typed definitions, whether total or not. However, it is desirable for functions to be total as far as possible, as this provides a guarantee that they provide a result for all possible inputs, in finite time. It is possible to make total functions a requirement, either:
By using the
--total
compiler flag.By adding a
%default total
directive to a source file. All definitions after this will be required to be total, unless explicitly flagged aspartial
.
All functions after a %default total
declaration are required to
be total. Correspondingly, after a %default partial
declaration, the
requirement is relaxed.
Finally, the compiler flag --warnpartial
causes to print a warning
for any undeclared partial function.
Totality checking issues
Please note that the totality checker is not perfect! Firstly, it is necessarily conservative due to the undecidability of the halting problem, so many programs which are total will not be detected as such. Secondly, the current implementation has had limited effort put into it so far, so there may still be cases where it believes a function is total which is not. Do not rely on it for your proofs yet!
Hints for totality
In cases where you believe a program is total, but Idris does not agree, it is
possible to give hints to the checker to give more detail for a termination
argument. The checker works by ensuring that all chains of recursive calls
eventually lead to one of the arguments decreasing towards a base case, but
sometimes this is hard to spot. For example, the following definition cannot be
checked as total
because the checker cannot decide that filter (< x) xs
will always be smaller than (x :: xs)
:
qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs)
= qsort (filter (< x) xs) ++
(x :: qsort (filter (>= x) xs))
The function assert_smaller
, defined in the prelude, is intended to
address this problem:
assert_smaller : a -> a -> a
assert_smaller x y = y
It simply evaluates to its second argument, but also asserts to the
totality checker that y
is structurally smaller than x
. This can
be used to explain the reasoning for totality if the checker cannot work
it out itself. The above example can now be written as:
total
qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs)
= qsort (assert_smaller (x :: xs) (filter (< x) xs)) ++
(x :: qsort (assert_smaller (x :: xs) (filter (>= x) xs)))
The expression assert_smaller (x :: xs) (filter (<= x) xs)
asserts
that the result of the filter will always be smaller than the pattern
(x :: xs)
.
In more extreme cases, the function assert_total
marks a
subexpression as always being total:
assert_total : a -> a
assert_total x = x
In general, this function should be avoided, but it can be very useful when reasoning about primitives or externally defined functions (for example from a C library) where totality can be shown by an external argument.
- 1
Timothy G. Griffin. 1989. A formulae-as-type notion of control. In Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL ‘90). ACM, New York, NY, USA, 47-58. DOI=10.1145/96709.96714 https://doi.acm.org/10.1145/96709.96714
交互式编辑
到目前为止,我们已经看到了几个例子,说明了 Idris 的依赖类型系统如何通过更精确地描述函数的*类型*中的预期行为来增强对函数正确性的信心。我们还看到了类型系统如何通过允许程序员描述对象语言的类型系统来帮助嵌入式 DSL 开发的示例。然而,精确类型给我们的不仅仅是程序的验证——我们还可以使用类型系统交互式地来帮助编写*按构造正确*的程序,交互。
Idris REPL 提供了几个用于检查和修改程序部分的命令,基于它们的类型,例如模式变量的大小写分割,检查孔的类型,甚至是基本的证明搜索机制。在本节中,我们将解释文本编辑器如何利用这些功能,特别是如何在 Vim 中这样做。 Emacs 的交互模式也可用,自 2021 年 2 月 23 日起针对 Idris 2 兼容性进行了更新。
在 REPL 中编辑
备注
The Idris2 repl does not support readline in the interest of
keeping dependencies minimal. Unfortunately this precludes some
niceties such as line editing, persistent history and completion.
A useful work around is to install rlwrap,
this utility provides all the aforementioned features simply by
invoking the Idris2 repl as an argument to the utility rlwrap idris2
The REPL provides a number of commands, which we will describe shortly, which generate new program fragments based on the currently loaded module. These take the general form:
:command [line number] [name]
That is, each command acts on a specific source line, at a specific name, and outputs a new program fragment. Each command has an alternative form, which updates the source file in-place:
:command! [line number] [name]
It is also possible to invoke Idris in a mode which runs a REPL command,
displays the result, then exits, using idris2 --client
. For example:
$ idris2 --client ':t plus'
Prelude.plus : Nat -> Nat -> Nat
$ idris2 --client '2+2'
4
A text editor can take advantage of this, along with the editing commands, in order to provide interactive editing support.
Editing Commands
:addclause
The :addclause n f
command, abbreviated :ac n f
, creates a
template definition for the function named f
declared on line
n
. For example, if the code beginning on line 94 contains:
vzipWith : (a -> b -> c) ->
Vect n a -> Vect n b -> Vect n c
then :ac 94 vzipWith
will give:
vzipWith f xs ys = ?vzipWith_rhs
The names are chosen according to hints which may be given by a programmer, and then made unique by the machine by adding a digit if necessary. Hints can be given as follows:
%name Vect xs, ys, zs, ws
This declares that any names generated for types in the Vect
family
should be chosen in the order xs
, ys
, zs
, ws
.
:casesplit
The :casesplit n c x
command, abbreviated :cs n c x
, splits the
pattern variable x
on line n
at column c
into the various
pattern forms it may take, removing any cases which are impossible due
to unification errors. For example, if the code beginning on line 94 is:
vzipWith : (a -> b -> c) ->
Vect n a -> Vect n b -> Vect n c
vzipWith f xs ys = ?vzipWith_rhs
then :cs 96 12 xs
will give:
vzipWith f [] ys = ?vzipWith_rhs_1
vzipWith f (x :: xs) ys = ?vzipWith_rhs_2
That is, the pattern variable xs
has been split into the two
possible cases []
and x :: xs
. Again, the names are chosen
according to the same heuristic. If we update the file (using
:cs!
) then case split on ys
on the same line, we get:
vzipWith f [] [] = ?vzipWith_rhs_3
That is, the pattern variable ys
has been split into one case
[]
, Idris having noticed that the other possible case y ::
ys
would lead to a unification error.
:addmissing
The :addmissing n f
command, abbreviated :am n f
, adds the
clauses which are required to make the function f
on line n
cover all inputs. For example, if the code beginning on line 94 is:
vzipWith : (a -> b -> c) ->
Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = ?vzipWith_rhs_1
then :am 96 vzipWith
gives:
vzipWith f (x :: xs) (y :: ys) = ?vzipWith_rhs_2
That is, it notices that there are no cases for empty vectors, generates the required clauses, and eliminates the clauses which would lead to unification errors.
:proofsearch
The :proofsearch n f
command, abbreviated :ps n f
, attempts to
find a value for the hole f
on line n
by proof search,
trying values of local variables, recursive calls and constructors of
the required family. Optionally, it can take a list of hints, which
are functions it can try applying to solve the hole. For
example, if the code beginning on line 94 is:
vzipWith : (a -> b -> c) ->
Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = ?vzipWith_rhs_1
vzipWith f (x :: xs) (y :: ys) = ?vzipWith_rhs_2
then :ps 96 vzipWith_rhs_1
will give
[]
This works because it is searching for a Vect
of length 0, of
which the empty vector is the only possibility. Similarly, and perhaps
surprisingly, there is only one possibility if we try to solve :ps
97 vzipWith_rhs_2
:
f x y :: vzipWith f xs ys
This works because vzipWith
has a precise enough type: The
resulting vector has to be non-empty (a ::
); the first element
must have type c
and the only way to get this is to apply f
to
x
and y
; finally, the tail of the vector can only be built
recursively.
:makewith
The :makewith n f
command, abbreviated :mw n f
, adds a
with
to a pattern clause. For example, recall parity
. If line
10 is:
parity (S k) = ?parity_rhs
then :mw 10 parity
will give:
parity (S k) with (_)
parity (S k) | with_pat = ?parity_rhs
If we then fill in the placeholder _
with parity k
and case
split on with_pat
using :cs 11 with_pat
we get the following
patterns:
parity (S (plus n n)) | even = ?parity_rhs_1
parity (S (S (plus n n))) | odd = ?parity_rhs_2
Note that case splitting has normalised the patterns here (giving
plus
rather than +
). In any case, we see that using
interactive editing significantly simplifies the implementation of
dependent pattern matching by showing a programmer exactly what the
valid patterns are.
Interactive Editing in Vim
The editor mode for Vim provides syntax highlighting, indentation and interactive editing support using the commands described above. Interactive editing is achieved using the following editor commands, each of which update the buffer directly:
\a
adds a template definition for the name declared on thecurrent line (using
:addclause
).
\c
case splits the variable at the cursor (using:casesplit
).
\m
adds the missing cases for the name at the cursor (using:addmissing
).
\w
adds awith
clause (using:makewith
).\s
invokes a proof search to solve the hole under thecursor (using
:proofsearch
).
There are also commands to invoke the type checker and evaluator:
\t
displays the type of the (globally visible) name under thecursor. In the case of a hole, this displays the context and the expected type.
\e
prompts for an expression to evaluate.\r
reloads and type checks the buffer.
Corresponding commands are also available in the Emacs mode. Support
for other editors can be added in a relatively straightforward manner
by using idris2 -–client
.
More sophisticated support can be added by using the IDE protocol (yet to
be documented for Idris 2, but which mostly extends to protocol documented for
Idris 1.
杂项
在本节中,我们将讨论各种附加功能:
自动、隐式和默认参数;
文学编程;和
全域层级。
隐式参数
我们已经看到了隐式参数,它允许在类型检查器 1 在可以推断出参数时省略参数,例如。
index : forall a, n . Fin n -> Vect n a -> a
自动隐式参数
在其他情况下,可能不是通过类型检查而是通过在上下文中搜索适当的值或构造证明来推断参数。例如,下面 head
的定义需要证明列表是非空的:
isCons : List a -> Bool
isCons [] = False
isCons (x :: xs) = True
head : (xs : List a) -> (isCons xs = True) -> a
head (x :: xs) _ = x
如果列表静态已知为非空,或者因为它的值是已知的,或者因为上下文中已经存在证明,则可以自动构造证明。自动隐式参数允许这种情况静默发生。我们定义 head
如下:
head : (xs : List a) -> {auto p : isCons xs = True} -> a
head (x :: xs) = x
隐式参数上的 auto
注解意味着 Idris 将尝试通过搜索适当类型的值来填充隐式参数。事实上,在内部,这正是接口解析的工作方式。它将按顺序尝试以下操作:
局部变量,即模式匹配或
let
绑定中的名称,具有完全正确的类型。所需类型的构造函数。如果它们有参数,它将递归搜索的最大深度为 100。
具有函数类型的局部变量,递归搜索参数。
任何具有适当返回类型且标有
%hint
注解的函数。
在没有找到证明的情况下,可以像往常一样明确提供:
head xs {p = ?headProof}
默认隐式参数
除了让 Idris 自动查找给定类型的值之外,有时我们还希望有一个具有特定默认值的隐式参数。在 Idris 中,我们可以使用 default
注解来做到这一点。虽然这主要是为了帮助自动构建 auto 失败或发现无用值的证明,但首先考虑不涉及证明的更简单的情况可能更容易。
如果我们想计算第 n 个斐波那契数(并将第 0 个斐波那契数定义为 0),我们可以这样写:
fibonacci : {default 0 lag : Nat} -> {default 1 lead : Nat} -> (n : Nat) -> Nat
fibonacci {lag} Z = lag
fibonacci {lag} {lead} (S n) = fibonacci {lag=lead} {lead=lag+lead} n
在这个定义之后, fibonacci 5
等价于 fibonacci {lag=0} {lead=1} 5
,并且会返回第 5 个斐波那契数。请注意,虽然这有效,但这不是 default
注解的预期用途。此处仅用于说明目的。通常, default
用于提供自定义证明搜索脚本之类的东西。
文学编程
与 Haskell 一样,Idris 支持 文学 编程。如果一个文件的扩展名为 .lidr ,那么它被认为是一个 文学(literate) 文件。在文学编程中,所有内容都被假定为注释,除非该行以大于号 >
开头,例如:
> module literate
This is a comment. The main program is below
> main : IO ()
> main = putStrLn "Hello literate world!\n"
另一个限制是程序行(以 >
开头)和注释行(以任何其他字符开头)之间必须有一个空行。
累积性
警告
尚未在 IDRIS 2 中
由于值可以出现在类型中,然后 反之亦然 ,因此类型本身具有类型是很自然的。例如:
*universe> :t Nat
Nat : Type
*universe> :t Vect
Vect : Nat -> Type -> Type
但是 Type
的类型呢?如果我们问 Idris ,它会报告:
*universe> :t Type
Type : Type 1
如果 Type
是它自己的类型,那么它会因为 Girard 悖论 而导致不一致性,所以内部有一个 层级 类型(或 全域 ):
Type : Type 1 : Type 2 : Type 3 : ...
全域是 累积的 ,也就是说,如果 x : Type n
我们也可以拥有 x : Type m
,只要 n < m
。如果发现任何不一致,类型检查器会生成这样的全域约束并报告错误。通常,程序员不需要担心这一点,但它确实会防止(构造出)以下程序:
myid : (a : Type) -> a -> a
myid _ x = x
idid : (a : Type) -> a -> a
idid = myid _ myid
myid
对自身的应用会导致 Universe 层次结构中的循环 - myid
的第一个参数是 Type
,如果应用它,那么其级别不能低于所要求的级别。
延伸阅读
有关 Idris 编程和一般依赖类型编程的更多信息,可以从各种来源获得:
使用 Idris 进行类型驱动开发 作者 Edwin Brady,可从 `Manning <https://www.manning.com >`_ 获得。
Idris网站(https://www.idris-lang.org/)和通过在邮件列表中提问。
IRC 频道
#idris
, 在 webchat.freenode.net 。维基(https://github.com/idris-lang/Idris-dev/wiki/)有进一步的用户提供的信息,特别是:
检查 preclude 和探索发行版中的
samples
目录。Idris 2 的源代码可以在网上找到:Idris Hackers
网络空间上的现有项目:各种论文(例如:1,[#Brady]_, 和[#BradyHammond2010]_ )。 虽然这些大多是描述 Idris 的旧版本。
- 1
Edwin Brady and Kevin Hammond. 2012. Resource-Safe systems programming with embedded domain specific languages. In Proceedings of the 14th international conference on Practical Aspects of Declarative Languages (PADL’12), Claudio Russo and Neng-Fa Zhou (Eds.). Springer-Verlag, Berlin, Heidelberg, 242-257. DOI=10.1007/978-3-642-27694-1_18 https://dx.doi.org/10.1007/978-3-642-27694-1_18
- 2
Edwin C. Brady. 2011. IDRIS —: systems programming meets full dependent types. In Proceedings of the 5th ACM workshop on Programming languages meets program verification (PLPV ‘11). ACM, New York, NY, USA, 43-54. DOI=10.1145/1929529.1929536 https://doi.acm.org/10.1145/1929529.1929536
- 3
Edwin C. Brady and Kevin Hammond. 2010. Scrapping your inefficient engine: using partial evaluation to improve domain-specific language implementation. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (ICFP ‘10). ACM, New York, NY, USA, 297-308. DOI=10.1145/1863543.1863587 https://doi.acm.org/10.1145/1863543.1863587
常见问题解答
Idris 项目的目标是什么?
Idris 旨在使软件从业者可以使用与类型相关的高级编程技术。我们遵循的一个重要理念是,Idris 允许 软件开发人员表达其数据的不变量并证明程序的属性,但不会 要求 他们必须这样做。
此常见问题解答中的许多答案都证明了这一理念,我们在做出语言和库设计决策时始终牢记这一点。
Idris 主要是一个研究项目,由圣安德鲁斯大学的 Edwin Brady 领导,并受益于 SICSA (https://www.sicsa.ac.uk) 和 EPSRC (https://www.epsrc.ac.uk) /) 资助。这确实会影响一些设计选择和实现优先级,并且意味着有些事情没有我们想要的那么完美。尽管如此,我们仍在努力使其尽可能广泛地使用!
我在哪里可以找到库?有包管理器吗?
我们还没有包管理器,但您仍然可以在 wiki 上找到库的来源:https://github.com/idris-lang/Idris2/wiki/1-%5BLanguage%5D-Libraries
幸运的是,依赖关系目前并不复杂,但我们仍然希望包管理器提供帮助!目前还没有正式的,但有两个正在开发中:
Idris 2 可以使用自己进行编译吗?
是的,Idris 2 在 Idris 2 中实现。默认情况下,它以 Chez Scheme 为目标,因此您可以从生成的 Scheme 代码引导,如 入门 一节所述。
为什么 Idris 2 以 Scheme 为目标?动态类型的目标语言肯定会很慢吗?
您可能会对 Chez Scheme 的速度感到惊讶! Racket 作为替代目标,也表现良好。两者的性能都优于 Idris 1 后端,后者是用 C 语言编写的,但没有像 Chez 和 Racket 那样经过运行时系统专家数十年的工程努力。 Chez Scheme 还允许我们关闭运行时检查,我们也是这样做的。
作为性能改进的观察性证据,我们使用使用 Chez 运行时构建的编译器版本和使用引导 Idris 2 构建的相同版本,比较了 Idris 2 运行时与 Idris 1 运行时的性能。在戴尔 XPS 13运行 2020 年 5 月 23 日版本的 Ubuntu,性能为:
Idris 2(使用 Chez Scheme 运行时)在 93 秒内检查完自己的源码。
引导 Idris 2(使用 Idris 1 编译)在 125 秒内检查完相同的源码。
Idris 1 在 768 秒内检查完引导 Idris 2 的源码(与上述相同,但由于语法更改而略有不同)。
不幸的是,我们不能用最新版本重复这个实验,因为引导 Idris 2 不再能够构建当前版本。
然而,这并不是一个长期的解决方案,即使它是一种非常方便的引导方式。
Idris 2 可以生成 Javascript 吗?那么可插拔代码生成器呢?
是的! JavaScript 代码生成器 是内置的,可以针对浏览器或 NodeJS。
与 Idris 1 一样,Idris 2 支持可插拔代码生成器 允许您为您选择的平台编写后端。
Idris 1 和 Idris 2 之间的主要区别是什么?
最重要的区别是 Idris 2 明确表示 擦除 类型,因此您可以在编译时看到哪些函数和数据类型参数被擦除,哪些将在运行时出现。您可以在 多重性 中查看更多详细信息。
Idris 2 具有明显更好的类型检查性能(甚至可能是数量级的差异!)并生成更好的代码。
此外,在 Idris 中实现,我们已经能够利用类型系统来消除一些重要的错误来源!
您可以在 自 Idris 1 以来的变化 部分中找到更多详细信息。
为什么库中没有更多的线性注解?
理论上,现在 Idris 2 基于定量类型理论(参见章节 多重性 ),我们可以在 Prelude 和 Base 库中编写更精确的类型,从而提供更精确的使用信息。但是,我们选择(暂时)不这样做。例如,考虑一下如果我们这样做会发生什么:
id : (1 _ : a) -> a
id x = x
这绝对是正确的,因为 x
只使用了一次。但是,我们也有:
map : (a -> b) -> List a -> List b
通常情况下,我们不能保证传递给 map 的函数在其参数中是线性的,因此我们不能再说 map id xs
,因为 id
的多重性和传递给 map
的函数的多重性不匹配。
最终,我们希望通过多重性多态来扩展核心语言,这将有助于解决这些问题。在那之前,我们认为线性是类型系统中的一个实验性新特性,因此我们遵循一般理念,即如果你不想使用线性,它的存在一定不会影响你编写程序的方式。
如何在 Idris2 REPL 中获取命令历史记录?
Idris2 REPL 不支持 readline 以保持最小的依赖关系。一个有用的解决方法是安装 rlwrap ,这个程序只需调用 Idris2 repl 作为程序 rlwrap idris2
的参数即可提供命令历史记录。
最终目标是使用 IDE 模式或 Idris API 作为独立于 Idris 2 核心开发的复杂 REPL 实现的基础。据我们所知,目前还没有人致力于此:如果您有兴趣,请联系我们,我们可以帮助您开始!
为什么 Idris 使用及早求值而不是惰性求值?
Idris 使用及早求值来获得更可预测的性能,特别是因为长期目标之一是能够编写高效且经过验证的低级代码,例如设备驱动程序和网络基础设施。此外,Idris 类型系统允许我们准确地声明每个值的类型,从而准确地声明每个值的运行时形式。在惰性语言中,考虑一个类型为 Int
的值:
thing : Int
thing
在运行时的表示形式是什么?它是表示整数的位模式,还是指向某些将计算整数的代码的指针?在 Idris 中,我们决定在类型中使这种区分更加精确:
thing_val : Int
thing_comp : Lazy Int
在这里,从类型中可以清楚地看出, thing_val
被保证是一个具体的 Int
,而 thing_comp
是一个将会产生一个 Int
的计算。
如何创建惰性控制结构?
您可以使用特殊的 Lazy 类型创建控制结构。例如,实现不依赖的 if...then...else...
的一种方法是通过名为 ifThenElse
的函数:
ifThenElse : Bool -> (t : Lazy a) -> (e : Lazy a) -> a
ifThenElse True t e = t
ifThenElse False t e = e
t
和 e
的 Lazy a
类型表示只有在使用它们时才会对这些参数求值,也就是说,它们会被延迟求值。
顺便说一句:我们实际上并没有在 Idris 2 中以这种方式实现 if...then...else...”
相反,它被转换为允许依赖 if
的 case
表达式。
REPL 的求值并不像我预期的那样。这是怎么回事?
作为一种完全依赖类型的语言,Idris 有两个阶段来对事物求值,编译时和运行时。在编译时,它只会求值它知道的全部内容(即终止并覆盖所有可能的输入),以保持类型检查的可判定性。编译时求值器是 Idris 内核的一部分,在 Idris 中作为解释器实现。由于这里的所有内容都具有范式,因此求值策略实际上并不重要,因为无论哪种方式都会得到相同的答案!在实践中,它使用按名称调用,因为这避免了类型检查不需要的子表达式求值。
为方便起见,REPL 使用了编译时的求值概念。除了更容易实现(因为我们有可用的求值器),这对于显示被求值项在类型检查器中如何求值非常有用。所以你可以看到两者之间的区别:
Main> \n, m => S n + m
\n, m => S (plus n m)
Main> \n, m => n + S m
\n, m => plus n (S m)
如果你想在 REPL 编译和执行一个表达式,你可以使用 :exec
命令。在这种情况下,表达式必须具有类型 IO a
( a
可以是任何类型,尽管它不会打印结果)。
为什么我不能使用类型中没有参数的函数?
如果您在以小写字母开头且不应用于任何参数的类型中使用名称,则 Idris 会将其视为隐式绑定参数。例如:
append : Vect n ty -> Vect m ty -> Vect (n + m) ty
在这里, n
、 m
和 ty
是隐式绑定的。即使在其他地方定义了具有任何这些名称的函数,此规则也适用。例如,您可能还拥有:
ty : Type
ty = String
即使在这种情况下, ty
仍然被认为是隐式绑定在 append
的定义中,而不是使 append
的类型等价于…
append : Vect n String -> Vect m String -> Vect (n + m) String
…这可能不是预期的!这条规则的原因是,只看 append
的类型,而不是其他上下文,就可以清楚地知道隐式绑定的名称是什么。
如果您想在类型中使用未应用的名称,您有三个选项。您可以明确限定它,例如,如果在命名空间 Main
中定义了 ty
,则可以执行以下操作:
append : Vect n Main.ty -> Vect m Main.ty -> Vect (n + m) Main.ty
或者,您可以使用不以小写字母开头的名称,它永远不会被隐式绑定:
Ty : Type
Ty = String
append : Vect n Ty -> Vect m Ty -> Vect (n + m) Ty
按照惯例,如果一个名称打算用作类型同义词,最好以大写字母开头以避免这种限制。
最后,您可以使用指令关闭隐式的自动绑定:
%auto_implicits off
在这种情况下,您可以将 n
和 m
绑定为隐式,但不能将 ty
绑定,如下所示:
append : forall n, m . Vect n ty -> Vect m ty -> Vect (n + m) ty
为什么 Functor
、 Applicative
、 Monad
等接口不包含定律?
从表面上看,这听起来是个好主意,因为类型系统允许我们指定规律。不过,我们不会在 prelude 中这样做,主要有两个原因:
它违背了 Idris 允许 程序员证明其程序的属性,但不 要求 它的哲学(在上面)。
在 Idris 系统内,有效、合法的实现不一定是可证明合法的,尤其是在涉及更高阶功能的情况下。
Control.Algebra
中有经过验证的接口版本,它们扩展了带有定律的接口。
我有一个明显终止的程序,但 Idris 说它可能不是完全函数。这是为什么?
由于 停机问题 的不确定性,Idris 通常无法确定程序是否终止。但是,可以识别某些肯定终止的程序。 Idris 使用 “大小更改终止” 来执行此操作,它查找从函数返回到自身的递归路径。在这样的路径上,必须至少有一个参数收敛到基本情况。
支持相互递归函数
不过,递归路径上的所有函数必须被完整地应用。此外,Idris 不支持高阶应用。
Idris 通过查找对语法上较小的输入参数的递归调用来识别收敛到基本情况的参数。例如
k
在语法上小于S (S k)
因为k
是S (S k)
的子项,但(k, k)
在语法不小于(S k, S k)
。
如果你有一个你认为要终止的函数,但 Idris 不这么认为,你可以重新组织程序,或者使用 assert_total
函数。
Idris 有全域多态吗? Type
的类型是什么?
Idris 2 当前实现了 Type : Type
。别担心,这不会永远如此!对于 Idris 1,FAQ 对这个问题的回答如下:
Idris 不是全域多态,而是全域的累积层层级。 Type : Type 1
, Type 1 : Type 2
等等。累积性意味着如果 x : Type n
并且 n <= m
,那么 x : Type m `` 。全域级别总是由 Idris 推断,不能明确指定。 REPL 命令 ``:type Type 1
将导致错误,尝试指定任何类型的全域级别也会导致错误。
“Idris”这个名字是什么意思?
到了一定年龄的英国人可能对这条 会唱歌的龙 比较熟悉。如果这没有帮助,也许你可以发明一个合适的首字母缩略词:-)。
在哪里可以找到 Idris 社区的社区标准?
Idris 社区标准在 这里 声明
编译为可执行文件
备注
Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/
Idris 2(语言)被设计为不依赖于任何特定的代码生成器。不过,由于编写程序的重点是能够运行它,所以知道如何运行是很重要的,默认情况下,Idris通过 Chez Scheme 编译为可执行文件。
你可以在 REPL 中按如下方式编译到可执行文件:
Main> :c execname expr
…其中 execname
是要生成的可执行文件的名称, expr
是将被执行的 Idris 表达式。 expr
必须拥有 IO ()
的类型。这将产生一个可执行文件 execname
,在相对于当前工作目录的 build/exec
目录下。
你也可以直接执行表达式:
Main> :exec expr
同样, expr
也必须要有类型 IO ()
。
最后,你可以通过添加 -o <output file>
选项从命令行编译为可执行文件:
$ idris2 hello.idr -o hello
将编译表达式 Main.main
,在 build/exec
目录下生成一个可执行的 hello
(根据代码生成器的不同,可能会有一个文件扩展名)。
默认情况下,Idris 2 是一个完整的程序编译器 - 也就是说,它找到所有必要的函数定义,并在你构建可执行文件时才编译它们。这提供了大量的优化机会,但对于重新构建来说可能会很慢。然而,如果后端支持的话,你可以 增量 构建模块和可执行文件:
Incremental Code Generation
By default, Idris 2 is a whole program compiler - that is, it finds all the necessary function definitions and compiles them only when you build an executable. This gives plenty of optimisation opportunities, but can also be slow for rebuilding. However, if the backend supports it, you can build modules and executables incrementally. To do so, you can either:
Set the
--inc <backend>
flag at the command line, for each backend you want to use incrementally.Set the
IDRIS2_INC_CGS
environment variable with a comma separated list of backends to use incrementally.
At the moment, only the Chez backend supports incremental builds.
Building modules incrementally
If either of the above are set, building a module will produce compiled binary code for all of the definitions in the module, as well as the usual checked TTC file. e.g.:
$ idris2 --inc chez Foo.idr
$ IDRIS2_INC_CGS=chez idris2 Foo.idr
On successful type checking, each of these will produce a Chez Scheme file
(Foo.ss
) and compiled code for it (Foo.so
) as well as the usual
Foo.ttc
, in the same build directory as Foo.ttc
.
In incremental mode, you will see a warning for any holes in the module, even if those holes will be defined in a different module.
Building executables incrementally
If either --inc
is used or IDRIS2_INC_CGS
is set, compiling to
an executable will attempt to link all of the compiled modules together,
rather than generating code for all of the functions at once. For this
to work, all the imported modules must have been built with incremental
compilation for the current back end (Idris will revert to whole program
compilation if any are missing, and you will see a warning.)
Therefore, all packages used by the executable must also have been built
incrementally for the current back end. The prelude
, base
,
contrib
, network
and test
packages are all built with
incremental compilation support for Chez by default.
When switching between incremental and whole program compilation, it is
recommended that you remove the build
directory first. This is
particularly important when switching to incremental compilation, since there
may be stale object files that Idris does not currently detect!
Overriding incremental compilation
The --whole-program
flag overrides any incremental compilation settings
when building an executable.
Performance note
Incremental compilation means that executables are generated much quicker,
especially when only a small proportion of modules have changed. However,
it means that there are fewer optimisation opportunities, so the resulting
executable will not perform as well. For deployment, --whole-program
compilation is recommended.
如果后端支持,你可以通过设置 profile
标志来生成配置数据,或者用 --profile
启动 Idris,或者在 REPL 运行 :set profile
。生成的配置数据将取决于你所使用的后端。目前, Chez 和 Racket 后端支持生成配置数据。
Idris 2 中提供了五个代码生成器,并且有一个系统可以为各种目标语言插入新的代码生成器。默认是通过 Chez Scheme 编译,还有一个选择是通过 Racket 或 Gambit 编译。你可以在REPL中用 :set codegen 命令设置代码生成器,或者通过 IDRIS2_CG 环境变量进行设置。
Chez Scheme 代码生成器
Chez Scheme 代码生成器是默认的,或者可以通过 REPL 命令访问:
Main> :set cg chez
因此,默认情况下,要运行 Idris 程序,您需要安装 Chez Scheme 。 Chez Scheme 是开源的,可通过大多数操作系统包管理器获得。
您可以在 REPL 中将类型为 IO ()
的表达式 expr
编译为可执行文件,如下所示:
Main> :c execname expr
…其中 execname
是可执行文件的名称。这将生成以下内容:
调用程序的 shell 脚本
build/exec/execname
子目录
build/exec/execname_app
中包含运行程序所需的所有数据。这包括 Chez Scheme 源代码(execname.ss
),已编译的 Chez Scheme 代码(execname.so
)和外部函数定义所需的任何共享库。
可执行的 execname
可以重新定位到任何子目录,前提是 execname_app
也在同一个子目录中。
你也可以直接执行表达式:
Main> :exec expr
同样, expr
必须具有 IO ()
类型。这将在 build/exec
目录中生成一个临时可执行脚本 _tmpchez
,并执行它。
Chez Scheme 是默认的代码生成器,因此如果您使用 -o execname
标志调用 idris2
,它将生成一个可执行脚本 build/exec/execname
,和支持文件 build/exec/execname_app
。
Chez 指令
--directive extraRuntime=<path>
将来自
<path>
的 Scheme 源代码直接嵌入到生成的输出中。可以多次指定,在这种情况下,所有给定的文件都将按指定的顺序包含。; extensions.scm (define (my-mul a b) (* a b))
-- Main.idr %foreign "scheme:my-mul" myMul : Int -> Int -> Int
$ idris2 --codegen chez --directive extraRuntime=/path/to/extensions.scm -o main Main.idr
构建独立的可执行文件
可以使用 chez-exe 将 Chez Scheme 系统和内置的 Idris2 程序嵌入到独立的可执行文件中。
通过运行配置脚本构建并安装
compile-chez-program-tool
,然后执行:$ scheme --script gen-config.ss --bootpath <bootpath>其中
<bootpath
是 Chez Scheme 引导文件(petite.boot
和scheme.boot
)和scheme.h
所在的路径。更多配置在 chez-exe 安装说明中描述。调用
compile-chez-program
:$ compile-chez-program --optimize-level 3 build/exec/my_idris_prog_app/my_idris_prog.ss请注意,它只能使用
.ss
文件而不是.so
文件。要嵌入包括编译器在内的完整 Chez Scheme 系统,请添加--full-chez
选项。完成的可执行文件仍然需要 libidris_support 共享库。也可以通过静态链接来消除这种依赖关系。
Racket 代码生成器
Racket 代码生成器通过 REPL 命令访问:
Main> :set cg racket
或者,您可以通过 IDRIS2_CG
环境变量进行设置:
$ export IDRIS2_CG=racket
您可以在 REPL 中将类型为 IO ()
的表达式 expr
编译为可执行文件,如下所示:
Main> :c execname expr
…其中 execname
是可执行文件的名称。这将生成以下内容:
调用程序的 shell 脚本
build/exec/execname
一个子目录
build/exec/execname_app
中包含运行程序所需的所有数据。这包括 Racket 源代码(execname.rkt
)、已编译的 Racket 代码(Windows 上的execname
或execname.exe
)以及外部函数定义所需的任何共享库。
可执行的 execname
可以重新定位到任何子目录,前提是 execname_app
也在同一个子目录中。
你也可以直接执行表达式:
Main> :exec expr
同样, expr
必须具有 IO ()
类型。这将在 build/exec
目录中生成一个临时可执行脚本 _tmpracket
,并执行该脚本,而无需先编译为二进制文件(因此会解释生成的 Racket 代码)。
Racket 指令
--directive extraRuntime=<path>
将来自
<path>
的 Scheme 源代码直接嵌入到生成的输出中。可以多次指定,在这种情况下,所有给定的文件都将按指定的顺序包含。; extensions.scm (define (my-mul a b) (* a b))
-- Main.idr %foreign "scheme:my-mul" myMul : Int -> Int -> Int
$ idris2 --codegen chez --directive extraRuntime=/path/to/extensions.scm -o main Main.idr
Gambit Scheme 代码生成器
可以通过 REPL 命令访问 Gambit Scheme 代码生成器:
Main> :set cg gambit
或者,您可以通过 IDRIS2_CG
环境变量进行设置:
$ export IDRIS2_CG=gambit
要使用此生成器运行 Idris 程序,您需要安装 Gambit Scheme 。 Gambit Scheme 是免费软件,可通过大多数包管理器获得。
您可以在 REPL 中将类型为 IO ()
的表达式 expr
编译为可执行文件,如下所示:
Main> :c execname expr
…其中 execname
是可执行文件的名称。这将生成以下内容:
程序的可执行二进制文件为
build/exec/execname
。一个 Gambit Scheme 源文件
build/exec/execname.scm
,并从中生成二进制文件。
你也可以直接执行表达式:
Main> :exec expr
同样, expr
必须具有 IO ()
类型。这将生成一个临时 Scheme 文件,并在其上执行 Gambit 解释器。
Gambit 指令
--directive extraRuntime=<path>
将来自
<path>
的 Scheme 源代码直接嵌入到生成的输出中。可以多次指定,在这种情况下,所有给定的文件都将按指定的顺序包含。; extensions.scm (define (my-mul a b) (* a b))
-- Main.idr %foreign "scheme:my-mul" myMul : Int -> Int -> Int
$ idris2 --codegen chez --directive extraRuntime=/path/to/extensions.scm -o main Main.idr
--directive C
编译为 C。
Gambit 环境变量配置
GAMBIT_GSC_BACKEND
GAMBIT_GSC_BACKEND
变量控制在编译期间 Gambit 将使用哪个 C 编译器。例如。使用 clang :$ export GAMBIT_GSC_BACKEND=clang
v4.9.3 之后的 Gambit 支持
-cc
选项,它配置编译器后端 Gambit 将用于构建二进制文件。目前要获得此功能 Gambit 需要从源代码构建,因为它尚未在发布版本中可用。
Javascript 和 Node 代码生成器
有两个 javascript 代码生成器, node
和 javascript
。两者之间有两个区别: javascript
代码生成器在被调用时,如果输出是一个一个HTML文件,会同时生成一个基本的HTML文件,生成的代码在 <script>
标签内;另一个区别是在 ffi 上,将在下面解释。
Javascript FFI 说明符
有三种主要的 javascript ffi 说明符 javascript
, node
和 browser
。 javascript
表示在node 和浏览器上均可用, node
仅在 node 上可用, browser
仅在浏览器上可用。
对于 node
来说,有两种方法来定义一个外部函数:
%foreign "node:lambda: n => process.env[n]"
prim_getEnv : String -> PrimIO (Ptr String)
这里的 lambda
表示我们将定义作为一个 lambda 表达式进行提供。
%foreign "node:lambda:fp=>require('fs').fstatSync(fp.fd, {bigint: false}).size"
prim__fileSize : FilePtr -> PrimIO Int
require
可以用来导入 javascript 模块。
下面是一个完整示例,只有在 browser
的 codegen 是外部函数才可用:
%foreign "browser:lambda:x=>{document.body.innerHTML = x}"
prim__setBodyInnerHTML : String -> PrimIO ()
简短示例
一个有趣的例子是为 setTimeout 函数创建一个外部函数:
%foreign "javascript:lambda:(callback, delay)=>setTimeout(callback, delay)"
prim__setTimeout : (PrimIO ()) -> Int -> PrimIO ()
setTimeout : HasIO io => IO () -> Int -> io ()
setTimeout callback delay = primIO $ prim__setTimeout (toPrim callback) delay
注意:以前版本 的javascript 后端将 Int
视为一个64位有符号的整数,在 javascript 领域由 BigInt
表示。现在情况不是这样了。 Int
现在被视为一个32位有符号的整数,由 Number
表示。这应该有利于 Idris2 和后端之间的互操作。
但是,除非您有充分的理由这样做,否则请考虑使用其他固定精度整数类型之一。它们应该在所有后端上都具有相同的行为。所有精度高达 32 位的有符号和无符号整型( Int8
, Int16
, Int32
, Bits8
, Bits16
, 和 Bits32
)都由 Number
表示,而 Int64
、 Bits64
和 Integer
由 BigInt
表示。因此,可以通过使用 Int32
代替 Int
来改进上面的示例:
%foreign "javascript:lambda:(callback, delay)=>setTimeout(callback, delay)"
prim__setTimeout : (PrimIO ()) -> Int32 -> PrimIO ()
setTimeout : HasIO io => IO () -> Int32 -> io ()
setTimeout callback delay = primIO $ prim__setTimeout (toPrim callback) delay
浏览器示例
要构建能在浏览器中使用的JavaScript,必须使用 javascript codegen 选项编译代码。编译器生成 JavaScript 或 HTML 文件。浏览器需要一个 HTML 文件才能加载。此HTML文件可以通过两种方式创建
如果输出文件中包含
.html
后缀,编译器就会生成一个 HTML 文件,其中包括对已生成的 JavaScript 的包装。如果 没有 给出
.html
后缀,生成的文件只包含JavaScript代码。在这种情况下,需要手动包装。
包装到 HTML 的示例:
<html>
<head><meta charset='utf-8'></head>
<body>
<script type='text/javascript'>
JS code goes here
</script>
</body>
</html>
由于我们的目的是开发在浏览器中运行的东西,自然会产生一些问题:
如何与 HTML 元素交互?
更重要的是,生成的 Idris 代码会在什么时候开始执行?
Idris 生成代码的起点
为你的程序生成的 JavaScript 包含一个入口点。 main
函数被编译成一个 JavaScript 顶层表达式,它将在加载 script
标签时被求值,这就是Idris生成的程序在浏览器中开始的入口点。
与HTML元素的交互
正如简短示例部分所描述的,当 Idris 生成的代码和浏览器/JS生态系统的其他部分发生交互时,必须使用 FFI 。由 FFI 处理的信息被分成两类。第一是Idris FFI 的原语类型,如 Int 。第二类是除原语类型之外所有的。第二类是通过 AnyPtr 访问的。 %foreign
结构应该被用来在 JavaScript 方面给出实现。还有一个 Idris 函数声明,在 Idris 方面给出 Type
声明。语法是 %foreign "browser:lambda:js-lambda-expression"
。在 Idris 方面,当定义 %foreign
时,原语类型和 PrimIO t
类型应该作为参数。这个声明是一个辅助函数,需要在 primIO
函数后面被调用。关于这一点的更多信息可以在 FFI 章节中找到。
JavaScript FFI 示例
console.log
%foreign "browser:lambda: x => console.log(x)"
prim__consoleLog : String -> PrimIO ()
consoleLog : HasIO io => String -> io ()
consoleLog x = primIO $ prim__consoleLog x
在 Idris 中,字符串是一个原语类型,它被表示为一个 JavaScript 字符串。在 Idris 和 JavaScript 之间没有必要进行任何转换。
setInterval
%foreign "browser:lambda: (a,i)=>setInterval(a,i)"
prim__setInterval : PrimIO () -> Int32 -> PrimIO ()
setInterval : (HasIO io) => IO () -> Int32 -> io ()
setInterval a i = primIO $ prim__setInterval (toPrim a) i
JavaScript 中的 setInterval
函数在每 x
毫秒执行给定的函数。我们可以在回调中使用 Idris 生成的函数,只要它们的类型是 IO ()
。
HTML Dom 元素
让我们把注意力转移到 Dom 元素和事件上。如上所述,任何不是原语类型的东西都应该通过FFI中的 AnyPtr
类型来处理。任何由 JavaScript 函数返回的复杂的东西都应该在 AnyPtr
值中捕获。建议将 AnyPtr
值分成几类。
data DomNode = MkNode AnyPtr
%foreign "browser:lambda: () => document.body"
prim__body : () -> PrimIO AnyPtr
body : HasIO io => io DomNode
body = map MkNode $ primIO $ prim__body ()
我们创建了一个 DomNode
类型,它持有一个 AnyPtr
。 prim__body
函数包装了一个没有参数的 lambda 函数。在 Idris 函数 body
中,我们传递一个额外的 ()
参数,我们使用 MkNode
数据构造器将结果包裹在 DomNode
类型中。
JavaScript 返回的原语类型值
作为前面例子的延续,DOM元素的 width
属性可以通过FFI检索。
%foreign "browser:lambda: n=>(n.width)"
prim__width : AnyPtr -> PrimIO Bits32
width : HasIO io => DomNode -> io Bits32
width (MkNode p) = primIO $ prim__width p
处理 JavaScript 事件
data DomEvent = MkEvent AnyPtr
%foreign "browser:lambda: (event, callback, node) => node.addEventListener(event, x=>callback(x)())"
prim__addEventListener : String -> (AnyPtr -> PrimIO ()) -> AnyPtr -> PrimIO ()
addEventListener : HasIO io => String -> DomNode -> (DomEvent -> IO ()) -> io ()
addEventListener event (MkNode n) callback =
primIO $ prim__addEventListener event (\ptr => toPrim $ callback $ MkEvent ptr) n
在这个例子中显示了如何将一个事件处理程序附加到一个特定的 DOM 元素。在Idris 方面事件的值也是 AnyPtr
类型。为了分离 DomNode
和 DomEvent
我们创建了两个不同的类型。它还演示了在 Idris 中定义的一个简单的回调函数如何在 JavaScript 侧使用。
指令
javascript 代码生成器接受三种不同的指令,即生成的代码应该有多紧凑和多晦涩。下面的例子显示了为 putStr
函数生成的代码,这三个指令分别来自 prelude 。(--cg node
被在下面的例子使用,但在生成代码在浏览器中运行时, --cg javascript
的行为是一样的)。
使用 idris2 --cg node --directive pretty
(默认情况下,如果没有给出指令),一个基本的美观打印器被用来生成正确缩进的 javascript 代码。
function Prelude_IO_putStr($0, $1) {
return $0.a2(undefined)($7 => Prelude_IO_prim__putStr($1, $7));
}
使用 idris2 --cg node --directive compact
,每一个顶层函数都在一行中声明,不需要的空格都会被删除:
function Prelude_IO_putStr($0,$1){return $0.a2(undefined)($7=>Prelude_IO_prim__putStr($1,$7));}
最后,通过 idris2 --cg node --directive minimal
,顶层函数名称(除了少数例外,如静态序言『static preamble』中的函数)会被混淆,以减少生成的javascript文件的大小:
function $R3a($0,$1){return $0.a2(undefined)($7=>$R3b($1,$7));}
C with Reference Counting
There is an experimental code generator which compiles to an executable via C, using a reference counting garbage collector. This is intended as a lightweight (i.e. minimal dependencies) code generator that can be ported to multiple platforms, especially those with memory constraints.
Performance is not as good as the Scheme based code generators, partly because the reference counting has not yet had any optimisation, and partly because of the limitations of C. However, the main goal is portability: the generated code should run on any platform that supports a C compiler.
This code generator can be accessed via the REPL command:
Main> :set cg refc
或者,您可以通过 IDRIS2_CG
环境变量进行设置:
$ export IDRIS2_CG=refc
The C compiler it invokes is determined by either the IDRIS2_CC
or CC
environment variables. If neither is set, it uses cc
.
This code generator does not yet support :exec, just :c.
Also note that, if you link with any dynamic libraries for interfacing with
C, you will need to arrange for them to be accessible via LD_LIBRARY_PATH
when running the executable. The default Idris 2 support libraries are
statically linked.
Extending RefC
RefC can be extended to produce a new backend for languages that support C foreign functions. For example, a Python backend for Idris.
In your backend, use the Compiler.RefC
functions generateCSourceFile
,
compileCObjectFile {asLibrary = True}
, and
compileCFile {asShared = True}
to generate a .so
shared object file.
_ <- generateCSourceFile defs cSourceFile
_ <- compileCObjectFile {asLibrary = True} cSourceFile cObjectFile
_ <- compileCFile {asShared = True} cObjectFile cSharedObjectFile
To run a compiled Idris program, call the int main(int argc, char *argv[])
function in the compiled .so
file, with the arguments you wish to pass to
the running program.
For example, in Python:
import ctypes
import sys
argc = len(sys.argv)
argv = (ctypes.c_char_p * argc)(*map(str.encode, sys.argv))
cdll = ctypes.CDLL("main.so")
cdll.main(argc, argv)
Extending RefC FFIs
To make the generated C code recognize additional FFI languages beyond the
standard RefC FFIs, pass the additionalFFILangs
option to
generateCSourceFile
, with a list of the language identifiers your backend
recognizes.
_ <- generateCSourceFile {additionalFFILangs = ["python"]} defs cSourceFile
This will generate stub FFI function pointers in the generated C file, which
your backend should set to the appropriate C functions before main
is
called.
Each %foreign "lang: foreignFuncName, opts"
definition for a function
will produce a stub, of the appropriate function pointer type. This stub will
be called cName $ NS (mkNamespace lang) funcName
, where funcName
is the
fully qualified Idris name of that function.
So the %foreign
function
%foreign "python: abs"
abs : Int -> Int
produces a stub python_Main_abs
, which can be backpatched in Python by:
abs_ptr = ctypes.CFUNCTYPE(ctypes.c_int64, ctypes.c_int64)(abs)
ctypes.c_void_p.in_dll(cdll, "python_Main_abs").value = ctypes.cast(abs_ptr, ctypes.c_void_p).value
使用新后端构建 Idris 2
The way to extend Idris 2 with new backends is to use it as
a library. The module Idris.Driver
exports the function
mainWithCodegens
, that takes a list of (String, Codegen)
,
starting idris with these codegens in addition to the built-in ones. The first
codegen in the list will be set as the default codegen.
入门
要将 Idris 2 用作库,您需要自托管安装,然后安装 idris2api
库(位于 Idris2 存储库的顶层)
make install-api
接下来创建一个文件,包含以下内容
module Main
import Core.Context
import Compiler.Common
import Idris.Driver
import Idris.Syntax
compile :
Ref Ctxt Defs -> Ref Syn SyntaxInfo ->
(tmpDir : String) -> (execDir : String) ->
ClosedTerm -> (outfile : String) -> Core (Maybe String)
compile syn defs tmp dir term file
= do coreLift $ putStrLn "I'd rather not."
pure Nothing
execute :
Ref Ctxt Defs -> Ref Syn SyntaxInfo ->
(execDir : String) -> ClosedTerm -> Core ()
execute defs syn dir term = do coreLift $ putStrLn "Maybe in an hour."
lazyCodegen : Codegen
lazyCodegen = MkCG compile execute Nothing Nothing
main : IO ()
main = mainWithCodegens [("lazy", lazyCodegen)]
构建它
$ idris2 -p idris2 -p contrib -p network Lazy.idr -o lazy-idris2
现在您有了一个带有附加后端的 idris2 。
$ ./build/exec/lazy-idris2
____ __ _ ___
/ _/___/ /____(_)____ |__ \
/ // __ / ___/ / ___/ __/ / Version 0.2.0-bd9498c00
_/ // /_/ / / / (__ ) / __/ https://www.idris-lang.org
/___/\__,_/_/ /_/____/ /____/ Type :? for help
Welcome to Idris 2. Enjoy yourself!
With codegen for: lazy
Main>
不过,它不会过分急于用新的后端实际编译任何代码
$ ./build/exec/lazy-idris2 --cg lazy Hello.idr -o hello
I'd rather not.
$
关于目录
代码生成器可以假设 tmpDir
和 outputDir
都存在。 tmpDir
用于临时文件,而 outputDir
是放置所需输出文件的位置。默认情况下, tmpDir
和 outputDir
指向同一个目录( build/exec
)。可以从包描述(参见 包 部分)或通过命令行选项(在 idris2 --help
中列出)设置目录。
Custom backend cookbook
This document addresses the details on how to implement a custom code generation backend for the Idris compiler.
This part has no insights about how to implement the dependently typed bits. For that part of the compiler Edwin Brady gave lectures at SPLV20 which are available online.
The architecture of the Idris2 compiler makes it easy to implement a custom code generation back-end.
The way to extend Idris with new back-ends is to use it as a library.
The module Idris.Driver
exports the function mainWithCodegens
,
that takes a list of (String, Codegen)
, starting idris with these codegens
in addition to the built-in ones.
The first codegen in the list will be set as the default codegen.
Anyone who is interested in implementing a custom back-end needs to answer the following questions:
Which Intermediate Representation (IR) should be consumed by the custom back-end?
How to represent primitive values defined by the
Core.TT.Constant
type?How to represent Algebraic Data Types?
How to implement special values?
How to implement primitive operations?
How to compile IR expressions?
How to compile Definitions?
How to implement Foreign Function Interface?
How to compile modules?
How to embed code snippets?
What should the runtime system support?
First of all, we should know that Idris2 is not an optimizing compiler. Currently its focus is only to compile dependently typed functional code in a timely manner. Its main purpose is to check if the given program is correct in a dependently typed setting and generate code in form of a lambda-calculus like IR where higher-order functions are present. Idris has 3 intermediate representations for code generation. At every level we get a simpler representation, closer to machine code, but it should be stressed that all the aggressive code optimizations should happen in the custom back-ends. The quality and readability of the generated back-end code is on the shoulders of the implementor of the back-end. Idris erases type information, in the IRs as it compiles to scheme by default, and there is no need to keep the type information around. With this in mind let’s answer the questions above.
The architecture of an Idris back-end
Idris compiles its dependently typed front-end language into a representation
which is called Compile.TT.Term
.
This data type has a few constructors and it represents a dependently typed
term.
This Term
is transformed to Core.CompileExpr.CExp
which has more
constructors than Term
and it is a very similar construct to a lambda
calculus with let bindings, structured and tagged data representation,
primitive operations, external operations, and case expressions.
The CExp
is closer in the compiling process to code generation.
The custom code generation back-end gets
a context of definitions,
a template directory and an output directory,
a Core.TT.ClosedTerm
to compile and a path to an output file.
compile : Ref Ctxt Defs -> (tmpDir : String) -> (outputDir : String)
-> ClosedTerm -> (outfile : String) -> Core (Maybe String)
compile defs tmpDir outputDir term file = ?
The ClosedTerm
is a special Term
where the list of the unbound
variables is empty.
This technicality is not important for the code generation of the custom
back-end as the back-end needs to call the getCompileData
function
which produces the Compiler.Common.CompileData
record.
The CompileData
contains:
A main expression that will be the entry point for the program in
CExp
A list of
Core.CompileExpr.NamedDef
A list of lambda-lifted definitions
Compiler.LambdaLift.LiftedDef
A list of
Compiler.ANF.ANFDef
A list of
Compiler.VMCode.VMDef
definitions
These lists contain:
函数
Top-level data definitions
Runtime crashes which represent unfilled holes, explicit calls by the user to
idris_crash
, and unreachable branches in case treesForeign call constructs
The job of the custom code generation back-end is to transform one of the phase
encoded definitions (NamedDef
, LiftedDef
, CExp
, ANF
, or VM
)
into the intermediate representation of the code generator.
It can then run optimizations and generate some form of executable.
In summary, the code generator has to understand how to represent tagged data
and function applications (even if the function application is partial), how
to handle let expressions, how to implement and invoke primitive operations,
how to handle Erased
arguments, and how to do runtime crashes.
The implementor of the custom back-end should pick the closest Idris IR which fits to the abstraction of the technology that is aimed to compile to. The implementor should also consider how to transform the simple main expression which is represented in CExp. As Idris does not focus on memory management and threading. The custom back-end should model these concepts for the program that is compiled. One possible approach is to target a fairly high level language and reuse as much as possible from it for the custom back-end. Another possibility is to implement a runtime that is capable of handling memory management and threading.
Which Intermediate Representation (IR) should be consumed by the custom back-end?
Now lets turn our attention to the different intermediate representations (IRs)
that Idris provides.
When the getCompiledData
function is invoked with the Phase
parameter
it will produce a CompileData
record, which will contain lists of top-level
definitions that needs to be compiled. These are:
NamedDef
LiftedDef
ANFDef
VMDef
The question to answer here is: Which one should be picked? Which one fits to the custom back-end?
How to represent primitive values defined by the Core.TT.Constant
type?
After one selects the IR to be used during code generation, the next question to answer is how primitive types should be represented in the back-end. Idris has the following primitive types:
Int
Integer
(arbitrary precision)Bits(8/16/32/64)
Char
String
Double
WorldVal
(token for IO computations)
And as Idris allows pattern matching on types all the primitive types have their primitive counterpart for describing a type:
IntType
IntegerType
Bits(8/16/32/64)Type
StringType
CharType
DoubleType
WorldType
The representation of these primitive types should be a well-thought out design decision as it affects many parts of the code generation, such as conversion from the back-end values when FFI is involved, big part of the data during the runtime is represented in these forms. Representation of primitive types affect the possible optimisation techniques, and they also affect the memory management and garbage collection.
There are two special primitive types: String and World.
String
As its name suggest this type represent a string of characters. As mentioned in Primitive FFI Types, Strings are encoded in UTF-8.
It is not always clear who is responsible for freeing up a String
created by
a component other than the Idris runtime. Strings created in Idris will
always have value, unlike possible String representation of the host technology,
where for example NULL pointer can be a value, which can not happen on the Idris side.
This creates constraints on the possible representations of the Strings in the
custom back-end and diverging from the Idris representation is not a good idea.
The best approach here is to build a conversion layer between the string
representation of the custom back-end and the runtime.
World
In pure functional programming, causality needs to be represented whenever we want to maintain the order in which subexpressions are executed. In Idris a token is used to chain IO function calls. This is an abstract notion about the state of the world. For example this information could be the information that the runtime needs for bookkeeping of the running program.
The WorldVal
value in Idris programs is accessed via the primIO
construction which leads us to the PrimIO
module.
Let’s see the relevant snippets:
data IORes : Type -> Type where
MkIORes : (result : a) -> (1 x : %World) -> IORes a
fromPrim : (1 fn : (1 x : %World) -> IORes a) -> IO a
fromPrim op = MkIO op
primIO : HasIO io => (1 fn : (1 x : %World) -> IORes a) -> io a
primIO op = liftIO (fromPrim op)
The world value is referenced as %World
in Idris.
It is created by the runtime when the program starts.
Its content is changed by the custom runtime.
More precisely, the World is created when the WorldVal
is evaluated during
the execution of the program.
This can happen when the program gets initialized or when an unsafePerformIO
function is executed.
How to represent Algebraic Data Types?
In Idris there are two different ways to define a data type: tagged unions are
introduced using the data
keyword while structs are declared via the
record
keyword.
Declaring a record
amounts to defining a named collection of fields.
Let’s see examples for both:
data Either a b
= Left a
| Right b
record Pair a b
constructor MkPair
fst : a
snd : b
Idris offers not only algebraic data types but also indexed families. These
are tagged union where different constructors may have different return types.
Here is Vect
an example of a data type which is an indexed family
corresponding to a linked-list whose length is known at compile time.
It has one index (of type Nat
) representing the length of the list (the
value of this index is therefore different for the []
and (::)
constructors) and a parameter (of type Type
) corresponding to the type
of values stored in the list.
data Vect : (size : Nat) -> Type -> Type where
Nil : Vect 0 a -- empty list: size is 0
(::) : a -> Vect n a -> Vect (1 + n) a -- extending a list of size n: size is 1+n
Both data and record are compiled to constructors in the intermediate
representations. Two examples of such Constructors are
Core.CompileExpr.CExp.CCon
and Core.CompileExpr.CDef.MkCon
.
Compiling the Either
data type will produce three constructor definitions
in the IR:
One for the
Either
type itself, with the arity of two. Arity tells how many parameters of the constructor should have. Two is reasonable in this case as the original IdrisEither
type has two parameters.One for the
Left
constructor with arity of three. Three may be surprising, as the constructor only has one argument in Idris, but we should keep in mind the type parameters for the data type too.One for the
Right
constructor with arity of three.
In the IR constructors have unique names. For efficiency reasons,
Idris assigns a unique integer tag to each data constructors so that constructor
matching is reduced to comparisons of integers instead of strings.
In the Either
example above Left
gets tag 0 and Right
gets tag 1.
Constructors can be considered structured information: a name
together with parameters.
The custom back-end needs to decide how to represent such data.
For example using Dict
in Python, JSON
in JavaScript, etc.
The most important aspect to consider is that these structured values
are heap related values, which should be created and stored dynamically.
If there is an easy way to map in the host technology, the memory management
for these values could be inherited. If not, then the host technology is
responsible for implementing an appropriate memory management.
For example RefC
is a C backend that implements its own memory management
based on reference counting.
How to implement special values?
Apart from the data constructors there are two special kind of values present
in the Idris IRs: type constructors and Erased
.
Type constructors
Type and data constructors that are not relevant for the program’s runtime behaviour may be used at compile butand will be erased from the intermediate representation.
However some type constructors need to be kept around even at runtime because pattern matching on types is allowed in Idris:
notId : {a : Type} -> a -> a
notId {a=Int} x = x + 1
notId x = x
Here we can pattern match on a
and ensure that notId
behaves differently
on Int
than all the other types.
This will generate an IR that will contain a Case
expression with two
branches:
one Alt
matching on the Int
type constructor
and a default for the non-Int
matching part of the notId
function.
This is not that special: Type
is a bit like an infinite data type that
contains all of the types a user may ever declare or use.
This can be handled in the back-end and host language using the same mechanisms
that were mobilised to deal with data constructors.
The reason for using the same approach is that in dependently typed languages,
the same language is used to form both type and value level expressions.
Compilation of type level terms will be the same as that of value level terms.
This is one of the things that make dependently typed abstraction elegant.
Erased
The other kind of special value is Erased
.
This is generated by the Idris compiler and part of the IR if the original value
is only needed during the type elaboration process. For example:
data Subset : (type : Type)
-> (pred : type -> Type)
-> Type
where
Element : (value : type)
-> (0 prf : pred value)
-> Subset type pred
Because prf
has quantity 0
, it is guaranteed to be erased during
compilation and thus not present at runtime.
Therefore prf
will be represented as Erased
in the IR.
The custom back-end needs to represent this value too as any other data value,
as it could occur in place of normal values.
The simplest approach is to implement it as a special data constructor and let
the host technology provided optimizations take care of its removal.
How to implement primitive operations?
Primitive operations are defined in the module Core.TT.PrimFn
.
The constructors of this data type represent the primitive operations that
the custom back-end needs to implement.
These primitive operations can be grouped as:
Arithmetic operations (
Add
,Sub
,Mul
,Div
,Mod
,Neg
)Bit operations (
ShiftL
,ShiftR
,BAnd
,BOr
,BXor
)Comparison operations (
LT
,LTE
,EQ
,GTE
,GT
)String operations (
Length
,Head
,Tail
,Index
,Cons
,Append
,Reverse
,Substr
)Double precision floating point operations (
Exp
,Log
,Sin
,Cos
,Tan
,ASin
,ACos
,ATan
,Sqrt
,Floor
,Ceiling
)Casting of numeric and string values
An unsafe cast operation
BelieveMe
A
Crash
operation taking a type and a string and creating a value at that type by raising an error.
BelieveMe
The primitive believe_me
is an unsafe cast that allows users to bypass the
typechecker when they know something to be true even though it cannot be proven.
For instance, assuming that Idris’ primitives are correctly implemented, it
should be true that if a boolean equality test on two Int
i
and j
returns True
then i
and j
are equal.
Such a theorem can be implemented by using believe_me
to cast Refl
(the constructor for proofs of a propositional equality) from i === i
to
i === j
. In this case, it should be safe to implement.
Boxing
Idris assumes that the back-end representation of the data is not strongly typed and that all the data type have the same kind of representation. This could introduce a constraint on the representation of the primitives and constructor represented data types. One possible solution is that the custom back-end should represent primitive data types the same way it does constructors, using special tags. This is called boxing.
Official backends represent primitive data types as boxed ones.
RefC: Boxes the primitives, which makes them easy to put on the heap.
Scheme: Prints the values that are a
Constant
as Scheme literals.
How to compile top-level definitions?
As mentioned earlier, Idris has 4 different IRs that are available in
the CompileData
record: Named
, LambdaLifted
, ANF
, and VMDef
.
When assembling the CompileData
we have to tell the Idris compiler which
level we are interested in.
The CompileData
contains lists of definitions that can be considered as top
level definitions that the custom back-end need to generate functions for.
There are four types of top-level definitions that the code generation back-end needs to support:
Function
Constructor
Foreign call
Error
Function contains a lambda calculus like expression.
Constructor represents a data or a type constructor, and it should be implemented as a function creating the corresponding data structure in the custom back-end.
A top-level foreign call defines an entry point for calling functions implemented outside the Idris program under compilation. The Foreign construction contains a list of Strings which are the snippets defined by the programmer, the type of the arguments and the return type of the foreign function. The custom back-end should generate a wrapper function. More on this on How to implement the Foreign Function Interface?
A top-level error definition represents holes in Idris programs, uses of
idris_crash
, or unreachable branches in a case tree.
Users may want to execute incomplete programs for testing purposes which is
fine as long as we never actually need the value of any of the holes.
Library writers may want to raise an exception if an unrecoverable error has
happened.
Finally, Idris compiles the unreachable branches of a case tree to runtime
error as it is dead code anyway.
How to compile IR expressions?
The custom back-end should decide which intermediate representation is used as a starting point. The result of the transformation should be expressions and functions of the host technology.
Definitions in ANF
and Lifted
are represented as a tree like expression,
where control flow is based on the Let
and Case
expressions.
Case expressions
There are two types of case expressions,
one for matching and branching on primitive values such as Int
,
and the second one is matching and branching on constructor values.
The two types of case expressions will have two different representation for
alternatives of the cases. These are ConstCase
(for matching on constant
values) and ConCase
(for matching on constructors).
Matching on constructors can be implemented as matching on their tags or, less efficiently, as matching on the name of the constructor. In both cases a match should bind the values of the constructor’s arguments to variables in the body of the matching branch. This can be implemented in various ways depending on the host technology: switch expressions, case with pattern matching, or if-then-else chains.
When pattern matching binds variables, the number of arguments can be different
from the arity of the constructor defined in top-level definitions and in
GlobalDef
. This is because all the arguments are kept around at typechecking
time, but the code generator for the case tree removes the ones which are marked
as erased. The code generator of the custom back-end also needs to remove the
erased arguments in the constructor implementation.
In GlobalDef
, eraseArg
contains this information, which can be used to
extract the number of arguments which needs to be kept around.
Creating values
Values can be created in two ways.
If the value is a primitive value, it will be handed to the back-end as
a PrimVal
. It should be compiled to a constant in the host language
following the design decisions made in
the ‘How to represent primitive values?’ section.
If it is a structured value (i.e. a Con
) it should be compiled to a function
in the host language which creates a dynamic value. Design decisions made for
‘How to represent constructor values?’ is going to have effect here.
Function calls
There are four types of function calls:
- Saturated function calls (all the arguments are there)
- Under-applied function calls (some arguments are missing)
- Primitive function calls (necessarily saturated, PrimFn
constructor)
- Foreign Function calls (referred to by its name)
The ANF
and Lifted
intermediate representations support under-applied
function calls (using the UnderApp
constructor in both IR).
The custom back-end needs to support partial application of functions and
creating closures in the host technology.
This is not a problem with back-ends like Scheme where we get the partial
application of a function for free.
But if the host language does not have this tool in its toolbox, the custom
back-end needs to simulate closures.
One possible solution is to manufacture a closure as a special object storing
the function and the values it is currently applied to and wait until all the
necessary arguments have been received before evaluating it.
The same approach is needed if the VMCode
IR was chosen for code generation.
Let bindings
Both the ANF
and Lifted
intermediate representations have a
Let
construct that lets users assign values to local variables.
These two IRs differ in their representation of bound variables.
Lifted
is a type family indexed by the List Name
of local variables
in scope. A variable is represented using LLocal
, a constructor that
stores a Nat
together with a proof that it points to a valid name in
the local scope.
ANF
is a lower level representation where this kind of guarantees are not
present anymore. A local variable is represented using the AV
constructor
which stores an AVar
whose definition we include below.
The ALocal
constructor stores an Int
that corresponds to the Nat
we would have seen in Lifted
.
The ANull
constructor refers to an erased variable and its representation
in the host language will depend on the design choices made in
the ‘How to represent Erased
values’ section.
VMDef specificities
VMDef
is meant to be the closest IR to machine code.
In VMDef
, all the definitions have been compiled to instructions for a small
virtual machine with registers and closures.
Instead of Let
expressions, there only are ASSIGN
statements
at this level.
Instead of Case
expressions binding variables when they successfully match
on a data constructor, CASE
picks a branch based on the constructor itself.
An extra operation called PROJECT
is introduced to explicitly extract a
constructor’s argument based on their position.
There are no App
or UnderApp
. Both are replaced by APPLY
which
applies only one value and creates a closure from the application. For erased
values the operation NULL
assigns an empty/null value for the register.
How to implement the Foreign Function Interface?
The Foreign Function Interface (FFI) plays a big role in running Idris programs.
The primitive operations which are mentioned above are functions for
manipulating values and those functions aren’t meant for complex interaction
with the runtime system.
Many of the primitive types can be thought of as abstract types provided via
external
and foreign functions to manipulate them.
The responsibility of the custom back-end and the host technology is to represent these computations the operationally correct way. The design decisions with respect to representing primitive types in the host technology will inevitably have effects on the design of the FFI.
Foreign Types
Originally Idris had an official back-end implementation in C. Even though
this has changed, the names in the types for the FFI kept their C prefix.
The Core.CompileExpr.CFType
contains the following definitions, many of
them one-to-one mapping from the corresponding primitive type, but some of
them needs explanation.
The foreign types are:
CFUnit
CFInt
CFUnsigned(8/16/32/64)
CFString
CFDouble
CFChar
CFFun
of typeCFType -> CFType -> CFType
Callbacks can be registered in the host technology via parameters that have CFFun type. The back-end should be able to handle functions that are defined in Idris side and compiled to the host technology. If the custom back-end supports higher order functions then it should be used to implement the support for this kind of FFI type.CFIORes
of typeCFType -> CFType
AnyPrimIO
defined computation will have this extra layer. Pure functions shouldn’t have any observable IO effect on the program state in the host technology implemented runtime. NOTE:IORes
is also used when callback functions are registered in the host technology.CFWorld
Represents the current state of the world. This should refer to a token that is passed around between function calls. The implementation of the World value should contain back-end specific values and information about the state of the Idris runtime.CFStruct
of typeString -> List (String, CFType) -> CFType
is the foreign type associated with theSystem.FFI.Struct
. It represents a C like structure in the custom back-end.prim__getField
andprim__setField
primitives should be implemented to support this CFType.CFUser
of typeName -> List CFType -> CFType
Types defined with [external] are represented withCFUser
. For exampledata MyType : Type where [external]
will be represented asCFUser Module.MyType []
CFBuffer
Foreign type defined forData.Buffer
. Although this is an external type, Idris builds on a random access buffer.CFPtr
ThePtr t
andAnyPtr
are compiled toCFPtr
Any complex structured data that can not be represented as a simple primitive can use this CFPtr to keep track where the value is used. In IdrisPtr t
is defined as external type.CFGCPtr
TheGCPtr t
andGCAnyPtr
are compiled toCFGCPtr
.GCPtr
is inferred from a Ptr value calling theonCollect
function and has a special property. TheonCollect
attaches a finalizer for the pointer which should run when the pointer is freed.
Examples
Let’s step back and look into how this is represented at the Idris source level.
The simplest form of a definition involving the FFI a function definition with
a %foreign
pragma. The pragma is passed a list of strings corresponding to
a mapping from backends to names for the foreign calls. For instance:
this function should be translated by the C back end as a call to the add
function defined in the smallc.c
file. In the FFI, Int
is translated to
CFInt
. The back-end assumes that the data representation specified in the
library file correspond to that of normal Idris values.
We can also define external
types like in the following examples:
Here ThreadID
is defined as an external type and this type will be
represented as CFUser "ThreadID" []
internally. The value which is
created by the scheme runtime will be considered as a black box.
The type of prim__fork
, once translated as a foreign type, is
[%World -> IORes Unit, %World] -> IORes Main.ThreadID
Here we see that the %World
is added to the IO computations.
The %World
parameter is always the last in the argument list.
For the FFI functions, the type information and the user defined string can
be found in the top-level definitions.
The custom back-end should use the definitions to generate wrapper code,
which should convert the types that are described by the CFType
to the
types that the function in the %foreign
directive needs..
How to compile modules?
The Idris compiler generates intermediate files for modules, the content of
the files are neither part of Lifted
, ANF
, nor VMCode
.
Because of this, when the compilation pipeline enters the stage of code
generation, all the information will be in one instance of the CompileData
record and the custom code generator back-end can process them as it would
see the whole program.
The custom back-end has the option to introduce some hierarchy for the functions in different namespaces and organize some module structure to let the host technology process the bits and pieces in different sized chunks. However, this feature is not in the scope of the Idris compiler.
It is worth noting that modules can be mutually recursive in Idris. So a direct compilation of Idris modules to modules in the host language may be unsuccessful.
How to embed code snippets?
A possible motivation for implementing a custom back-end for Idris is to generate code that is meant to be used in a larger project. This project may be bound to another language that has many useful librarie but could benefit from relying on Idris’ strong type system in places.
When writing a code generator for this purpose, the interoperability of the host technology and Idris based on the Foreign Interface can be inconvenient. In this situation, the need to embed code of the host technology arises naturally. Elaboration can be an answer for that.
Elaboration is a typechecking time code generation technique.
It relies on the Elab
monad to write scripts that can interact with the
typechecking machinery to generate Idris code in Core.TT
.
When code snippets need to be embedded a custom library should be provided
with the custom back-end to turn the valid code snippets into their
representation in Core.TT
.
What should the runtime system support?
As a summary, a custom back-end for the Idris compiler should create an environment in the host technology that is able to run Idris programs. As Idris is part of the family of functional programming languages, its computation model is based on graph reduction. Programs represented as simple graphs in the memory are based on the closure creation mechanism during evaluation. Closure creation exist even on the lowest levels of IRs. For that reason any runtime in any host technology needs to support some kind of representation of closures and be able to store them on the heap, thus the responsibility of memory management falls on the lap of the implementor of the custom back-end. If the host technology has memory management, the problem is not difficult. It is also likely that storing closures can be easily implemented via the tools of the host technology.
Although it is not clear how much functionality a back-end should support. Tools from the Scheme back-end are brought into the Idris world via external types and primitive operations around them. This is a good practice and gives the community the ability to focus on the implementation of a quick compiler for a dependently typed language. One of these hidden features is the concurrency primitives. These are part of the different libraries that could be part of the compiler or part of the contribution package. If the threading model is different for the host technology that the Idris default back-end inherits currently from the Scheme technology it could be a bigger piece of work.
IO in Idris is implemented using an abstract %World
value, which serves as token for
functions that operate interactively with the World through simple calls to the
underlying runtime system. The entry point of the program is the main function, which
has the type of the IO unit, such as main : IO ()
. This means that every
program which runs, starts its part of some IO computation. Under the hood this is
implemented via the creation of the %World
abstract value, and invoking the main
function, which is compiled to pass the abstract %World value for IO related
foreign or external operations.
There is an operation called unsafePerformIO
in the PrimIO
module.
The type signature of unsafePerformIO
tells us that it is capable of
evaluating an IO
computation in a pure context.
Under the hood it is run in exactly the same way the main
function is.
It manufactures a fresh %World
token and passes it to the IO
computations.
This leads to a design decision: How to
represent the state of the World, and how to
represent the world that is instantiated for the sake of the unsafePerformIO
operation via the
unsafeCreateWorld
? Both the mechanisms of main
and unsafeCreateWorld
use the %MkWorld
constructor, which will be compiled to WorldVal
and
its type to WorldType
, which means the implementation of the runtime
is responsible for creating the abstraction around the World. Implementation of an
abstract World value could be based on a singleton pattern, where we can have
just one world, or we could have more than one world, resulting in parallel
universes for unsafePerformIO
.
还有一些其它的代码生成器,它们不是Idris 2 主资源库的一部分,你可以在 Idris 2 维基上找到:
目前正在进行的工作是支持从 idris2 代码生成其他语言的库。
类库
这个编译指示告诉后端对一个给定的函数使用什么名字。
%nomangle
foo : Int -> Int
foo x = x + 1
在支持该功能的后端,该函数将被称为 foo
而不会被混淆,并带有命名空间。
如果您要使用的名称不是有效的 idris 标识符,则可以对已编译代码中显示的 idris 名称和函数使用不同的名称,例如。
%nomangle "$_baz"
baz : Int
baz = 42
你也可以为不同的后端指定不同的名字,类似于 %foreign 的方式
%nomangle "refc:idr_add_one"
"node:add_one"
plusOne : Bits32 -> Bits32
plusOne x = x + 1
自 Idris 1 以来的变化
Idris 2 主要向后兼容 Idris 1,但有一些小例外。本文档描述了这些变化,大致按照在实践中遇到它们的可能性排序。新特性在最后的章节 新的特性 中描述。
Type Driven Development with Idris: Updates Required 章节描述了这些更改如何影响由 Edwin Brady 撰写的 《使用 Idris 进行类型驱动开发》 <https://www.manning.com/books/type-driven-development-with-idris> `_ 一书中的代码,可从 `Manning 获得。
备注
Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/
新核心语言:类型中的数量
Idris 2 是基于 量化类型理论(QTT) ,这是由 Bob Atkey 和 Conor McBride 开发的核心语言。在实践中,Idris 2 中的每个变量都有一个 数量 与之相关。数量是的取值是下列其中之一:
0
,表示变量在运行时被 擦除1
,表示变量在运行时 正好使用一次不受限制 ,这与 Idris 1 的行为相同
有关这方面的更多详细信息,请参阅章节 多重性。在实践中,这可能会导致某些 Idris 1 程序由于尝试使用在运行时被擦除的参数而不能通过 Idris 2 的类型检查。
擦除
在 Idris 中,以小写字母开头的名称会自动绑定为类型中的隐式参数,例如在以下骨架定义中, n
、 a
和 m
是隐式绑定的:
append : Vect n a -> Vect m a -> Vect (n + m) a
append xs ys = ?append_rhs
编译依赖类型编程语言的困难之一是决定哪些参数在运行时使用,哪些可以安全地擦除。更重要的是,这也是编程时的困难之一:程序员如何 知道 什么时候会删除参数?
在 Idris 2 中,变量的数量告诉我们它在运行时是否可用。我们可以通过检查 REPL 上的孔来查看 append_rhs
作用域内变量的数量:
Main> :t append_rhs
0 m : Nat
0 a : Type
0 n : Nat
ys : Vect m a
xs : Vect n a
-------------------------------------
append_rhs : Vect (plus n m) a
0
旁边的 m
, a
和 n
表示它们在范用域内,但在运行时将会出现 0
次,也就是说,将会 保证 它们在运行时会被删除。
如果您在运行时使用隐式参数,这确实会在转换 Idris 1 程序时导致一些潜在的困难。例如,在 Idris 1 中,您可以获得向量的长度,如下所示:
vlen : Vect n a -> Nat
vlen {n} xs = n
这似乎是个好主意,因为它在恒定时间内运行并利用了类型级别的信息,但代价是 n
必须在运行时可用,所以在运行时我们总是需要如果我们调用 vlen
时可用的向量的长度。 Idris 1 可以推断出是否需要长度,但程序员没有简单的方法可以确定。
在 Idris 2 中,我们需要明确指出,在运行时需要 n
vlen : {n : Nat} -> Vect n a -> Nat
vlen xs = n
(顺便说一下,还要注意在 Idris 2 中,在类型中绑定的名字也可以在定义中使用,而不需要明确地重新绑定它们)
这也意味着,当你调用 vlen
时,你需要可用的长度。例如,这将产生一个错误
sumLengths : Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen ys
Idris 2 会报告:
vlen.idr:7:20--7:28:While processing right hand side of Main.sumLengths at vlen.idr:7:1--10:1:
m is not accessible in this context
这意味着它需要使用 m
作为参数传递给 vlen xs
,在这里它需要在运行时可用,但是 m
在 sumLengths
中不可用,因为它有多重性 0
。
我们可以通过将 sumLengths
的右侧替换成一个孔来更清楚地看到这一点……
sumLengths : Vect m a -> Vect n a -> Nat
sumLengths xs ys = ?sumLengths_rhs
…然后在REPL检查孔的类型:
Main> :t sumLengths_rhs
0 n : Nat
0 a : Type
0 m : Nat
ys : Vect n a
xs : Vect m a
-------------------------------------
sumLengths_rhs : Nat
相反,我们需要为 m
和 n
提供无限制多重性的绑定
sumLengths : {m, n : _} -> Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen xs
请记住,在绑定器上不给出多重性,就像这里的 m
和 n
一样,意味着变量的使用不受限制。
如果你要将 Idris 1 程序转换到 Idris 2 中使用,这可能是你需要考虑的最大问题。但需要注意的是,如果你有绑定的隐式参数,例如…
excitingFn : {t : _} -> Coffee t -> Moonbase t
…那么最好确保 t
真的被需要,否则由于运行时间不必要地建立 t
的实例,性能可能会受到影响!
关于擦除的最后一点说明:试图对一个具有多重性 0
的参数进行模式匹配是一个错误,,除非其值可以从其他地方推断出来。因此,下面的定义会被拒绝
badNot : (0 x : Bool) -> Bool
badNot False = True
badNot True = False
这被拒绝了,错误是:
badnot.idr:2:1--3:1:Attempt to match on erased argument False in
Main.badNot
然而,下面的情况是好的,因为在 sNot
中,尽管我们似乎在被删除的参数 x
上进行了匹配,但它的值是可以从第二个参数的类型中唯一推断出来的
data SBool : Bool -> Type where
SFalse : SBool False
STrue : SBool True
sNot : (0 x : Bool) -> SBool x -> Bool
sNot False SFalse = True
sNot True STrue = False
到目前为止,Idris 2 的经验表明,在大多数情况下,只要你在 Idris 1 程序中使用非绑定隐式参数,它们在 Idris 2 中无需过多修改即可工作。 Idris 2 类型检查器将指出你在运行时需要非绑定隐式参数的地方–有时这既令人惊讶又具有启发性!
线性
多重性为 1 的线性参数的完整细节在章节 多重性 中给出。简而言之,多重性 1
背后的直觉是,如果我们有一个具有以下形式的函数……
f : (1 x : a) -> b
…那么类型系统提供的保证是 if f x
只使用一次,然后 x
在此过程中只使用一次 。
Prelude 和 base
库
Idris 1 中的 Prelude 包含很多定义,其中许多很少需要。 Idris 2 中的哲学是不同的。 (相当模糊的)经验法则是它应该包含几乎所有非平凡程序所需的基本功能。
这是一个模糊的规范,因为不同的程序员会考虑不同的东西绝对必要,但结果是它包含:
细化器可以脱糖的任何东西(例如元组、
()
、=
)基本类型
Bool
,Nat
,List
,Stream
,Dec
,Maybe
,Either
最重要的实用函数:
id
、the
、composition 等基本类型和基本类型的算术接口和实现
基本的
Char
和String
操作Show
,Eq
,Ord
,以及 Prelude 中所有类型的实现基本证明的接口和函数(
cong
、Uninhabited
等)Semigroup
,Monoid
Functor
,Applicative
,Monad
和相关函数Foldable
,Alternative
和Traversable
Range
,用于列表区间语法控制台
IO
任何不适合此处的内容都已移至 base
库。在其他地方,您可以找到一些曾经在 prelude 中的函数:
Data.List
和Data.Nat
Data.Maybe
和Data.Either
System.File
和System.Directory
,(文件管理以前是 Prelude 的一部分)Decidable.Equality
较小的变化
有歧义名称的解析
Idris 1 非常努力地按类型解析有歧义的名称,即使这涉及与接口解析的一些复杂交互。这有时可能是导致类型检查时间过长的原因。 Idris 2 简化了这一点,代价是有时需要对有歧义的名称进行更多的程序员注释。
作为一般规则,Idris 2 将能够区分具有不同具体返回类型(例如数据构造函数)或具有不同具体参数类型(例如记录投影)的名称。如果一个名称需要解析接口,则可能难以解决歧义。如果无法立即解析名称,它将推迟解析,但与 Idris 1 不同,它不会尝试显着回溯。如果你有深度嵌套的有歧义名称(超过一个小阈值,默认为 3),Idris 2 将报告错误。您可以使用指令更改此阈值,例如:
%ambiguity_depth 10
然而,在这种情况下,明确地消除歧义肯定是一个更好的主意。
实际上,一般来说,如果您遇到名称歧义错误,最好的方法是明确给出命名空间。您还可以在局部重新绑定名称:
Main> let (::) = Prelude.(::) in [1,2,3]
[1, 2, 3]
剩下的一个困难是解决有歧义的名称,其中一种可能是接口方法,另一种可能是具体的顶级函数。例如,我们可能有:
Prelude.(>>=) : Monad m => m a -> (a -> m b) -> m b
LinearIO.(>>=) : (1 act : IO a) -> (1 k : a -> IO b) -> IO b
作为一个务实的选择,如果在更具体的名称有效的上下文中进行类型检查(此处为 LinearIO.(>>=)
,因此如果对已知具有类型 IO t
的表达式 t
进行类型检查),将选择更具体的名称。
这在某种程度上令人不满意,所以我们将来可能会重新审视这个!
模块、命名空间和导出
由 private
、 export
和 public export
修饰符控制的可见性规则现在指的是来自其他 命名空间 的名称的可见性,而不是其他 文件 。
因此,如果您有以下内容,且所有内容都在同一个文件中…
namespace A
private
aHidden : Int -> Int
aHidden x = x * 2
export
aVisible : Int -> Int
aVisibile x = aHidden x
namespace B
export
bVisible : Int -> Int
bVisible x = aVisible (x * 2)
…然后 bVisible
可以访 aVisible
,但不能访问 aHidden
。
和以前一样,记录在它们自己的命名空间中定义,但字段始终在父命名空间中可见。
此外,模块名称现在必须与定义它们的文件名匹配,但模块 “Main” 除外,它可以在任何名称的文件中定义。
%language
编译指示
Idris 1 中有几个 %language
编译指示,它们定义了各种实验性扩展。这些在 Idris 2 中都不可用,尽管将来可能会定义扩展。
还删除了用于默认可见性的 %access
编译指示,而是在每个声明上使用可见性修饰符。
let
绑定
let
绑定,即 let x = val in e
形式的表达式具有稍微不同的行为。以前,您可以依赖 e
作用域内的 x
的计算行为,因此类型检查可以考虑 x
替换为 val
。不幸的是,这导致了 case
和 with
子句的复杂化:如果我们想保留计算行为,我们需要对 case
和 with
的阐述方式进行重大改变。
所以,为了简单和一致(实际上,因为我没有足够的时间来解决 case
和 with
的问题)上面的表达式 let x = val in e
相当于 (\x => e) val
。
所以, let
现在有效地概括了一个复杂的子表达式。如果您确实需要定义的计算行为,现在可以使用局部函数定义来代替 - 请参阅下面的 局部函数定义 章节。
此外,还可以使用替代语法 let x := val in e
。有关更多信息,请参见 let 绑定 章节。
auto
-隐式和接口
接口和 auto
-隐式参数是相似的,因为它们调用表达式搜索机制来查找参数的值。在 Idris 1 中,它们是分开实现的,但在 Idris 2 中,它们使用相同的机制。考虑以下 fromMaybe
的 total 定义:
data IsJust : Maybe a -> Type where
ItIsJust : IsJust (Just val)
fromMaybe : (x : Maybe a) -> {auto p : IsJust x} -> a
fromMaybe (Just x) {p = ItIsJust} = x
由于接口解析和 auto
- 隐式现在是同一个东西, fromMaybe
的类型可以写成:
fromMaybe : (x : Maybe a) -> IsJust x => a
所以现在,约束箭头 =>
意味着参数将通过 auto
隐式搜索找到。
在定义 data
类型时,可以通过为数据类型提供选项来控制 auto
隐式搜索将如何进行。例如:
data Elem : (x : a) -> (xs : List a) -> Type where
[search x]
Here : Elem x (x :: xs)
There : Elem x xs -> Elem x (y :: xs)
search x
选项意味着 auto
-隐式搜索类型为 Elem t ts
的值将在类型检查器解析值 t
后立即开始,即使 ts
仍然未知。
默认情况下, auto
- 隐式搜索使用数据类型的构造函数作为搜索提示。数据类型上的 noHints
选项会关闭此行为。
您可以使用函数上的 %hint
选项添加自己的搜索提示。例如:
data MyShow : Type -> Type where
[noHints]
MkMyShow : (myshow : a -> String) -> MyShow a
%hint
showBool : MyShow Bool
showBool = MkMyShow (\x => if x then "True" else "False")
myShow : MyShow a => a -> String
myShow @{MkMyShow myshow} = myshow
在这种情况下,搜索 MyShow Bool
会找到 showBool
,如果我们尝试在 REPL 中对 myShow True
求值可以看到:
Main> myShow True
"True"
事实上,这就是接口的详细说明。然而, %hint
应该小心使用。提示过多会导致搜索空间过大!
记录字段
现在可以通过 .
访问记录字段。例如,如果您有:
record Person where
constructor MkPerson
firstName, middleName, lastName : String
age : Int
并且您有一条记录 fred:Person
,那么您可以使用 fred.firstName
访问 firstName
字段。
完全性和覆盖性
%default covering
现在是默认状态,因此所有函数必须覆盖所有输入,除非另有说明 partial
注释,或切换到 %default partial``(不推荐 - 使用 ``partial
注释来代替函数是部分的最小可能位置)。
构建制品
这并不是真正的语言更改,而是 Idris 保存检查文件的方式的更改,并且仍然有用。所有检查的模块现在都保存在源代码树的根目录中的 build/ttc 目录中,目录结构遵循源目录结构。可执行文件放置在 build/exec
中。
包
对其他包的依赖现在用 depends
字段表示, pkgs
字段不再被识别。此外,具有 URLS 或其他字符串数据(模块或包名称除外)的字段必须用双引号引起来。例如:
package lightyear
sourceloc = "git://git@github.com:ziman/lightyear.git"
bugtracker = "http://www.github.com/ziman/lightyear/issues"
depends = effects
modules = Lightyear
, Lightyear.Position
, Lightyear.Core
, Lightyear.Combinators
, Lightyear.StringFile
, Lightyear.Strings
, Lightyear.Char
, Lightyear.Testing
新的特性
除了将核心语言更改为使用上述定量类型理论之外,还有其他几个新特性。
局部函数定义
现在可以使用 let
块在本地定义函数。例如,以下示例中的 greet
,它是在局部变量 x
的作用域内定义的:
chat : IO ()
chat
= do putStr "Name: "
x <- getLine
let greet : String -> String
greet msg = msg ++ " " ++ x
putStrLn (greet "Hello")
putStrLn (greet "Bye")
这些“ et
块可以在任何地方使用(在上面的 do
块中间,也可以在任何函数中,或在类型声明中)。 where
块现在通过翻译成局部 let
来阐述。
然而,Idris 不再尝试推断在 where
块中定义的函数的类型,因为这很脆弱。如果我们能想出一个好的、可预测的方法,这可能会被恢复。
隐式参数的作用域
类型中的隐式参数现在在定义主体的作用域内。我们已经在上面看到了,其中 n
自动在 vlen
的主体作用域内:
vlen : {n : Nat} -> Vect n a -> Nat
vlen xs = n
在使用 where
块或局部定义时记住这一点很重要,因为在声明局部定义的 type 时,作用域内的名称也将在作用域内。例如,下面的定义,我们试图为 Vect
定义我们自己的 Show
版本,将无法进行类型检查:
showVect : Show a => Vect n a -> String
showVect xs = "[" ++ showBody xs ++ "]"
where
showBody : Vect n a -> String
showBody [] = ""
showBody [x] = show x
showBody (x :: xs) = show x ++ ", " ++ showBody xs
This fails because n
is in scope already, from the type of showVect
,
in the type declaration for showBody
, and so the first clause showBody
[]
will fail to type check because []
has length Z
, not n
. We can
fix this by locally binding n
:
showVect : Show a => Vect n a -> String
showVect xs = "[" ++ showBody xs ++ "]"
where
showBody : forall n . Vect n a -> String
...
Or, alternatively, using a new name:
showVect : Show a => Vect n a -> String
showVect xs = "[" ++ showBody xs ++ "]"
where
showBody : Vect n' a -> String
...
Idris 1 took a different approach here: names which were parameters to data types were in scope, other names were not. The Idris 2 approach is, we hope, more consistent and easier to understand.
Function application syntax additions
From now on you can utilise the new syntax of function applications:
f {x1 [= e1], x2 [= e2], ...}
There are three additions here:
More than one argument can be written in braces, separated with commas:
record Dog where
constructor MkDog
name : String
age : Nat
-- Notice that `name` and `age` are explicit arguments.
-- See paragraph (2)
haveADog : Dog
haveADog = MkDog {name = "Max", age = 3}
pairOfStringAndNat : (String, Nat)
pairOfStringAndNat = MkPair {x = "year", y = 2020}
myPlus : (n : Nat) -> (k : Nat) -> Nat
myPlus {n = Z , k} = k
myPlus {n = S n', k} = S (myPlus n' k)
twoPlusTwoIsFour : myPlus {n = 2, k = 2} === 4
twoPlusTwoIsFour = Refl
Arguments in braces can now correspond to explicit, implicit and auto implicit dependent function types (
Pi
types), provided the domain type is named:
myPointlessFunction : (exp : String) -> {imp : String} -> {auto aut : String} -> String
myPointlessFunction exp = exp ++ imp ++ aut
callIt : String
callIt = myPointlessFunction {imp = "a ", exp = "Just ", aut = "test"}
Order of the arguments doesn’t matter as long as they are in braces and the names are distinct. It is better to stick named arguments in braces at the end of your argument list, because regular unnamed explicit arguments are processed first and take priority:
myPointlessFunction' : (a : String) -> String -> (c : String) -> String
myPointlessFunction' a b c = a ++ b ++ c
badCall : String
badCall = myPointlessFunction' {a = "a", c = "c"} "b"
This snippet won’t type check, because “b” in badCall
is passed first,
although logically we want it to be second.
Idris will tell you that it couldn’t find a spot for a = "a"
(because “b” took its place),
so the application is ill-formed.
Thus if you want to use the new syntax, it is worth naming your Pi
types.
Multiple explicit arguments can be “skipped” more easily with the following syntax:
f {x1 [= e1], x2 [= e2], ..., xn [= en], _}
or
f {}
in case none of the named arguments are wanted.
Examples:
import Data.Nat
record Four a b c d where
constructor MkFour
x : a
y : b
z : c
w : d
firstTwo : Four a b c d -> (a, b)
firstTwo $ MkFour {x, y, _} = (x, y)
-- firstTwo $ MkFour {x, y, z = _, w = _} = (x, y)
dontCare : (x : Nat) -> Nat -> Nat -> Nat -> (y : Nat) -> x + y = y + x
dontCare {} = plusCommutative {}
--dontCare _ _ _ _ _ = plusCommutative _ _
Last rule worth noting is the case of named applications with repeated argument names, e.g:
data WeirdPair : Type -> Type -> Type where
MkWeirdPair : (x : a) -> (x : b) -> WeirdPair a b
weirdSnd : WeirdPair a b -> b
--weirdSnd $ MkWeirdPair {x, x} = x
-- ^
-- Error: "Non linear pattern variable"
-- But that one is okay:
weirdSnd $ MkWeirdPair {x = _, x} = x
In this example the name x
is given repeatedly to the Pi
types of the data constructor MkWeirdPair
.
In order to deconstruct the WeirdPair a b
in weirdSnd
, while writing the left-hand side of the pattern-matching clause
in a named manner (via the new syntax), we have to rename the first occurrence of x
to any fresh name or the _
as we did.
Then the definition type checks normally.
In general, duplicate names are bound sequentially on the left-hand side and must be renamed for the pattern expression to be valid.
The situation is similar on the right-hand side of pattern-matching clauses:
0 TypeOf : a -> Type
TypeOf _ = a
weirdId : {0 a : Type} -> (1 a : a) -> TypeOf a
weirdId a = a
zero : Nat
-- zero = weirdId { a = Z }
-- ^
-- Error: "Mismatch between: Nat and Type"
-- But this works:
zero = weirdId { a = Nat, a = Z }
Named arguments should be passed sequentially in the order they were defined in the Pi
types,
regardless of their (imp)explicitness.
Better inference
In Idris 1, holes (that is, unification variables arising from implicit arguments) were local to an expression, and if they were not resolved while checking the expression, they would not be resolved at all. In Idris 2, they are global, so inference works better. For example, we can now say:
test : Vect ? Int
test = [1,2,3,4]
Main> :t test
Main.test : Vect (S (S (S (S Z)))) Int
The ?
, incidentally, differs from _
in that _
will be bound as
an implicit argument if unresolved after checking the type of test
, but
?
will be left as a hole to be resolved later. Otherwise, they can be
used interchangeably.
Dependent case
case
blocks were available in Idris 1, but with some restrictions. Having
better inference means that case
blocks work more effectively in Idris 2,
and dependent case analysis is supported.
append : Vect n a -> Vect m a -> Vect (n + m) a
append xs ys
= case xs of
[] => ys
(x :: xs) => x :: append xs ys
The implicit arguments and original values are still available in the body of
the case
. Somewhat contrived, but the following is valid:
info : {n : _} -> Vect n a -> (Vect n a, Nat)
info xs
= case xs of
[] => (xs, n)
(y :: ys) => (xs, n)
Record updates
Dependent record updates work, provided that all relevant fields are updated
at the same time. Dependent record update is implemented via dependent case
blocks rather than by generating a specific update function for each field as
in Idris 1, so you will no longer get mystifying errors when trying to update
dependent records!
For example, we can wrap a vector in a record, with an explicit length field:
record WrapVect a where
constructor MkVect
purpose : String
length : Nat
content : Vect length a
Then, we can safely update the content
, provided we update the length
correspondingly:
addEntry : String -> WrapVect String -> WrapVect String
addEntry val = { length $= S,
content $= (val :: ) }
Another novelty - new update syntax (previous one still functional):
record Three a b c where
constructor MkThree
x : a
y : b
z : c
-- Yet another contrived example
mapSetMap : Three a b c -> (a -> a') -> b' -> (c -> c') -> Three a' b' c'
mapSetMap three@(MkThree x y z) f y' g = {x $= f, y := y', z $= g} three
The record
keyword has been discarded for brevity, symbol :=
replaces =
in order to not introduce any ambiguity.
Generate definition
A new feature of the IDE protocol supports generating complete definitions from a type signature. You can try this at the REPL, for example, given our favourite introductory example…
append : Vect n a -> Vect m a -> Vect (n + m) a
…assuming this is defined on line 3, you can use the :gd
command as
follows:
Main> :gd 3 append
append [] ys = ys
append (x :: xs) ys = x :: append xs ys
This works by a fairly simple brute force search, which tries searching for a valid right hand side, and case splitting on the left if that fails, but is remarkably effective in a lot of situations. Some other examples which work:
my_cong : forall f . (x : a) -> (y : a) -> x = y -> f x = f y
my_curry : ((a, b) -> c) -> a -> b -> c
my_uncurry : (a -> b -> c) -> (a, b) -> c
append : Vect n a -> Vect m a -> Vect (n + m) a
lappend : (1 xs : List a) -> (1 ys : List a) -> List a
zipWith : (a -> b -> c) -> Vect n a -> Vect n b -> Vect n c
This is available in the IDE protocol via the generate-def
command.
Chez Scheme target
The default code generator is, for the moment, Chez Scheme. Racket and Gambit code generators are also
available. Like Idris 1, Idris 2 supports plug-in code generation
to allow you to write a back end for the platform of your choice.
To change the code generator, you can use the :set cg
command:
Main> :set cg racket
Early experience shows that both are much faster than the Idris 1 C code generator, in both compile time and execution time (but we haven’t done any formal study on this yet, so it’s just anecdotal evidence).
Type Driven Development with Idris: Updates Required
The code in the book Type-Driven Development with Idris by Edwin Brady, available from Manning, will mostly work in Idris 2, with some small changes as detailed in this document. The updated code is also [going to be] part of the test suite (see tests/typedd-book in the Idris 2 source).
If you are new to Idris, and learning from the book, we recommend working through the first 3-4 chapters with Idris 1, to avoid the need to worry about the changes described here. After that, refer to this document for any necessary changes.
Chapter 1
No changes necessary
Chapter 2
The Prelude is smaller than Idris 1, and many functions have been moved to the base libraries instead. So:
In Average.idr
, add:
import Data.String -- for `words`
import Data.List -- for `length` on lists
In AveMain.idr
and Reverse.idr
add:
import System.REPL -- for 'repl'
Chapter 3
Unbound implicits have multiplicity 0, so we can’t match on them at run-time.
Therefore, in Matrix.idr
, we need to change the type of createEmpties
and transposeMat
so that the length of the inner vector is available to
match on:
createEmpties : {n : _} -> Vect n (Vect 0 elem)
transposeMat : {n : _} -> Vect m (Vect n elem) -> Vect n (Vect m elem)
Chapter 4
For the reasons described above:
In
DataStore.idr
, addimport System.REPL
andimport Data.String
In
SumInputs.idr
, addimport System.REPL
In
TryIndex.idr
, add an implicit argument:
tryIndex : {n : _} -> Integer -> Vect n a -> Maybe a
In exercise 5 of 4.2, add an implicit argument:
sumEntries : Num a => {n : _} -> (pos : Integer) -> Vect n a -> Vect n a -> Maybe a
Chapter 5
There is no longer a Cast
instance from String
to Nat
, because its
behaviour of returing Z if the String
wasn’t numeric was thought to be
confusing and potentially error prone. Instead, there is stringToNatOrZ
in
Data.String
which at least has a clearer name. So:
In Loops.idr
and ReadNum.idr
add import Data.String
and change cast
to
stringToNatOrZ
In ReadNum.idr
, since functions must now be covering
by default, add
a partial
annotation to readNumber_v2
.
Chapter 6
In DataStore.idr
and DataStoreHoles.idr
, add import Data.String
and
import System.REPL
. Also in DataStore.idr
, the schema
argument to
display
is required for matching, so change the type to:
display : {schema : _} -> SchemaType schema -> String
In TypeFuns.idr
add import Data.String
Chapter 7
Abs
is now a separate interface from Neg
. So, change the type of eval
to include Abs
specifically:
eval : (Abs num, Neg num, Integral num) => Expr num -> num
Also, take abs
out of the Neg
implementation for Expr
and add an
implementation of Abs
as follows:
Abs ty => Abs (Expr ty) where
abs = Abs
Chapter 8
In AppendVec.idr
, add import Data.Nat
for the Nat
proofs
cong
now takes an explicit argument for the function to apply. So, in
CheckEqMaybe.idr
change the last case to:
checkEqNat (S k) (S j) = case checkEqNat k j of
Nothing => Nothing
Just prf => Just (cong S prf)
A similar change is necessary in CheckEqDec.idr
.
In ExactLength.idr
, the m
argument to exactLength
is needed at run time,
so change its type to:
exactLength : {m : _} ->
(len : Nat) -> (input : Vect m a) -> Maybe (Vect len a)
A similar change is necessary in ExactLengthDec.idr
. Also, DecEq
is no
longer part of the prelude, so add import Decidable.Equality
.
In ReverseVec.idr
, add import Data.Nat
for the Nat
proofs.
In Void.idr
, since functions must now be covering
by default, add
a partial
annotation to nohead
and its helper function getHead
.
In Exercise 2 of 8.2.5, the definition of reverse'
should be changed to
reverse' : Vect k a -> Vect m a -> Vect (k + m) a
, because the n
in reverse'
is otherwise bound to the same value as the n
in the signature of myReverse
.
Chapter 9
In
ElemType.idr
, addimport Decidable.Equality
In
Elem.idr
, addimport Data.Vect.Elem
In Hangman.idr
:
Add
import Data.String
,import Data.Vect.Elem
andimport Decidable.Equality
removeElem
pattern matches onn
, so it needs to be written in its type:
removeElem : {n : _} ->
(value : a) -> (xs : Vect (S n) a) ->
{auto prf : Elem value xs} ->
Vect n a
letters
is used byprocessGuess
, because it’s passed toremoveElem
:
processGuess : {letters : _} ->
(letter : Char) -> WordState (S guesses) (S letters) ->
Either (WordState guesses (S letters))
(WordState (S guesses) letters)
guesses
andletters
are implicit arguments togame
, but are used by the definition, so add them to its type:
game : {guesses : _} -> {letters : _} ->
WordState (S guesses) (S letters) -> IO Finished
In RemoveElem.idr
Add
import Data.Vect.Elem
removeElem
needs to be updated as above.
Chapter 10
Lots of changes necessary here, at least when constructing views, due to Idris
2 having a better (that is, more precise and correct!) implementation of
unification, and the rules for recursive with
application being tightened up.
In MergeSort.idr
, add import Data.List
In MergeSortView.idr
, add import Data.List
, and make the arguments to the
views explicit:
mergeSort : Ord a => List a -> List a
mergeSort input with (splitRec input)
mergeSort [] | SplitRecNil = []
mergeSort [x] | SplitRecOne x = [x]
mergeSort (lefts ++ rights) | (SplitRecPair lefts rights lrec rrec)
= merge (mergeSort lefts | lrec)
(mergeSort rights | rrec)
In the problem 1 of exercise 10-1, the rest
argument of the data
constructor Exact
of TakeN
must be made explicit.
data TakeN : List a -> Type where
Fewer : TakeN xs
Exact : (n_xs : List a) -> {rest : _} -> TakeN (n_xs ++ rest)
In SnocList.idr
, in my_reverse
, the link between Snoc rec
and xs ++ [x]
needs to be made explicit. Idris 1 would happily decide that xs
and x
were
the relevant implicit arguments to Snoc
but this was little more than a guess
based on what would make it type check, whereas Idris 2 is more precise in
what it allows to unify. So, x
and xs
need to be explicit arguments to
Snoc
:
data SnocList : List a -> Type where
Empty : SnocList []
Snoc : (x, xs : _) -> (rec : SnocList xs) -> SnocList (xs ++ [x])
Correspondingly, they need to be explicit when matching. For example:
my_reverse : List a -> List a
my_reverse input with (snocList input)
my_reverse [] | Empty = []
my_reverse (xs ++ [x]) | (Snoc x xs rec) = x :: my_reverse xs | rec
Similar changes are necessary in snocListHelp
and my_reverse_help
. See
tests/typedd-book/chapter10/SnocList.idr for the full details.
Also, in snocListHelp
, input
is used at run time so needs to be bound
in the type:
snocListHelp : {input : _} ->
(snoc : SnocList input) -> (rest : List a) -> SnocList (input +
It’s no longer necessary to give {input}
explicitly in the patterns for
snocListHelp
, although it’s harmless to do so.
In IsSuffix.idr
, the matching has to be written slightly differently. The
recursive with application in Idris 1 probably shouldn’t have allowed this!
Note that the Snoc
- Snoc
case has to be written first otherwise Idris
generates a case tree splitting on input1
and input2
instead of the
SnocList
objects and this leads to a lot of cases being detected as missing.
isSuffix : Eq a => List a -> List a -> Bool
isSuffix input1 input2 with (snocList input1, snocList input2)
isSuffix _ _ | (Snoc x xs xsrec, Snoc y ys ysrec)
= (x == y) && (isSuffix _ _ | (xsrec, ysrec))
isSuffix _ _ | (Empty, s) = True
isSuffix _ _ | (s, Empty) = False
This doesn’t yet get past the totality checker, however, because it doesn’t know about looking inside pairs.
For the VList
view in the exercise 4 after Chapter 10-2 import Data.List.Views.Extra
from contrib
library.
In DataStore.idr
: Well this is embarrassing - I’ve no idea how Idris 1 lets
this through! I think perhaps it’s too “helpful” when solving unification
problems. To fix it, add an extra parameter schema
to StoreView
, and change
the type of SNil
to be explicit that the empty
is the function defined in
DataStore
. Also add entry
and store
as explicit arguments to SAdd
:
data StoreView : (schema : _) -> DataStore schema -> Type where
SNil : StoreView schema DataStore.empty
SAdd : (entry, store : _) -> (rec : StoreView schema store) ->
StoreView schema (addToStore entry store)
Since size
is as explicit argument in the DataStore
record, it also needs
to be relevant in the type of storeViewHelp
:
storeViewHelp : {size : _} ->
(items : Vect size (SchemaType schema)) ->
StoreView schema (MkData size items)
In TestStore.idr
:
In
listItems
,empty
needs to beDataStore.empty
to be explicit that you mean the functionIn
filterKeys
, there is an error in theSNil
case, which wasn’t caught because of the type ofSNil
above. It should be:
filterKeys test DataStore.empty | SNil = []
Chapter 11
In Streams.idr
add import Data.Stream
for iterate
.
In Arith.idr
and ArithTotal.idr
, the Divides
view now has explicit
arguments for the dividend and remainder, so they need to be explicit in
bound
:
bound : Int -> Int
bound x with (divides x 12)
bound ((12 * div) + rem) | (DivBy div rem prf) = rem + 1
In addition, import Data.Bits
has to be added for shiftR
, which
now uses a safer type for the number of shifts:
randoms : Int -> Stream Int
randoms seed = let seed' = 1664525 * seed + 1013904223 in
(seed' `shiftR` 2) :: randoms seed'
In ArithCmd.idr
, update DivBy
, randoms
, and import Data.Bits
as above. Also add import Data.String
for String.toLower
.
In ArithCmd.idr
, update DivBy
, randoms
, import Data.Bits
and
import Data.String
as above. Also, since export rules are per-namespace
now, rather than per-file, you need to export (>>=)
from the namespaces
CommandDo
and ConsoleDo
.
In ArithCmdDo.idr
, since (>>=)
is export
, Command
and ConsoleIO
also have to be export
. Also, update randoms
and import Data.Bits
as above.
In StreamFail.idr
, add a partial
annotation to labelWith
.
In order to support do
notation for custom types (like RunIO
), you need to implement (>>=)
for binding values in a do
block and (>>)
for sequencing computations without binding values. See tests for complete implementations.
For instance, the following do block is desugared to foo >>= (\x => bar >>= (\y => baz x y))
:
do
x <- foo
y <- bar
baz x y
while the following is converted to foo >> bar >> baz
:
do
foo
bar
baz
Chapter 12
For reasons described above: In ArithState.idr
, add import Data.String
and import Data.Bits
and update randoms
. Also the (>>=)
operators
need to be set as export
since they are in their own namespaces, and in
getRandom
, DivBy
needs to take additional arguments div
and
rem
.
In ArithState.idr
, since (>>=)
is export
, Command
and ConsoleIO
also have to be export
.
evalState from Control.Monad.State now takes the stateType
argument first.
Chapter 13
In StackIO.idr
:
tryAdd
pattern matches onheight
, so it needs to be written in its type:
tryAdd : {height : _} -> StackIO height
height
is also an implicit argument tostackCalc
, but is used by the definition, so add it to its type:
stackCalc : {height : _} -> StackIO height
In
StackDo
namespace, export(>>=)
:
namespace StackDo
export
(>>=) : StackCmd a height1 height2 ->
(a -> Inf (StackIO height2)) -> StackIO height1
(>>=) = Do
In Vending.idr
:
Add
import Data.String
and changecast
tostringToNatOrZ
instrToInput
In
MachineCmd
type, add an implicit argument to(>>=)
data constructor:
(>>=) : {state2 : _} ->
MachineCmd a state1 state2 ->
(a -> MachineCmd b state2 state3) ->
MachineCmd b state1 state3
In
MachineIO
type, add an implicit argument toDo
data constructor:
data MachineIO : VendState -> Type where
Do : {state1 : _} ->
MachineCmd a state1 state2 ->
(a -> Inf (MachineIO state2)) -> MachineIO state1
runMachine
pattern matches oninState
, so it needs to be written in its type:
runMachine : {inState : _} -> MachineCmd ty inState outState -> IO ty
In
MachineDo
namespace, add an implicit argument to(>>=)
and export it:
namespace MachineDo
export
(>>=) : {state1 : _} ->
MachineCmd a state1 state2 ->
(a -> Inf (MachineIO state2)) -> MachineIO state1
(>>=) = Do
vend
andrefill
pattern match onpounds
andchocs
, so they need to be written in their type:
vend : {pounds : _} -> {chocs : _} -> MachineIO (pounds, chocs)
refill: {pounds : _} -> {chocs : _} -> (num : Nat) -> MachineIO (pounds, chocs)
pounds
andchocs
are implicit arguments tomachineLoop
, but are used by the definition, so add them to its type:
machineLoop : {pounds : _} -> {chocs : _} -> MachineIO (pounds, chocs)
Chapter 14
In ATM.idr
:
Add
import Data.String
and changecast
tostringToNatOrZ
inrunATM
In Hangman.idr
, add:
import Data.Vect.Elem -- `Elem` now has its own submodule
import Data.String -- for `toUpper`
import Data.List -- for `nub`
In
Loop
namespace, exportGameLoop
type and its data constructors:
namespace Loop
public export
data GameLoop : (ty : Type) -> GameState -> (ty -> GameState) -> Type where
(>>=) : GameCmd a state1 state2_fn ->
((res : a) -> Inf (GameLoop b (state2_fn res) state3_fn)) ->
GameLoop b state1 state3_fn
Exit : GameLoop () NotRunning (const NotRunning)
letters
andguesses
are used bygameLoop
, so they need to be written in its type:
gameLoop : {letters : _} -> {guesses : _} ->
GameLoop () (Running (S guesses) (S letters)) (const NotRunning)
In
Game
type, add an implicit argumentletters
toInProgress
data constructor:
data Game : GameState -> Type where
GameStart : Game NotRunning
GameWon : (word : String) -> Game NotRunning
GameLost : (word : String) -> Game NotRunning
InProgress : {letters : _} -> (word : String) -> (guesses : Nat) ->
(missing : Vect letters Char) -> Game (Running guesses letters)
removeElem
pattern matches onn
, so it needs to be written in its type:
removeElem : {n : _} ->
(value : a) -> (xs : Vect (S n) a) ->
{auto prf : Elem value xs} ->
Vect n a
Chapter 15
包
Idris includes a system for building packages from a package description file. These files can be used with the Idris compiler to manage the development process of your Idris programs and packages.
Package Descriptions
A package description includes the following:
A header, consisting of the keyword
package
followed by the package name. Package names can be any valid Idris identifier. The iPKG format also takes a quoted version that accepts any valid filename.Fields describing package contents,
<field> = <value>
Packages can describe libraries, executables, or both, and should include
a version number. For library packages,
one field must be the modules field, where the value is a comma separated list
of modules to be installed. For example, a library test
which has two modules
Foo.idr
and Bar.idr
as source files would be written as follows:
package test
version = 0.0.1
modules = Foo, Bar
When installed, this will be in a directory test-0.1
. If the version
number is missing, it will default to 0
.
Other examples of package files can be found in the libs
directory
of the main Idris repository, and in third-party libraries.
Metadata
The iPKG format supports additional metadata associated with the package. The added fields are:
brief = "<text>"
, a string literal containing a brief description of the package.version = <version number>
, a semantic version number, which must be in the form of integers separated by dots (e.g.1.0.0
,0.3.0
,3.1.4
etc)langversion <version constraints>
, seedepends
below for a list of allowable constraints. For example,langversion >= 0.5.1 && < 1.0.0
readme = "<file>"
, location of the README file.license = "<text>"
, a string description of the licensing information.authors = "<text>"
, the author information.maintainers = "<text>"
, Maintainer information.homepage = "<url>"
, the website associated with the package.sourceloc = "<url>"
, the location of the DVCS where the source can be found.bugtracker = "<url>"
, the location of the project’s bug tracker.
Directories
sourcedir = "<dir>"
, the directory to look for Idris source files.builddir = "<dir>"
, the directory to put the checked modules and the artefacts from the code generator.outputdir = "<dir>"
, the directory where the code generator should output the executable.
Common Fields
Other common fields which may be present in an ipkg
file are:
executable = <output>
, which takes the name of the executable file to generate. Executable names can be any valid Idris identifier. the iPKG format also takes a quoted version that accepts any valid filename.Executables are placed in
build/exec
by default. The location can be changed by specifying theoutputdir
field.main = <module>
, which takes the name of the main module, and must be present if theexecutable
field is present.opts = "<idris options>"
, which allows options to be passed to Idris.depends = <pkg description> (',' <pkg description>)+
, a comma separated list of package names that the Idris package requires. Thepkg_description
is the package name, followed by an optional list of version constraints. Version constraints are separated by&&
and can use operators<
,<=
,>
,>=
,==
. For example, the following are valid package descriptions:contrib
(no constraints)contrib == 0.3.0
(an exact version constraint)contrib >= 0.3.0
(an inclusive lower bound)contrib >= 0.3.0 && < 0.4
(an inclusive lower bound, and exclusive upper bound)
Using Package files
Given an Idris package file test.ipkg
it can be used with the Idris compiler as follows:
idris2 --build test.ipkg
will build all modules in the packageidris2 --install test.ipkg
will install the package to the global Idris library directory (that is$IDRIS2_PREFIX/idris-<version>/
), making the modules in itsmodules
field accessible by other Idris libraries and programs. Note that this doesn’t install any executables, just library modules.idris2 --clean test.ipkg
will clean the intermediate build files.idris2 --mkdoc test.ipkg
will generate HTML documentation for the package, output tobuild/docs
Once the test package has been installed, the command line option
--package test
makes it accessible (abbreviated to -p test
).
For example:
idris -p test Main.idr
Where does Idris look for packages?
Compiled packages are directories with compiled TTC files (see 构建制品 section). Directory structure of the source *.idr files is preserved for TTC files.
Compiled packages can be installed globally (under $IDRIS2_PREFIX/idris-<version>/
as
described above) or locally (under a depends
subdirectory in the top level
working directory of a project).
Packages specified using -p pkgname
or with the depends
field of a
package will then be located as follows:
First, Idris looks in
depends/pkgname-<version>
, for a package which satisfies the version constraint.If no package is found locally, Idris looks in
$IDRIS2_PREFIX/idris-<version>/pkgname-<version>
.
In each case, if more than one version satisfies the constraint, it will choose
the one with the highest version number.
If package versions are omitted in directory names, they are treated as the version 0
.
在哪里可以找到库
您可以在 github 上的 wiki 上找到 Idris 库的列表:https://github.com/idris-lang/Idris2/wiki/1-%5BLanguage%5D-Libraries
请随时在那里贡献您自己的库!最终,我们的目标是拥有一个包管理器来管理库和依赖项。我们还没有正式的,但(至少)有两个正在开发中:
构建 Idris 2 应用程序
关于使用 Control.App
构建Idris 2应用程序的教程。
备注
Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/
Idris 应用程序有 main : IO ()
作为一个入口点, 类 型 IO a
是对交互式操作的描述,它产生一个类型 a
的值。这对原语来说很好,但 IO
不支持异常,所以我们必须明确说明一个操作如何处理失败。另外,如果我们确实想支持异常,我们也要解释异常和线性(见章节 多重性 )如何交互。
在本教程中,我们描述了一个参数化类型 App
和一个相关的参数化类型 App1
,它们共同允许我们在考虑到异常和线性的情况下构造更大的应用程序。 App
和 App1
的目的是:
使得在其类型中表达一个函数所做的交互成为可能,而没有太多的符号开销。
与写在 IO 中相比,几乎没有性能开销。
与其他副作用相关的库和技术兼容,如代数副作用『algebraic effects』或单子变压器『monad transformers』。
足够易于使用和性能,它可以成为 所有 进行外部函数调用的库的基础,就像 IO 在 Idris 1 和 Haskell 中一样
与线性类型兼容,也就是说,它们应该表达一段代码是否是线性的(保证只执行一次而不抛出异常)或是否可能抛出异常。
我们首先介绍 App
,用一些小的示例程序,然后展示如何用异常、状态和其他接口来扩展它。
APP 介绍
App
声明在模块 Control.App
中,它是 base
库的一部分。它的参数是一个隐含的 Path
(说明程序的执行路径是线性的还是可能抛出异常),它有一个 default
值,程序可能会抛出一个 List Error
(一个可以抛出的异常类型列表, Error
是 Type
的同义词):
data App : {default MayThrow l : Path} ->
(es : List Error) -> Type -> Type
它的作用与 IO
相同,但支持抛出和捕获异常,并允许我们定义更多的由错误列表 es
参数化的约束性接口。例如,一个支持控制台IO的程序:
hello : Console es => App es ()
hello = putStrLn "Hello, App world!"
我们可以在一个完整的程序中使用它,如下所示:
module Main
import Control.App
import Control.App.Console
hello : Console es => App es ()
hello = putStrLn "Hello, App world!"
main : IO ()
main = run hello
或者,一个支持控制台IO的程序,携带一个 Int
的状态,标记为 Counter
:
data Counter : Type where
helloCount : (Console es, State Counter Int es) => App es ()
helloCount = do c <- get Counter
put Counter (c + 1)
putStrLn "Hello, counting world"
c <- get Counter
putStrLn ("Counter " ++ show c)
为了将其作为一个完整程序的一部分来运行,我们需要初始化状态。
main : IO ()
main = run (new 93 helloCount)
为了方便起见,我们可以一次性列出多个接口,使用 Control.App
中定义的函数 Has
来计算接口约束:
helloCount : Has [Console, State Counter Int] es => App es ()
0 Has : List (a -> Type) -> a -> Type
Has [] es = ()
Has (e :: es') es = (e es, Has es' es)
Path
的目的是说明一个程序是否可以抛出异常,这样我们就可以知道在哪里引用线性资源是安全的。它被声明如下:
data Path = MayThrow | NoThrow
App
的类型中 MayThrow
是默认的。我们希望这是最常见的情况。毕竟,现实中,大多数操作都有可能的失败模式,特别是那些与外部世界交互的操作。
0
在 Has
的声明中表示它只能在一个被擦除的上下文中运行,所以它在运行时永远不会被运行。为了在 IO
内运行一个 App
,我们使用一个初始错误列表 Init
(记住 Error
是 Type
的同义词):
Init : List Error
Init = [AppHasIO]
run : App {l} Init a -> IO a
将 Path
参数配合 l
使用,意味着我们可以为任何应用程序调用 run
,无论 Path
是 NoThrow
还是 MayThrow
。但是,在实践中,所有给 run
的应用程序都不会在顶层抛出,因为唯一可用的异常类型是 AppHasIO
。任何异常都会在 App
里面被引入和处理。
异常和状态
Control.App
主要是为了更容易管理有异常和状态的应用程序的常见情况。我们可以抛出和捕捉错误列表中列出的异常( es
参数为 App
),并引入新的全局状态。
异常
List Error
是一个错误类型的列表,可通过定义在 Control.App
中 Exception
接口使用:
interface Exception err e where
throw : err -> App e a
catch : App e a -> (err -> App e a) -> App e a
要该异常类型存在于错误列表中,我们就可以使用 throw
和 catch
处理异常类型 err
。可以通过 HasErr
谓词来检查,被定义在``Control.App`` 中:
data HasErr : Error -> List Error -> Type where
Here : HasErr e (e :: es)
There : HasErr e es -> HasErr e (e' :: es)
HasErr err es => Exception err es where ...
注意 Exception
上的 HasErr
约束:这是在 Idris 2 中 auto
隐式机制和接口解析机制相同的符号方便的地方。最后,我们可以通过 handle
引入新的异常类型,它运行可能抛出的代码块,处理任何异常:
handle : App (err :: e) a ->
(onok : a -> App e b) ->
(onerr : err -> App e b) -> App e b
添加状态
应用程序通常需要跟踪状态,我们在 App
中使用 Control.App
中定义的 State
类型支持这个原语:
data State : (tag : a) -> Type -> List Error -> Type
tag
只被用于区分不同的状态,在运行时是不需要,如用于访问和更新的 get
和 put
类型:
get : (0 tag : _) -> State tag t e => App {l} e t
put : (0 tag : _) -> State tag t e => (1 val : t) -> App {l} e ()
它们使用 auto
-implicit 来隐式传递带有相关 tag
的 State
,因此我们仅通过标签来引用状态。在前面的 helloCount
中,我们使用了一个空类型 Counter
作为标签:
data Counter : Type where -- complete definition
错误列表 e
用来确保状态只在其被引入的错误列表中可用。状态是用 new
引入的:
new : t -> (1 p : State tag t e => App {l} e a) -> App {l} e a
请注意,这个类型告诉我们 new
用这个状态运行程序正好一次。然而,我们通常不直接使用 State
和 Exception
,而是使用接口来约束错误列表中允许的操作。在内部, State
是通过 IORef
实现的,这主要是出于性能的考虑。
定义接口
Control.App
提供的运行 App
的唯一方法是通过 run
函数,它接收一个具体的错误列表 Init
。对这个错误列表的所有具体扩展都是通过 handle
以引入一个新的异常,或者 new
以引入一个新状态。为了有效地组成 App
程序,而不是笼统地引入具体的异常和状态,我们为在特定错误列表中工作的操作集合定义接口。
Console I/O 示例
我们已经看到了一个使用 Console
接口的初始示例,它在 Control.App.Console
中声明如下:
interface Console e where
putChar : Char -> App {l} e ()
putStr : String -> App {l} e ()
getChar : App {l} e Char
getLine : App {l} e String
它提供了用于写入和读取控制台的原语,并将路径参数推广到 |
意味着两者都不能抛出异常,因为它们必须在 NoThrow
和 MayThrow
上下文中工作。
为了在顶层 IO
程序中实现这一点,我们需要访问原始的 IO
操作。 Control.App
库为此定义了一个原语接口:
interface PrimIO e where
primIO : IO a -> App {l} e a
fork : (forall e' . PrimIO e' => App {l} e' ()) -> App e ()
我们使用 primIO
来调用 IO
函数。我们还有一个 fork
原语,它在支持 PrimIO
的新错误列表中启动一个新线程。请注意, fork
启动了一个新的错误列表 e
,因此状态仅在单个线程中可用。
PrimIO
有一个错误列表的实现,可以将空类型作为异常抛出。这意味着如果 PrimIO
是唯一可用的接口,我们不能抛出异常,这与 IO
的定义是一致的。这也允许我们在初始错误列表 Init
中使用 PrimIO
。
HasErr AppHasIO e => PrimIO e where ...
鉴于此,我们可以实现 Console
并在 IO
中运行我们的 hello
程序。它在 Control.App.Console
中实现如下:
PrimIO e => Console e where
putChar c = primIO $ putChar c
putStr str = primIO $ putStr str
getChar = primIO getChar
getLine = primIO getLine
示例:文件 I/O
控制台 I/O 可以直接实现,但大多数 I/O 操作可能会失败。例如,打开文件失败的原因有多种:文件不存在;用户拥有错误的权限等。在 Idris 中, IO
原语在其类型中反映了这一点:
openFile : String -> Mode -> IO (Either FileError File)
虽然精确,但当有很长的 IO
操作序列时,这会变得笨拙。使用 App
时,我们可以提供一个接口,当操作失败时抛出异常,并保证使用 handle
在顶层处理任何异常。我们首先在 Control.App.FileIO
中定义 FileIO
接口:
interface Has [Exception IOError] e => FileIO e where
withFile : String -> Mode ->
(onError : IOError -> App e a) ->
(onOpen : File -> App e a) ->
App e a
fGetStr : File -> App e String
fGetChars : File -> Int -> App e String
fGetChar : File -> App e Char
fPutStr : File -> String -> App e ()
fPutStrLn : File -> String -> App e ()
fflush : File -> App e ()
fEOF : File -> App e Bool
我们使用资源括号 - 将函数传递给 withFile
来处理打开的文件 - 而不是显式的 open
操作来打开文件,以确保文件句柄在完成时被清理。
还可以想象一个接口使用文件的线性资源,这在某些安全关键的上下文中可能是合适的,但对于大多数编程任务,异常应该就足够了。所有的操作都可能失败,接口明确表示,如果错误列表支持抛出和捕获 IOError
异常,我们只能实现 FileIO
。 IOError
在 Control.App
中定义。
例如,我们可以使用这个接口来实现 readFile
,如果在 withFile
中打开文件失败则抛出异常:
readFile : FileIO e => String -> App e String
readFile f = withFile f Read throw $ \h =>
do content <- read [] h
pure (concat content)
where
read : List String -> File -> App e (List String)
read acc h = do eof <- fEOF h
if eof then pure (reverse acc)
else do str <- fGetStr h
read (str :: acc) h
同样,这是在 Control.App.FileIO
中定义的。
要实现 FileIO
,我们需要通过 PrimIO
访问原始操作,以及在任何操作失败时抛出异常的能力。有了这个,我们可以如下实现 withFile
,例如:
Has [PrimIO, Exception IOError] e => FileIO e where
withFile fname m onError proc
= do Right h <- primIO $ openFile fname m
| Left err => onError (FileErr (toFileEx err))
res <- catch (proc h) onError
primIO $ closeFile h
pure res
...
鉴于 FileIO
的这个实现,我们可以运行 readFile
,前提是我们将它包装在一个顶级的 handle
函数中以处理 readFile
抛出的任何错误:
readMain : String -> App Init ()
readMain fname = handle (readFile fname)
(\str => putStrLn $ "Success:\n" ++ show str)
(\err : IOError => putStrLn $ "Error: " ++ show err)
线性资源
我们已经介绍了 App
用于编写交互式程序,使用接口来限制允许哪些操作,但还没有看到 Path
参数的作用。其目的是限制程序何时可以抛出异常,以了解允许线性资源使用的位置。 App
的绑定运算符定义如下(不是通过 Monad
):
data SafeBind : Path -> (l' : Path) -> Type where
SafeSame : SafeBind l l
SafeToThrow : SafeBind NoThrow MayThrow
(>>=) : SafeBind l l' =>
App {l} e a -> (a -> App {l=l'} e b) -> App {l=l'} e b
这种类型背后的直觉是,当对两个 App
程序进行排序时:
如果第一个动作可能抛出异常,那么整个程序就可能会抛出异常。
如果第一个动作不能抛出异常,那么第二个动作仍然可以抛出,整个程序也就会抛出异常。
如果两个动作都不会抛出异常,则整个程序都不会抛出异常。
类型中详细的原因是它对具有不同 Path
的程序进行排序很有用,但在这样做时,我们必须准确计算得到的 Path
。然后,如果我们想用线性变量对子程序进行排序,我们可以使用另一种绑定运算符来保证只运行一次延续:
bindL : App {l=NoThrow} e a ->
(1 k : a -> App {l} e b) -> App {l} e b
为了说明 bindL
的必要性,我们可以尝试编写一个程序来跟踪安全数据存储的状态,这需要在读取数据之前登录。
示例:需要登录的数据存储
许多软件组件依赖于某种形式的状态,并且可能存在仅在特定状态下有效的操作。例如,考虑一个安全的数据存储,用户必须在其中登录才能访问某些秘密数据。该系统可以处于以下两种状态之一:
LoggedIn
,允许用户在其中读取秘密LoggedOut
,用户无权访问机密
我们可以提供登录、注销和读取数据的命令,如下图所示:
login
命令,如果成功,将整个系统状态从 LoggedOut
移动到 LoggedIn
。 logout
命令将状态从 LoggedIn
移动到 LoggedOut
。最重要的是, readSecret
命令仅在系统处于 LoggedIn
状态时才有效。
我们可以使用线性类型的函数来表示状态转换。首先,我们定义一个用于连接和断开商店的接口:
interface StoreI e where
connect : (1 prog : (1 d : Store LoggedOut) ->
App {l} e ()) -> App {l} e ()
disconnect : (1 d : Store LoggedOut) -> App {l} e ()
Neither connect
nor disconnect
throw, as shown by
generalising over l
. Once we
have a connection, we can use the following functions to
access the resource directly:
data Res : (a : Type) -> (a -> Type) -> Type where
(#) : (val : a) -> (1 resource : r val) -> Res a r
login : (1 s : Store LoggedOut) -> (password : String) ->
Res Bool (\ok => Store (if ok then LoggedIn else LoggedOut))
logout : (1 s : Store LoggedIn) -> Store LoggedOut
readSecret : (1 s : Store LoggedIn) ->
Res String (const (Store LoggedIn))
Res
is defined in the Prelude, since it is commonly useful. It is a
dependent pair type, which associates a value with a linear resource.
We’ll leave the other definitions abstract, for the purposes of this
introductory example.
The following listing shows a complete program accessing the store, which
reads a password, accesses the store if the password is correct and prints the
secret data. It uses let (>>=) = bindL
to redefine
do
-notation locally.
storeProg : Has [Console, StoreI] e => App e ()
storeProg = let (>>=) = bindL in
do putStr "Password: "
password <- getStr
connect $ \s =>
do let True # s = login s password
| False # s => do putStrLn "Wrong password"
disconnect s
let str # s = readSecret s
putStrLn $ "Secret: " ++ show str
let s = logout s
disconnect s
If we omit the let (>>=) = bindL
, it will use the default
(>>=)
operator, which allows the continuation to be run multiple
times, which would mean that s
is not guaranteed to be accessed
linearly, and storeProg
would not type check.
We can safely use getStr
and putStr
because they
are guaranteed not to throw by the Path
parameter in their types.
App1: Linear Interfaces
Adding the bindL
function to allow locally rebinding the
(>>=)
operator allows us to combine existing linear resource
programs with operations in App
- at least, those that don’t throw.
It would nevertheless be nice to interoperate more directly with App
.
One advantage of defining interfaces is that we can provide multiple
implementations for different contexts, but our implementation of the
data store uses primitive functions (which we left undefined in any case)
to access the store.
To allow control over linear resources, Control.App
provides an alternative
parameterised type App1
:
data App1 : {default One u : Usage} ->
(es : List Error) -> Type -> Type
There is no need for a Path
argument, since linear programs can
never throw. The Usage
argument states whether the value
returned is to be used once, or has unrestricted usage, with
the default in App1
being to use once:
data Usage = One | Any
The main difference from App
is the (>>=)
operator, which
has a different multiplicity for the variable bound by the continuation
depending on the usage of the first action:
Cont1Type : Usage -> Type -> Usage -> List Error -> Type -> Type
Cont1Type One a u e b = (1 x : a) -> App1 {u} e b
Cont1Type Any a u e b = (x : a) -> App1 {u} e b
(>>=) : {u : _} -> (1 act : App1 {u} e a) ->
(1 k : Cont1Type u a u' e b) -> App1 {u=u'} e b
Cont1Type
returns a continuation which uses the argument linearly,
if the first App1
program has usage One
, otherwise it
returns a continuation where argument usage is unrestricted. Either way,
because there may be linear resources in scope, the continuation is
run exactly once and there can be no exceptions thrown.
Using App1
, we can define all of the data store operations in a
single interface, as shown in the following listing.
Each operation other than disconnect
returns a linear resource.
interface StoreI e where
connect : App1 e (Store LoggedOut)
login : (1 d : Store LoggedOut) -> (password : String) ->
App1 e (Res Bool (\ok => Store (if ok then LoggedIn
else LoggedOut))
logout : (1 d : Store LoggedIn) -> App1 e (Store LoggedOut)
readSecret : (1 d : Store LoggedIn) ->
App1 e (Res String (const (Store LoggedIn)))
disconnect : (1 d : Store LoggedOut) -> App {l} e ()
We can explicitly move between App
and App1
:
app : (1 p : App {l=NoThrow} e a) -> App1 {u=Any} e a
app1 : (1 p : App1 {u=Any} e a) -> App {l} e a
We can run an App
program using app
, inside App1
,
provided that it is guaranteed not to throw. Similarly, we can run an
App1
program using app1
, inside App
, provided that
the value it returns has unrestricted usage. So, for example, we can
write:
storeProg : Has [Console, StoreI] e => App e ()
storeProg = app1 $ do
store <- connect
app $ putStr "Password: "
?what_next
This uses app1
to state that the body of the program is linear,
then app
to state that the putStr
operation is in
App
. We can see that connect
returns a linear resource
by inspecting the hole what_next
, which also shows that we are
running inside App1
:
0 e : List Type
1 store : Store LoggedOut
-------------------------------------
what_next : App1 e ()
For completeness, one way to implement the interface is as follows, with hard coded password and internal data:
Has [Console] e => StoreI e where
connect
= do app $ putStrLn "Connect"
pure1 (MkStore "xyzzy")
login (MkStore str) pwd
= if pwd == "Mornington Crescent"
then pure1 (True # MkStore str)
else pure1 (False # MkStore str)
logout (MkStore str) = pure1 (MkStore str)
readSecret (MkStore str) = pure1 (str # MkStore str)
disconnect (MkStore _)
= putStrLn "Disconnect"
Then we can run it in main
:
main : IO ()
main = run storeProg
外部函数接口
备注
Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/
Idris 2 旨在支持多个代码生成器。默认目标是 Chez Scheme,还支持 Racket 和 Gambit 代码生成器。但是,与 Idris 1 一样,其目的是支持多个平台上的多个目标,包括例如 JavaScript、JVM、.NET 和其他尚未发明的。这使得调用其他语言函数的外部函数接口 (FFI) 的设计有点挑战,因为理想情况下它将支持所有可能的目标!
为此,Idris 2 FFI 的目标是灵活和适应性强,同时仍然支持最常见的需求,而不需要太多外部语言中的 “胶水” 代码。
FFI 概述
外部函数使用 %foreign
指令声明,它采用以下一般形式:
%foreign [specifiers]
name : t
说明符是一个 Idris String
,它表示外部函数是用哪种语言编写的,它被称为什么,以及在哪里可以找到它。可能有多个说明符,并且代码生成器可以自由选择它理解的任何说明符 - 甚至完全忽略说明符并使用自己的方法。通常,说明符的形式为“Language:name,library”。例如,在 C 中:
%foreign "C:puts,libc"
puts : String -> PrimIO Int
由特定的代码生成器决定如何定位函数和库。在本文档中,我们将假设默认的 Chez Scheme 代码生成器(示例也适用于 Racket 或 Gambit 代码生成器)并且外部语音是 C。
Scheme 旁注
可以编写 Scheme 外部说明符以针对特定目标的口味。
以下示例显示了一个外部声明,它以特定于代码生成器选择的方式分配内存。在此示例中,不存在匹配每种风味的通用方案说明符,例如 scheme:foo
,所以它只会匹配列出的特定口味:
%foreign "scheme,chez:foreign-alloc"
"scheme,racket:malloc"
"C:malloc,libc"
allocMem : (bytes : Int) -> PrimIO AnyPtr
备注
如果您的后端(代码生成器)未指定但定义了 C FFI,它将能够使用 C:malloc,libc
说明符。
C 旁注
C
语言说明符用于任何后端都可以使用的通用函数,而后端又可以将 FFI 输出到 C。例如,Scheme。
常见的 C 函数不进行自动内存管理,将其推迟到各个后端。
标准 C 后端称为“RefC”,并使用 RefC
语言说明符。
FFI 示例
作为一个运行示例,我们将使用一个小的 C 文件。将以下内容保存到文件 smallc.c
#include <stdio.h>
int add(int x, int y) {
return x+y;
}
int addWithMessage(char* msg, int x, int y) {
printf("%s: %d + %d = %d\n", msg, x, y, x+y);
return x+y;
}
然后,将其编译为共享库:
cc -shared smallc.c -o libsmall.so
我们现在可以编写一个 Idris 程序来调用其中的每一个函数。首先,我们将编写一个小程序,它使用 add
将两个整数相加:
%foreign "C:add,libsmall"
add : Int -> Int -> Int
main : IO ()
main = printLn (add 70 24)
%foreign
说明符声明 add
是用 C 语言编写的,在 libsmall
库中名为 add
。只要运行时能够找到 libsmall.so
(实际上它会在当前目录和系统库路径中查找),我们就可以在 REPL 中运行它:
Main> :exec main
94
请注意,确保 Idris 函数和 C 函数具有相应的类型是程序员的责任。机器没有办法检查这个!如果你弄错了,你会得到不可预测的行为。
由于 add
没有副作用,我们给它一个 Int
返回类型。但是如果这个函数对外界有一些影响,比如 addWithMessage
呢?在这种情况下,我们使用 PrimIO Int
来表示它返回一个原语 IO 操作:
%foreign "C:addWithMessage,libsmall"
prim__addWithMessage : String -> Int -> Int -> PrimIO Int
在内部, PrimIO Int
是一个函数,它获取世界的当前(线性)状态,并返回一个带有更新的世界状态的 Int
。通常,Idris 程序中的 IO
操作被定义为 HasIO
接口的实例。我们可以使用 primIO
将原语操作转换为 HasIO
中可用的操作:
primIO : HasIO io => PrimIO a -> io a
因此,我们可以如下扩展我们的程序:
addWithMessage : HasIO io => String -> Int -> Int -> io Int
addWithMessage s x y = primIO $ prim__addWithMessage s x y
main : IO ()
main
= do printLn (add 70 24)
addWithMessage "Sum" 70 24
pure ()
程序员可以通过 PrimIO
声明哪些函数是纯函数,哪些有副作用。执行以下内容:
Main> :exec main
94
Sum: 70 + 24 = 94
我们已经看到了两个外部函数的说明符:
%foreign "C:add,libsmall"
%foreign "C:addWithMessage,libsmall"
它们都具有相同的形式: "C:[name],libsmall"
, 所以我们可以不写具体的 String
,而是写一个函数来计算说明符,并使用它来代替现在的字符串:
libsmall : String -> String
libsmall fn = "C:" ++ fn ++ ",libsmall"
%foreign (libsmall "add")
add : Int -> Int -> Int
%foreign (libsmall "addWithMessage")
prim__addWithMessage : String -> Int -> Int -> PrimIO Int
原语 FFI 类型
可以传递给外部函数和从外部函数返回的类型仅限于可以合理假设任何后端都可以处理的类型。在实践中,这意味着大多数原语类型,以及有限的其他类型。参数类型可以是以下任何原语:
Int
Char
Double
(在 C 中为double
)Bits8
Bits16
Bits32
Bits64
String
(在 C 中作为char*
)Ptr t
和AnyPtr
(在 C 中都是void*
)
返回类型可以是上述任何一种,加上:
()
PrimIO t
,其中t
是除了PrimIO
之外的有效返回类型。
处理 String
会导致一些复杂性,原因有很多:
字符串可以有多种编码。在 Idris 运行时,字符串被编码为 UTF-8,但 C 不做任何假设。
谁负责释放由 C 函数分配的字符串并不总是很清楚。
在 C 中,字符串可以是
NULL
,但 Idris 字符串总是有一个值。
因此,当将 String
传入和传出 C 时,请记住以下几点:
C 函数返回的
char*
将被复制到 Idris 堆,并且 Idris 运行时立即对返回的char*
调用free
函数。如果
char*
在C
中可能是NULL
,请使用Ptr String
而不是String
。
当使用 Ptr String
时,该值将作为 void*
传递,因此 Idris 代码不能直接访问。这是为了防止意外尝试将 NULL
用作 String
。尽管如此,您仍然可以使用它们并通过以下形式的外部函数转换为 String
:
char* getString(void *p) {
return (char*)p;
}
void* mkString(char* str) {
return (void*)str;
}
int isNullString(void* str) {
return str == NULL;
}
例如,请参阅示例 示例:最小化的 Readline 绑定 绑定。
此外,外部函数可以接受*回调*,并接受和返回 C struct
指针。
回调
在 C 语言中,函数接受 callback 是很有用的,它是在完成一些工作后调用的函数。例如,我们可以编写一个函数,该函数接受一个回调,该回调接受一个 char*
和一个 int
并返回一个 char*
,在 C 语言中,如下所示(添加到 smallc. c
上面):
typedef char*(*StringFn)(char*, int);
char* applyFn(char* x, int y, StringFn f) {
printf("Applying callback to %s %d\n", x, y);
return f(x, y);
}
然后,我们可以通过将其声明为 %foreign
函数并将其包装在 HasIO
接口中来从 Idris 访问它,其中 C 函数调用 Idris 函数作为回调:
%foreign (libsmall "applyFn")
prim__applyFn : String -> Int -> (String -> Int -> String) -> PrimIO String
applyFn : HasIO io =>
String -> Int -> (String -> Int -> String) -> io String
applyFn c i f = primIO $ prim__applyFn c i f
例如,我们可以尝试如下:
pluralise : String -> Int -> String
pluralise str x
= show x ++ " " ++
if x == 1
then str
else str ++ "s"
main : IO ()
main
= do str1 <- applyFn "Biscuit" 10 pluralise
putStrLn str1
str2 <- applyFn "Tree" 1 pluralise
putStrLn str2
作为一种变体,回调可能会产生副作用:
%foreign (libsmall "applyFn")
prim__applyFnIO : String -> Int -> (String -> Int -> PrimIO String) ->
PrimIO String
由于有回调,这对于提升到 HasIO
函数有点复杂,但是我们可以使用 toPrim : IO a -> PrimIO a
来做到这一点:
applyFnIO : HasIO io =>
String -> Int -> (String -> Int -> IO String) -> io String
applyFnIO c i f = primIO $ prim__applyFnIO c i (\s, i => toPrim $ f s i)
请注意,回调显式的被包裹在 IO
中,因为 HasIO
没有提取原语 IO
操作的通用方法。
例如,我们可以扩展上面的 pluralise
示例以在回调中打印一条消息:
pluralise : String -> Int -> IO String
pluralise str x
= do putStrLn "Pluralising"
pure $ show x ++ " " ++
if x == 1
then str
else str ++ "s"
main : IO ()
main
= do str1 <- applyFnIO "Biscuit" 10 pluralise
putStrLn str1
str2 <- applyFnIO "Tree" 1 pluralise
putStrLn str2
结构体
许多 C API 传递更复杂的数据结构,如 struct
。我们并不打算在我们支持的 C 类型中完全通用,因为这会使编写跨多个后端可移植的代码变得更加困难。但是,能够直接访问 struct
通常会很有用。例如,将以下内容添加到 smallc.c
的顶部,并重新构建 libsmall.so
:
#include <stdlib.h>
typedef struct {
int x;
int y;
} point;
point* mkPoint(int x, int y) {
point* pt = malloc(sizeof(point));
pt->x = x;
pt->y = y;
return pt;
}
void freePoint(point* pt) {
free(pt);
}
我们可以通过导入 System.FFI
并使用 Struct
类型在 Idris 中定义一个访问 point
的类型,如下所示:
Point : Type
Point = Struct "point" [("x", Int), ("y", Int)]
%foreign (libsmall "mkPoint")
mkPoint : Int -> Int -> Point
%foreign (libsmall "freePoint")
prim__freePoint : Point -> PrimIO ()
freePoint : Point -> IO ()
freePoint p = primIO $ prim__freePoint p
Idris 中的 Point
类型现在对应于 C 中的 point*
。可以使用以下命令读取和写入字段,也可以通过 System.FFI
:
getField : Struct s fs -> (n : String) ->
FieldType n ty fs => ty
setField : Struct s fs -> (n : String) ->
FieldType n ty fs => ty -> IO ()
请注意,字段是按名称访问的,并且必须在结构中可用,给定约束 FieldType n ty fs
,它指出结构字段 fs
中名为 n
的字段具有类型 ty
。因此,我们可以通过如下所示直接访问字段来显示 Point
:
showPoint : Point -> String
showPoint pt
= let x : Int = getField pt "x"
y : Int = getField pt "y" in
show (x, y)
而且,作为一个完整的例子,我们可以初始化、更新、显示和删除一个 Point
,如下所示:
main : IO ()
main = do let pt = mkPoint 20 30
setField pt "x" (the Int 40)
putStrLn $ showPoint pt
freePoint pt
Struct
的字段类型可以是以下任何一种:
Int
Char
Double
(C 中为double
)Bits8
Bits16
Bits32
Bits64
Ptr a
或AnyPtr
(C 中的void*
)另一个
Struct
,在C中它是指向struct
的指针
请注意,这不包括 String
或函数类型!这主要是因为 Chez 后端不直接支持这些。但是,您可以使用另一种指针类型并进行转换。例如,假设你在 C 中有:
typedef struct {
char* name;
point* pt;
} namedpoint;
您可以在 Idris 中将其表示为:
NamedPoint : Type
NamedPoint
= Struct "namedpoint"
[("name", Ptr String),
("pt", Point)]
也就是说,直接使用 Ptr String
而不是 String
。然后你可以在 C 中的 void*
和 char*
之间进行转换:
char* getString(void *p) {
return (char*)p;
}
…并在 Idris 中使用它转换为 String
:
%foreign (pfn "getString")
getString : Ptr String -> String
决赛选手
在某些库中,外部函数创建一个指针,调用者负责释放它。在这种情况下,您可以对 free
进行显式的外部调用。然而,这并不总是方便的,甚至是不可能的。相反,您可以使用 Prelude 中定义的 onCollect
(或其无类型变体 onCollectAny
)要求 Idris 运行时负责在指针不再可访问时释放它:
onCollect : Ptr t -> (Ptr t -> IO ()) -> IO (GCPtr t)
onCollectAny : AnyPtr -> (AnyPtr -> IO ()) -> IO GCAnyPtr
当传递给外部函数时, GCPtr t
的行为与 Ptr t
完全相同(类似地, GCAnyPtr
的行为类似于 AnyPtr
)。然而,外部函数不能返回 GCPtr
,因为我们不能再假设指针完全由 Idris 运行时管理。
当垃圾收集器确定指针不再可访问时,或者在执行结束时调用终结器。
请注意,并非所有后端都支持终结器,因为它们依赖于特定后端运行时系统提供的设施。 Chez Scheme 和 Racket 后端肯定支持它们。
示例:最小化的 Readline 绑定
In this section, we’ll see how to create bindings for a C library (the GNU
Readline library) in
Idris, and make them available in a package. We’ll only create the most minimal
bindings, but nevertheless they demonstrate some of the trickier problems in
creating bindings to a C library, in that they need to handle memory allocation
of String
.
You can find the example in full in the Idris 2 source repository, in samples/FFI-readline. As a minimal example, this can be used as a starting point for other C library bindings.
We are going to provide bindings to the following functions in the Readline
API, available via #include <readline/readline.h>
:
char* readline (const char *prompt);
void add_history(const char *string);
Additionally, we are going to support tab completion, which in the Readline API is achieved by setting a global variable to a callback function (see Section 回调) which explains how to handle the completion:
typedef char *rl_compentry_func_t (const char *, int);
rl_compentry_func_t * rl_completion_entry_function;
A completion function takes a String
, which is the text to complete, and
an Int
, which is the number of times it has asked for a completion so far.
In Idris, this could be a function complete : String -> Int -> IO String
.
So, for example, if the text so far is "id"
, and the possible completions
are idiomatic
and idris
, then complete "id" 0
would produce the
string "idiomatic"
and complete "id" 1
would produce "idris"
.
We will define glue functions in a C file idris_readline.c
, which compiles
to a shared object libidrisreadline
, so we write a function for locating
the C functions:
rlib : String -> String
rlib fn = "C:" ++ fn ++ ",libidrisreadline"
Each of the foreign bindings will have a %foreign
specifier which locates
functions via rlib
.
Basic behaviour: Reading input, and history
We can start by writing a binding for readline
directly. It’s interactive,
so needs to return a PrimIO
:
%foreign (rlib "readline")
prim__readline : String -> PrimIO String
Then, we can write an IO
wrapper:
readline : String -> IO String
readline prompt = primIO $ readline prompt
Unfortunately, this isn’t quite good enough! The C readline
function
returns a NULL
string if there is no input due to encountering an
end of file. So, we need to handle that - if we don’t, we’ll get a crash
on encountering end of file (remember: it’s the Idris programmer’s responsibility
to give an appropriate type to the C binding!)
Instead, we need to use a Ptr
to say that it might be a NULL
pointer (see Section 原语 FFI 类型):
%foreign (rlib "readline")
prim__readline : String -> PrimIO (Ptr String)
We also need to provide a way to check whether the returned Ptr String
is
NULL
. To do so, we’ll write some glue code to convert back and forth
between Ptr String
and String
, in a file idris_readline.c
and a
corresponding header idris_readline.h
. In idris_readline.h
we have:
int isNullString(void* str); // return 0 if a string in NULL, non zero otherwise
char* getString(void* str); // turn a non-NULL Ptr String into a String (assuming not NULL)
void* mkString(char* str); // turn a String into a Ptr String
void* nullString(); // create a new NULL String
Correspondingly, in idris_readline.c
:
int isNullString(void* str) {
return str == NULL;
}
char* getString(void* str) {
return (char*)str;
}
void* mkString(char* str) {
return (void*)str;
}
void* nullString() {
return NULL;
}
Now, we can use prim__readline
as follows, with a safe API, checking
whether the result it returns is NULL
or a concrete String
:
%foreign (rlib "isNullString")
prim__isNullString : Ptr String -> Int
export
isNullString : Ptr String -> Bool
isNullString str = if prim__isNullString str == 0 then False else True
export
readline : String -> IO (Maybe String)
readline s
= do mstr <- primIO $ prim__readline s
if isNullString mstr
then pure $ Nothing
else pure $ Just (getString mstr)
We’ll need nullString
and mkString
later, for dealing with completions.
Once we’ve read a string, we’ll want to add it to the input history. We can
provide a binding to add_history
as follows:
%foreign (rlib "add_history")
prim__add_history : String -> PrimIO ()
export
addHistory : String -> IO ()
addHistory s = primIO $ prim__add_history s
In this case, since Idris is in control of the String
, we know it’s not
going to be NULL
, so we can add it directly.
A small readline
program that reads input, and echoes it, recording input
history for non-empty inputs, can be written as follows:
echoLoop : IO ()
echoLoop
= do Just x <- readline "> "
| Nothing => putStrLn "EOF"
putStrLn ("Read: " ++ x)
when (x /= "") $ addHistory x
if x /= "quit"
then echoLoop
else putStrLn "Done"
This gives us command history, and command line editing, but Readline becomes much more useful when we add tab completion. The default tab completion, which is available even in the small example above, is to tab complete file names in the current working directory. But for any realistic application, we probably want to tab complete other commands, such as function names, references to local data, or anything that is appropriate for the application.
Completions
Readline has a large API, with several ways of supporting tab completion, typically involving setting a global variable to an appropriate completion function. We’ll use the following:
typedef char *rl_compentry_func_t (const char *, int);
rl_compentry_func_t * rl_completion_entry_function;
The completion function takes the prefix of the completion, and the number
of times it has been called so far on this prefix, and returns the next
completion, or NULL
if there are no more completions. An Idris equivalent
would therefore have the following type:
setCompletionFn : (String -> Int -> IO (Maybe String)) -> IO ()
The function returns Nothing
if there are no more completions, or
Just str
for some str
if there is another one for the current
input.
We might hope that it’s a matter of defining a function to assign the completion function…
void idrisrl_setCompletion(rl_compentry_func_t* fn) {
rl_completion_entry_function = fn;
}
…then defining the Idris binding, which needs to take into account that
the Readline library expects NULL
when there are no more completions:
%foreign (rlib "idrisrl_setCompletion")
prim__setCompletion : (String -> Int -> PrimIO (Ptr String)) -> PrimIO ()
export
setCompletionFn : (String -> Int -> IO (Maybe String)) -> IO ()
setCompletionFn fn
= primIO $ prim__setCompletion $ \s, i => toPrim $
do mstr <- fn s i
case mstr of
Nothing => pure nullString // need to return a Ptr String to readline!
Just str => pure (mkString str)
So, we turn Nothing
into nullString
and Just str
into mkString
str
. Unfortunately, this doesn’t quite work. To see what goes wrong, let’s
try it for the most basic completion function that returns one completion no
matter what the input:
testComplete : String -> Int -> IO (Maybe String)
testComplete text 0 = pure $ Just "hamster"
testComplete text st = pure Nothing
We’ll try this in a small modification of echoLoop
above, setting a
completion function first:
main : IO ()
main = do setCompletionFn testComplete
echoLoop
We see that there is a problem when we try running it, and hitting TAB before entering anything:
Main> :exec main
> free(): invalid pointer
The Idris code which sets up the completion is fine, but there is a problem with the memory allocation in the C glue code.
This problem arises because we haven’t thought carefully enough about which
parts of our program are responsible for allocating and freeing strings.
When Idris calls a foreign function that returns a string, it copies the
string to the Idris heap and frees it immediately. But, if the foreign
library also frees the string, it ends up being freed twice. This is what’s
happening here: the callback passed to prim__setCompletion
frees the string
and puts it onto the Idris heap, but Readline also frees the string returned
by prim__setCompletion
once it has processed it. We can solve this
problem by writing a wrapper for the completion function which reallocates
the string, and using that in idrisrl_setCompletion
instead.
rl_compentry_func_t* my_compentry;
char* compentry_wrapper(const char* text, int i) {
char* res = my_compentry(text, i); // my_compentry is an Idris function, so res is on the Idris heap,
// and freed on return
if (res != NULL) {
char* comp = malloc(strlen(res)+1); // comp is passed back to readline, which frees it when
// it is finished with it
strcpy(comp, res);
return comp;
}
else {
return NULL;
}
}
void idrisrl_setCompletion(rl_compentry_func_t* fn) {
rl_completion_entry_function = compentry_wrapper;
my_compentry = fn; // fn is an Idris function, called by compentry_wrapper
}
So, we define the completion function in C, which calls the Idris completion function then makes sure the string returned by the Idris function is copied to the C heap.
We now have a primitive API that covers the most fundamental features of the readline API:
readline : String -> IO (Maybe String)
addHistory : String -> IO ()
setCompletionFn : (String -> Int -> IO (Maybe String)) -> IO ()
定理证明
Idris 2 中的定理证明教程。
备注
Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/
Before we discuss the details of theorem proving in Idris, we will describe some fundamental concepts:
Propositions and judgments
Boolean and constructive logic
Curry-Howard correspondence
Definitional and propositional equalities
Axiomatic and constructive approaches
Propositions and Judgments
Propositions are the subject of our proofs. Before the proof, we can’t
formally say if they are true or not. If the proof is successful then the
result is a ‘judgment’. For instance, if the proposition
is,
1+1=2 |
When we prove it, the judgment
is,
1+1=2 true |
Or if the proposition
is,
1+1=3 |
we can’t prove it is true, but it is still a valid proposition and perhaps we
can prove it is false so the judgment
is,
1+1=3 false |
This may seem a bit pedantic but it is important to be careful: in mathematics not every proposition is true or false. For instance, a proposition may be unproven or even unprovable.
So the logic here is different from the logic that comes from boolean algebra. In that case what is not true is false and what is not false is true. The logic we are using here does not have this law, the “Law of Excluded Middle”, so we cannot use it.
A false proposition is taken to be a contradiction and if we have a contradiction then we can prove anything, so we need to avoid this. Some languages, used in proof assistants, prevent contradictions.
The logic we are using is called constructive (or sometimes intuitional) because we are constructing a ‘database’ of judgments.
Curry-Howard correspondence
So how do we relate these proofs to Idris programs? It turns out that there is a correspondence between constructive logic and type theory. They have the same structure and we can switch back and forth between the two notations.
The way that this works is that a proposition is a type so…
Main> 1 + 1 = 2
2 = 2
Main> :t 1 + 1 = 2
(fromInteger 1 + fromInteger 1) === fromInteger 2 : Type
…is a proposition and it is also a type. The following will also produce an equality type:
Main> 1 + 1 = 3
2 = 3
Both of these are valid propositions so both are valid equality types. But how do we represent a true judgment? That is, how do we denote 1+1=2 is true but not 1+1=3? A type that is true is inhabited, that is, it can be constructed. An equality type has only one constructor ‘Refl’ so a proof of 1+1=2 is
onePlusOne : 1+1=2
onePlusOne = Refl
Now that we can represent propositions as types other aspects of propositional logic can also be translated to types as follows:
propositions |
example of possible type |
|
A |
x=y |
|
B |
y=z |
|
and |
A /\ B |
Pair(x=y,y=z) |
or |
A \/ B |
Either(x=y,y=z) |
implies |
A -> B |
(x=y) -> (y=z) |
for all |
y=z |
|
exists |
y=z |
And (conjunction)
We can have a type which corresponds to conjunction:
AndIntro : a -> b -> A a b
There is a built in type called ‘Pair’.
Or (disjunction)
We can have a type which corresponds to disjunction:
data Or : Type -> Type -> Type where
OrIntroLeft : a -> A a b
OrIntroRight : b -> A a b
There is a built in type called ‘Either’.
Definitional and Propositional Equalities
We have seen that we can ‘prove’ a type by finding a way to construct a term.
In the case of equality types there is only one constructor which is Refl
.
We have also seen that each side of the equation does not have to be identical
like ‘2=2’. It is enough that both sides are definitionally equal like this:
onePlusOne : 1+1=2
onePlusOne = Refl
Both sides of this equation normalise to 2 and so Refl matches and the proposition is proved.
We don’t have to stick to terms; we can also use symbolic parameters so the following type checks:
varIdentity : m = m
varIdentity = Refl
If a proposition/equality type is not definitionally equal but is still true then it is propositionally equal. In this case we may still be able to prove it but some steps in the proof may require us to add something into the terms or at least to take some sideways steps to get to a proof.
Especially when working with equalities containing variable terms (inside
functions) it can be hard to know which equality types are definitionally equal,
in this example plusReducesL
is definitionally equal but plusReducesR
is
not (although it is propositionally equal). The only difference between
them is the order of the operands.
plusReducesL : (n:Nat) -> plus Z n = n
plusReducesL n = Refl
plusReducesR : (n:Nat) -> plus n Z = n
plusReducesR n = Refl
Checking plusReducesR
gives the following error:
Proofs.idr:21:18--23:1:While processing right hand side of Main.plusReducesR at Proofs.idr:21:1--23:1:
Can't solve constraint between:
plus n Z
and
n
So why is Refl
able to prove some equality types but not others?
The first answer is that plus
is defined by recursion on its first
argument. So, when the first argument is Z
, it reduces, but not when the
second argument is Z
.
If an equality type can be proved/constructed by using Refl
alone it is known
as a definitional equality. In order to be definitionally equal both sides
of the equation must normalise to the same value.
So when we type 1+1
in Idris it is immediately reduced to 2 because
definitional equality is built in
Main> 1+1
2
In the following pages we discuss how to resolve propositional equalities.
Running example: Addition of Natural Numbers
Throughout this tutorial, we will be working with the following function, defined in the Idris prelude, which defines addition on natural numbers:
plus : Nat -> Nat -> Nat
plus Z m = m
plus (S k) m = S (plus k m)
It is defined by the above equations, meaning that we have for free the
properties that adding m
to zero always results in m
, and that
adding m
to any non-zero number S k
always results in
S (plus k m)
. We can see this by evaluation at the Idris REPL (i.e.
the prompt, the read-eval-print loop):
Main> \m => plus Z m
\m => m
Idris> \k,m => plus (S k) m
\k => \m => S (plus k m)
Note that unlike many other language REPLs, the Idris REPL performs
evaluation on open terms, meaning that it can reduce terms which
appear inside lambda bindings, like those above. Therefore, we can
introduce unknowns k
and m
as lambda bindings and see how
plus
reduces.
The plus
function has a number of other useful properties, for
example:
It is commutative, that is for all
Nat
inputsn
andm
, we know thatplus n m = plus m n
.It is associative, that is for all
Nat
inputsn
,m
andp
, we know thatplus n (plus m p) = plus (plus m n) p
.
We can use these properties in an Idris program, but in order to do so we must prove them.
Equality Proofs
Idris defines a propositional equality type as follows:
data Equal : a -> b -> Type where
Refl : Equal x x
As syntactic sugar, Equal x y
can be written as x = y
.
It is propositional equality, where the type states that any two
values in different types a
and b
may be proposed to be equal.
There is only one way to prove equality, however, which is by
reflexivity (Refl
).
We have a type for propositional equality here, and correspondingly a
program inhabiting an instance of this type can be seen as a proof of
the corresponding proposition 1. So, trivially, we can prove that
4
equals 4
:
four_eq : 4 = 4
four_eq = Refl
However, trying to prove that 4 = 5
results in failure:
four_eq_five : 4 = 5
four_eq_five = Refl
The type 4 = 5
is a perfectly valid type, but is uninhabited, so
when trying to type check this definition, Idris gives the following
error:
When unifying 4 = 4 and (fromInteger 4) = (fromInteger 5)
Mismatch between:
4
and
5
Type checking equality proofs
An important step in type checking Idris programs is unification,
which attempts to resolve implicit arguments such as the implicit
argument x
in Refl
. As far as our understanding of type checking
proofs is concerned, it suffices to know that unifying two terms
involves reducing both to normal form then trying to find an assignment
to implicit arguments which will make those normal forms equal.
When type checking Refl
, Idris requires that the type is of the form
x = x
, as we see from the type of Refl
. In the case of
four_eq_five
, Idris will try to unify the expected type 4 = 5
with the type of Refl
, x = x
, notice that a solution requires
that x
be both 4
and 5
, and therefore fail.
Since type checking involves reduction to normal form, we can write the following equalities directly:
twoplustwo_eq_four : 2 + 2 = 4
twoplustwo_eq_four = Refl
plus_reduces_Z : (m : Nat) -> plus Z m = m
plus_reduces_Z m = Refl
plus_reduces_Sk : (k, m : Nat) -> plus (S k) m = S (plus k m)
plus_reduces_Sk k m = Refl
Heterogeneous Equality
Equality in Idris is heterogeneous, meaning that we can even propose equalities between values in different types:
idris_not_php : Z = "Z"
The type Z = "Z"
is uninhabited, and one might wonder why it is useful to
be able to propose equalities between values in different types. However, with
dependent types, such equalities can arise naturally. For example, if two
vectors are equal, their lengths must be equal:
vect_eq_length : (xs : Vect n a) -> (ys : Vect m a) ->
(xs = ys) -> n = m
In the above declaration, xs
and ys
have different types because
their lengths are different, but we would still like to draw a
conclusion about the lengths if they happen to be equal. We can define
vect_eq_length
as follows:
vect_eq_length xs xs Refl = Refl
By matching on Refl
for the third argument, we know that the only
valid value for ys
is xs
, because they must be equal, and
therefore their types must be equal, so the lengths must be equal.
Alternatively, we can put an underscore for the second xs
, since
there is only one value which will type check:
vect_eq_length xs _ Refl = Refl
Properties of plus
Using the (=)
type, we can now state the properties of plus
given above as Idris type declarations:
plus_commutes : (n, m : Nat) -> plus n m = plus m n
plus_assoc : (n, m, p : Nat) -> plus n (plus m p) = plus (plus n m) p
Both of these properties (and many others) are proved for natural number
addition in the Idris standard library, using (+)
from the Num
interface rather than using plus
directly. They have the names
plusCommutative
and plusAssociative
respectively.
In the remainder of this tutorial, we will explore several different
ways of proving plus_commutes
(or, to put it another way, writing
the function.) We will also discuss how to use such equality proofs, and
see where the need for them arises in practice.
- 1
This is known as the Curry-Howard correspondence.
Inductive Proofs
Before embarking on proving plus_commutes
in Idris itself, let us
consider the overall structure of a proof of some property of natural
numbers. Recall that they are defined recursively, as follows:
data Nat : Type where
Z : Nat
S : Nat -> Nat
A total function over natural numbers must both terminate, and cover
all possible inputs. Idris checks functions for totality by checking that
all inputs are covered, and that all recursive calls are on
structurally smaller values (so recursion will always reach a base
case). Recalling plus
:
plus : Nat -> Nat -> Nat
plus Z m = m
plus (S k) m = S (plus k m)
This is total because it covers all possible inputs (the first argument
can only be Z
or S k
for some k
, and the second argument
m
covers all possible Nat
) and in the recursive call, k
is structurally smaller than S k
so the first argument will always
reach the base case Z
in any sequence of recursive calls.
In some sense, this resembles a mathematical proof by induction (and
this is no coincidence!). For some property P
of a natural number
x
, we can show that P
holds for all x
if:
P
holds for zero (the base case).Assuming that
P
holds fork
, we can showP
also holds forS k
(the inductive step).
In plus
, the property we are trying to show is somewhat trivial (for
all natural numbers x
, there is a Nat
which need not have any
relation to x
). However, it still takes the form of a base case and
an inductive step. In the base case, we show that there is a Nat
arising from plus n m
when n = Z
, and in the inductive step we
show that there is a Nat
arising when n = S k
and we know we can
get a Nat
inductively from plus k m
. We could even write a
function capturing all such inductive definitions:
nat_induction :
(prop : Nat -> Type) -> -- Property to show
(prop Z) -> -- Base case
((k : Nat) -> prop k -> prop (S k)) -> -- Inductive step
(x : Nat) -> -- Show for all x
prop x
nat_induction prop p_Z p_S Z = p_Z
nat_induction prop p_Z p_S (S k) = p_S k (nat_induction prop p_Z p_S k)
Using nat_induction
, we can implement an equivalent inductive
version of plus
:
plus_ind : Nat -> Nat -> Nat
plus_ind n m
= nat_induction (\x => Nat)
m -- Base case, plus_ind Z m
(\k, k_rec => S k_rec) -- Inductive step plus_ind (S k) m
-- where k_rec = plus_ind k m
n
To prove that plus n m = plus m n
for all natural numbers n
and
m
, we can also use induction. Either we can fix m
and perform
induction on n
, or vice versa. We can sketch an outline of a proof;
performing induction on n
, we have:
Property
prop
is\x => plus x m = plus m x
.Show that
prop
holds in the base case and inductive step:- Base case:
prop Z
, i.e.plus Z m = plus m Z
, which reduces tom = plus m Z
due to the definition ofplus
. - Inductive step: Inductively, we know that
prop k
holds for a specific, fixedk
, i.e.plus k m = plus m k
(the induction hypothesis). Given this, showprop (S k)
, i.e.plus (S k) m = plus m (S k)
, which reduces toS (plus k m) = plus m (S k)
. From the induction hypothesis, we can rewrite this toS (plus m k) = plus m (S k)
.
To complete the proof we therefore need to show that m = plus m Z
for all natural numbers m
, and that S (plus m k) = plus m (S k)
for all natural numbers m
and k
. Each of these can also be
proved by induction, this time on m
.
We are now ready to embark on a proof of commutativity of plus
formally in Idris.
Pattern Matching Proofs
In this section, we will provide a proof of plus_commutes
directly,
by writing a pattern matching definition. We will use interactive
editing features extensively, since it is significantly easier to
produce a proof when the machine can give the types of intermediate
values and construct components of the proof itself. The commands we
will use are summarised below. Where we refer to commands
directly, we will use the Vim version, but these commands have a direct
mapping to Emacs commands.
Command |
Vim binding |
Emacs binding |
Explanation |
Check type |
|
|
Show type of identifier or hole under the cursor. |
Proof search |
|
|
Attempt to solve hole under the cursor by applying simple proof search. |
Make new definition |
|
|
Add a template definition for the type defined under the cursor. |
Make lemma |
|
|
Add a top level function with a type which solves the hole under the cursor. |
Split cases |
|
|
Create new constructor patterns for each possible case of the variable under the cursor. |
Creating a Definition
To begin, create a file pluscomm.idr
containing the following type
declaration:
plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
To create a template definition for the proof, press \a
(or the
equivalent in your editor of choice) on the line with the type
declaration. You should see:
plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes n m = ?plus_commutes_rhs
To prove this by induction on n
, as we sketched in Section
Inductive Proofs, we begin with a case split on n
(press
\c
with the cursor over the n
in the definition.) You
should see:
plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = ?plus_commutes_rhs_1
plus_commutes (S k) m = ?plus_commutes_rhs_2
If we inspect the types of the newly created holes,
plus_commutes_rhs_1
and plus_commutes_rhs_2
, we see that the
type of each reflects that n
has been refined to Z
and S k
in each respective case. Pressing \t
over
plus_commutes_rhs_1
shows:
m : Nat
-------------------------------------
plus_commutes_rhs_1 : m = plus m Z
Similarly, for plus_commutes_rhs_2
:
k : Nat
m : Nat
--------------------------------------
plus_commutes_rhs_2 : (S (plus k m)) = (plus m (S k))
It is a good idea to give these slightly more meaningful names:
plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = ?plus_commutes_Z
plus_commutes (S k) m = ?plus_commutes_S
Base Case
We can create a separate lemma for the base case interactively, by
pressing \l
with the cursor over plus_commutes_Z
. This
yields:
plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z m
plus_commutes (S k) m = ?plus_commutes_S
That is, the hole has been filled with a call to a top level
function plus_commutes_Z
, applied to the variable in scope m
.
Unfortunately, we cannot prove this lemma directly, since plus
is
defined by matching on its first argument, and here plus m Z
has a
concrete value for its second argument (in fact, the left hand side of
the equality has been reduced from plus Z m
.) Again, we can prove
this by induction, this time on m
.
First, create a template definition with \d
:
plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z m = ?plus_commutes_Z_rhs
Now, case split on m
with \c
:
plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = ?plus_commutes_Z_rhs_1
plus_commutes_Z (S k) = ?plus_commutes_Z_rhs_2
Checking the type of plus_commutes_Z_rhs_1
shows the following,
which is provable by Refl
:
--------------------------------------
plus_commutes_Z_rhs_1 : Z = Z
For such immediate proofs, we can let write the proof automatically by
pressing \s
with the cursor over plus_commutes_Z_rhs_1
.
This yields:
plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = Refl
plus_commutes_Z (S k) = ?plus_commutes_Z_rhs_2
For plus_commutes_Z_rhs_2
, we are not so lucky:
k : Nat
-------------------------------------
plus_commutes_Z_rhs_2 : S k = S (plus k Z)
Inductively, we should know that k = plus k Z
, and we can get access
to this inductive hypothesis by making a recursive call on k, as
follows:
plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = Refl
plus_commutes_Z (S k)
= let rec = plus_commutes_Z k in
?plus_commutes_Z_rhs_2
For plus_commutes_Z_rhs_2
, we now see:
k : Nat
rec : k = plus k Z
-------------------------------------
plus_commutes_Z_rhs_2 : S k = S (plus k Z)
So we know that k = plus k Z
, but how do we use this to update the goal to
S k = S k
?
To achieve this, Idris provides a replace
function as part of the
prelude:
Main> :t replace
Builtin.replace : (0 rule : x = y) -> p x -> p y
Given a proof that x = y
, and a property p
which holds for
x
, we can get a proof of the same property for y
, because we
know x
and y
must be the same. Note the multiplicity on rule
means that it’s guaranteed to be erased at run time.
In practice, this function can be
a little tricky to use because in general the implicit argument p
can be hard to infer by unification, so Idris provides a high level
syntax which calculates the property and applies replace
:
rewrite prf in expr
If we have prf : x = y
, and the required type for expr
is some
property of x
, the rewrite ... in
syntax will search for all
occurrences of x
in the required type of expr
and replace them with y
. We want
to replace plus k Z
with k
, so we need to apply our rule
rec
in reverse, which we can do using sym
from the Prelude
Main> :t sym
Builtin.sym : (0 rule : x = y) -> y = x
Concretely, in our example, we can say:
plus_commutes_Z (S k)
= let rec = plus_commutes_Z k in
rewrite sym rec in ?plus_commutes_Z_rhs_2
Checking the type of plus_commutes_Z_rhs_2
now gives:
k : Nat
rec : k = plus k Z
-------------------------------------
plus_commutes_Z_rhs_2 : S k = S k
Using the rewrite rule rec
, the goal type has been updated with plus k Z
replaced by k
.
We can use proof search (\s
) to complete the proof, giving:
plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = Refl
plus_commutes_Z (S k)
= let rec = plus_commutes_Z k in
rewrite sym rec in Refl
The base case of plus_commutes
is now complete.
Inductive Step
Our main theorem, plus_commutes
should currently be in the following
state:
plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z m
plus_commutes (S k) m = ?plus_commutes_S
Looking again at the type of plus_commutes_S
, we have:
k : Nat
m : Nat
-------------------------------------
plus_commutes_S : S (plus k m) = plus m (S k)
Conveniently, by induction we can immediately tell that
plus k m = plus m k
, so let us rewrite directly by making a
recursive call to plus_commutes
. We add this directly, by hand, as
follows:
plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z
plus_commutes (S k) m = rewrite plus_commutes k m in ?plus_commutes_S
Checking the type of plus_commutes_S
now gives:
k : Nat
m : Nat
-------------------------------------
plus_commutes_S : S (plus m k) = plus m (S k)
The good news is that m
and k
now appear in the correct order.
However, we still have to show that the successor symbol S
can be
moved to the front in the right hand side of this equality. This
remaining lemma takes a similar form to the plus_commutes_Z
; we
begin by making a new top level lemma with \l
. This gives:
plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)
Again, we make a template definition with \a
:
plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)
plus_commutes_S k m = ?plus_commutes_S_rhs
Like plus_commutes_Z
, we can define this by induction over m
, since
plus
is defined by matching on its first argument. The complete definition
is:
total
plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)
plus_commutes_S k Z = Refl
plus_commutes_S k (S j) = rewrite plus_commutes_S k j in Refl
All holes have now been solved.
The total
annotation means that we require the final function to
pass the totality checker; i.e. it will terminate on all possible
well-typed inputs. This is important for proofs, since it provides a
guarantee that the proof is valid in all cases, not just those for
which it happens to be well-defined.
Now that plus_commutes
has a total
annotation, we have completed the
proof of commutativity of addition on natural numbers.
This page attempts to explain some of the techniques used in Idris to prove propositional equalities.
Proving Propositional Equality
We have seen that definitional equalities can be proved using Refl
since they
always normalise to values that can be compared directly.
However with propositional equalities we are using symbolic variables, which do not always normalise.
So to take the previous example:
plusReducesR : (n : Nat) -> plus n Z = n
In this case plus n Z
does not normalise to n. Even though both sides of
the equality are provably equal we cannot claim Refl
as a proof.
If the pattern match cannot match for all n
then we need to
match all possible values of n
. In this case
plusReducesR : (n : Nat) -> plus n Z = n
plusReducesR Z = Refl
plusReducesR (S k)
= let rec = plusReducesR k in
rewrite rec in Refl
we can’t use Refl
to prove plus n 0 = n
for all n
. Instead, we call
it for each case separately. So, in the second line for example, the type checker
substitutes Z
for n
in the type being matched, and reduces the type
accordingly.
Replace
This implements the ‘indiscernability of identicals’ principle, if two terms
are equal then they have the same properties. In other words, if x=y
, then we
can substitute y for x in any expression. In our proofs we can express this as:
if x=y then prop x = prop y
where prop is a pure function representing the property. In the examples below
prop is an expression in some variable with a type like this: prop: n -> Type
So if n
is a natural number variable then prop
could be something
like \n => 2*n + 3
.
To use this in our proofs there is the following function in the prelude:
||| Perform substitution in a term according to some equality.
replace : forall x, y, prop . (0 rule : x = y) -> prop x -> prop y
replace Refl prf = prf
If we supply an equality (x=y) and a proof of a property of x (prop x
) then we get
a proof of a property of y (prop y
).
So, in the following example, if we supply p1 x
which is a proof that x=2
and
the equality x=y
then we get a proof that y=2
.
p1: Nat -> Type
p1 n = (n=2)
testReplace: (x=y) -> (p1 x) -> (p1 y)
testReplace a b = replace a b
Rewrite
In practice, replace
can be
a little tricky to use because in general the implicit argument prop
can be hard to infer for the machine, so Idris provides a high level
syntax which calculates the property and applies replace
.
Example: again we supply p1 x
which is a proof that x=2
and the equality
y=x
then we get a proof that y=2
.
p1: Nat -> Type
p1 x = (x=2)
testRewrite: (y=x) -> (p1 x) -> (p1 y)
testRewrite a b = rewrite a in b
We can think of rewrite
as working in this way:
Start with a equation
x=y
and a propertyprop : x -> Type
Search for
x
inprop
Replaces all occurrences of
x
withy
inprop
.
That is, we are doing a substitution.
Notice that here we need to supply reverse equality, i.e. y=x
instead of x=y
.
This is because rewrite
performs the substitution of left part of equality to the right part
and this substitution is done in the return type.
Thus, here in the return type y=2
we need to apply y=x
in order to match the type of the argument x=2
.
Symmetry and Transitivity
In addition to ‘reflexivity’ equality also obeys ‘symmetry’ and ‘transitivity’ and these are also included in the prelude:
||| Symmetry of propositional equality
sym : forall x, y . (0 rule : x = y) -> y = x
sym Refl = Refl
||| Transitivity of propositional equality
trans : forall a, b, c . (0 l : a = b) -> (0 r : b = c) -> a = c
trans Refl Refl = Refl
Heterogeneous Equality
Also included in the prelude:
||| Explicit heterogeneous ("John Major") equality. Use this when Idris
||| incorrectly chooses homogeneous equality for `(=)`.
||| @ a the type of the left side
||| @ b the type of the right side
||| @ x the left side
||| @ y the right side
(~=~) : (x : a) -> (y : b) -> Type
(~=~) x y = (x = y)
实现说明
备注
Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/
本节包含(或希望包含)关于 Idris 2 实现方面的各种注释,希望它们有助于调试和未来的贡献。
实现概述
这些是关于实现方面的一些未分类的注释。粗略的,并不总是完全最新的,但希望能提供一些关于正在发生的事情的提示以及在代码中查看某些功能如何工作的一些想法。
介绍
核心语言 TT(在 Core.TT
中定义),基于定量类型理论(参见 https://bentnib.org/quantitative-type-theory.html)。具有 0 、 1 或 unlimited 的 “多重性”。
术语在范围内的名称上编入索引,因此我们知道术语始终具有良好的范围。值(即标准形式)在 Core.Value
中定义为 NF
;在明确请求之前,构造函数不会对参数进行求值。
从更高级别的语言 TTImp*(定义在 ``TTImp.TTImp`` )中细化到 *TT,这是带有隐式参数、局部函数定义、案例块、作为模式、具有自动类型导向消歧的限定名称的 TT , 还有证明搜索。
细化依赖于 unification(在 Core.Unify
中),它允许推迟 unification 问题。基本上与 Ulf Norell 论文中描述的 Agda 的工作方式相同。
一般的想法是高级语言将提供对 TT 的翻译。在 Idris/
命名空间中,我们定义了 Idris 的高级语法,它通过脱糖操作符、do 符号等转换为 TTImp。
在细化之后有一个单独的线性检查,它会更新孔的类型(并且知道 case 块)。这是在 Core.LinearCheck
中实现的。在此检查期间,我们还重新计算孔应用程序中的多重性,以便它们正确显示(例如,如果线性变量在其他地方未使用,它将始终以多重性 1 出现在孔中)。
目录结构:
Core/
– 与核心 TT、类型检查和 unification 相关的任何内容TTImp/
– 与隐式 TT 及其详细说明相关的任何内容TTImp/Elab/
– 细化状态和细化术语TTImp/Interactive/
– 交互式编辑基础设施
Parser/
– 用于解析和词法分析 TT 和 TTImp(以及其他东西)的各种实用程序Utils/
– 一些通常有用的实用程序Idris/
– 任何与高级语言相关的东西,翻译成 TTImpIdris/Elab/
– 高级构造细化机制(例如接口)
Compiler/
– 编译器后端
核心类型和参考
Core
是一个 “monad”(不是真的,出于效率的原因,目前…)支持 Error
和 IO
(我最初确实计划允许将此限制到一些特定的 IO 操作,但尚未完成)。原始语法由 RawImp
类型定义,该类型在每个节点都有一个源位置,详细说明中的任何错误都会记录错误发生点的位置,作为文件上下文 FC
。
Ref
本质上是一个 IORef
。通常我们会隐式传递它们并使用标签来区分我们的意图。有关它们的定义,请参见 Core.Core
。再一次, IORef
是为了提高效率——即使使用 state monad 会更整洁,但结果却快了大约 2-3 倍,所以我选择了 “丑陋” 的选择……
术语表示
核心语言中的术语由作用域内的名称列表索引,最近定义的优先:
data Term : List Name -> Type
这意味着术语总是有恰当的作用域,我们可以使用类型系统来保持我们在操作名称时的正确性。例如,我们有:
Local : FC -> (isLet : Maybe Bool) ->
(idx : Nat) -> (0 p : IsVar name idx vars) -> Term vars
因此,局部变量由局部上下文中的索引(de Bruijn 索引 idx
)表示,并在运行时擦除该索引有效的证明。所以一切都被 de Bruijn 索引了,但是类型检查器仍然跟踪索引,这样我们就不必想太多了!
Core.TT
包含各种方便的工具,用于使用它们的索引来操作术语,例如:
weaken : Term vars -> Term (n :: vars) -- actually in an interface, Weaken
embed : Term vars -> Term (ns ++ vars)
refToLocal : (x : Name) -> -- explicit name of a reference
(new : Name) -> -- name to bind as
Term vars -> Term (new :: vars)
请注意,类型明确说明何时需要在运行时传递 vars
,何时不需要。大多数需要它的地方是帮助显示名称或名称生成,而不是核心中的任何基本原因。一般来说,这在运行时并不昂贵。
在 Core.Env
中定义的环境变量将局部变量映射到绑定器:
data Env : (tm : List Name -> Type) -> List Name -> Type
A binders is typically a lambda, a pi, or a let (with a value), but can
also be a pattern variable. See the definition of TT
for more details.
Where we have a term, we usually also need an Env
.
We also have values, which are in head normal form, and defined in
Core.Value
:
data NF : List Name -> Type
We can convert a term to a value by normalising…
nf : {vars : _} ->
Defs -> Env Term vars -> Term vars -> Core (NF vars)
…and back again, by quoting:
quote : {vars : _} ->
Defs -> Env Term vars -> tm vars -> Core (Term vars)
Both nf
and quote
are defined in Core.Normalise
. We don’t
always know whether we’ll need to work with NF
or Term
, so
we also have a “glued” representation, Glued vars
, again defined in
Core.Normalise
, which lazily computes either a NF
or Term
as
required. Elaborating a term returns the type as a Glued vars
.
Term
separates Ref
(global user defined names) from Meta
, which
are globally defined metavariables. For efficiency, metavariables are only
substituted into terms if they have non-0 multiplicity, to preserve sharing as
much as possible.
Unification
Unification is probably the most important part of the elaboration process,
and infers values for implicit arguments. That is, it finds values for the
things which are referred to by Meta
in Term
. It is defined in
Core.Unify
, as the top level unification function has the following
type:
unify : Unify tm =>
{vars : _} ->
{auto c : Ref Ctxt Defs} ->
{auto u : Ref UST UState} ->
UnifyInfo ->
FC -> Env Term vars ->
tm vars -> tm vars ->
Core UnifyResult
The Unify
interface is there because it is convenient to be able to
define unification on Term
and NF
, as well as Closure
(which
is part of NF
to represent unevaluated arguments to constructors).
This is one place where indexing over vars
is extremely valuable: we
have to keep the environment consistent, so unification won’t accidentally
introduce any scoping bugs!
Idris 2 implements pattern unification - see Adam Gundry’s thesis for an accessible introduction.
Context
Core.Context
defines all the things needed for TT. Most importantly: Def
gives definitions of names (case trees, builtins, constructors and
holes, mostly); GlobalDef
is a definition with all the other information
about it (type, visibility, totality, etc); Context
is a context mapping names
to GlobalDef
, and Defs
is the core data structure with everything needed to
typecheck more definitions.
The main Context type stores definitions in an array, indexed by a “resolved
name id”, an integer, for fast look up. This means that it also needs to be
able to convert between resolved names and full names. The HasNames
interface defines methods for going back and forth between structures with
human readable names, and structures with resolved integer names.
Since we store names in an array, all the lookup functions need to be in the
Core
monad. This also turns out to help with loading checked files (see
below).
Elaboration Overview
Elaboration of RawImp
to TT
is driven by TTImp.Elab
, with the
top level function for elaborating terms defined in TTImp.Elab.Term
,
support functions defined in TTImp.Elab.Check
, and elaborators for the
various TTImp constructs defined in separate files under TTImp.Elab.*
.
惰性
Like Idris 1, laziness is marked in types using Lazy
, Delay
and Force
, or
Inf
(instead of Lazy
) for codata. Unlike Idris 1, these are language primitives
rather than special purpose names.
Implicit laziness resolution is handled during unification (in Core.Unify
).
When unification is invoked (by convert
in TTImp.Elab.Check
) with
the withLazy
flag set, it checks whether it is converting a lazy type
with a non-lazy type. If so, it continues with unification, but returning
that either a Force
or Delay
needs inserting as appropriate.
TTC format
We can save things to binary if we have an implementation of the TTC interface
for it. See Utils.Binary
to see how this is done. It uses a global reference
Ref Bin Binary
which uses Data.Buffer
underneath.
When we load checked TTC files, we don’t process the definitions immediately,
but rather store them as a ContextEntry
, which is either a Binary
blob, or
a processed definition. We only process the definitions the first time they
are looked up, since converting Binary to the definition is fairly costly
(due to having to construct a lot of AST nodes), and often definitions in an
imported file are never used.
Bound Implicits
The RawImp
type has a constructor IBindVar
. The first time we encounter an
IBindVar
, we record the name as one which will be implicitly bound. At the
end of elaboration, we decide which holes should turn into bound variables
(Pi bound in types, Pattern bound on a LHS, still holes on the RHS) by
looking at the list of names bound as IBindVar
, the things they depend on,
and sorting them so that they are bound in dependency order. This happens
in TTImp.Implicit.getToBind
.
Once we know what the bound implicits need to be, we bind them in
bindImplicits
. Any application of a hole which stands for a bound implicit
gets turned into a local binding (either Pi or Pat as appropriate, or PLet for
@-patterns).
Unbound Implicits
Any name beginning with a lower case letter is considered an unbound implicit. They are elaborated as holes, which may depend on the initial environment of the elaboration, and after elaboration they are converted to an implicit pi binding, with multiplicity 0. So, for example:
map : {f : _} -> (a -> b) -> f a -> f b
becomes:
map : {f : _} -> {0 a : _} -> {0 b : _} -> (a -> b) -> f a -> f b
Bindings are ordered according to dependency. It’ll infer any additional names, e.g. in:
lookup : HasType i xs t -> Env xs -> t
… where xs
is a Vect n a
, it infers bindings for n
and a
.
The %unbound_implicits
directive means that it will no longer automatically
bind names (that is, a
and b
in map
above) but it will still
infer the types for any additional names, e.g. if you write:
lookup : forall i, x, t . HasType i xs t -> Env xs -> t
… it will still infer a type for xs
and infer bindings for n
and
a
.
隐式参数
When we encounter an implicit argument (_
in the raw syntax, or added when
we elaborate an application and see that there is an implicit needed) we
make a new hole which is a fresh name applied to the current environment,
and return that as the elaborated term. This happens in TTImp.Elab.Check
,
with the function metaVar
. If there’s enough information elsewhere we’ll
find the definition of the hole by unification.
We never substitute holes in a term during elaboration and rely on normalisation if we need to look inside it. If there are holes remaining after elaboration of a definition, report an error (it’s okay for a hole in a type as long as it’s resolved by the time the definition is done).
See Elab.App.makeImplicit
, Elab.App.makeAutoImplicit
to see where we
add holes for the implicit arguments in applications.
Elab.App
does quite a lot of tricky stuff! In an attempt to help with
resolving ambiguous names and record updates, it will sometimes delay
elaboration of an argument (see App.checkRestApp
) so that it can get more
information about its type first.
Core.Unify.solveConstraints
revisits all of the currently unsolved holes
and constrained definitions, and tries again to unify any constraints which
they require. It also tries to resolve anything defined by proof search.
The current state of unification is defined in Core.UnifyState
, and
unification constraints record which metavariables are blocking them. This
improves performance, since we’ll only retry a constraint if one of the
blocking metavariables has been resolved.
Additional type inference
A ?
in a type means “infer this part of the type”. This is distinct from
_
in types, which means “I don’t care what this is”. The distinction is in
what happens when inference fails. If inference fails for _
, we implicitly
bind a new name (just like pattern matching on the lhs - i.e. it means match
anything). If inference fails for ?
, we leave it as a hole and try to fill
it in later. As a result, we can say:
foo : Vect ? Int
foo = [1,2,3,4]
… and the ?
will be inferred to be 4. But if we say:
foo : Vect _ Int
foo = [1,2,3,4]
… we’ll get an error, because the _
has been bound as a new name.
Both ?
and _
are represented in RawImp
by the Implicit
constructor, which has a boolean flag meaning “bind if unresolved”.
So the meaning of _
is now consistent on the lhs and in types (i.e. it
means infer a value and bind a variable on failure to infer anything). In
practice, using _
will get you the old Idris behaviour, but ?
might
get you a bit more type inference.
Auto Implicits
Auto implicits are resolved by proof search, and can be given explicit
arguments in the same way as ordinary implicits: i.e. {x = exp}
to give
exp
as the value for auto implicit x
. Interfaces are syntactic sugar for
auto implicits (it is the same resolution mechanism - interfaces translate into
records, and implementations translate into hints for the search).
The argument syntax @{exp}
means that the value of the next auto implicit
in the application should be exp
- this is the same as the syntax for
invoking named implementations in Idris 1, but interfaces and auto implicits
have been combined now.
Implicit search is defined in Core.AutoSearch
. It will only begin a
search if all the determining arguments of the goal are defined, meaning
that they don’t contain any holes. This avoids committing too early to
the solution of a hole by resolving it by search, rather than unification,
unless a programmer has explicitly said (via a search
option on a data
type) that that’s what they want.
Dot Patterns
IMustUnify
is a constructor of RawImp
. When we elaborate this, we generate a
hole, then elaborate the term, and add a constraint that the generated hole
must unify with the term which was explicitly given (in UnifyState.addDot
),
without resolving any holes. This is finally checked in UnifyState.checkDots
.
Proof Search
A definition constructed with Core.Context.BySearch
is a hole which will
be resolved by searching for something which fits the type. This happens in
Core.AutoSearch
. It checks all possible hints for a term, to ensure that
only one is possible.
@-Patterns
Names which are bound in types are also bound as @-patterns, meaning that functions have access to them. For example, we can say:
vlength : {n : Nat} -> Vect n a -> Nat
vlength [] = n
vlength (x :: xs) = n
As patterns are implemented as a constructor of TT
, which makes a lot
of things more convenient (especially case tree compilation).
Linear Types
Following Conor McBride and Bob Atkey’s work, all binders have a multiplicity
annotation (RigCount
). After elaboration in TTImp.Elab
, we do a
separate linearity check which: a) makes sure that linear variables are used
exactly once; b) updates hole types to properly reflect usage information.
Local definitions
We elaborate relative to an environment, meaning that we can elaborate local function definitions. We keep track of the names being defined in a nested block of declarations, and ensure that they are lifted to top level definitions in TT by applying them to every name in scope.
Since we don’t know how many times a local definition will be applied, in general, anything bound with multiplicity 1 is passed to the local definition with multiplicity 0, so if you want to use it in a local definition, you need to pass it explicitly.
Case blocks
Similar to local definitions, these are lifted to top level definitions which represent the case block, which is immediately applied to the scrutinee of the case. We don’t attempt to calculate the multiplicities of arguments when elaborating the case block, since we’ll probably get it wrong - instead, these are checked during linearity checking, which knows about case functions.
Case blocks in the scope of local definitions are tricky, because the names
need to match up, and the types might be refined, but we also still need to
apply the local names to the scope in which they were defined. This is a bit
fiddly, and dealt with by the ICaseLocal
constructor of RawImp
.
Various parts of the system treat case blocks specially, even though they aren’t strictly part of the core. In particular, these are linearity checking and totality checking.
Parameters
The parameters to a data type are taken to be the arguments which appear, unchanged, in the same position, everywhere across a data definition.
擦除
Unbound implicits are given 0
multiplicity, so the rule is now that if you
don’t explicitly write it in the type of a function or constructor, the
argument is erased at run time.
Elaboration and the case tree compiler check ensure that 0-multiplicity arguments are not inspected in case trees. In the compiler, 0-multiplicity arguments to constructors are erased completely, whereas 0-multiplicity arguments to functions are replaced with a placeholder erased value.
Namespaces and name visibility
Same rules mostly apply as in Idris 1. The difference is that visibility is per namespace not per file (that is, files have no relevance other except in that they introduce their own namespace, and in that they allow separate typechecking).
One effect of this is that when a file defines nested namespaces, the inner
namespace can see what’s in the outer namespace, but not vice versa unless
names defined in the inner namespace are explicitly exported. The visibility
modifiers export
, public export
, and private
control whether the name
can be seen in any other namespace, and it’s nothing to do with the file
they’re defined in at all.
Unlike Idris 1, there is no restriction on whether public definitions can
refer to private names. The only restriction on private
names is that
they can’t be referred to directly (i.e. in code) outside the namespace.
记录
Records are part of TTImp (rather than the surface language). Elaborating a
record declaration creates a data type and associated projection functions.
Record setters are generated on demand while elaborating TTImp (in
TTImp.Elab.Record
). Setters are translated directly to case
blocks,
which means that update of dependent fields works as one might expect (i.e.
it’s safe as long as all of the fields are updated at the same time
consistently).
The IDE Protocol
The Idris REPL has two modes of interaction: a human-readable syntax designed for direct use in a terminal, and a machine-readable syntax designed for using Idris as a backend for external tools.
The IDE-Protocol is versioned separately from the Idris compiler. The first version of Idris (written in Haskell and is at v1.3.3) implements version one of the IDE Protocol, and Idris2 (self-hosting and is at v.0.3.0) implements version two of the IDE Protocol.
The protocol and its serialisation/deserialisation routines are part of the Protocol submodule hierarchy and are packaged in the idris2protocols.ipkg package.
Starting IDE Mode
To initiate the IDE-Protocol on stdin/stdout, use the --ide-mode
command line option.
To run the protocol over a TCP socket, use the --ide-mode-socket
option:
idris2 --ide-mode-socket
53864
By default this will chose an open port, print the number of the port to stdout followed by a newline, and listen to that socket on localhost. You may optionally specify the hostname and port to listen to:
idris2 --ide-mode-socket localhost:12345
12345
The IDE-Protocol will run on that socket, and Idris will exit when the client disconnects from the socket.
Protocol Overview
The communication protocol is of asynchronous request-reply style: a single request from the client is handled by Idris at a time. Idris waits for a request on its standard input stream, and outputs the answer or answers to standard output. The result of a request can be either success, failure, or intermediate output; and furthermore, before the result is delivered, there might be additional meta-messages.
A reply can consist of multiple messages: any number of messages to inform the user about the progress of the request or other informational output, and finally a result, either ok
or error
.
The wire format is the length of the message in characters, encoded in 6 characters hexadecimal, followed by the message encoded as S-expression (sexp). Additionally, each request includes a unique integer (counting upwards), which is repeated in all messages corresponding to that request.
An example interaction from loading the file /home/hannes/empty.idr
looks as follows on the wire:
00002a((:load-file "/home/hannes/empty.idr") 1)
000039(:write-string "Type checking /home/hannes/empty.idr" 1)
000025(:set-prompt "/home/hannes/empty" 1)
000032(:return (:ok "Loaded /home/hannes/empty.idr") 1)
The first message is the request from idris-mode to load the specific file, which length is hex 2a, decimal 42 (including the newline at the end).
The request identifier is set to 1.
The first message from Idris is to write the string Type checking /home/hannes/empty.idr
, another is to set the prompt to */home/hannes/empty
.
The answer, starting with :return
is ok
, and additional information is that the file was loaded.
There are three atoms in the wire language: numbers, strings, and symbols. The only compound object is a list, which is surrounded by parenthesis. The syntax is:
A ::= NUM | '"' STR '"' | ':' ALPHA+
S ::= A | '(' S* ')' | nil
where NUM
is either 0 or a positive integer, ALPHA
is an alphabetical character, and STR
is the contents of a string, with "
escaped by a backslash.
The atom nil
is accepted instead of ()
for compatibility with some regexp pretty-printing routines.
The state of the Idris process is mainly the active file, which needs to be kept synchronised between the editor and Idris.
This is achieved by the already seen :load-file
command.
Protocol Versioning
When interacting with Idris through the IDE Protocol the initial message sent by the running Idris Process is the version (major and minor) of the IDE Protocol being used.
The expected message has the following format:
(:protocol-version MAJOR MINOR)
IDE Clients can use this to help support multiple Idris versions.
Commands
The available commands are listed below. They are compatible with Version 1 and 2.0 of the protocol unless otherwise stated.
(:load-file FILENAME [LINE])
Load the named file. If a
LINE
number is provided, the file is only loaded up to that line. Otherwise, the entire file is loaded. Version 2 of the IDE Protocol requires that the file name be a quoted string, as in(:load-file "MyFile.idr")
and not(:load-file MyFile.idr)
.(:cd FILEPATH)
Change the working direction to the given
FILEPATH
. Version 2 of the IDE Protocol requires that the path is quoted, as in(:cd "a/b/c")
and not(:cd a/b/c)
.(:interpret STRING)
Interpret
STRING
at the Idris REPL, returning a highlighted result.(:type-of STRING)
Return the type of the name, written with Idris syntax in the
STRING
. The reply may contain highlighting information.(:case-split LINE NAME)
Generate a case-split for the pattern variable
NAME
on program lineLINE
. The pattern-match cases to be substituted are returned as a string with no highlighting.(:add-clause LINE NAME)
Generate an initial pattern-match clause for the function declared as
NAME
on program lineLINE
. The initial clause is returned as a string with no highlighting.(:add-proof-clause LINE NAME)
Add a clause driven by the
<==
syntax.(:add-missing LINE NAME)
Add the missing cases discovered by totality checking the function declared as
NAME
on program lineLINE
. The missing clauses are returned as a string with no highlighting.(:make-with LINE NAME)
Create a with-rule pattern match template for the clause of function
NAME
on lineLINE
. The new code is returned with no highlighting.(:make-case LINE NAME)
Create a case pattern match template for the clause of function
NAME
on lineLINE
. The new code is returned with no highlighting.(:make-lemma LINE NAME)
Create a top level function with a type which solves the hole named
NAME
on lineLINE
.(:proof-search LINE NAME HINTS)
Attempt to fill out the hole on
LINE
namedNAME
by proof search.HINTS
is a possibly-empty list of additional things to try while searching. This operation is also calledExprSearch
in the Idris 2 API.(:refine LINE NAME TM)
Refine the hole on
LINE
namedNAME
by using the termTM
.(:docs-for NAME [MODE])
Look up the documentation for
NAME
, and return it as a highlighted string. IfMODE
is:overview
, only the first paragraph of documentation is provided forNAME
. IfMODE
is:full
, or omitted, the full documentation is returned forNAME
.(:apropos STRING)
Search the documentation for mentions of
STRING
, and return any found as a list of highlighted strings.(:metavariables WIDTH)
List the currently-active holes, with their types pretty-printed in
WIDTH
columns.(:who-calls NAME)
Get a list of callers of
NAME
.(:calls-who NAME)
Get a list of callees of
NAME
.(:browse-namespace NAMESPACE)
Return the contents of
NAMESPACE
, like:browse
at the command-line REPL.(:normalise-term TM)
Return a highlighted string consisting of the results of normalising the serialised term
TM
(which would previously have been sent as thett-term
property of a string).(:show-term-implicits TM)
Return a highlighted string, consisting of the results of making all arguments in serialised term
TM
explicit. The arguments inTM
would previously have been sent as thett-term
property of a string.(:hide-term-implicits TM)
Return a highlighted string, consisting of the results of making all arguments in serialised term
TM
follow their usual implicitness setting. The arguments inTM
would previously have been sent as thett-term
property of a string.(:elaborate-term TM)
Return a highlighted string, consisting of the core language term corresponding to serialised term
TM
. The arguments inTM
would previously have been sent as thett-term
property of a string.(:print-definition NAME)
Return the definition of
NAME
as a highlighted string.(:repl-completions NAME)
Search names, types and documentations which contain
NAME
. Return the result of tab-completingNAME
as a REPL command.:version
Return the version information of the Idris compiler.
New For Version 2
New in Version 2 of the protocol are:
(:generate-def LINE NAME)
Attempt to generate a complete definition from a type.
(:generate-def-next)
Replace the previous generated definition with the next generated definition.
(:proof-search-next)
Replace the previous proof search result with the next proof search result.
(:intro LINE NAME)
Returns the non-empty list of valid saturated constructors that can be used in the hole at line
LINE
namedNAME
.
Possible Replies
Possible replies include a normal final reply:
(:return (:ok SEXP [HIGHLIGHTING]) ID)
(:return (:error String [HIGHLIGHTING]) ID)
A normal intermediate reply:
(:output (:ok SEXP [HIGHLIGHTING]) ID)
(:output (:error String [HIGHLIGHTING]) ID)
Informational and/or abnormal replies:
(:write-string String ID)
(:set-prompt String ID)
(:warning (FilePath (LINE COL) (LINE COL) String [HIGHLIGHTING]) ID)
Warnings include compiler errors that don’t cause the compiler to stop.
Output Highlighting
Idris mode supports highlighting the output from Idris. In reality, this highlighting is controlled by the Idris compiler. Some of the return forms from Idris support an optional extra parameter: a list mapping spans of text to metadata about that text. Clients can then use this list both to highlight the displayed output and to enable richer interaction by having more metadata present. For example, the Emacs mode allows right-clicking identifiers to get a menu with access to documentation and type signatures.
A particular semantic span is a three element list. The first element of the list is the index at which the span begins, the second element is the number of characters included in the span, and the third is the semantic data itself. The semantic data is a list of lists. The head of each list is a key that denotes what kind of metadata is in the list, and the tail is the metadata itself.
- The following keys are available:
name
gives a reference to the fully-qualified Idris name
implicit
provides a Boolean value that is True if the region is the name of an implicit argument
decor
describes the category of a token, which can be:
type
: type constructorsfunction
: defined functionsdata
: data constructorsbound
: bound variables, orkeyword
source-loc
states that the region refers to a source code location. Its body is a collection of key-value pairs, with the following possibilities:
filename
provides the filename
start
provides the line and column that the source location starts at as a two-element tail
end
provides the line and column that the source location ends at as a two-element tail
text-formatting
provides an attribute of formatted text. This is for use with natural-language text, not code, and is presently emitted only from inline documentation. The potential values are
bold
,italic
, andunderline
.link-href
provides a URL that the corresponding text is a link to.
quasiquotation
states that the region is quasiquoted.
antiquotation
states that the region is antiquoted.
tt-term
A serialised representation of the Idris core term corresponding to the region of text.
Source Code Highlighting
Idris supports instructing editors how to colour their code. When elaborating source code or REPL input, Idris will locate regions of the source code corresponding to names, and emit information about these names using the same metadata as output highlighting.
These messages will arrive as replies to the command that caused elaboration to occur, such as :load-file
or :interpret
.
They have the format:
(:output (:ok (:highlight-source POSNS)) ID)
where POSNS
is a list of positions to highlight. Each of these is a two-element list whose first element is a position (encoded as for the source-loc
property above) and whose second element is highlighting metadata in the same format used for output.
Idris2 Reference Guide
备注
Idris 2 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。
有关 CC0 的更多信息,请访问:https://creativecommons.org/publicdomain/zero/1.0/
This is a placeholder, to get set up with readthedocs.
Documenting Idris Code
Idris documentation comes in two major forms: comments, which exist
for a reader’s edification and are ignored by the compiler, and inline
API documentation, which the compiler parses and stores for future
reference. To consult the documentation for a declaration f
, write
:doc f
at the REPL or use the appropriate command in your editor
(C-c C-d
in Emacs, <LocalLeader>h
in Vim).
Comments
Use comments to explain why code is written the way that it
is. Idris’s comment syntax is the same as that of Haskell: lines
beginning with --
are comments, and regions bracketed by {-
and -}
are comments even if they extend across multiple
lines. These can be used to comment out lines of code or provide
simple documentation for the readers of Idris code.
Inline Documentation
Idris also supports a comprehensive and rich inline syntax for Idris code to be generated. This syntax also allows for named parameters and variables within type signatures to be individually annotated using a syntax similar to Javadoc parameter annotations.
Documentation always comes before the declaration being documented. Inline documentation applies to either top-level declarations or to constructors. Documentation for specific arguments to constructors, type constructors, or functions can be associated with these arguments using their names.
The inline documentation for a declaration is an unbroken string of
lines, each of which begins with |||
(three pipe symbols). The
first paragraph of the documentation is taken to be an overview, and
in some contexts, only this overview will be shown. After the
documentation for the declaration as a whole, it is possible to
associate documentation with specific named parameters, which can
either be explicitly name or the results of converting free variables
to implicit parameters. Annotations are the same as with Javadoc
annotations, that is for the named parameter (n : T)
, the
corresponding annotation is ||| @ n Some description
that is
placed before the declaration.
Documentation is written in Markdown, though not all contexts will display all possible formatting (for example, images are not displayed when viewing documentation in the REPL, and only some terminals render italics correctly). A comprehensive set of examples is given below.
||| Modules can also be documented.
module Docs
||| Add some numbers.
|||
||| Addition is really great. This paragraph is not part of the overview.
||| Still the same paragraph.
|||
||| You can even provide examples which are inlined in the documentation:
||| ```idris example
||| add 4 5
||| ```
|||
||| Lists are also nifty:
||| * Really nifty!
||| * Yep!
||| * The name `add` is a **bold** choice
||| @ n is the recursive param
||| @ m is not
add : (n, m : Nat) -> Nat
add Z m = m
add (S n) m = S (add n m)
||| Append some vectors
||| @ a the contents of the vectors
||| @ xs the first vector (recursive param)
||| @ ys the second vector (not analysed)
appendV : (xs : Vect n a) -> (ys : Vect m a) -> Vect (add n m) a
appendV [] ys = ys
appendV (x::xs) ys = x :: appendV xs ys
||| Here's a simple datatype
data Ty =
||| Unit
UNIT |
||| Functions
ARR Ty Ty
||| Points to a place in a typing context
data Elem : Vect n Ty -> Ty -> Type where
Here : {ts : Vect n Ty} -> Elem (t::ts) t
There : {ts : Vect n Ty} -> Elem ts t -> Elem (t'::ts) t
||| A more interesting datatype
||| @ n the number of free variables
||| @ ctxt a typing context for the free variables
||| @ ty the type of the term
data Term : (ctxt : Vect n Ty) -> (ty : Ty) -> Type where
||| The constructor of the unit type
||| More comment
||| @ ctxt the typing context
UnitCon : {ctxt : Vect n Ty} -> Term ctxt UNIT
||| Function application
||| @ f the function to apply
||| @ x the argument
App : {ctxt : Vect n Ty} -> (f : Term ctxt (ARR t1 t2)) -> (x : Term ctxt t1) -> Term ctxt t2
||| Lambda
||| @ body the function body
Lam : {ctxt : Vect n Ty} -> (body : Term (t1::ctxt) t2) -> Term ctxt (ARR t1 t2)
||| Variables
||| @ i de Bruijn index
Var : {ctxt : Vect n Ty} -> (i : Elem ctxt t) -> Term ctxt t
||| We can document records, including their fields and constructors
record Yummy where
||| Make a yummy
constructor MkYummy
||| What to eat
food : String
Environment Variables
Idris 2 recognises a number of environment variables, to decide where to look for packages, external libraries, code generators, etc. It currently recognises, in approximately the order you’re likely to need them:
EDITOR
- Sets the editor used in REPL :e commandIDRIS2_CG
- Sets which code generator to use when compiling programsIDRIS2_PACKAGE_PATH
- Lists the directories where Idris2 looks for packages, in addition to the defaults (which are under theIDRIS2_PREFIX
and in thedepends
subdirectory of the current working directory). Directories are separated by a:
, or a;
on WindowsIDRIS2_PATH
- Places Idris2 looks for import files, in addition to the imports in packagesIDRIS2_DATA
- Places Idris2 looks for its data files. These are typically support code for code generators.IDRIS2_LIBS
- Places Idris2 looks for libraries used by code generators.IDRIS2_PREFIX
- Gives the Idris2 installation prefixCHEZ
- Sets the location of thechez
executable used in Chez codegenRACKET
- Sets the location of theracket
executable used in Racket codegenRACKET_RACO
- Sets the location of theraco
executable used in Racket codegenGAMBIT_GSI
- Sets the location of thegsi
executable used in Gambit codegenGAMBIT_GSC
- Sets the location of thegsc
executable used in Gambit codegenGAMBIT_GSC_BACKEND
- Sets thegsc
executable backend argumentIDRIS2_CC
- Sets the location of the C compiler executable used in RefC codegenCC
- Sets the location of the C compiler executable used in RefC codegenNODE
- Sets the location of thenode
executable used in Node codegenPATH
- used to search for executables in certain codegens
Dot syntax for records
Long story short, .field
is a postfix projection operator that binds
tighter than function application.
Lexical structure
.foo
is a valid name, which stands for record fields (newName
constructorRF "foo"
)Foo.bar.baz
starting with uppercaseF
is one lexeme, a namespaced identifier:DotSepIdent ["baz", "bar", "Foo"]
foo.bar.baz
starting with lowercasef
is three lexemes:foo
,.bar
,.baz
.foo.bar.baz
is three lexemes:.foo
,.bar
,.baz
If you want
Constructor.field
, you have to write(Constructor).field
.All module names must start with an uppercase letter.
New syntax of simpleExpr
Expressions binding tighter than application (simpleExpr
), such as variables or parenthesised expressions, have been renamed to simplerExpr
, and an extra layer of syntax has been inserted.
simpleExpr ::= (.field)+ -- parses as PPostfixAppPartial
| simplerExpr (.field)+ -- parses as PPostfixApp
| simplerExpr -- (parses as whatever it used to)
(.foo)
is a name, so you can use it to e.g. define a function called.foo
(see.squared
below)(.foo.bar)
is a parenthesised expression
Desugaring rules
(.field1 .field2 .field3)
desugars to(\x => .field3 (.field2 (.field1 x)))
(simpleExpr .field1 .field2 .field3)
desugars to((.field .field2 .field3) simpleExpr)
Record elaboration
there is a new pragma
%prefix_record_projections
, which ison
by defaultfor every field
f
of a recordR
, we get:projection
f
in namespaceR
(exactly like now), unless%prefix_record_projections
isoff
projection
.f
in namespaceR
with the same definition
Example code
record Point where
constructor MkPoint
x : Double
y : Double
This record creates two projections:
* .x : Point -> Double
* .y : Point -> Double
Because %prefix_record_projections
are on
by default, we also get:
* x : Point -> Double
* y : Point -> Double
To prevent cluttering the ordinary global name space with short identifiers, we can do this:
%prefix_record_projections off
record Rect where
constructor MkRect
topLeft : Point
bottomRight : Point
For Rect
, we don’t get the prefix projections:
Main> :t topLeft
(interactive):1:4--1:11:Undefined name topLeft
Main> :t .topLeft
\{rec:0} => .topLeft rec : ?_ -> Point
Let’s define some constants:
pt : Point
pt = MkPoint 4.2 6.6
rect : Rect
rect =
MkRect
(MkPoint 1.1 2.5)
(MkPoint 4.3 6.3)
User-defined projections work, too. (Should they?)
(.squared) : Double -> Double
(.squared) x = x * x
Finally, the examples:
main : IO ()
main = do
-- desugars to (.x pt)
-- prints 4.2
printLn $ pt.x
-- prints 4.2, too
-- maybe we want to make this a parse error?
printLn $ pt .x
-- prints 10.8
printLn $ pt.x + pt.y
-- works fine with namespacing
-- prints 4.2
printLn $ (Main.pt).x
-- the LHS can be an arbitrary expression
-- prints 4.2
printLn $ (MkPoint pt.y pt.x).y
-- user-defined projection
-- prints 17.64
printLn $ pt.x.squared
-- prints [1.0, 3.0]
printLn $ map (.x) [MkPoint 1 2, MkPoint 3 4]
-- .topLeft.y desugars to (\x => .y (.topLeft x))
-- prints [2.5, 2.5]
printLn $ map (.topLeft.y) [rect, rect]
-- desugars to (.topLeft.x rect + .bottomRight.y rect)
-- prints 7.4
printLn $ rect.topLeft.x + rect.bottomRight.y
-- qualified names work, too
-- all these print 4.2
printLn $ Main.Point.(.x) pt
printLn $ Point.(.x) pt
printLn $ (.x) pt
printLn $ .x pt
-- haskell-style projections work, too
printLn $ Main.Point.x pt
printLn $ Point.x pt
printLn $ (x) pt
printLn $ x pt
-- record update syntax uses dots now
-- prints 3.0
printLn $ ({ topLeft.x := 3 } rect).topLeft.x
-- but for compatibility, we support the old syntax, too
printLn $ ({ topLeft->x := 3 } rect).topLeft.x
-- prints 2.1
printLn $ ({ topLeft.x $= (+1) } rect).topLeft.x
printLn $ ({ topLeft->x $= (+1) } rect).topLeft.x
Parses but does not typecheck:
-- parses as: map.x [MkPoint 1 2, MkPoint 3 4]
-- maybe we should disallow spaces before dots?
--
printLn $ map .x [MkPoint 1 2, MkPoint 3 4]
Literate Programming
Idris2 supports several types of literate mode styles.
The unlit’n has been designed based such that we assume that we are parsing markdown-like languages The unlit’n is performed by a Lexer that uses a provided literate style to recognise code blocks and code lines. Anything else is ignored. Idris2 also provides support for recognising both ‘visible’ and ‘invisible’ code blocks using ‘native features’ of each literate style.
A literate style consists of:
a list of String encoded code block deliminators;
a list of line indicators; and
a list of valid file extensions.
Lexing is simple and greedy in that when consuming anything that is a code blocks we treat everything as code until we reach the closing deliminator. This means that use of verbatim modes in a literate file will also be treated as active code.
In future we should add support for literate LaTeX
files, and potentially other common document formats.
But more importantly, a more intelligent processing of literate documents using a pandoc like library in Idris such as: Edda <https://github.com/jfdm/edda> would also be welcome.
Bird Style Literate Files
We treat files with an extension of .lidr
as bird style literate files.
Bird notation is a classic literate mode found in Haskell, (and Orwell) in which visible code lines begin with >
and hidden lines with <
.
Other lines are treated as documentation.
备注
We have diverged from lhs2tex
in which <
is traditionally used to display inactive code.
Bird-style is presented as is, and we recommended use of the other styles for much more comprehensive literate mode.
Embedding in Markdown-like documents
While Bird Style literate mode is useful, it does not lend itself well to more modern markdown-like notations such as Org-Mode and CommonMark. Idris2 also provides support for recognising both ‘visible’ and ‘invisible’ code blocks and lines in both CommonMark and OrgMode documents using native code blocks and lines..
The idea being is that:
Visible content will be kept in the pretty printer’s output;
Invisible content will be removed; and
Specifications will be displayed as is and not touched by the compiler.
OrgMode
We treat files with an extension of .org
as org-style literate files.
Each of the following markup is recognised regardless of case:
Org mode source blocks for idris sans options are recognised as visible code blocks:
#+begin_src idris data Nat = Z | S Nat #+end_src
Comment blocks that begin with
#+BEGIN_COMMENT idris
are treated as invisible code blocks:#+begin_comment idris data Nat = Z | S Nat #+end_comment
Visible code lines, and specifications, are not supported. Invisible code lines are denoted with
#+IDRIS:
:#+IDRIS: data Nat = Z | S Nat
Specifications can be given using OrgModes plain source or example blocks:
#+begin_src map : (f : a -> b) -> List a -> List b map f _ = Nil #+end_src
CommonMark
We treat files with an extension of .md
and .markdown
as CommonMark style literate files.
CommonMark source blocks for idris sans options are recognised as visible code blocks:
```idris data Nat = Z | S Nat ``` ~~~idris data Nat = Z | S Nat ~~~
Comment blocks of the form
<!-- idris\n ... \n -->
are treated as invisible code blocks:<!-- idris data Nat = Z | S Nat -->
Code lines are not supported.
Specifications can be given using CommonMark’s pre-formatted blocks (indented by four spaces) or unlabelled code blocks.:
Compare ```idris map : (f : a -> b) -> List a -> List b map f _ = Nil ``` with map : (f : a -> b) -> List a -> List b map f _ = Nil
LaTeX
We treat files with an extension of .tex
and .ltx
as LaTeX style literate files.
We treat environments named
code
as visible code blocks:\begin{code} data Nat = Z | S Nat \end{code}
We treat environments named
hidden
as invisible code blocks:\begin{hidden} data Nat = Z | S Nat \end{hidden}
Code lines are not supported.
Specifications can be given using user defined environments.
We do not provide definitions for these code blocks and ask the user to define them.
With one such example using fancyverbatim
and comment
packages as:
\usepackage{fancyvrb}
\DefineVerbatimEnvironment
{code}{Verbatim}
{}
\usepackage{comment}
\excludecomment{hidden}
Overloaded literals
The compiler provides directives for literals overloading, respectively
%stringLit <fun>
and %integerLit <fun>
for string and integer literals. During
elaboration, the given function is applied to the corresponding literal.
In the Prelude these functions are set to fromString
and fromInteger
.
The interface FromString ty
provides the fromString : String -> ty
function,
while the Num ty
interface provides the fromInteger : Integer -> ty
function
for all numerical types.
Restricted overloads
Although the overloading of literals can be achieved by implementing the interfaces
described above, in principle only a function with the correct signature and name
is enough to achieve the desired behaviour. This can be exploited to obtain more
restrictive overloading such as converting literals to Fin n
values, where
integer literals greater or equal to n are not constructible values for the type.
Additional implicit arguments can be added to the function signature, in particular
auto implicit arguments for searching proofs. As an example, this is the implementation
of fromInteger
for Fin n
.
public export
fromInteger : (x : Integer) -> {n : Nat} ->
{auto prf : (IsJust (integerToFin x n))} ->
Fin n
fromInteger {n} x {prf} with (integerToFin x n)
fromInteger {n} x {prf = ItIsJust} | Just y = y
The prf
auto implicit is an automatically constructed proof (if possible) that
the literal is suitable for the Fin n
type. The restricted behaviour can be
observed in the REPL, where the failure to construct a valid proof is caught during
the type-checking phase and not at runtime:
Main> the (Fin 3) 2
FS (FS FZ)
Main> the (Fin 3) 5
(interactive):1:13--1:14:Can't find an implementation for IsJust (integerToFin 5 3) at:
1 the (Fin 3) 5
String literals in Idris
To facilitate the use of string literals, idris provides three features in addition to plain string literals: multiline strings, raw strings and interpolated strings.
Plain string literals
String literals behave the way you expect from other programming language. Use quotation marks
"
around the piece of text that you want to use as a string:
"hello world"
As explained in Overloaded literals, string literals can be overloaded to return a type different than string.
Multiline string literals
In some cases you will have to display a large string literal that spans multiple lines. For this you can use multiline string literals, they allow you to span a string across multiple vertical lines, preserving the line returns and the indentation. Additionally they allow you to indent your multiline string with the surrounding code, without breaking the intended format of the string.
To use multiline strings, start with a triple quote """
followed by a line return, then
enter your text and close it with another triple quote """
with whitespace on its left.
The indentation of the closing triple quote will determine how much whitespace should be cropped
from each line of the text.
备注
Multiline strings use triple quotes to enable the automatic cropping of leading whitespace when the multiline block is indented.
welcome : String
welcome = """
Welcome to Idris 2
We hope you enjoy your stay
This line will remain indented with 2 spaces
This line has no intendation
"""
printing the variable welcome will result in the following text:
Welcome to Idris 2
We hope you enjoy your stay
This line will remain indented with 2 spaces
This line has no intendation
As you can see, each line has been stripped of its leading 4 space, that is because the closing delimiter was indented with 4 spaces.
In order to use multiline string literals, remember the following:
The starting delimited must be followed by a line return
The ending delimiter’s intendation level must not exceed the indentation of any line
Raw string literals
It is not uncommon to write string literals that require some amount of escaping. For plain string
literals the characters \\
and "
must be escaped, for multiline strings the characters
"""
must be escaped. Raw string literals allow you to dynamically change the required
escaped
sequence in order to avoid having to escape those very common sets of characters. For this, use
#"
as starting delimiter and "#
as closing delimiter. The number of #
symbols can be
increased in order to accomodate for edge cases where "#
would be a valid symbol.
In the following example we are able to match on \{
by using half as many \\
characters
as if we didn’t use raw string literals:
myRegex : Regex
myRegex = parseRegex #"\\{"#
If you need to escape characters you still can by using a \\
followed by the same number of
#
that you used for your string delimiters. In the following example we are using two
#
characters as our escape sequence and want to print a line return:
markdownExample : String
markdownExample = ##"markdown titles look like this: \##n"# Title \##n body""##
This last example could be implemented by combining raw string literals with multiline strings:
markdownExample : String
markdownExample = ##"""
markdown titles look like this:
"# Title
body"
"""##
Interpolated strings
Concatenating string literals with runtime values happens all the time, but sprinkling our code
with lots of "
and ++
symbols sometimes hurts legibility which in turn can introduce bugs
that are hard to detect for human eyes. Interpolated strings allow to inline the execution of
programs that evaluate to strings with a string literals in order to avoid manually writing out
the concatenation of those expressions. To use interpolated strings, use \{
to start an
interpolation slice in which you can write an idris expression. Close it with }
print : Expr -> String
print (Var name expr) = "let \{name} = \{print expr}"
print (Lam arg body) = #"\\#{arg} => \#{print body}"#
print (Decl fname fargs body) = """
func \{fname}(\{commasep fargs}) {
\{unlines (map print body)}
}
"""
print (Multi lns) = #"""
"""
\#{unlines lns}
"""
"""#
As you can see in the second line, raw string literals and interpolated strings can be combined.
The starting and closing delimiters indicate how many #
must be used as escape sequence in the
string, since interpolated strings require the first {
to be escaped, an interpolated slice
in a raw string uses \#{
as starting delimiter.
Additionally multiline strings can also be combined with string interpolation in the way you
expect, as shown with the Decl
pattern. Finally all three features can be combined together in the
last branch of the example, where a multiline string has a custom escape sequence and includes an
interpolated slice.
Interpolation Interface
The Prelude exposes an Interpolation
interface with one function interpolate
. This function
is used within every interpolation slice to convert an arbitrary expression into a string that can
be concatenated with the rest of the interpolated string.
To go into more details, when you write "hello \{username}"
the compiler translates the expression
into concat [interpolate "hello ", interpolate username]
so that the concatenation is fast and so that if
username
implement the Interpolation
interface, you don’t have to convert it to a string manually.
Here is an example where we reuse the Expr
type but instead of implementing a print
function we implement Interpolation
:
Interpolation Expr where
interpolate (Var name expr) = "let \{name} = \{expr}"
interpolate (Lam arg body) = #"\\#{arg} => \#{body}"#
interpolate (Decl fname fargs body) = """
func \{fname}(\{commasep fargs}) {
\{unlines (map interpolate body)}
}
"""
interpolate (Multi lns) = #"""
"""
\#{unlines lns}
"""
"""#
As you can see we avoid repeated calls to print
since the slices are automatically applied to
interpolate
.
We use Interpolation
instead of Show
for interpolation slices because the semantics of show
are not necessarily the same as interpolate
. Typically the implementation of show
for String
adds double quotes around the text, but for interpolate
what we want is to return the string as is.
In the previous example, "hello \{username}"
, if we were to use show
we would end up with the string
"hello "Susan
which displays an extra pair of double quotes. That is why the implementation of
interpolate
for String
is the identity function: interpolate x = x
. This way the desugared
code looks like: concat [id "hello ", interpolate username]
.
Pragmas
Idris2 supports a number of pragmas (identifiable by the % prefix). Some pragmas change compiler behavior until the behavior is changed back using the same pragma while others apply to the following declaration. A small niche of pragmas apply directly to one or more arguments instead of the code following the pragma (like the %name pragma described below).
备注
This page is a work in progress. If you know about a pragma that is not described yet, please consider submitting a pull request!
%builtin
The %builtin Natural
pragma converts recursive/unary representations of natural numbers
into primitive Integer
representations.
This pragma is explained in detail on its own page. For more, see Builtins.
%deprecate
Mark the following definition as deprecated. Whenever the function is used, Idris will show a deprecation warning.
%deprecate
foo : String -> String
foo x = x ++ "!"
bar : String
bar = foo "hello"
Warning: Deprecation warning: Man.foo is deprecated and will be removed in a future version.
You can use code documentation (triple vertical bar ||| docs) to suggest a strategy for removing the deprecated function call and that documentation will be displayed alongside the warning.
||| Please use the @altFoo@ function from now on.
%deprecate
foo : String -> String
foo x = x ++ "!"
bar : String
bar = foo "hello"
Warning: Deprecation warning: Man.foo is deprecated and will be removed in a future version.
Please use the @altFoo@ function from now on.
%inline
Instruct the compiler to inline the following definition when it is applied. It is generally best to let the compiler and the backend you are using optimize code based on its predetermined rules, but if you want to force a function to be inlined when it is called, this pragma will force it.
%inline
foo : String -> String
foo x = x ++ "!"
%noinline
Instruct the compiler _not_ to inline the following definition when it is applied. It is generally best to let the compiler and the backend you are using optimize code based on its predetermined rules, but if you want to force a function to never be inlined when it is called, this pragma will force it.
%noinline
foo : String -> String
foo x = x ++ "!"
%name
Give the compiler some suggested names to use for a particular type when it is asked to generate names for values. You can specify any number of suggested names; they will be used in-order when more than one is needed for a single definition.
data Foo = X | Y
%name Foo foo,bar
Builtins
Natural numbers
Idris2 supports an optimized runtime representation of natural numbers (non-negative integers). This optimization is automatic, however it only works when natural numbers are represented in a specific way
Here is an example of a natural number that would be optimized:
data Natural
= Zero
| Succ Natural
Natural numbers are generally represented as either zero or the successor (1 more than) of another natural number. These are called Peano numbers.
At runtime, Idris2 will automatically represent this the same as the Integer
type.
This will massively reduce the memory usage.
There are a few rules governing when this optimization occures:
The data type must have 2 constructors
After erasure of runtime irrelevant arguments + One must have no arguments + One must have exactly 1 argument (called
Succ
)
The type of the argument to
Succ
must have the same type constructor as the parent type. This means indexed data types, likeFin
, can be optimised.The argument to
Succ
must be strict, ie notLazy Natural
To ensure that a type is optimized to an Integer
, use %builtin Natural
ie
data MyNat
= Succ MyNat
| Zero
%builtin Natural MyNat
Casting between natural numbers and integer
Idris optimizes functions which convert between natural numbers and integers, so that it takes constant time rather than linear time.
Such functions must be written in a specific way, so that idris can detect that it can be optimised.
Here is an example of a natural to Integer
function.
cast : Natural -> Integer
cast Z = 0
cast (S k) = cast k + 1
This optimization is applied late in the compilation process, so it may be sensitive to seemingly insignificant changes.
However here are roughly the rules governing this optimisation:
Exactly one argument must be pattern matched on (any other forced or dotted patterns are allowed)
The right hand side of the ‘Zero’ case must be
0
The right hand side of the ‘Succ’ case must be
1 + cast k
wherek
is the predecessor of the pattern matched argument
Casting from an Integer
to a natural is a little more complex.
castNonNegative : Integer -> Natural
castNonNegative x = case x of
0 => Zero
_ => Succ $ castNonNegative (x - 1)
cast : Integer -> Natural
cast x = if x < 0 then Zero else castNonNegative x
For now you must manually check the given integer is non-negative.
If you are using an indexed data type it may be very hard to write
your Integer
to natural cast in such a way,
so you can use %builtin IntegerToNatural
to assert to the compiler
that a function is correct. It is your responsibility to make sure this is correct.
module ComplexNat
import Data.Maybe
data ComplexNat
= Zero
| Succ ComplexNat
integerToMaybeNat : Integer -> Maybe ComplexNat
integerToMaybeNat _ = ...
integerToNat :
(x : Integer) ->
{auto 0 prf : IsJust (ComplexNat.integerToMaybeNat x)} ->
ComplexNat
integerToNat x {prf} = fromJust (integerToMaybeNat x) @{prf}
%builtin IntegerToNatural ComplexNat.integerToNat
Other operations
This can be used with %transform
to allow many other operations to be O(1) too.
eqNat : Nat -> Nat -> Bool
eqNat Z Z = True
eqNat (S j) (S k) = eqNat j k
eqNat _ _ = False
%transform "eqNat" eqNat j k = natToInteger j == natToInteger k
plus : Nat -> Nat -> Nat
plus Z y = y
plus (S x) y = S $ plus x y
%transform "plus" plus j k = integerToNat (natToInteger j + natToInteger j)
Compilation
Here are the details of how natural numbers are compiled to Integer
s.
Note: a numeric literal here is an Integer
.
Zero
=> 0
Succ k
=> 1 + k
case k of
Z => zexp
S k' => sexp
=>
case k of
0 => zexp
_ => let k' = k - 1 in sexp
Debugging The Compiler
Performance
The compiler has the --timing
flag to dump timing information collected during operation.
The output documents, in reverse chronological order, the cumulative time taken for the operation (and sub operations) to complete.
Sub levels are indicated by successive repetitions of +
.
Logging
The compiler logs various categories of information during operation at various levels.
Log levels are characterised by two things:
a dot-separated path of ever finer topics of interest e.g. scope.let
a natural number corresponding to the verbosity level e.g. 5
If the user asks for some logs by writing:
%logging "scope" 5
they will get all of the logs whose path starts with scope and whose verbosity level is less or equal to 5. By combining different logging directives, users can request information about everything (with a low level of details) and at the same time focus on a particular subsystem they want to get a lot of information about. For instance::
%logging 1
%logging "scope.let" 10
will deliver basic information about the various phases the compiler goes through and deliver a lot of information about scope-checking let binders.
You can set the logging level at the command line using:
--log <level>
and through the REPL using:
:log <string category> <level>
:logging <string category> <level>
The supported logging categories can be found using the command line flag:
--help logging
REPL Commands
To see more debug information from the REPL there are several options one can set.
command |
description |
---|---|
|
show debugging information for a name |
|
show values of implicit arguments |
Compiler Flags
There are several ‘hidden’ compiler flags that can help expose Idris’ inner workings.
command |
description |
---|---|
|
dump case trees to the given file |
|
dump lambda lifted trees to the given file |
|
dump ANF to the given file |
|
dump VM Code to the given file |
|
do more elaborator checks (currently conversion in LinearCheck) |
Output Formats
Debug Output
Calling :di <name>
dumps debugging information about the selected term.
Specifically dumped are:
topic |
description |
---|---|
Full Name(s) |
The fully qualified name of the term. |
Multiplicity |
The terms multiplicity. |
Erasable Arguments |
Things that are erased. |
Detaggable argument types |
|
Specialised arguments |
|
Inferrable arguments |
|
Compiled version |
|
Compile time linked terms |
|
Runtime linked terms |
|
Flags |
|
Size change graph |
经典实例
经典实例是为 Idris 2 中的常见模式和应用提供常见的案例。
解析
Idris 2 带有一个词法分析库和语法解析库,内置在 contrib
包中。
在本示例中,我们将写一个非常简单的 lambda 演算解析器,该解析器将接受以下语言:
let name = world in (\x.hello x) name
一旦我们写了一个 lambda 演算解析器,我们还将看到我们如何利用 Idris 2 中强大的内置表达式解析器来写一个小计算器,它应该足够聪明来解析以下表达式:
1 + 2 - 3 * 4 / 5
词法分析器
词法分析模块主要在 Text.Lexer
下。这个模块包含 toTokenMap
,这是一个转换 List (Lexer, k) -> TokenMap (Token k)
的函数,其中 k
是一个标记种类。这个函数可用于词法与 Token 的简单映射。该模块还包括高级词法,用于指定数量和常见的编程原语,如 alphas
, intLit
, lineComment
和 blockComment
。
Text.Lexer
模块还重新导出了 Text.Lexer.Core
、 Text.Quantity
和 Text.Token
。
Text.Lexer.Core
提供了词法的基本构建块,包括一个叫做 Recognise
的类型,它是词法的底层数据类型。这个模块提供的另一个重要功能是 lex
,它接收一个词法分析器并返回 token。
Text.Quantity
提供了一个数据类型 Quantity
可以与某些词法一起使用,以指定某些东西预计会出现多少次。
Text.Token
提供一个数据类型 Token
表示一个被解析的标记和它的种类以及文本。这个模块还提供了一个重要的接口,称为 TokenKind
.,它告诉词法分析器如何将标记种类映射到 Idris 2 类型,以及如何将每种种类从字符串转换为一个值。
解析器
解析器模主要在 Text.Parser
下。这个模块包含不同的语法分析器,主要的语法分析器是 match
它接收一个 TokenKind
并返回 TokenKind
接口中定义的值。还有其他的语法分析器,但对于我们的例子,我们将只使用 match
。
Text.Parser
模块重新导出 Text.Parser.Core
, Text.Quantity
和 Text.Token
。
Text.Parser.Core
提供了解析器的构建块,包括一个叫做 Grammar
的类型,它是解析器的底层数据类型。这个模块提供的另一个重要函数是 parse
它接收一个 Grammar
并返回解析后的表达式。
我们在 Lexer 部分介绍了 Text.Quantity
和 Text.Token
,所以我们不打算在这里重复它们的作用。
Lambda 演算的分析器和解析器
1import Data.List
2import Data.List1
3import Text.Lexer
4import Text.Parser
5
6%default total
7
8data Expr = App Expr Expr | Abs String Expr | Var String | Let String Expr Expr
9
10Show Expr where
11 showPrec d (App e1 e2) = showParens (d == App) (showPrec (User 0) e1 ++ " " ++ showPrec App e2)
12 showPrec d (Abs v e) = showParens (d > Open) ("\\" ++ v ++ "." ++ show e)
13 showPrec d (Var v) = v
14 showPrec d (Let v e1 e2) = showParens (d > Open) ("let " ++ v ++ " = " ++ show e1 ++ " in " ++ show e2)
15
16data LambdaTokenKind
17 = LTLambda
18 | LTIdentifier
19 | LTDot
20 | LTOParen
21 | LTCParen
22 | LTIgnore
23 | LTLet
24 | LTEqual
25 | LTIn
26
27Eq LambdaTokenKind where
28 (==) LTLambda LTLambda = True
29 (==) LTDot LTDot = True
30 (==) LTIdentifier LTIdentifier = True
31 (==) LTOParen LTOParen = True
32 (==) LTCParen LTCParen = True
33 (==) LTLet LTLet = True
34 (==) LTEqual LTEqual = True
35 (==) LTIn LTIn = True
36 (==) _ _ = False
37
38Show LambdaTokenKind where
39 show LTLambda = "LTLambda"
40 show LTDot = "LTDot"
41 show LTIdentifier = "LTIdentifier"
42 show LTOParen = "LTOParen"
43 show LTCParen = "LTCParen"
44 show LTIgnore = "LTIgnore"
45 show LTLet = "LTLet"
46 show LTEqual = "LTEqual"
47 show LTIn = "LTIn"
48
49LambdaToken : Type
50LambdaToken = Token LambdaTokenKind
51
52Show LambdaToken where
53 show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
54
55TokenKind LambdaTokenKind where
56 TokType LTIdentifier = String
57 TokType _ = ()
58
59 tokValue LTLambda _ = ()
60 tokValue LTIdentifier s = s
61 tokValue LTDot _ = ()
62 tokValue LTOParen _ = ()
63 tokValue LTCParen _ = ()
64 tokValue LTIgnore _ = ()
65 tokValue LTLet _ = ()
66 tokValue LTEqual _ = ()
67 tokValue LTIn _ = ()
68
69ignored : WithBounds LambdaToken -> Bool
70ignored (MkBounded (Tok LTIgnore _) _ _) = True
71ignored _ = False
72
73identifier : Lexer
74identifier = alpha <+> many alphaNum
75
76keywords : List (String, LambdaTokenKind)
77keywords = [
78 ("let", LTLet),
79 ("in", LTIn)
80]
81
82lambdaTokenMap : TokenMap LambdaToken
83lambdaTokenMap = toTokenMap [(spaces, LTIgnore)] ++
84 [(identifier, \s =>
85 case lookup s keywords of
86 (Just kind) => Tok kind s
87 Nothing => Tok LTIdentifier s
88 )
89 ] ++ toTokenMap [
90 (exact "\\", LTLambda),
91 (exact ".", LTDot),
92 (exact "(", LTOParen),
93 (exact ")", LTCParen),
94 (exact "=", LTEqual)
95 ]
96
97lexLambda : String -> Maybe (List (WithBounds LambdaToken))
98lexLambda str =
99 case lex lambdaTokenMap str of
100 (tokens, _, _, "") => Just tokens
101 _ => Nothing
102
103mutual
104 expr : Grammar state LambdaToken True Expr
105 expr = do
106 t <- term
107 app t <|> pure t
108
109 term : Grammar state LambdaToken True Expr
110 term = abs
111 <|> var
112 <|> paren
113 <|> letE
114
115 app : Expr -> Grammar state LambdaToken True Expr
116 app e1 = do
117 e2 <- term
118 app1 $ App e1 e2
119
120 app1 : Expr -> Grammar state LambdaToken False Expr
121 app1 e = app e <|> pure e
122
123 abs : Grammar state LambdaToken True Expr
124 abs = do
125 match LTLambda
126 commit
127 argument <- match LTIdentifier
128 match LTDot
129 e <- expr
130 pure $ Abs argument e
131
132 var : Grammar state LambdaToken True Expr
133 var = map Var $ match LTIdentifier
134
135 paren : Grammar state LambdaToken True Expr
136 paren = do
137 match LTOParen
138 e <- expr
139 match LTCParen
140 pure e
141
142 letE : Grammar state LambdaToken True Expr
143 letE = do
144 match LTLet
145 commit
146 argument <- match LTIdentifier
147 match LTEqual
148 e1 <- expr
149 match LTIn
150 e2 <- expr
151 pure $ Let argument e1 e2
152
153parseLambda : List (WithBounds LambdaToken) -> Either String Expr
154parseLambda toks =
155 case parse expr $ filter (not . ignored) toks of
156 Right (l, []) => Right l
157 Right e => Left "contains tokens that were not consumed"
158 Left e => Left (show e)
159
160parse : String -> Either String Expr
161parse x =
162 case lexLambda x of
163 Just toks => parseLambda toks
164 Nothing => Left "Failed to lex."
测试一下我们的分析器,得到的输出结果如下:
$ idris2 -p contrib LambdaCalculus.idr
Main> :exec printLn $ parse "let name = world in (\\x.hello x) name"
Right (let name = world in (\x.hello x) name)
表达式解析器
Idris 2 还在 Text.Parser.Expression
中配备了一个非常方便的表达式解析器,可以明确优先权和关联性。
名为 buildExpressionParser
的主函数接受一个 OperatorTable
和一个表示术语的 Grammar
,并返回一个解析后的表达式。魔法来自 OperatorTable
,因为该表定义了所有运算符及其语法、优先级和关联性。
一个 OperatorTable
是一个包含 Op
类型的列表。 Op
类型允许你指定 Prefix
, Postfix
, 和 Infix
运算符以及它们的语法。 Infix
也包含了名为 Assoc
的关联性,可以指定左关联性 AssocLeft
,右关联性 AssocRight
,以及非关联性 AssocNone
。
我们将在计算器中使用的运算符表的一个例子是:
[
[ Infix (match CTMultiply >> pure (*)) AssocLeft
, Infix (match CTDivide >> pure (/)) AssocLeft
],
[ Infix (match CTPlus >> pure (+)) AssocLeft
, Infix (match CTMinus >> pure (-)) AssocLeft
]
]
这张表定义了4个运算符,用于乘法、除法、加法和减法。乘法和除法出现在第一个表中,因为它们的优先级高于加法和减法,后者出现在第二个表中。我们还将它们定义为 infix 运算符,有一个特定的语法,并且都是通过 AssocLeft
进行左关联。
构建一个计算器
1import Data.List1
2import Text.Lexer
3import Text.Parser
4import Text.Parser.Expression
5
6%default total
7
8data CalculatorTokenKind
9 = CTNum
10 | CTPlus
11 | CTMinus
12 | CTMultiply
13 | CTDivide
14 | CTOParen
15 | CTCParen
16 | CTIgnore
17
18Eq CalculatorTokenKind where
19 (==) CTNum CTNum = True
20 (==) CTPlus CTPlus = True
21 (==) CTMinus CTMinus = True
22 (==) CTMultiply CTMultiply = True
23 (==) CTDivide CTDivide = True
24 (==) CTOParen CTOParen = True
25 (==) CTCParen CTCParen = True
26 (==) _ _ = False
27
28Show CalculatorTokenKind where
29 show CTNum = "CTNum"
30 show CTPlus = "CTPlus"
31 show CTMinus = "CTMinus"
32 show CTMultiply = "CTMultiply"
33 show CTDivide = "CTDivide"
34 show CTOParen = "CTOParen"
35 show CTCParen = "CTCParen"
36 show CTIgnore = "CTIgnore"
37
38CalculatorToken : Type
39CalculatorToken = Token CalculatorTokenKind
40
41Show CalculatorToken where
42 show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
43
44TokenKind CalculatorTokenKind where
45 TokType CTNum = Double
46 TokType _ = ()
47
48 tokValue CTNum s = cast s
49 tokValue CTPlus _ = ()
50 tokValue CTMinus _ = ()
51 tokValue CTMultiply _ = ()
52 tokValue CTDivide _ = ()
53 tokValue CTOParen _ = ()
54 tokValue CTCParen _ = ()
55 tokValue CTIgnore _ = ()
56
57ignored : WithBounds CalculatorToken -> Bool
58ignored (MkBounded (Tok CTIgnore _) _ _) = True
59ignored _ = False
60
61number : Lexer
62number = digits
63
64calculatorTokenMap : TokenMap CalculatorToken
65calculatorTokenMap = toTokenMap [
66 (spaces, CTIgnore),
67 (digits, CTNum),
68 (exact "+", CTPlus),
69 (exact "-", CTMinus),
70 (exact "*", CTMultiply),
71 (exact "/", CTDivide)
72]
73
74lexCalculator : String -> Maybe (List (WithBounds CalculatorToken))
75lexCalculator str =
76 case lex calculatorTokenMap str of
77 (tokens, _, _, "") => Just tokens
78 _ => Nothing
79
80mutual
81 term : Grammar state CalculatorToken True Double
82 term = do
83 num <- match CTNum
84 pure num
85
86 expr : Grammar state CalculatorToken True Double
87 expr = buildExpressionParser [
88 [ Infix ((*) <$ match CTMultiply) AssocLeft
89 , Infix ((/) <$ match CTDivide) AssocLeft
90 ],
91 [ Infix ((+) <$ match CTPlus) AssocLeft
92 , Infix ((-) <$ match CTMinus) AssocLeft
93 ]
94 ] term
95
96parseCalculator : List (WithBounds CalculatorToken) -> Either String Double
97parseCalculator toks =
98 case parse expr $ filter (not . ignored) toks of
99 Right (l, []) => Right l
100 Right e => Left "contains tokens that were not consumed"
101 Left e => Left (show e)
102
103parse1 : String -> Either String Double
104parse1 x =
105 case lexCalculator x of
106 Just toks => parseCalculator toks
107 Nothing => Left "Failed to lex."
测试一下我们的计算器,就可以得到以下输出:
$ idris2 -p contrib Calculator.idr
Main> :exec printLn $ parse1 "1 + 2 - 3 * 4 / 5"
Right 0.6000000000000001
Comments
Package files support comments using the standard Idris singleline
--
and multiline{- -}
format.