Idris 2 语言文档

备注

Idris 2 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

有关 CC0 的更多信息,请访问:https://creativecommons.org/publicdomain/zero/1.0/

Idris 2 速成课程

这是一个Idris 2的速成课程(有点像教程,但恐怕没有那么温和!)。它提供了一个关于Idris语言编程的简要介绍。它涵盖了核心的语言功能,假设读者有一些现有函数式编程语言的经验,如Haskell或OCaml。

这是从 Idris 1 教程中修订和更新的。关于自 Idris 1 以来的变化细节,详见 自 Idris 1 以来的变化

备注

Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/

介绍

在传统的编程语言中,类型 之间有明显的区别。例如,在 Haskell ,以下是类型,分别代表整数、字符、字符列表和任意值的列表:

  • Int, Char, [Char], [a]

相应地,以下值是这些类型的成员的示例:

  • 42, ’a’, "Hello world!", [2,3,4,5,6]

然而,在具有*依赖类型*的语言中,它们的区别不太明显。依赖类型允许类型“依赖”值——换句话说,类型是*一等*语言结构,可以像任何其他值一样被操作。标准示例是给定长度的列表类型 1Vect n a ,其中 a 是元素类型, n 是列表的长度,且可以任意长。

当类型可以包含值,并且这些值描述属性时,例如一个列表的长度,函数的类型就可以开始描述它自己的属性。以两个列表的连接为例。这个操作的属性是:结果列表的长度是两个输入列表的长度之和。因此,我们可以给 app 函数提供以下类型,它用于连接向量:

app : Vect n a -> Vect m a -> Vect (n + m) a

本教程介绍了Idris,一种具有依赖类型的通用函数式编程语言。Idris 项目的目标是建立一个适用于可验证的通用编程的依赖类型语言。为此,Idris 被设计为一种编译语言,旨在生成高效的可执行代码。同时它还有一个轻量级的外部函数接口,允许与外部库轻松互动。

目标受众

本教程旨在作为该语言的简要介绍,并针对已经熟悉函数式语言的读者,如 HaskellOCaml 。特别是假设对 Haskell 语法有一定程度的熟悉,尽管大多数概念至少会被简单地解释。同时还假设读者对使用依赖类型来编写和验证软件有一定的兴趣。

关于Idris的更深入的介绍,它以更慢的速度进行,涵盖了交互式程序开发,也有更多的示例,见 Type-Driven Development with Idris ,作者 Edwin Brady ,本书可从 Manning 获取。

示例代码

本教程包括一些示例代码,这些代码已经针对Idris 2进行了测试。这些文件与Idris 2发行版一起提供,所以你可以很容易地使用它们。它们可以在 samples 目录下找到。然而,强烈建议你自己输入它们,而不是简单地加载然后阅读。

脚注

1

通常,并且可能令人困惑的是,在依赖类型编程文献中称为“向量”。

入门

从源代码安装

Windows 的先决条件
MSYS2

要在 Windows 上构建 Idris 2 ,需要一个类似 Unix 的环境,用于构建过程中使用的所有工具。 MSYS2 为我们提供了这个环境。

  1. 下载最新版本的 MSYS2

  2. 运行安装程序。不要把它安装在程序文件下,因为它需要写文件(例如, “unix ” 主目录就在那下面)

  3. 在你安装 MSYS2 的目录中,找到文件 mingw64.ini ,并添加行 MSYS2_PATH_TYPE=inherit 。这样就把 windows PATH 加入到 MSYS2 的 shell 中。

  4. 启动 MSYS2 (点击 mingw64.exe ,因为开始菜单中的图标不会从 ini 中获取 MSYS2_PATH_TYPE ,但它可以被添加到系统设置中)

  5. 通过 pacman -Syu 更新安装最新版本

  6. 安装构建所需的程序:

    $ pacman -S make mingw-w64-x86_64-gcc
    
Chez Scheme

Chez Scheme 在 GitHub 上有一个现成的安装程序

  1. 下载安装程序并运行它,不要安装在有空格的路径中,目前 Idris2 对空格有问题。

  2. 将64位方案添加到 PATH 中。它是 \bin\ta6nt 子目录,是安装 Chez Scheme 的地方。因此,如果你使用 “C:Chez”,它将在 C:\Chez\bin\ta6nt

构建
  1. 启动一个新的 MSYS2 shell ,让它知道你修改过的 PATH (使用 Mingw64 来获得正确的编译器是很重要的)。

  2. 导航到Idris2目录。

  3. 设置 Idris2 需要的 SCHEME 环境变量 export SCHEME=scheme 。这可以在 bash 配置文件或 Windows 设置中永久设置。

  4. 现在 make bootstrap && make install 应该建立 Idris2 并安装在 home/<username>/.idris2/bin 在你的 MSYS2 安装下。如果你把它添加到 Windows 设置的 PATH 中,它将可以从你打开的任何命令行(包括 Powershell 或 DOS )使用。

前置条件

因为Idris 2 是由 Idris 2 自身实现的,所以要启动它,你可以从生成的 Scheme 源码开始构建。要做到这一点,你需要 Chez Scheme (默认的,目前首选,因为它是最快的)或 Racket 。你可以从以下地方获得其中之一:

两者都可以从 MacPorts/Homebrew 和所有主要的 Linux 发行版获得。Windows 需要一些进一步的先决条件,详详见 Windows 的先决条件

注意 :如果你从源文件安装 Chez Scheme,在本地构建它的时候,确保你运行 ./configure --threads 来构建多线程支持。

下载和安装

你可以从 Idris网站 下载Idris 2源代码,或者从Github上的 idris-lang/Idris2 获得最新的开发版本。 这包括Idris 2的源代码和由此产生的 Scheme 代码。 一旦你解压了源代码,你可以按以下方式安装它:

make bootstrap SCHEME=chez

其中 chez 是 Chez Scheme 编译器的可执行名称。这因系统而异,但通常是 schemechezscheme ,或 chezscheme9.5 中的一种。如果你是通过 Racket 构建的,你可以按以下方式安装它:

make bootstrap-racket

一旦你用上述任何一个命令成功启动,你就可以用 make install 命令进行安装。 默认情况下,这将安装到 ${HOME}/.idris2 。你可以通过编辑 config.mk 中的选项来改变这个。例如,要安装到 /usr/local ,你可以编辑 IDRIS2_PREFIX ,如下所示:

IDRIS2_PREFIX ?= /usr/local

从包管理器安装

使用 Homebrew 进行安装

如果你是 Homebrew 用户,你可以通过运行以下命令安装 Idris 2 和所有的依赖:

brew install idris2

检查安装

为了检查安装是否成功,并编写你的第一个 Idris 程序,创建一个名为 hello.idr 的文件,并包含以下文本:

module Main

main : IO ()
main = putStrLn "Hello world"

如果你熟悉 Haskell ,应该相当清楚这个程序在做什么以及如何工作,如果不熟悉,我们将在后面解释细节。你可以通过在 shell 提示符下输入 idris2 hello.idr -o hello 来将程序编译成可执行文件。默认情况下,这将创建一个名为 hello 的可执行程序,它将调用一个生成和编译的 Chez Scheme 程序,在目标目录 build/exec 中,你可以运行它:

$ idris2 hello.idr -o hello
$ ./build/exec/hello
Hello world

请注意,美元符号 $ 表示 shell 提示! Idris 命令的一些有用的选项是:

  • -o prog 编译成可执行文件,名为 prog

  • --check 文件类型检查和它的依赖关系,而不启动交互式环境。

  • --package pkg 添加软件包为依赖项,例如 --package contrib 表示使用 contrib 包。

  • --help 显示使用摘要和命令行选项。

你可以在 编译为可执行文件 一节中找到更多关于编译成可执行文件的信息。

交互式环境

在 shell 提示符下输入 idris2 ,就会启动交互式环境。你应该看到类似下面的内容:

$ idris2
     ____    __     _         ___
    /  _/___/ /____(_)____   |__ \
    / // __  / ___/ / ___/   __/ /     Version 0.5.1
  _/ // /_/ / /  / (__  )   / __/      https://www.idris-lang.org
 /___/\__,_/_/  /_/____/   /____/      Type :? for help

Welcome to Idris 2.  Enjoy yourself!
Main>

这给出了一个 ghci 风格的界面,允许对表达式进行求值以及类型检查;定理证明、编译;编辑;以及其他各种操作。命令 :? 给出了一个支持的命令列表。下面,我们看到一个运行的例子,其中 hello.idr 被加载, main 的类型被检查,然后程序被编译为可执行文件 hello ,可在目标目录 build/exec/ 中获得。对文件进行类型检查,如果成功的话,会创建一个文件的字节码版本(在本例中是 build/ttc/hello.ttc ),以加快未来的加载速度。如果源文件发生变化,则重新生成字节码。

$ idris2 hello.idr
     ____    __     _         ___
    /  _/___/ /____(_)____   |__ \
    / // __  / ___/ / ___/   __/ /     Version 0.5.1
  _/ // /_/ / /  / (__  )   / __/      https://www.idris-lang.org
 /___/\__,_/_/  /_/____/   /____/      Type :? for help

Welcome to Idris 2.  Enjoy yourself!
Main> :t main
Main.main : IO ()
Main> :c hello main
File build/exec/hello written
Main> :q
Bye for now!

类型和函数

原语类型

Idris 定义了几个原语类型。 Int , IntegerDouble 用于数字操作, CharString 用于文本操作,以及 Ptr 表示外来指针。库中还声明了几种数据类型,包括 Bool ,其值为 TrueFalse 。我们可以用这些类型声明一 些常量。在文件 Prims.idr 中输入以下内容,并通过输入 idris2 Prims.idr 将其加载到 Idris 交互环境中:

module Prims

x : Int
x = 94

foo : String
foo = "Sausage machine"

bar : Char
bar = 'Z'

quux : Bool
quux = False

一个 Idris 文件由一个可选的模块声明(这里是 module Prims )组成,后面是可选的导入列表和声明与定义的集合。在这个例子中没有指定导入。然而 Idris 程序可以由几个模块组成,每个模块的定义都有自己的命名空间。这将在 模块和命名空间 部分进一步讨论。当编写 Idris 程序时,定义的顺序和缩进都很重要。函数和数据类型必须在使用前定义,顺便说一下,每个定义都必须有一个类型声明,例如上面列表中的 x : Int , foo : String ,。新的声明必须从与前一个声明相同的缩进层次开始。或者用分号 ; 来终止声明。

库模块 prelude 会被每个 Idris 程序自动导入,包括 IO 、算术、数据结构和各种常用函数的设施。preclude 模块定义了几个算术和比较运算符,我们可以在提示符下使用。在提示符下对事物进行求值会得到一个答案,例如:

Prims> 13+9*9
94 : Integer
Prims> x == 9*9+13
True

所有常见的算术和比较运算符都是被定义为原语类型的。它们通过接口被重载,我们将在 接口 一节中讨论,并且可以被扩展到用户定义的类型上工作。例如,布尔表达式可以用 if...then...else 构建来测试:

*prims> if x == 8 * 8 + 30 then "Yes!" else "No!"
"Yes!"

数据类型

数据类型的声明方式和语法与 Haskell 类似。例如,自然数和列表可以被声明如下:

data Nat    = Z   | S Nat           -- Natural numbers
                                    -- (zero and successor)
data List a = Nil | (::) a (List a) -- Polymorphic lists

数据类型名称不能以小写字母开头(我们将在后面看到为什么不可以!)。 上面的声明来自标准库。一进制自然数可以是零 (Z),也可以是另一个自然数的后继者 (S k)。列表可以是空的 (Nil ),也可以是添加到另一个列表前面的值 (x :: xs )。在 List 的声明中,我们使用了一个 infix 运算符 :: 。像这样的新运算符可以使用缀序声明来添加,如下所示:

infixr 10 ::

函数、数据构造器和类型构造器都可以被赋予 infix 运算符作为名称。它们可以以前缀的形式使用,如果用括号括起来,例如: (::) 。中缀运算符可以使用任何符号:

:+-*\/=.?|&><!@$%^~#

一些由这些符号构建的运算符不能被用户定义。这些是

%, \, :, =, |, |||, <-, ->, =>, ?, !, &, **, ..

函数

函数是通过模式匹配实现的,同样使用与 Haskell 类似的语法。主要的区别是 Idris 要求所有函数的类型声明使用单冒号 : (而不是Haskell的双冒号 :: )。一些自然数算术函数可以定义如下,同样取自标准库:

-- Unary addition
plus : Nat -> Nat -> Nat
plus Z     y = y
plus (S k) y = S (plus k y)

-- Unary multiplication
mult : Nat -> Nat -> Nat
mult Z     y = Z
mult (S k) y = plus y (mult k y)

标准的算术运算符 +* 也被重载,供 Nat 使用,并用上述函数实现。与 Haskell 不同,对函数名是否必须以大写字母开头没有任何限制。函数名 (上面的 plusmult )、数据构造函数 ( Z, S, Nil::) 和类型构造函数 ( NatList) 都属于同一个命名空间。然而,根据惯例,数据类型和构造函数名称通常以大写字母开头。我们可以在 Idris 提示下测试这些函数:

Main> plus (S (S Z)) (S (S Z))
4
Main> mult (S (S (S Z))) (plus (S (S Z)) (S (S Z)))
12

和算术运算一样,整数字面量也是使用接口重载的,这意味着我们也可以按如下方式测试函数:

Idris> plus 2 2
4
Idris> mult 3 (plus 2 2)
12

顺便说一下,你可能会想知道,既然我们的计算机已经内置了完美的整数运算,为什么我们还有一进制自然数。原因主要是一进制自然数有一个非常方便的结构,容易推理,而且容易与其他数据结构联系起来,我们将在后面看到。尽管如此,我们并不希望这种方便是以牺牲效率为代价的。幸运的是, Idris 知道 Nat (和类似的结构化类型)和数字之间的关系。这意味着它可以优化表示,以及诸如 plusmult 等函数。

where 从句

也可以使用 where 从句在 本地 定义函数。例如,为了定义一个反转列表的函数,我们可以使用一个辅助函数来累积新的反转后的列表,而这个函数不需要全局可见:

reverse : List a -> List a
reverse xs = revAcc [] xs where
  revAcc : List a -> List a -> List a
  revAcc acc [] = acc
  revAcc acc (x :: xs) = revAcc (x :: acc) xs

缩进是很重要的 – where 块中的函数必须比外部函数有更深的缩进层次。

备注

作用域

任何在外层作用域中可见的名字在 where 从句中也是可见的(除非它们被重新定义,例如这里的 xs )。在类型声明中出现的名字也将在 where 从句的作用域内。

除了函数, where 块也可以包括本地数据类型声明,比如下面的 MyLTfoo 的定义之外不能访问:

foo : Int -> Int
foo x = case isLT of
            Yes => x*2
            No => x*4
    where
       data MyLT = Yes | No

       isLT : MyLT
       isLT = if x < 20 then Yes else No

where 从句中定义的函数需要一个类型声明,就像任何顶层函数一样。下面是另一个例子,说明这在实践中是如何工作的:

even : Nat -> Bool
even Z = True
even (S k) = odd k where
  odd : Nat -> Bool
  odd Z = False
  odd (S k) = even k

test : List Nat
test = [c (S 1), c Z, d (S Z)]
  where c : Nat -> Nat
        c x = 42 + x

        d : Nat -> Nat
        d y = c (y + 1 + z y)
              where z : Nat -> Nat
                    z w = y + w
完全性和覆盖性

默认情况下,Idris 的函数必须是 covering 。也就是说,必须有涵盖输入类型的所有可能值的模式。例如,下面的定义将给出一个错误:

fromMaybe : Maybe a -> a
fromMaybe (Just x) = x

这给出了一个错误,因为 fromMaybe Nothing 没有定义。Idris会输出报告:

frommaybe.idr:1:1--2:1:fromMaybe is not covering. Missing cases:
        fromMaybe Nothing

你可以用 partial 注解来忽略这一警告。

partial fromMaybe : Maybe a -> a
fromMaybe (Just x) = x

然而,这并不可取,一般来说,你只应该在函数的初始开发过程中,或者在调试过程中这样做。 如果你试图在运行时对 fromMaybe Nothing 进行求值,你将得到一个运行时错误。

Idris programs can contain holes which stand for incomplete parts of programs. For example, we could leave a hole for the greeting in our “Hello world” program:

main : IO ()
main = putStrLn ?greeting

语法 ?greeting 引入了一个孔,它代表程序中尚未编写的一部分。这是一个有效的I dris 程序,你可以检查 greeting 的类型:

Main> :t greeting
-------------------------------------
greeting : String

检查一个孔的类型也会显示作用域内任何变量的类型。例如,给定一个不完整的定义 even

even : Nat -> Bool
even Z = True
even (S k) = ?even_rhs

我们可以检查 even_rhs 的类型,看到预期的返回类型,以及变量 k 的类型:

Main> :t even_rhs
   k : Nat
-------------------------------------
even_rhs : Bool

孔的用途在于可以帮助我们 渐进的 写函数。与其一次写完整个函数,我们可以留下一些部分不写,Idris 会告诉我们完还需要完成哪些内容。

依赖类型

一等类型

在 Idris 中,类型是一类公民,意味着它们可以像其他语言结构一样被计算和操作(并传递给函数)。例如,我们可以写一个函数来计算一个类型:

isSingleton : Bool -> Type
isSingleton True = Nat
isSingleton False = List Nat

这个函数从一个 Bool 值计算出适当的类型,这个 Bool 值表示是否是一个单例。我们可以在任何可以使用类型的地方使用这个函数来计算一个类型。例如,它可以被用来计算一个返回类型:

mkSingle : (x : Bool) -> isSingleton x
mkSingle True = 0
mkSingle False = []

或者它可以用在输入类型上。以下函数计算 Nat 列表的总和,或返回给定的 Nat ,具体取决于单例标志是否为真:

sum : (single : Bool) -> isSingleton single -> Nat
sum True x = x
sum False [] = 0
sum False (x :: xs) = x + sum False xs
向量

依赖数据类型的一个标准例子是 “有长度的列表” 类型,在依赖类型文献中习惯上称为向量。它们作为 Idris 库的一部分,可以通过 Data.Vect 导入,或者我们可以像这样声明它们:

data Vect : Nat -> Type -> Type where
   Nil  : Vect Z a
   (::) : a -> Vect k a -> Vect (S k) a

注意,我们使用了与 List 相同的构造函数名称。Idris 接受这样的临时名称重载,只要这些名称是在不同的命名空间(在实践中,通常是在不同的模块中)声明的。有歧义的构造函数名称通常可以通过不同的上下文来解决。

这声明了一个类型族,因此声明的形式与上面的简单类型声明相当不同。我们明确说明类型构造函数 Vect 的类型 – 它接受一个 Nat 和一个类型作为参数,其中 Type 代表类型的类型。我们说 Vect 是在 Nat 上建立 索引 的 ,并且通过 Type 参数化 。每个构造函数针对类型族的不同部分。 Nil 只能用来构造零长度的向量,而 :: 用来构造非零长度的向量。在 :: 的类型中,我们明确指出,一个类型为 a 的元素和一个类型为 Vect k a 的尾部(即一个长度为 k 的向量)组合成一个长度为 S k 的向量。

我们可以通过模式匹配的方式,在 Vect 这样的依赖类型上定义函数,就像在上面 ListNat 这样的简单类型上一样。 Vect 上的函数的类型将描述涉及到的向量的长度会发生什么。例如,下面定义的 ++ 用于链接两个 Vect

(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) Nil       ys = ys
(++) (x :: xs) ys = x :: xs ++ ys

(++) 的类型指出,结果向量的长度将是输入长度的总和。如果我们把定义弄错了,使之不成立,Idris 将不接受这个定义。例如:

(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) Nil       ys = ys
(++) (x :: xs) ys = x :: xs ++ xs -- BROKEN

当通过 Idris 类型检查器运行时,这将导致以下结果:

$ idris2 Vect.idr --check
1/1: Building Vect (Vect.idr)
Vect.idr:7:26--8:1:While processing right hand side of Main.++ at Vect.idr:7:1--8:1:
When unifying plus k k and plus k m
Mismatch between:
        k
and
        m

这个错误信息表明,两个向量之间存在长度不匹配 – 我们需要一个长度为 k + m 的向量,但提供了一个长度为 k + k 的向量。

有限集

有限集,顾名思义,是具有有限数量元素的集合。它作为Idris库的一部分,可以通过 Data.Fin 导入,或者可以按以下方式声明:

data Fin : Nat -> Type where
   FZ : Fin (S k)
   FS : Fin k -> Fin (S k)

从签名中,我们可以看到这是一个类型构造函数,它接收一个 Nat ,并产生一个类型。所以,这不是一个表示对象的容器的集合,相反,它是未命名元素的典型集合,例如,”5个元素的集合”。实际上,它是一个捕捉零到 (n - 1) 范围内的整数的类型,其中 n 是用来实例化 Fin 类型的参数。例如, Fin 5 可以被认为是0到4之间的整数的类型。

让我们更详细地看看这些构造函数。

FZ 是具有 S k 个元素的有限集的第零个元素; FS n 是具有 S k 元素的有限集的第 n+1 个元素。 FinNat 索引,它表示集合中元素的数量。因为我们不能构造一个空集的元素,因此也就无法构造出 Fin Z

如上所述, Fin 家族的一个有用的应用是表示有界自然数。由于第一个 n 自然数构成了一个由 n 个元素组成的有限集合,我们可以将 Fin n 作为大于或等于零且小于 n 的整数集合。

例如,下面这个函数通过给定一个有界的索引 Fin n 来查找 Vect 中的元素。在 prelude 中定义如下:

index : Fin n -> Vect n a -> a
index FZ     (x :: xs) = x
index (FS k) (x :: xs) = index k xs

这个函数在一个向量的指定位置查找一个值。该位置以向量的长度为界(每种情况下都是 n),所以不需要进行运行时的边界检查。类型检查器保证该位置不大于向量的长度,当然也不小于零。

还要注意,这里没有 Nil 的情况。这是因为这是不可能的。因为没有类型为 Fin Z 且位置是 Fin n 的元素,那么 n 不可能是 Z 。因此,试图在一个空向量中查找一个元素,会在编译时产生一个类型错误,因为它将迫使 n 成为 Z

隐式参数

让我们仔细看看 index 的类型。

index : Fin n -> Vect n a -> a

它需要两个参数,一个是 n 个元素的有限集,一个是 n 个元素的向量,类型是 a 。但是还有两个名字, na ,这两个名字没有被明确声明。 index 使用了 隐式 参数 。我们也可以把 index 的类型写成:

index : forall a, n . Fin n -> Vect n a -> a

隐式参数是用``forall``声明的,在 index 的应用中没有给出;它们的值可以从 Fin nVect n a 参数的类型中推测出来。在类型声明中作为参数或索引出现的任何以小写字母开头的名称,如果没有应用于任何参数, 总是 会自动被绑定为隐式参数;这就是为什么数据类型名称不能以小写字母开头。隐式参数仍然可以在应用程序中明确给出,例如,使用 {a=value}{n=value}

index {a=Int} {n=2} FZ (2 :: 3 :: Nil)

事实上,任何参数,不管是隐式还是显式,都可以被赋予一个名字。我们可以将 index 的类型声明为:

index : (i : Fin n) -> (xs : Vect n a) -> a

你是否要这样做是一个品味问题–有时它可以帮助记录一个函数,使参数的目的更加明确。

隐式参数的名字在函数的主体中是有作用域的,尽管它们在运行时不能使用。关于隐式参数还有很多要说的–我们将在 多重性 一节中讨论在运行时也可以使用的问题,以及其他事项

注:声明顺序和 mutual

一般来说,函数和数据类型必须在使用前定义,因为依赖类型允许函数作为类型的一部分出现,而类型检查可以依赖于特定函数的定义方式(尽管这只适用于完全函数;见 Totality Checking)。然而,可以通过使用 mutual 块来放宽这个限制,它允许数据类型和函数同时被定义:

mutual
  even : Nat -> Bool
  even Z = True
  even (S k) = odd k

  odd : Nat -> Bool
  odd Z = False
  odd (S k) = even k

mutual 块中,首先添加所有的类型声明,然后是函数体。因此,任何一个函数类型都不会依赖于块中其它函数的递归行为。

前向声明可以让你对相互定义的概念的声明顺序有更精细的控制。如果你需要在相互定义的函数的类型中提到一个数据类型的构造函数,或者需要依靠相互定义的函数的行为来进行类型检查,这就很有用。

data V : Type
T : V -> Type

data V : Type where
  N : V
  Pi : (a : V) -> (b : T a -> V) -> V

T N = Nat
T (Pi a b) = (x : T a) -> T (b x)
data Even : Nat -> Type
data Odd  : Nat -> Type

data Even : Nat -> Type where
  ZIsEven : Even Z
  SOddIsEven : Odd n -> Even (S k)

data Odd : Nat -> Type where
  SEvenIsOdd : Even n -> Odd (S k)
even : Nat -> Bool
odd  : Nat -> Bool

-- or just ``even, odd : Nat -> Bool``

even    Z  = True
even (S k) = odd k

odd    Z  = False
odd (S k) = even k

将签名声明放在前面可以建议 Idris 检 测他们相应的相互定义。

I/O

如果计算机程序不以某种方式与用户或系统互动,那么它们就没有什么用处。像 Idris 这样的纯语言 – 即表达式没有副作用的语言 – 的困难在于 I/O 本质上是有副作用的。因此, Idris 提供了一个参数化的类型 IO描述 运行时系统在执行一个函数时将执行的交互作用:

data IO a -- description of an IO operation returning a value of type a

我们先给出 IO 的抽象化定义,但实际上它描述了要执行的 I/O 操作是什么,而不是如何执行它们。由此产生的操作是在外部由运行时系统执行的。我们已经看到了一个I/O程序:

main : IO ()
main = putStrLn "Hello world"

putStrLn 的类型说明它接收一个字符串,并返回一个 I/O 动作,产生一个单元类型的元素 () 。另外它有一个变体 putStr ,它描述了一个没有换行的字符串的输出:

putStrLn : String -> IO ()
putStr   : String -> IO ()

我们还可以从用户输入中读取字符串:

getLine : IO String

还有一些其他的 I/O 操作可用。例如,通过在你的程序中添加 import System.File ,你可以获得读写文件的函数,包括:

data File -- abstract
data Mode = Read | Write | ReadWrite

openFile : (f : String) -> (m : Mode) -> IO (Either FileError File)
closeFile : File -> IO ()

fGetLine : (h : File) -> IO (Either FileError String)
fPutStr : (h : File) -> (str : String) -> IO (Either FileError ())
fEOF : File -> IO Bool

请注意,其中几个会返回 Either ,因为它们可能会失败。

do ” 记法

I/O 程序通常需要对行动进行排序,将一个计算的输出输入到下一个计算的输入中。然而, IO 是一个抽象类型,所以我们不能直接访问一个计算的结果。相反,我们用 do 记法来排列操作:

greet : IO ()
greet = do putStr "What is your name? "
           name <- getLine
           putStrLn ("Hello " ++ name)

语法 x <- iovalue 执行I/O操作 iovalue ,类型为 IO a ,并将类型为 a 的结果放入变量 x 。在这种情况下, getLine 返回一个 IO String ,所以 name 具有类型 String 。缩进很重要 – do 块中的每个语句必须在同一列开始。 pure 操作允许我们将一个值直接注入到一个 IO 操作中:

pure : a -> IO a

我们将在后面看到, do 符号比这里展示的更加通用,而且可以重载。

你可以尝试在 Idris 2 REPL 执行 greet ,运行命令 :exec greet

惰性

通常情况下,函数的参数在函数本身之前被求值(也就是说,Idris使用 及早 求值策略)。然而,这并不总是最好的方法。考虑一下下面的函数:

ifThenElse : Bool -> a -> a -> a
ifThenElse True  t e = t
ifThenElse False t e = e

这个函数会使用 te 参数中的一个,而不是两个都用。我们希望 只有 被使用的参数被求值。为了实现这一点,Idris 提供了一个 Lazy 原语,它允许暂缓求值。它是一个原语,但在概念上我们可以把它看成是这样:

data Lazy : Type -> Type where
     Delay : (val : a) -> Lazy a

Force : Lazy a -> a

一个 Lazy a 类型的值是不被求值的,直到它被 Force 强迫。Idris 类型检查器知道 Lazy 类型,并在必要时插入 Lazy aa 之间的转换,反之亦然。因此,我们可以这样写 ifThenElse ,而不需要明确使用 ForceDelay

ifThenElse : Bool -> Lazy a -> Lazy a -> a
ifThenElse True  t e = t
ifThenElse False t e = e

无限数据类型

我们可以通过余数据类型(codata),将递归参数标记为潜在无穷来定义无限的数据结构。余数据类型的一个例子是Stream,它的定义如下。

data Stream : Type -> Type where
  (::) : (e : a) -> Inf (Stream a) -> Stream a

下面是一个例子,说明余数数据类型 Stream 可以用来形成一个无限的数据结构。在这种情况下,我们正在创建一个无限的 1 的流。

ones : Stream Nat
ones = 1 :: ones

有用的数据类型

Idris包括一些有用的数据类型和库函数(见发行版中的 libs/ 目录,以及`文档 <https://www.idris-lang.org/pages/documentation.html>`_ )。本节描述了其中一些,以及如何导入它们。

ListVect

我们已经看到过 ListVect 数据类型:

data List a = Nil | (::) a (List a)

data Vect : Nat -> Type -> Type where
   Nil  : Vect Z a
   (::) : a -> Vect k a -> Vect (S k) a

你可以通过 import Data.Vect 获得对 Vect 的访问。请注意,List 和 Vect 每个构造函数的名字都是一样的 – 构造函数的名字(事实上,一般的名字)可以被重载,只要它们被声明在不同的命名空间(见章节 模块和命名空间 ),并且通常会根据它们的类型来解析。作为语法糖,任何被命名为 Nil:: 的实现都可以写成列表形式。例如:

  • [] 意味着 Nil

  • [1,2,3] 意味着 1 :: 2 :: 3 :: Nil

同样,任何名称为 Lin:< 的实现都可以写成 snoc-list 形式:

  • [<] 意味着 Lin

  • [< 1, 2, 3] 意味着 Lin :< 1 :< 2 :< 3

prelude 包括一个预定义的 snoc-lists 的数据类型:

data SnocList a = Lin | (:<) (SnocList a) a

该库还定义了一些用于操作这些类型的函数。 mapListVect 都是重载的(我们将在后面的 接口 章节中讨论接口时看到更多精确的细节),并对列表或向量的每个元素应用一个函数。

map : (a -> b) -> List a -> List b
map f []        = []
map f (x :: xs) = f x :: map f xs

map : (a -> b) -> Vect n a -> Vect n b
map f []        = []
map f (x :: xs) = f x :: map f xs

例如,给定以下的整数向量,和一个将整数加倍的函数:

intVec : Vect 5 Int
intVec = [1, 2, 3, 4, 5]

double : Int -> Int
double x = x * 2

函数 map 可用于将向量中的每个元素翻倍:

*UsefulTypes> show (map double intVec)
"[2, 4, 6, 8, 10]" : String

关于 ListVect 上的函数的更多细节,请查阅库文件:

  • libs/base/Data/List.idr

  • libs/base/Data/Vect.idr

函数包括过滤、追加、反转等。

题外话:匿名函数和操作符段

有更多的方法来写上述表达式。其中一种方法是使用匿名函数:

*UsefulTypes> show (map (\x => x * 2) intVec)
"[2, 4, 6, 8, 10]" : String

符号 \x => val 构建了一个匿名函数,它接受一个参数 x 并返回表达式 val 。匿名函数可以接受多个参数,用逗号分隔,例如: \x, y, z => val 。参数也可以被赋予明确的类型,例如: \x : Int => x * 2 ,并且可以模式匹配,例如: \(x, y) => x + y 。另外我们也可以使用一个操作符段:

*UsefulTypes> show (map (* 2) intVec)
"[2, 4, 6, 8, 10]" : String

(*2) 是一个将一个数字乘以2的函数的缩写。它可以被扩展为 \x => x * 2 。类似地, (2*) 将被扩展为 \x => 2 * x

Maybe

Maybe 被定义在 Prelude 中,描述了一个可选的值。要么有一个给定类型的值,要么没有:

data Maybe a = Just a | Nothing

Maybe 是给操作提供类型的一种方式,可能会失败。例如,在 List (而不是一个向量)中查找东西可能会导致越界错误:

list_lookup : Nat -> List a -> Maybe a
list_lookup _     Nil         = Nothing
list_lookup Z     (x :: xs) = Just x
list_lookup (S k) (x :: xs) = list_lookup k xs

maybe 函数用于处理 Maybe 类型的值,如果有值可以对该值应用一个函数,或者提供一个默认值后再应用函数:

maybe : Lazy b -> Lazy (a -> b) -> Maybe a -> b

注意,前两个参数的类型被包裹在 Lazy 中。由于这两个参数中只有一个会被实际使用,我们把它们标记为 Lazy ,以防它们是复杂的表达式,计算后再丢弃它们会很浪费。

元组

值可以用以下内置数据类型配对:

data Pair a b = MkPair a b

作为语法糖,我们可以写 (a, b) ,根据上下文,这意味着 Pair a bMkPair a b 。元组可以包含任意数量的值,以嵌套对的形式表示:

fred : (String, Int)
fred = ("Fred", 42)

jim : (String, Int, String)
jim = ("Jim", 25, "Cambridge")
*UsefulTypes> fst jim
"Jim" : String
*UsefulTypes> snd jim
(25, "Cambridge") : (Int, String)
*UsefulTypes> jim == ("Jim", (25, "Cambridge"))
True : Bool
依赖对

依赖对允许一个对中的第二个元素的类型取决于第一个元素的值:

data DPair : (a : Type) -> (p : a -> Type) -> Type where
   MkDPair : {p : a -> Type} -> (x : a) -> p x -> DPair a p

同样,这也有语法上的糖。 (x : a ** p) 是一对 A 和 P 的类型,其中名称 x 可以出现在 p 里面。 ( x ** p ) 构建一个该类型的值。例如,我们可以将一个数字与一个特定长度的 Vect 配对:

vec : (n : Nat ** Vect n Int)
vec = (2 ** [3, 4])

如果你愿意,你可以用长的方式写出来;两者是等同的:

vec : DPair Nat (\n => Vect n Int)
vec = MkDPair 2 [3, 4]

类型检查器可以从向量的长度推断出第一个元素的值。我们可以写一个下划线``_``来代替我们期望类型检查器填写的值,所以上述定义也可以写成:

vec : (n : Nat ** Vect n Int)
vec = (_ ** [3, 4])

我们也可能倾向于省略这对元素中第一个元素的类型,因为它同样可以被推断出来:

vec : (n ** Vect n Int)
vec = (_ ** [3, 4])

依赖对的一个用途是返回依赖类型的值,其中的索引不一定事先知道。例如,如果我们根据一些谓词从 Vect 中过滤出元素,我们将不会事先知道所产生的向量的长度:

filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)

如果 Vect 是空的,结果就是:

filter p Nil = (_ ** [])

:: 的情况下,我们需要检查对 filter 的递归调用的结果,从结果中提取长度和矢量。要做到这一点,我们使用 case 表达式,它允许对中间值进行模式匹配:

filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)
filter p Nil = (_ ** [])
filter p (x :: xs)
    = case filter p xs of
           (_ ** xs') => if p x then (_ ** x :: xs')
                                else (_ ** xs')

依赖对有时被称为 “Sigma 类型”。

记录

记录 是将几个值(记录的*字段* )收集在一起的数据类型。Idris 提供了定义记录的语法,并自动生成字段访问和更新函数。与用于数据结构的语法不同,Idris 中的记录遵循一种与 Haskell 不同的语法。例如,我们可以在一个记录中表示一个人的名字和年龄:

record Person where
    constructor MkPerson
    firstName, middleName, lastName : String
    age : Int

fred : Person
fred = MkPerson "Fred" "Joe" "Bloggs" 30

使用 constructor 关键字提供构造函数名称,然后给出*字段* ,这些字段在 where 关键字之后的缩进块中(这里是 firstNamemiddleNamelastName ,和 age )。你可以在一行中声明多个字段,只要它们具有相同的类型。字段名可以用来访问字段的值:

*Record> fred.firstName
"Fred" : String
*Record> fred.age
30 : Int
*Record> :t (.firstName)
Main.Person.(.firstName) : Person -> String

我们可以使用前缀字段投影,就像在Haskell中一样:

*Record> firstName fred
"Fred" : String
*Record> age fred
30 : Int
*Record> :t firstName
firstName : Person -> String

可以使用pragma %prefix_record_projections off 在每条记录的定义中禁用前缀字段投影,这使得所有随后定义的记录只产生点状的投影。这个 pragma 在模块结束前或在最近一次出现 %prefix_record_projections on 之前都是有效的。

我们还可以使用字段名来更新一条记录(或者更准确地说,产生一个更新了给定字段的记录副本):

*Record> { firstName := "Jim" } fred
MkPerson "Jim" "Joe" "Bloggs" 30 : Person
*Record> { firstName := "Jim", age $= (+ 1) } fred
MkPerson "Jim" "Joe" "Bloggs" 31 : Person

语法 { field := val, ... } 产生一个函数,更新记录中的给定字段。 := 给一个字段分配一个新的值, $= 应用一个函数来更新它的值。

每条记录都被定义在自己的命名空间中,这意味着字段名可以在多条记录中重复使用。

记录和记录中的字段可以有依赖类型。允许更新改变一个字段的类型,只要其结果是良类型。

record Class where
    constructor ClassInfo
    students : Vect n Person
    className : String

students 字段更新为不同长度的向量是安全的,因为它不会影响记录的类型:

addStudent : Person -> Class -> Class
addStudent p c = { students := p :: students c } c
*Record> addStudent fred (ClassInfo [] "CS")
ClassInfo [MkPerson "Fred" "Joe" "Bloggs" 30] "CS" : Class

我们也可以用 $= 来更简洁地定义 addStudent

addStudent' : Person -> Class -> Class
addStudent' p c = { students $= (p ::) } c
嵌套记录投影

嵌套的记录字段可以使用点符号访问:

x.a.b.c
map (.a.b.c) xs

对于点符号,点后不能有空格,但是点前可以有空格。合成投影必须有括号,否则 map .a.b.c xs 将被理解为 map.a.b.c xs

嵌套的记录字段也可以用前缀符号访问:

(c . b . a) x
map (c . b . a) xs

周围有空格的点代表函数组合运算符。

嵌套记录更新

Idris 还提供了一个方便的语法来访问和更新嵌套记录。例如,如果一个字段可以用表达式 x.a.b.c 来访问,它可以用以下语法来更新:

{ a.b.c := val } x

这将返回一个新的记录,由路径 a.b.c 访问的字段被设置为 val 。语法也是一等的,即 { a.b.c := val } 本身有一个函数类型。

$= 符号对嵌套的记录更新也有效。

依赖记录

记录也可以依赖于数值。记录有 参数 ,这些参数不能像其他字段一样被更新。参数作为结果类型的参数出现,并写在记录类型名称的后面。例如,一个对类型可以定义如下:

record Prod a b where
    constructor Times
    fst : a
    snd : b

使用前面的 Class 记录,可以用 Vect 来限制类的大小,并通过对记录的大小进行参数化,将大小纳入类型。 例如:

record SizedClass (size : Nat) where
    constructor SizedClassInfo
    students : Vect size Person
    className : String

在前面 addStudent 的情况下,我们仍然可以在 SizedClass 上添加一个学生,因为大小是隐含的,当添加一个学生的时候大小会被更新:

addStudent : Person -> SizedClass n -> SizedClass (S n)
addStudent p c = { students := p :: students c } c

事实上,我们刚才看到的依赖对类型在实践中被定义为一条记录,其字段 fstsnd 允许从依赖对中投影出数值:

record DPair a (p : a -> Type) where
  constructor MkDPair
  fst : a
  snd : p fst

可以使用记录更新语法来更新依赖字段,前提是所有相关字段都要一次性更新。例如:

cons : t -> (x : Nat ** Vect x t) -> (x : Nat ** Vect x t)
cons val xs
    = { fst := S (fst xs),
        snd := (val :: snd xs) } xs

甚至可以更省事:

cons' : t -> (x : Nat ** Vect x t) -> (x : Nat ** Vect x t)
cons' val
    = { fst $= S,
        snd $= (val ::) }

更多表达式

let 绑定

计算出的中间值可以使用 let 来绑定到变量:

mirror : List a -> List a
mirror xs = let xs' = reverse xs in
                xs ++ xs'

我们也可以在 let 绑定中进行模式匹配。例如,我们可以从记录中提取字段,如下所示,也可以通过在顶层进行模式匹配:

data Person = MkPerson String Int

showPerson : Person -> String
showPerson p = let MkPerson name age = p in
                   name ++ " is " ++ show age ++ " years old"

这些 let 绑定可以使用类型注解:

mirror : List a -> List a
mirror xs = let xs' : List a = reverse xs in
                xs ++ xs'

我们还可以使用符号 := 来代替 = ,除其他事项外,避免命题相等的歧义:

Diag : a -> Type
Diag v = let ty : Type := v = v in ty

本地定义也可以使用 let 引入。就像顶层定义和在 where 子句中定义的一样,你需要:

  1. 声明函数和它的类型

  2. 通过模式匹配来定义函数

foldMap : Monoid m => (a -> m) -> Vect n a -> m
foldMap f = let fo : m -> a -> m
                fo ac el = ac <+> f el
             in foldl fo neutral

符号 := 不能在局部函数定义中使用。这意味着它可以用来交错使用 let 绑定和局部定义,而不会引入歧义。

foldMap : Monoid m => (a -> m) -> Vect n a -> m
foldMap f = let fo : m -> a -> m
                fo ac el = ac <+> f el
                initial := neutral
                 --     ^ this indicates that `initial` is a separate binding,
                 -- not relevant to definition of `fo`
             in foldl fo initial
列表推导式

Idris提 供了 推导式 符号,作为建立列表的方便速记法。其一般形式是:

[ expression | qualifiers ]

通过对 expression 进行求值,根据逗号分隔的 qualifiers 给出的条件生成一个符合条件的列表。例如,我们可以建立一个毕达哥拉斯三段论的列表,如下所示:

pythag : Int -> List (Int, Int, Int)
pythag n = [ (x, y, z) | z <- [1..n], y <- [1..z], x <- [1..y],
                         x*x + y*y == z*z ]

[a..b] 符号是另一种速记方法,它在 ab 之间建立一个数字列表。或者 [a,b..c]ac 之间建立一个数字列表,增量由 ab 之间的差异指定。这适用于 Nat, IntInteger 类型,是 prelude 中的 enumFromToenumFromThenTo 函数的语法糖。

case 表达式

另一种检查中间值的方法是使用 case 表达式。例如,下面的函数在一个给定的字符处将一个字符串分成两个:

splitAt : Char -> String -> (String, String)
splitAt c x = case break (== c) x of
                  (x, y) => (x, strTail y)

break 是一个库函数,它在给定函数返回真值的地方将一个字符串分解成一对子字符串。然后我们对它返回的一对子字符串进行解构,并删除第二个子字符串的第一个字符。

一个 case 表达式可以匹配多种情况,例如,检查一个中间值的类型 Maybe a 。回顾 list_lookup 函数,它在一个列表中查找一个索引,如果索引出界则返回 Nothing 。我们可以用它来写 lookup_default ,它查找一个索引,如果索引出界则返回一个默认值:

lookup_default : Nat -> List a -> a -> a
lookup_default i xs def = case list_lookup i xs of
                              Nothing => def
                              Just x => x

如果索引在范围内,我们得到该索引的值,否则我们得到一个默认值:

*UsefulTypes> lookup_default 2 [3,4,5,6] (-1)
5 : Integer
*UsefulTypes> lookup_default 4 [3,4,5,6] (-1)
-1 : Integer

完全性

Idris 区分了 完全部分 函数。完全函数是一个这样的函数,它要么:

  • 对所有可能的输入终止,或者

  • 产生一个非空的、有限的、或者一个可能是无限结果的前缀

如果一个函数是完全的,我们可以认为其类型是对该函数将做什么的精确描述。例如,如果我们有一个返回类型为 String 的函数,我们知道一些不同的东西,这取决于它是否是完全的:

  • 如果是完全的,它将在有限时间内返回一个类型为 String 的值:

  • 如果是部分的,那么只要不崩溃或进入无限循环,就会返回一个 String

Idris 做了这个区分,所以它知道哪些函数在类型检查时是安全的(正如我们在 一等类型 中看到的)。毕竟,如果它试图在类型检查期间求值一个没有终止的函数,那么类型检查就不会终止!因此,在类型检查期间,只有完全函数会被求值。部分函数仍然可以在类型中使用,但不会被进一步求值。

接口

我们经常希望定义的函数能在几种不同的数据类型中工作。例如,我们希望算术运算符至少能在 Int, IntegerDouble 上工作。我们希望 == 能在大多数数据类型上工作。我们希望能够以一种统一的方式显示不同的类型。

为了实现这一点,我们使用 接口 ,它类似于 Haskell 中的类型类或 Rust 中的 traits 。为了定义一个接口,我们提供一个可重载函数的集合。一个简单的例子是 Show 接口,它被定义在 prelude 中,提供了一个将数值转换为 String 的接口:

interface Show a where
    show : a -> String

生成一个如下类型的函数(我们称之为 Show 接口的 方法 ):

show : Show a => a -> String

我们可以把它理解为:“ 在 a 实现 Show 的约束下,该函数接受一个输入 a 并返回一个 String ”。我们可以通过为它定义接口的方法来实现该接口。例如, NatShow 实现可以定义为:

Show Nat where
    show Z = "Z"
    show (S k) = "s" ++ show k
Main> show (S (S (S Z)))
"sssZ" : String

一个类型对于同一个接口只能有一种实现——实现不得重合。实现声明本身可以有约束。为了帮助解决这个问题,实现的参数必须是构造函数(数据或类型构造函数)或变量(也就是说,你无法为函数赋予实现)。例如,为向量定义一个 Show 的实现,我们需要知道有一个 Show 的实现用于元素类型,因为我们要用它把每个元素转换为 String

Show a => Show (Vect n a) where
    show xs = "[" ++ show' xs ++ "]" where
        show' : forall n . Vect n a -> String
        show' Nil        = ""
        show' (x :: Nil) = show x
        show' (x :: xs)  = show x ++ ", " ++ show' xs

请注意,我们需要在 show' 函数中明确 forall n . ,因为 n 已经在作用域内,并且固定为顶层的 n 的值。

默认定义

Prelude 定义了一个 Eq 接口,它提供了比较值的相等或不相等的方法,并为所有的内置类型提供了实现:

interface Eq a where
    (==) : a -> a -> Bool
    (/=) : a -> a -> Bool

要为类型实现一个接口,我们必须给出所有方法的定义。例如, Nat 类型的 Eq 接口实现:

Eq Nat where
    Z     == Z     = True
    (S x) == (S y) = x == y
    Z     == (S y) = False
    (S x) == Z     = False

    x /= y = not (x == y)

很难想象在很多情况下, /= 方法除了是应用 == 方法的结果的否定之外,还会是什么。因此,在接口声明中为每个方法给出一个默认的定义是很方便的,默认定义可以调用其它方法:

interface Eq a where
    (==) : a -> a -> Bool
    (/=) : a -> a -> Bool

    x /= y = not (x == y)
    x == y = not (x /= y)

Eq 的最小完整实现需要定义 ==/= ,但不需要同时定义。如果缺少一个方法的定义,并且有一个默认的定义,那么就用默认的定义来代替。

扩展接口

接口也可以被扩展。相等关系 Eq 的下一个逻辑步骤是定义一个排序关系 Ord 。我们可以定义一个 Ord 接口,它继承了 Eq 的方法,同时也定义了一些自己的方法:

data Ordering = LT | EQ | GT
interface Eq a => Ord a where
    compare : a -> a -> Ordering

    (<) : a -> a -> Bool
    (>) : a -> a -> Bool
    (<=) : a -> a -> Bool
    (>=) : a -> a -> Bool
    max : a -> a -> a
    min : a -> a -> a

Ord 接口允许我们比较两个值并确定它们的顺序。只有 compare 方法是必需的;其他每个方法都有一个默认的定义。利用这一点,我们可以写一些函数,比如 sort ,这个函数可以将一个列表按递增顺序排序,前提是列表的元素类型在 Ord 接口中。我们在胖箭头 => 的左边给出类型变量的约束,在胖箭头的右边给出函数类型:

sort : Ord a => List a -> List a

函数、接口和实现可以有多个约束。多个约束条件以逗号分隔的列表方式写在括号里,例如:

sortAndShow : (Ord a, Show a) => List a -> String
sortAndShow xs = show (sort xs)

约束和类型一样,是语言中的一等对象。你可以在 REPL 中看到这一点:

Main> :t Ord
Prelude.Ord : Type -> Type

所以, (Ord a, Show a) 是一对普通的 Types ,将两个约束作为该对的第一个和第二个元素。

注:接口和 mutual

Idris是严格的 “先定义后使用”,除了在 mutual 块中。在 mutual 块中,Idris 分两遍进行扫描:第一遍是类型,第二遍是定义。当 mutual 块包含一个接口声明时,它在第一遍中扫描接口头,但没有方法类型,在第二遍扫描方法类型和所有的默认定义。

参数的量

默认情况下,在 interface 声明中没有明确赋予类型的参数被分配为数量 0 。这意味着该参数在运行时不能在方法的定义中使用。

例如, Show ashow 方法的类型中产生了一个数量为 0 的类型变量 a

Main> :set showimplicits
Main> :t show
Prelude.show : {0 a : Type} -> Show a => a -> String

然而有些用例要求一些参数在运行时可用。例如,我们可能想为 Storable 类型声明一个接口。约束 Storable a size 意味着我们可以将 a 类型的值存储在一个 Buffer 中,正好是 size 字节。

如果用户提供一个方法来在通过给定一个偏移量读取类型 a 的值,那么我们可以通过计算 ksize 的适当偏移量来读取存储在缓冲区中的 k 的元素。这可以通过为 peekElementOff 方法提供一个默认的实现来证明,该方法通过 peekByteOff 和参数 size 来实现。

data ForeignPtr : Type -> Type where
  MkFP : Buffer -> ForeignPtr a

interface Storable (0 a : Type) (size : Nat) | a where
  peekByteOff : HasIO io => ForeignPtr a -> Int -> io a

  peekElemOff : HasIO io => ForeignPtr a -> Int -> io a
  peekElemOff fp k = peekByteOff fp (k * cast size)

请注意, a 被明确标记为运行时不相关,所以它被编译器删除了。相当于我们可以写成 interface Storable a (size : Nat)| a 的含义在 确定参数 中有解释。

函子与应用子

到目前为止,我们看到的都是单参数接口,其中参数的类型是 Type 。一般来说,可以有任何数量的参数(甚至是零个),而且参数可以有 任何 类型。如果参数的类型不是 Type ,我们需要给出一个明确的类型声明。例如, Functor 接口在 prelude 中是这样定义的:

interface Functor (0 f : Type -> Type) where
    map : (m : a -> b) -> f a -> f b

函子允许在结构中应用一个函数,例如,将一个函数应用于 List 中的每个元素:

Functor List where
  map f []      = []
  map f (x::xs) = f x :: map f xs
Idris> map (*2) [1..10]
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20] : List Integer

在定义了 Functor 之后,我们可以定义 Applicative ,它抽象了函数应用的概念:

infixl 2 <*>

interface Functor f => Applicative (0 f : Type -> Type) where
    pure  : a -> f a
    (<*>) : f (a -> b) -> f a -> f b

单子和 do- 记法

Monad 接口允许我们对绑定和计算进行封装,它是 “ do ” 记法 一节中 do 记法的基础 。它扩展了上面定义的 Applicative ,并有如下定义:

interface Applicative m => Monad (m : Type -> Type) where
    (>>=)  : m a -> (a -> m b) -> m b

还有一个不进行绑定操作的运算符, Monad 将其定义为:

v >> e = v >>= \_ => e

do 块内,应用以下语法转换:

  • x <- v; e 变成 v >>= (\x => e)

  • v; e 变成 v >> e

  • let x = v; e 变成 let x = v in e

IO 有一个 Monad 的实现,是使用原语函数定义。我们也可以为 Maybe 定义一个实现,如下所示:

Monad Maybe where
    Nothing  >>= k = Nothing
    (Just x) >>= k = k x

利用这一点,我们可以做更多的事情,例如,定义用于对 Maybe Int 进行加法操作的函数,使用单子来封装错误处理:

m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = do x' <- x -- Extract value from x
               y' <- y -- Extract value from y
               pure (x' + y') -- Add them

如果两个值都是有值的,这个函数将从 xy 中提取数值,或者如果一个或两个都不是(”快速失败”),则返回 Nothing 。管理 Nothing 的情况是由 >>= 操作符实现的,被 do 符号所隐藏。

Main> m_add (Just 82) (Just 22)
Just 94
Main> m_add (Just 82) Nothing
Nothing

do 符号的翻译完全是句法性的,所以没有必要将 (>>=)(>>) 操作符作为 Monad 接口中定义的操作符。一般来说,Idris 会尝试区分你所指的运算符的类型,但你可以用限定的 do 符号明确选择,例如:

m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = Prelude.do
               x' <- x -- Extract value from x
               y' <- y -- Extract value from y
               pure (x' + y') -- Add them

Prelude.do 意味着 Idris 将使用在 Prelude 中定义的 (>>=)(>>)

模式匹配绑定

do 记法中,有时我们想在一个函数的结果上立即进行模式匹配,例如,假设我们有一个函数 readNumber 从控制台读取一个数字,如果该数字有效,则返回一个形式为 Just x 的值,否则为 Nothing

import Data.String

readNumber : IO (Maybe Nat)
readNumber = do
  input <- getLine
  if all isDigit (unpack input)
     then pure (Just (stringToNatOrZ input))
     else pure Nothing

如果我们用它来写一个函数来读取两个数字,如果两个数字都无效,则返回 Nothing ,然后我们想对 readNumber 的结果进行模式匹配:

readNumbers : IO (Maybe (Nat, Nat))
readNumbers =
  do x <- readNumber
     case x of
          Nothing => pure Nothing
          Just x_ok => do y <- readNumber
                          case y of
                               Nothing => pure Nothing
                               Just y_ok => pure (Just (x_ok, y_ok))

如果有大量的错误处理,这可能很快就会被深度嵌套!所以我们可以在一行中结合绑定和模式匹配。例如,我们可以尝试对形式为 Just x_ok 的值进行模式匹配:

readNumbers : IO (Maybe (Nat, Nat))
readNumbers
  = do Just x_ok <- readNumber
       Just y_ok <- readNumber
       pure (Just (x_ok, y_ok))

然而,仍然有一个问题,因为我们现在省略了 Nothing 的情况,所以 readNumbers 不再是完全函数!我们可以把 Nothing 的情况加回来,如下所示:

readNumbers : IO (Maybe (Nat, Nat))
readNumbers
  = do Just x_ok <- readNumber
            | Nothing => pure Nothing
       Just y_ok <- readNumber
            | Nothing => pure Nothing
       pure (Just (x_ok, y_ok))

这个版本的 readNumbers 的效果与第一个版本相同(事实上,这是它的句法糖,会直接翻译成第一个版本的形式)。每个语句的第一部分( Just x_ok <-Just y_ok <- )给出了首选的绑定方式–如果匹配,将继续执行 do 块的其余部分。第二部分给出了备选的绑定方式,其中可能有多个绑定方式。

!-记法

在许多情况下,使用 do- 记法会使程序变得不必要的冗长,特别是在上面 m_add 的情况下,值被绑定后立即使用且只用一次。在这些情况下,我们可以使用一个速记版本,如下所示:

m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = pure (!x + !y)

符号 !expr 表示表达式 expr 应该被求值,然后被隐含地绑定。从概念上讲,我们可以把 ! 看作是一个前缀函数,其类型如下:

(!) : m a -> a

然而,请注意,它并不是一个真正的函数,只是语法而已。一个子表达式 !expr 将在其当前作用域内尽可能地提升 expr ,将其绑定到一个新的名称 x ,并将 !expr 替换为 x 。表达式从左到右,从深度开始提升。在实践中, ! - notation 允许我们以更直接的方式进行编程,同时仍然提供一个符号线索,说明哪些表达式是单子。

例如,表达式:

let y = 94 in f !(g !(print y) !x)

被提升为:

let y = 94 in do y' <- print y
                 x' <- x
                 g' <- g y' x'
                 f g'
单子推导式

我们在 更多表达式 一节中看到的列表推导式符号更为通用,它适用于任何实现了 MonadAlternative 的数据类型:

interface Applicative f => Alternative (0 f : Type -> Type) where
    empty : f a
    (<|>) : f a -> f a -> f a

一般来说,推导式的形式是: [ exp | qual1, qual2, …, qualn ] 其中 quali 可以是下列之一:

  • 生成器 x <- e

  • 一个 守卫 ,它是一个类型为 Bool 的表达式

  • let 绑定 let x = e

翻译一个推导式 [exp | qual1, qual2, ..., qualn] ,首先使用以下函数将任何作为 guard 的限定符 qual 转换为 guard qual

guard : Alternative f => Bool -> f ()

然后将推导式转换为 do 记法:

do { qual1; qual2; ...; qualn; pure exp; }

使用单子推导式, m_add 的另一个定义是:

m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = [ x' + y' | x' <- x, y' <- y ]

接口和IO

一般来说, IO 库中的操作不是直接使用 IO 编写的,而是通过 HasIO 接口编写的:

interface Monad io => HasIO io where
  liftIO : (1 _ : IO a) -> io a

HasIO 的解释,通过 liftIO 解释了如何将一个原语 IO 操作转换为某个底层类型的操作,只要该类型有一个 Monad 实现。 这些接口允许程序员定义一些更具表现力的交互式程序的概念,同时仍然可以直接访问 IO 原语。

习语括号

虽然 do 记法给序列另一种含义,但习语给了 应用子 另一种含义。本节中的符号和较大的例子是受 Conor McBride 和 Ross Paterson 的论文 “Applicative Programming with Effects ” 的启发 1

首先,让我们重新审视上面的 m_add 。它所做的实际上是对从 Maybe Int 中提取的两个值应用一个运算符。我们可以把这个应用子:

m_app : Maybe (a -> b) -> Maybe a -> Maybe b
m_app (Just f) (Just a) = Just (f a)
m_app _        _        = Nothing

利用这一点,我们可以写一个替代性的 m_add ,它使用这个替代性的函数应用概念,并明确调用 m_app

m_add' : Maybe Int -> Maybe Int -> Maybe Int
m_add' x y = m_app (m_app (Just (+)) x) y

我们不必在有应用子的地方插入 m_app ,而是可以使用习语括号来为我们完成这项工作。要做到这一点,我们可以让 Maybe 实现 Applicative ,如下所示,其中 <*> 的定义与上面 m_app 相同(这是在 Idris 库中定义的):

Applicative Maybe where
    pure = Just

    (Just f) <*> (Just a) = Just (f a)
    _        <*> _        = Nothing

Using <*> we can use this implementation as follows, where a function application [| f a1 an |] is translated into pure f <*> a1 <*> <*> an:

m_add' : Maybe Int -> Maybe Int -> Maybe Int
m_add' x y = [| x + y |]
一个错误处理解释器

在定义求值器时,习语括号通常是有用的。McBride 和 Paterson 描述了这样一个求值器 1 ,用于类似于以下的语言:

data Expr = Var String      -- variables
          | Val Int         -- values
          | Add Expr Expr   -- addition

求值器将相对于上下文映射变量(表示为 Strings) 到 Int 类型的求值,并可能失败。我们定义了一个数据类型 Eval 来包装一个求值器:

data Eval : Type -> Type where
     MkEval : (List (String, Int) -> Maybe a) -> Eval a

将求值器包裹在一个数据类型中意味着我们以后可以为它提供接口的实现。我们首先定义了一个函数,用于在求值过程中从上下文中获取数值:

fetch : String -> Eval Int
fetch x = MkEval (\e => fetchVal e) where
    fetchVal : List (String, Int) -> Maybe Int
    fetchVal [] = Nothing
    fetchVal ((v, val) :: xs) = if (x == v)
                                  then (Just val)
                                  else (fetchVal xs)

当定义语言的求值器时,我们将在 Eval 的上下文中应用函数,所以很自然地给 Eval 一个 Applicative 的实现。在 Eval 允许有 Applicative 的实现之前, Eval 必须有 Functor 的实现:

Functor Eval where
    map f (MkEval g) = MkEval (\e => map f (g e))

Applicative Eval where
    pure x = MkEval (\e => Just x)

    (<*>) (MkEval f) (MkEval g) = MkEval (\x => app (f x) (g x)) where
        app : Maybe (a -> b) -> Maybe a -> Maybe b
        app (Just fx) (Just gx) = Just (fx gx)
        app _         _         = Nothing

求值一个表达式时可以利用的习语括号来处理错误:

eval : Expr -> Eval Int
eval (Var x)   = fetch x
eval (Val x)   = [| x |]
eval (Add x y) = [| eval x + eval y |]

runEval : List (String, Int) -> Expr -> Maybe Int
runEval env e = case eval e of
    MkEval envFn => envFn env

例如:

InterpE> runEval [("x", 10), ("y",84)] (Add (Var "x") (Var "y"))
Just 94
InterpE> runEval [("x", 10), ("y",84)] (Add (Var "x") (Var "z"))
Nothing

命名实现

对于同一类型的接口,可能需要有多个实现,例如,为排序或打印数值提供替代方法。为了实现这一点,实现可以被 命名 ,如下所示:

[myord] Ord Nat where
   compare Z (S n)     = GT
   compare (S n) Z     = LT
   compare Z Z         = EQ
   compare (S x) (S y) = compare @{myord} x y

这就像平常一样声明了一个实现,但是有一个明确的名字, myord 。语法 compare @{myord}compare 提供了一个明确的实现,否则它将使用 Nat 的默认实现。例如,我们可以用它来对 Nat 的列表进行反向排序。给出以下列表:

testList : List Nat
testList = [3,4,1]

我们可以使用默认的 Ord 实现进行排序,通过使用 sort 函数, import Data.List 后可用,然后我们可以用命名的实现 myord 进行尝试,在 Idris 提示符下输入:

Main> show (sort testList)
"[1, 3, 4]"
Main> show (sort @{myord} testList)
"[4, 3, 1]"

有时,我们还需要访问一个命名的父级实现。例如,prelude 中定义了以``Semigroup`` 接口:

interface Semigroup ty where
  (<+>) : ty -> ty -> ty

然后,它定义了 Monoid ,用一个 “neutral” 值扩展了 Semigroup

interface Semigroup ty => Monoid ty where
  neutral : ty

我们可以为 Nat 定义 SemigroupMonoid 两种不同的实现,一种基于加法,一种基于乘法:

[PlusNatSemi] Semigroup Nat where
  (<+>) x y = x + y

[MultNatSemi] Semigroup Nat where
  (<+>) x y = x * y

加法的中性值是 0 ,但乘法的中性值是 1 。因此,重要的是,当我们定义 Monoid 的实现时,它们会扩展正确的 Semigroup 实现。我们可以通过实现中的 using 子句来做到这一点,具体如下:

[PlusNatMonoid] Monoid Nat using PlusNatSemi where
  neutral = 0

[MultNatMonoid] Monoid Nat using MultNatSemi where
  neutral = 1

using PlusNatSemi 子句表明, PlusNatMonoid 应扩展 自 PlusNatSemi

接口构造器

接口,就像记录一样,可以用一个用户定义的构造函数来声明。

interface A a where
  getA : a

interface A t => B t where
  constructor MkB

  getB : t

然后 MkB : A t => t -> B t

确定参数

当一个接口有一个以上的参数时,如果用来寻找实现的参数受到限制,就会有助于解决。比如说:

interface Monad m => MonadState s (0 m : Type -> Type) | m where
  get : m s
  put : s -> m ()

在这个接口中,只需要知道 m 就可以找到这个接口的实现,然后 s 可以从实现中确定。这是在接口声明之后用 | m 声明的。我们称 mMonadState 接口的 决定性参数 ,因为它是用来寻找实现的参数。这类似于Haskell中 功能依赖 的概念* ` <https://wiki.haskell.org/Functional_dependencies>`_ 。

1(1,2)

Conor McBride and Ross Paterson. 2008. Applicative programming with effects. J. Funct. Program. 18, 1 (January 2008), 1-13. DOI=10.1017/S0956796807006326 https://dx.doi.org/10.1017/S0956796807006326

模块和命名空间

一个 Idris 程序由一个模块的集合组成。每个模块包括一个可选的 module 声明,用来给出模块的名称,一个 import 声明列表,给出要导入的其他模块,以及一个类型、接口和函数的声明和定义的集合。例如,下面的列表给出了一个定义二叉树类型的模块 BTree (在文件 BTree.idr 中):

module BTree

public export
data BTree a = Leaf
             | Node (BTree a) a (BTree a)

export
insert : Ord a => a -> BTree a -> BTree a
insert x Leaf = Node Leaf x Leaf
insert x (Node l v r) = if (x < v) then (Node (insert x l) v r)
                                   else (Node l v (insert x r))

export
toList : BTree a -> List a
toList Leaf = []
toList (Node l v r) = BTree.toList l ++ (v :: BTree.toList r)

export
toTree : Ord a => List a -> BTree a
toTree [] = Leaf
toTree (x :: xs) = insert x (toTree xs)

修饰词 exportpublic export 表示哪些名称对其他命名空间可见。这些将在下面进一步解释。

然后,这就给出了一个主程序(在文件 bmain.idr 中),它使用 BTree 模块对一个列表进行排序:

module Main

import BTree

main : IO ()
main = do let t = toTree [1,8,2,7,9,3]
          print (BTree.toList t)

相同的名字可以被定义在多个模块中:名字可以用模块的名字来 限定 。在 BTree 模块中定义的名字,全限定名如下:

  • BTree.BTree

  • BTree.Leaf

  • BTree.Node

  • BTree.insert

  • BTree.toList

  • BTree.toTree

如果名字没有歧义,就没有必要给出完全限定的名字。名称也可以通过使用 with 关键字给出一个明确的限定,或者根据它们的类型来消除歧义。

with 表达式中的关键字有两种变体:

  • with BTree.insert (insert x empty) 用于单个名称

  • with [BTree.insert, BTree.empty] (insert x empty) 用于多个名称

这对于 do 记法特别有用,它通常可以改善错误消息: with MyModule.(>>=) do ...

尽管一般来说,模块名称和文件名之间没有正式的联系,模块和文件使用相同的名称是明智的。 import 语句指的是文件名,使用点来分隔目录。例如, import foo.bar 将导入文件 foo/bar.idr ,按照惯例,该文件的模块声明是 module foo.bar 。对模块名称的唯一要求是,带有 main 函数的主模块必须被称为 Main —— 尽管其文件名不需要是 Main.idr

导出修饰符

Idris 允许对命名空间内容的可见性进行精细的控制。默认情况下,所有定义在名字空间的名字都是私有的。 这有助于规范一个最小的接口和隐藏内部细节。Idris 允许函数、类型和接口被标记为 private, exportpublic export 。它们的一般含义如下:

  • private 意味着它不会被导出。这是默认设置。

  • export 意味着顶层类型已被导出。

  • public export 意味着整个定义被导出。

修改可见性的另一个限制是,定义不能引用更低层次的可见性中的任何东西。例如, public export 定义不能使用 privateexport 名称,而 export 类型不能使用 private 名称。这是为了防止私有名称泄露到模块的接口中。

用于函数时的含义
  • export 类型被导出

  • public export 类型和定义被导出,定义被导入后可以使用。换句话说,定义本身被认为是模块接口的一部分。 public export 这个长名字是为了让你在做这件事时三思而行。

备注

Idris 中的类型同义词是通过编写函数创建的。设置模块的可见性时,如果要在模块外使用所有类型的同义词,最好将它们设置为 public export 。否则,Idris 将不知道该同义词是谁的同义词。

由于 public export 意味着一个函数的定义被导出,这实际上使函数定义成为模块 API 的一部分。因此,一般来说,除非你真的想导出完整的定义,否则最好不要对函数使用 public export

备注

对于初学者 。如果函数只需要在运行时访问,使用 export 。但是,如果它也要在 编译时使用 (例如,证明一个定理),则使用 public export 。例如,考虑前面讨论的函数 plus : Nat -> Nat -> Nat ,以及下面的定理。 thm : plus Z m = m 。为了证明它,类型检查器需要将 plus Z m 还原为 m (从而得到 thm : m = m )。* 为了实现这一点,它需要访问*的定义 plus ,其中包括方程式 plus Z m = m 。因此,在这种情况下, plus 必须被标记为 public export

数据类型的含义

对于数据类型,其含义是:

  • export 类型构造器被导出

  • public export 类型构造器和数据构造器会被导出

接口上的含义

对于接口,其含义是:

  • export 接口名称被导出

  • public export 接口名称、方法名称和默认定义被导出

传播内部模块的 API

此外,一个模块可以重新输出它所导入的模块,方法是在 public 修改器上使用 import 。例如:

module A

import B
import public C

模块 A 将导出名称 a 以及模块 C 中的任何公共或抽象名称,但不会从模块 B 重新导出任何东西。

重命名导入

有时,能够通过不同的命名空间(通常是较短的命名空间)访问另一个模块中的名称是很方便的。为此,你可以使用 import…as 。例如:

module A

import Data.List as L

这个模块 A 可以访问从模块 Data.List 导出的名称,但也可以通过模块名称 L 明确地访问它们。 import...as 也可以与 import public 结合起来,创建一个模块,从其他子模块导出一个更大的API:

module Books

import public Books.Hardback as Books
import public Books.Comic as Books

在这里,任何导入 Books 的模块都可以访问 Books.HardbackBooks.Comic 的导出接口,两者都在命名空间 Books

显式命名空间

定义一个模块也隐含地定义了一个命名空间。然而,命名空间也可以被 明确 地赋予 。如果你想在同一个模块中重载名字,这会非常有用:

module Foo

namespace X
  export
  test : Int -> Int
  test x = x * 2

namespace Y
  export
  test : String -> String
  test x = x ++ x

这个模块(公认是设计好的)定义了两个函数,其全称是 Foo.X.testFoo.Y.test ,可以通过其类型来区分:

*Foo> test 3
6 : Int
*Foo> test "foo"
"foofoo" : String

导出规则 public exportexport ,是 按命名空间 ,而不是 按文件 ,所以上面的两个 test 定义需要 export 标志才能在它们自己的命名空间之外可见。

参数化块

例如,可以使用 parameters 声明,在一些参数上对函数组进行参数化:

parameters (x : Nat, y : Nat)
  addAll : Nat -> Nat
  addAll z = x + y + z

parameters 块的作用是将声明的参数添加到该块内的每个函数、类型和数据构造器中。具体来说,就是将参数添加到参数列表的前面。在块之外,必须明确地给出参数。 addAll 函数,当从 REPL 中调用时,将有以下类型签名。

*params> :t addAll
addAll : Nat -> Nat -> Nat -> Nat

和以下定义。

addAll : (x : Nat) -> (y : Nat) -> (z : Nat) -> Nat
addAll x y z = x + y + z

参数块可以是嵌套的,也可以包括数据声明,在这种情况下,参数被明确地添加到所有类型和数据构造器中。它们也可以是具有隐含参数的依赖类型:

parameters (y : Nat, xs : Vect x a)
  data Vects : Type -> Type where
    MkVects : Vect y a -> Vects a

  append : Vects a -> Vect (x + y) a
  append (MkVects ys) = xs ++ ys

要在块外使用 Vectsappend ,我们还必须给出 xsy 的参数。在这里,我们可以使用占位符来表示可以由类型检查器推断出来的值:

Main> show (append _ _ (MkVects _ [1,2,3] [4,5,6]))
"[1, 2, 3, 4, 5, 6]"

多重性

Idris 2 是基于 量化类型理论(QTT) ,这是由 Bob Atkey 和 Conor McBride 开发的核心语言。在实践中,Idris 2 中的每个变量都有一个 数量 与之相关。数量是的取值是下列其中之一:

  • 0 ,表示变量在运行时被 擦除

  • 1 ,表示变量在运行时 正好使用一次

  • 不受限制 ,这与 Idris 1 的行为相同

我们可以通过检查孔看到变量的多重性。例如,如果我们有以下关于向量的 append 的骨架定义…

append : Vect n a -> Vect m a -> Vect (n + m) a
append xs ys = ?append_rhs

…我们可以看一下 append_rhs 这个孔:

Main> :t append_rhs
 0 m : Nat
 0 a : Type
 0 n : Nat
   ys : Vect m a
   xs : Vect n a
-------------------------------------
append_rhs : Vect (plus n m) a

0 旁边的 m, an 表示它们在范作用域内,但在运行时将会出现 0 次,也就是说,将会 保证 它们在运行时会被删除。

多重性可以显式地写在函数类型中,如下所示:

  • ignoreN : (0 n : Nat) -> Vect n a -> Nat - 这个函数在运行时 n 将不可见

  • duplicate : (1 x : a) -> (a, a) - 这个函数必须准确地只使用 x 一次(因此,顺便说一下,祝你实现它。这个例子没有实现,因为它需要使用 x 两次!)

如果没有多重性注解,参数是不受限制的。另一方面,如果名字被隐式绑定(比如上面两个例子中的 a ),那么参数就会被抹去。所以,上面的类型也可以写成:

  • ignoreN : {0 a : _} -> (0 n : Nat) -> Vect n a -> Nat

  • duplicate : {0 a : _} -> (1 x : a) -> (a, a)

本节描述了多重性对你的 Idris 2 程序的实际意义,并有几个例子。特别描述了:

  • 擦除 - 如何知道哪些是运行时相关的,哪些是被擦除的

  • 线性 - 使用类型系统对 资源使用协议 进行编码

  • 类型的模式匹配 - 真正的一等类型

如果将 Idris 1 程序转换到 Idris 2 ,对于大多数程序来说,其中你需要了解的最重要的问题是 擦除 。然而,最有趣的,也是给 Idris 2 带来更多表现力的,是 线性 ,所以我们将从线性开始。

线性

1 多重性表达了一个变量必须被精确的只使用一次。我们所说的 “使用 ” 是指以下两种情况:

  • 如果变量是一个数据类型或原始值,它将被模式匹配,例如,通过成为 case 语句的主题,或成为模式匹配的函数参数等等,

  • 如果该变量是一个函数,则该函数被应用(即只用一个参数运行)

首先,我们将看到这在一些函数和数据类型的小例子上是如何工作的,然后看它如何被用来编码 资源协议

上面,我们看到了 duplicate 的类型。让我们试着以交互的方式来写它,看看出了什么问题。我们首先给出类型和一个带孔的骨架定义

duplicate : (1 x : a) -> (a, a)
duplicate x = ?help

检查一个孔的类型可以告诉我们作用域内每个变量的多重性。如果我们检查 ?help 的类型,我们会发现我们在运行时不能使用 a ,而且我们必须准确地只使用 x 一次:

Main> :t help
 0 a : Type
 1 x : a
-------------------------------------
help : (a, a)

如果我们用 x 来表示对中的一部分…

duplicate : (1 x : a) -> (a, a)
duplicate x = (x, ?help)

…那么剩下的孔的类型告诉我们,我们不能把它用于其他地方了:

Main> :t help
 0 a : Type
 0 x : a
-------------------------------------
help : a

如果我们尝试定义 duplicate x = (?help, x) ,也会发生同样的情况(试试吧!)。

为了避免解析上的歧义,如果你为一个变量给出一个明确的多重性,就像对 duplicate 的参数那样,你也需要给它一个名字。但是,如果这个名字不在类型的作用域内使用,你可以用 _ 来代替名字,如下所示:

duplicate : (1 _ : a) -> (a, a)

多重性 1 背后的意图是,如果我们有一个函数,其类型为以下形式…

f : (1 x : a) -> b

…那么类型系统给出的保证是: 如果 f x` 被精确使用一次,那么 x 被精确使用一次 。所以,如果我们坚持试图定义 duplicate …:

duplicate x = (x, x)

…然后 Idris 会抱怨:

pmtype.idr:2:15--8:1:While processing right hand side of Main.duplicate at pmtype.idr:2:1--8:1:
There are 2 uses of linear name x

类似的直觉也适用于数据类型。考虑以下类型, Lin ,它包装了一个必须使用一次的参数, Unr ,它包装了一个可以不受限制使用的参数

data Lin : Type -> Type where
     MkLin : (1 _ : a) -> Lin a

data Unr : Type -> Type where
     MkUnr : a -> Unr a

如果 MkLin x 被使用一次,那么 x 被使用一次。但是如果 MkUnr x 被使用一次,就不能保证 x 被使用的频率。我们可以通过开始为 LinUnr 写投影函数来更清楚地看到这一点,以便提取参数

getLin : (1 _ : Lin a) -> a
getLin (MkLin x) = ?howmanyLin

getUnr : (1 _ : Unr a) -> a
getUnr (MkUnr x) = ?howmanyUnr

检查孔的类型表明,对于 getLin ,我们必须准确地使用 x 一次(因为 val 参数被使用一次,通过对其进行模式匹配为 MkLin x ,如果 MkLin x 被使用一次,x 必须使用一次):

Main> :t howmanyLin
 0 a : Type
 1 x : a
-------------------------------------
howmanyLin : a

然而,对于 getUnr ,我们仍然必须使用 val 一次,再次对其进行模式匹配,但是使用 MkUnr x 一次并不会对 x 产生任何限制。因此, xgetUnr 的正文中可以不受限制地使用:

Main> :t howmanyUnr
 0 a : Type
   x : a
-------------------------------------
howmanyUnr : a

如果 getLin 有一个不受限制的参数…

getLin : Lin a -> a
getLin (MkLin x) = ?howmanyLin

…那么 xhowmanyLin 中是不受限制的:

Main> :t howmanyLin
 0 a : Type
   x : a
-------------------------------------
howmanyLin : a

记住从 MkLin 的类型中得到的直觉是,如果 MkLin x 正好使用一次, x 也正好使用一次。但是,我们没有说 MkLin x 会被精确使用一次,所以对 x 没有限制。

资源协议

利用能够表达参数的线性用法的一种方法是在定义资源使用协议时,我们可以使用线性来确保任何独特的外部资源只有一个实例,我们可以使用参数为线性的函数来表示该资源的状态转换。例如,一扇门可以处于两种状态之一, OpenClosed

data DoorState = Open | Closed

data Door : DoorState -> Type where
     MkDoor : (doorId : Int) -> Door st

(好吧,我们在这里只是假装–想象一下 doorId 是对一个外部资源的引用!)

我们可以定义开门和关门的函数,明确描述它们如何改变门的状态,并且它们在门中是线性的

openDoor : (1 d : Door Closed) -> Door Open
closeDoor : (1 d : Door Open) -> Door Closed

记住,直觉是这样的,如果 openDoor d 被精确使用一次,那么 d 也被精确使用一次。因此,只要一扇门 d 在创建时具有多重性 1 ,我们就 知道 ,一旦我们对它调用 openDoor ,我们将不能再使用 d 。鉴于 d 是一个外部资源,而 openDoor 已经改变了它的状态,这是一件好事!

我们可以通过使用以下类型的 newDoor 函数来确保我们创建的任何门都具有多重性 1

newDoor : (1 p : (1 d : Door Closed) -> IO ()) -> IO ()

也就是说, newDoor 需要一个函数,它正好运行一次。这个函数需要一个门,这个门被精确地使用一次。我们将在 IO 中运行它,以表明当我们创建门时,与外部世界有一些互动。由于多重性 1 意味着门必须被精确地使用一次,我们需要在完成后能够删除门

deleteDoor : (1 d : Door Closed) -> IO ()

因此,一个正确的 门 协议的使用例子是

doorProg : IO ()
doorProg
    = newDoor $ \d =>
          let d' = openDoor d
              d'' = closeDoor d' in
              deleteDoor d''

交互性的建立这个程序是很有启发性的,沿途会出现一些漏洞,看看 d , d' 等变量的多重性如何变化。比如说

doorProg : IO ()
doorProg
    = newDoor $ \d =>
          let d' = openDoor d in
              ?whatnow

检查 ?whatnow 的类型,发现 d 现在已经用完了,但我们还必须要使用 d' 正好一次:

Main> :t whatnow
 0 d : Door Closed
 1 d' : Door Open
-------------------------------------
whatnow : IO ()

请注意, d 的多重性 0 意味着我们仍然可以 谈论它 - 特别是,我们仍然可以在类型中推理它 - 但我们不能在程序的其余部分的相关位置再次使用它。在整个程序中影射 d 这个名字也是可以的

doorProg : IO ()
doorProg
    = newDoor $ \d =>
          let d = openDoor d
              d = closeDoor d in
              deleteDoor d

如果我们没有正确遵循协议——创建门,打开它,关闭它,然后删除它—— 那么程序就不能通过类型检查。例如,我们可以尝试在完成之前不删除门

doorProg : IO ()
doorProg
    = newDoor $ \d =>
          let d' = openDoor d
              d'' = closeDoor d' in
              putStrLn "What could possibly go wrong?"

这给出了以下错误:

Door.idr:15:19--15:38:While processing right hand side of Main.doorProg at Door.idr:13:1--17:1:
There are 0 uses of linear name d''

关于这里的细节还有很多要讲的!但是,这在很大程度上显示了我们如何在类型层面上使用线性来捕获资源使用协议。如果我们有一个需要保证线性使用的外部资源,比如 Door ,我们就不需要在 IO 单子中对该资源进行操作,因为我们已经对操作进行了排序,并且没有访问任何过时的资源状态。这类似于交互式程序在 Clean编程语言 中的工作方式,事实上这也是 IO 在Idris 2中的内部实现方式,用一个特殊的 %World 类型来表示外部世界的状态,它总是被线性地使用

public export
data IORes : Type -> Type where
     MkIORes : (result : a) -> (1 x : %World) -> IORes a

export
data IO : Type -> Type where
     MkIO : (1 fn : (1 x : %World) -> IORes a) -> IO a

在类型系统中拥有多重性,会引起一些有趣的问题,例如:

  • 我们是否可以使用线性信息来告知内存管理,例如,对不需要进行垃圾回收的函数进行类型级别的保证?

  • 应如何将多重性纳入 Functor, ApplicativeMonad 等接口?

  • 如果我们有 0 ,和 1 作为多重性,为什么要止步于此?为什么没有 23 或者更多(例如 Granule

  • 多重性多态怎么样,就像 Linear Haskell 提案 中那样?

  • 即使没有这些, 现在 我们能做什么?

擦除

1 多重性在我们可以表达的属性种类方面给了我们很多可能性。但是, 0 多重性也许更重要,因为它允许我们精确地知道哪些值在运行时是相关的,哪些是编译时才有的(也就是说,哪些是被删除的)。使用 0 多重性意味着一个函数的类型现在可以准确地告诉我们它在运行时需要什么。

例如,在 Idris 1 中你可以得到一个向量的长度,如下所示

vlen : Vect n a -> Nat
vlen {n} xs = n

这很好,因为它在恒定时间内运行,但代价是 n 在运行时必须可用,所以在运行时我们总是需要向量的长度,如果我们曾经调用 vlen 。Idris 1 可以推断出是否需要长度,但是程序员没有简单的方法来确定。

在 Idris 2 中,我们需要明确指出,在运行时需要 n

vlen : {n : Nat} -> Vect n a -> Nat
vlen xs = n

(顺便说一下,还要注意在 Idris 2 中,在类型中绑定的名字也可以在定义中使用,而不需要明确地重新绑定它们)

这也意味着,当你调用 vlen 时,你需要可用的长度。例如,这将产生一个错误

sumLengths : Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen ys

Idris 2 会报告:

vlen.idr:7:20--7:28:While processing right hand side of Main.sumLengths at vlen.idr:7:1--10:1:
m is not accessible in this context

这意味着它需要使用 m 作为参数传递给 vlen xs ,在这里它需要在运行时可用,但是 msumLengths 中不可用,因为它有多重性 0

我们可以通过将 sumLengths 的右侧替换成一个孔来更清楚地看到这一点……

sumLengths : Vect m a -> Vect n a -> Nat
sumLengths xs ys = ?sumLengths_rhs

…然后在REPL检查孔的类型:

Main> :t sumLengths_rhs
 0 n : Nat
 0 a : Type
 0 m : Nat
   ys : Vect n a
   xs : Vect m a
-------------------------------------
sumLengths_rhs : Nat

相反,我们需要为 mn 提供无限制多重性的绑定

sumLengths : {m, n : _} -> Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen xs

请记住,在绑定器上不给出多重性,就像这里的 mn 一样,意味着变量的使用不受限制。

如果你要将 Idris 1 程序转换到 Idris 2 中使用,这可能是你需要考虑的最大问题。但需要注意的是,如果你有绑定的隐式参数,例如…

excitingFn : {t : _} -> Coffee t -> Moonbase t

…那么最好确保 t 真的被需要,否则由于运行时间不必要地建立 t 的实例,性能可能会受到影响!

关于擦除的最后一点说明:试图对一个具有多重性 0 的参数进行模式匹配是一个错误,,除非其值可以从其他地方推断出来。因此,下面的定义会被拒绝

badNot : (0 x : Bool) -> Bool
badNot False = True
badNot True = False

这被拒绝了,错误是:

badnot.idr:2:1--3:1:Attempt to match on erased argument False in
Main.badNot

然而,下面的情况是好的,因为在 sNot 中,尽管我们似乎在被删除的参数 x 上进行了匹配,但它的值是可以从第二个参数的类型中唯一推断出来的

data SBool : Bool -> Type where
     SFalse : SBool False
     STrue  : SBool True

sNot : (0 x : Bool) -> SBool x -> Bool
sNot False SFalse = True
sNot True  STrue  = False

到目前为止,Idris 2 的经验表明,在大多数情况下,只要你在 Idris 1 程序中使用非绑定隐式参数,它们在 Idris 2 中无需过多修改即可工作。 Idris 2 类型检查器将指出你在运行时需要非绑定隐式参数的地方–有时这既令人惊讶又具有启发性!

类型的模式匹配

思考依赖类型的一种方式是将它们视为语言中的 “一等 ” 对象,因为它们可以像其他结构体一样被分配给变量、传递和从函数中返回。但是,如果它们是真正的一等对象,我们也应该能够对它们进行模式匹配。Idris 2 允许我们这样做。例如

showType : Type -> String
showType Int = "Int"
showType (List a) = "List of " ++ showType a
showType _ = "something else"

我们可以进行以下尝试:

Main> showType Int
"Int"
Main> showType (List Int)
"List of Int"
Main> showType (List Bool)
"List of something else"

对函数类型进行模式匹配很有意思,因为返回类型可能取决于输入值。例如,让我们为 showType 添加一个案例

showType (Nat -> a) = ?help

检查 help 的类型将告诉我们:

Main> :t help
   a : Nat -> Type
-------------------------------------
help : String

所以,返回类型 a 取决于类型 Nat 的输入值,我们需要想出一个值来使用 a ,比如说

showType (Nat -> a) = "Function from Nat to " ++ showType (a Z)

请注意,绑定器上的多重性,以及在 非擦除式 类型上的模式匹配能力,意味着以下两种类型是不同的

id : a -> a
notId : {a : Type} -> a -> a

notId 的情况下,我们可以在 a 上进行匹配,得到的函数肯定不是同一函数

notId {a = Int} x = x + 1
notId x = x
Main> notId 93
94
Main> notId "???"
"???"

能够区分相关和不相关的类型参数有一个重要的结果,在一个函数中,如果 a 有多重性 0 ,那么 只有 a 是参数化的。所以,在 notId 的情况下, a 不是 参数,所以我们不能因为它是多态的而对该函数的行为方式得出任何结论,因为类型告诉我们它可能对 a 进行模式匹配。

另一方面,这只是一个巧合,在非依赖类型的语言中,类型是 不相关的 并会被抹去,而值是 相关的 且会在运行时保留。Idris 2 是基于 QTT 的,允许我们精确区分相关和不相关的参数。类型可以是相关的,值(如 n 向量的索引)可以是不相关的。

关于多重性的更多细节,见 Idris 2: Quantitative Type Theory in Action

Idris 包括一个简单的构建系统,用于从一个命名的包描述文件中构建包和可执行文件。这些文件可以与 Idris 编译器一起使用,以管理开发过程。

包描述

一个包的描述包括以下内容:

  • 一个头,由关键词``package``组成,后面是一个包名。包名可以是任何有效的 Idris 标识符。iPKG 格式也需要一个带引号的版本,接受任何有效的文件名。

  • 描述包内容的字段, <field> = <value>

至少有一个字段必须是模块字段,其值是一个逗号分隔的模块列表。例如,给定一个 idris 包 maths ,其中有模块 Maths.idrMaths.NumOps.idrMaths.BinOps.idr ,和 Maths.HexOps.idr ,相应的包文件应该是:

package maths

modules = Maths
        , Maths.NumOps
        , Maths.BinOps
        , Maths.HexOps

运行 idris2 --init 将在当前目录下交互式地创建一个新的包文件。生成的包文件列出了所有可配置的字段,并附有简要说明。

其他包文件的例子可以在 libs 目录下的主Idris资源库中找到,也可以在 `第三方库 <https://github.com/idris-lang/Idris-dev/wiki/Libraries>`_中找到 。

使用包文件

Idris 本身知道软件包,并且有特殊的命令来帮助,例如,构建软件包,安装软件包,和清理软件包。 例如,考虑到前面的 maths 包,我们可以按以下方式使用 Idris:

  • idris2 --build maths.ipkg 将构建包中的所有模块

  • idris2 --install maths.ipkg 将安装这个包,使其他 Idris 库和程序可以访问它。

  • idris2 --clean maths.ipkg 将删除所有中间代码和构建时产生的可执行文件。

一旦安装了 math 包,命令行选项 --package maths 使其可以访问(缩写为 -p maths )。比如:

idris2 -p maths Main.idr

在 Atom 中使用包依赖

如果你在使用 Atom 编辑器,并且有对另一个软件包的依赖,例如对应于 import Lightyearimport Pruviloj ,你需要让 Atom 知道它应该被加载。最简单的方法是通过一个 .ipkg 文件来实现。 ipkg 文件的一般内容将在本教程的下一节中描述,但现在这里有一个简单的示例,用于这个微不足道的案例:

  • 创建一个文件夹 myProject。

  • 添加一个只包含几行的 myProject.ipkg 文件:

package myProject

depends = pruviloj, lightyear
  • 在 Atom 中,使用文件菜单,打开文件夹 myProject 。

示例——良类型的解释器

在这一节中,我们将使用到目前为止所看到的功能来编写一个更大的例子,一个简单的函数式编程语言的解释器,有变量、函数应用、二进制运算符和 if...then...else 结构。我们将使用依赖类型系统来确保任何可以被表示的程序都有良好的类型。

语言的表示

首先,让我们定义语言中的类型。我们有整数、布尔运算和函数,用 Ty 表示:

data Ty = TyInt | TyBool | TyFun Ty Ty

我们可以写一个函数,将这些表示方法转化为具体的 Idris 类型–记住,类型是一等的,所以可以像其他值一样被计算:

interpTy : Ty -> Type
interpTy TyInt       = Integer
interpTy TyBool      = Bool
interpTy (TyFun a t) = interpTy a -> interpTy t

我们将定义我们的语言的一种表示方式,即只有类型良好的程序才能被表示。我们将按表达式的类型、 局部变量的类型(上下文)来索引表达式的表示。上下文可以使用 Vect 数据类型表示,因此我们需要在源文件顶部导入 Data.Vect

import Data.Vect

表达式由局部变量的类型和表达式本身的类型索引:

data Expr : Vect n Ty -> Ty -> Type

表达式的完整表示是:

data HasType : (i : Fin n) -> Vect n Ty -> Ty -> Type where
    Stop : HasType FZ (t :: ctxt) t
    Pop  : HasType k ctxt t -> HasType (FS k) (u :: ctxt) t

data Expr : Vect n Ty -> Ty -> Type where
    Var : HasType i ctxt t -> Expr ctxt t
    Val : (x : Integer) -> Expr ctxt TyInt
    Lam : Expr (a :: ctxt) t -> Expr ctxt (TyFun a t)
    App : Expr ctxt (TyFun a t) -> Expr ctxt a -> Expr ctxt t
    Op  : (interpTy a -> interpTy b -> interpTy c) ->
          Expr ctxt a -> Expr ctxt b -> Expr ctxt c
    If  : Expr ctxt TyBool ->
          Lazy (Expr ctxt a) ->
          Lazy (Expr ctxt a) -> Expr ctxt a

上面的代码使用了 base 库中的 VectFin 类型。 Fin 可作为 Data.Vect 的一部分使用。在整个过程中, ctxt 指的是局部变量上下文。

由于表达式是按其类型索引的,我们可以从构造函数的定义中读取语言的类型规则。让我们依次看看每个构造函数。

We use a nameless representation for variables — they are de Bruijn indexed. Variables are represented by a proof of their membership in the context, HasType i ctxt T, which is a proof that variable i in context ctxt has type T. This is defined as follows:

data HasType : (i : Fin n) -> Vect n Ty -> Ty -> Type where
    Stop : HasType FZ (t :: ctxt) t
    Pop  : HasType k ctxt t -> HasType (FS k) (u :: ctxt) t

We can treat Stop as a proof that the most recently defined variable is well-typed, and Pop n as a proof that, if the nth most recently defined variable is well-typed, so is the n+1th. In practice, this means we use Stop to refer to the most recently defined variable, Pop Stop to refer to the next, and so on, via the Var constructor:

Var : HasType i ctxt t -> Expr ctxt t

So, in an expression \x. \y. x y, the variable x would have a de Bruijn index of 1, represented as Pop Stop, and y 0, represented as Stop. We find these by counting the number of lambdas between the definition and the use.

A value carries a concrete representation of an integer:

Val : (x : Integer) -> Expr ctxt TyInt

A lambda creates a function. In the scope of a function of type a -> t, there is a new local variable of type a, which is expressed by the context index:

Lam : Expr (a :: ctxt) t -> Expr ctxt (TyFun a t)

Function application produces a value of type t given a function from a to t and a value of type a:

App : Expr ctxt (TyFun a t) -> Expr ctxt a -> Expr ctxt t

We allow arbitrary binary operators, where the type of the operator informs what the types of the arguments must be:

Op : (interpTy a -> interpTy b -> interpTy c) ->
     Expr ctxt a -> Expr ctxt b -> Expr ctxt c

Finally, If expressions make a choice given a boolean. Each branch must have the same type, and we will evaluate the branches lazily so that only the branch which is taken need be evaluated:

If : Expr ctxt TyBool ->
     Lazy (Expr ctxt a) ->
     Lazy (Expr ctxt a) ->
     Expr ctxt a

Writing the Interpreter

When we evaluate an Expr, we’ll need to know the values in scope, as well as their types. Env is an environment, indexed over the types in scope. Since an environment is just another form of list, albeit with a strongly specified connection to the vector of local variable types, we use the usual :: and Nil constructors so that we can use the usual list syntax. Given a proof that a variable is defined in the context, we can then produce a value from the environment:

data Env : Vect n Ty -> Type where
    Nil  : Env Nil
    (::) : interpTy a -> Env ctxt -> Env (a :: ctxt)

lookup : HasType i ctxt t -> Env ctxt -> interpTy t
lookup Stop    (x :: xs) = x
lookup (Pop k) (x :: xs) = lookup k xs

Given this, an interpreter is a function which translates an Expr into a concrete Idris value with respect to a specific environment:

interp : Env ctxt -> Expr ctxt t -> interpTy t

The complete interpreter is defined as follows, for reference. For each constructor, we translate it into the corresponding Idris value:

interp env (Var i)     = lookup i env
interp env (Val x)     = x
interp env (Lam sc)    = \x => interp (x :: env) sc
interp env (App f s)   = interp env f (interp env s)
interp env (Op op x y) = op (interp env x) (interp env y)
interp env (If x t e)  = if interp env x then interp env t
                                         else interp env e

Let us look at each case in turn. To translate a variable, we simply look it up in the environment:

interp env (Var i) = lookup i env

To translate a value, we just return the concrete representation of the value:

interp env (Val x) = x

Lambdas are more interesting. In this case, we construct a function which interprets the scope of the lambda with a new value in the environment. So, a function in the object language is translated to an Idris function:

interp env (Lam sc) = \x => interp (x :: env) sc

For an application, we interpret the function and its argument and apply it directly. We know that interpreting f must produce a function, because of its type:

interp env (App f s) = interp env f (interp env s)

Operators and conditionals are, again, direct translations into the equivalent Idris constructs. For operators, we apply the function to its operands directly, and for If, we apply the Idris if...then...else construct directly.

interp env (Op op x y) = op (interp env x) (interp env y)
interp env (If x t e)  = if interp env x then interp env t
                                         else interp env e

Testing

We can make some simple test functions. Firstly, adding two inputs \x. \y. y + x is written as follows:

add : Expr ctxt (TyFun TyInt (TyFun TyInt TyInt))
add = Lam (Lam (Op (+) (Var Stop) (Var (Pop Stop))))

More interestingly, a factorial function fact (e.g. \x. if (x == 0) then 1 else (fact (x-1) * x)), can be written as:

fact : Expr ctxt (TyFun TyInt TyInt)
fact = Lam (If (Op (==) (Var Stop) (Val 0))
               (Val 1)
               (Op (*) (App fact (Op (-) (Var Stop) (Val 1)))
                       (Var Stop)))

Running

To finish, we write a main program which interprets the factorial function on user input:

main : IO ()
main = do putStr "Enter a number: "
          x <- getLine
          printLn (interp [] fact (cast x))

Here, cast is an overloaded function which converts a value from one type to another if possible. Here, it converts a string to an integer, giving 0 if the input is invalid. An example run of this program at the Idris interactive environment is:

$ idris2 interp.idr
     ____    __     _         ___
    /  _/___/ /____(_)____   |__ \
    / // __  / ___/ / ___/   __/ /     Version 0.5.1
  _/ // /_/ / /  / (__  )   / __/      https://www.idris-lang.org
 /___/\__,_/_/  /_/____/   /____/      Type :? for help

Welcome to Idris 2.  Enjoy yourself!
Main> :exec main
Enter a number: 6
720
Aside: cast

The prelude defines an interface Cast which allows conversion between types:

interface Cast from to where
    cast : from -> to

It is a multi-parameter interface, defining the source type and object type of the cast. It must be possible for the type checker to infer both parameters at the point where the cast is applied. There are casts defined between all of the primitive types, as far as they make sense.

Views and the “with” rule

警告

NOT UPDATED FOR IDRIS 2 YET

Dependent pattern matching

Since types can depend on values, the form of some arguments can be determined by the value of others. For example, if we were to write down the implicit length arguments to (++), we’d see that the form of the length argument was determined by whether the vector was empty or not:

(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) {n=Z}   []        ys = ys
(++) {n=S k} (x :: xs) ys = x :: xs ++ ys

If n was a successor in the [] case, or zero in the :: case, the definition would not be well typed.

The with rule — matching intermediate values

Very often, we need to match on the result of an intermediate computation. Idris provides a construct for this, the with rule, inspired by views in Epigram 1, which takes account of the fact that matching on a value in a dependently typed language can affect what we know about the forms of other values. In its simplest form, the with rule adds another argument to the function being defined.

We have already seen a vector filter function. This time, we define it using with as follows:

filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)
filter p [] = ( _ ** [] )
filter p (x :: xs) with (filter p xs)
  filter p (x :: xs) | ( _ ** xs' ) = if (p x) then ( _ ** x :: xs' ) else ( _ ** xs' )

Here, the with clause allows us to deconstruct the result of filter p xs. The view refined argument pattern filter p (x :: xs) goes beneath the with clause, followed by a vertical bar |, followed by the deconstructed intermediate result ( _ ** xs' ). If the view refined argument pattern is unchanged from the original function argument pattern, then the left side of | is extraneous and may be omitted with an underscore _:

filter p (x :: xs) with (filter p xs)
  _ | ( _ ** xs' ) = if (p x) then ( _ ** x :: xs' ) else ( _ ** xs' )

with clauses can also be nested:

foo : Int -> Int -> Bool
foo n m with (n + 1)
  foo _ m | 2 with (m + 1)
    foo _ _ | 2 | 3 = True
    foo _ _ | 2 | _ = False
  foo _ _ | _ = False

and left hand sides that are the same as their parent’s can be skipped by using _ to focus on the patterns for the most local with. Meaning that the above foo can be rewritten as follows:

foo : Int -> Int -> Bool
foo n m with (n + 1)
  _ | 2 with (m + 1)
    _ | 3 = True
    _ | _ = False
  _ | _ = False

If the intermediate computation itself has a dependent type, then the result can affect the forms of other arguments — we can learn the form of one value by testing another. In these cases, view refined argument patterns must be explicit. For example, a Nat is either even or odd. If it is even it will be the sum of two equal Nat. Otherwise, it is the sum of two equal Nat plus one:

data Parity : Nat -> Type where
   Even : {n : _} -> Parity (n + n)
   Odd  : {n : _} -> Parity (S (n + n))

We say Parity is a view of Nat. It has a covering function which tests whether it is even or odd and constructs the predicate accordingly. Note that we’re going to need access to n at run time, so although it’s an implicit argument, it has unrestricted multiplicity.

parity : (n:Nat) -> Parity n

We’ll come back to the definition of parity shortly. We can use it to write a function which converts a natural number to a list of binary digits (least significant first) as follows, using the with rule:

natToBin : Nat -> List Bool
natToBin Z = Nil
natToBin k with (parity k)
   natToBin (j + j)     | Even = False :: natToBin j
   natToBin (S (j + j)) | Odd  = True  :: natToBin j

The value of parity k affects the form of k, because the result of parity k depends on k. So, as well as the patterns for the result of the intermediate computation (Even and Odd) right of the |, we also write how the results affect the other patterns left of the |. That is:

  • When parity k evaluates to Even, we can refine the original argument k to a refined pattern (j + j) according to Parity (n + n) from the Even constructor definition. So (j + j) replaces k on the left side of |, and the Even constructor appears on the right side. The natural number j in the refined pattern can be used on the right side of the = sign.

  • Otherwise, when parity k evaluates to Odd, the original argument k is refined to S (j + j) according to Parity (S (n + n)) from the Odd constructor definition, and Odd now appears on the right side of |, again with the natural number j used on the right side of the = sign.

Note that there is a function in the patterns (+) and repeated occurrences of j - this is allowed because another argument has determined the form of these patterns.

Defining parity

The definition of parity is a little tricky, and requires some knowledge of theorem proving (see Section 定理证明), but for completeness, here it is:

parity : (n : Nat) -> Parity n
parity Z = Even {n = Z}
parity (S Z) = Odd {n = Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j))) | Even
      = rewrite plusSuccRightSucc j j in Even {n = S j}
  parity (S (S (S (j + j)))) | Odd
      = rewrite plusSuccRightSucc j j in Odd {n = S j}

For full details on rewrite in particular, please refer to the theorem proving tutorial, in Section 定理证明.

1

Conor McBride and James McKinna. 2004. The view from the left. J. Funct. Program. 14, 1 (January 2004), 69-111. https://doi.org/10.1017/S0956796803004829

定理证明

Equality

Idris allows propositional equalities to be declared, allowing theorems about programs to be stated and proved. An equality type is defined as follows in the Prelude:

data Equal : a -> b -> Type where
     Refl : Equal x x

As a notational convenience, Equal x y can be written as x = y. Equalities can be proposed between any values of any types, but the only way to construct a proof of equality is if values actually are equal. For example:

fiveIsFive : 5 = 5
fiveIsFive = Refl

twoPlusTwo : 2 + 2 = 4
twoPlusTwo = Refl

If we try…

twoPlusTwoBad : 2 + 2 = 5
twoPlusTwoBad = Refl

…then we’ll get an error:

Proofs.idr:8:17--10:1:While processing right hand side of Main.twoPlusTwoBad at Proofs.idr:8:1--10:1:
When unifying 4 = 4 and (fromInteger 2 + fromInteger 2) = (fromInteger 5)
Mismatch between:
        4
and
        5

The Empty Type

There is an empty type, Void, which has no constructors. It is therefore impossible to construct a canonical element of the empty type. We can therefore use the empty type to prove that something is impossible, for example zero is never equal to a successor:

disjoint : (n : Nat) -> Z = S n -> Void
disjoint n prf = replace {p = disjointTy} prf ()
  where
    disjointTy : Nat -> Type
    disjointTy Z = ()
    disjointTy (S k) = Void

Don’t worry if you don’t get all the details of how this works just yet - essentially, it applies the library function replace, which uses an equality proof to transform a predicate. Here we use it to transform a value of a type which can exist, the empty tuple, to a value of a type which can’t, by using a proof of something which can’t exist.

Once we have an element of the empty type, we can prove anything. void is defined in the library, to assist with proofs by contradiction.

void : Void -> a

Proving Theorems

When type checking dependent types, the type itself gets normalised. So imagine we want to prove the following theorem about the reduction behaviour of plus:

plusReduces : (n:Nat) -> plus Z n = n

We’ve written down the statement of the theorem as a type, in just the same way as we would write the type of a program. In fact there is no real distinction between proofs and programs. A proof, as far as we are concerned here, is merely a program with a precise enough type to guarantee a particular property of interest.

We won’t go into details here, but the Curry-Howard correspondence 1 explains this relationship. The proof itself is immediate, because plus Z n normalises to n by the definition of plus:

plusReduces n = Refl

It is slightly harder if we try the arguments the other way, because plus is defined by recursion on its first argument. The proof also works by recursion on the first argument to plus, namely n.

plusReducesZ : (n:Nat) -> n = plus n Z
plusReducesZ Z = Refl
plusReducesZ (S k) = cong S (plusReducesZ k)

cong is a function defined in the library which states that equality respects function application:

cong : (f : t -> u) -> a = b -> f a = f b

To see more detail on what’s going on, we can replace the recursive call to plusReducesZ with a hole:

plusReducesZ (S k) = cong S ?help

Then inspecting the type of the hole at the REPL shows us:

Main> :t help
   k : Nat
-------------------------------------
help : k = (plus k Z)

We can do the same for the reduction behaviour of plus on successors:

plusReducesS : (n:Nat) -> (m:Nat) -> S (plus n m) = plus n (S m)
plusReducesS Z m = Refl
plusReducesS (S k) m = cong S (plusReducesS k m)

Even for small theorems like these, the proofs are a little tricky to construct in one go. When things get even slightly more complicated, it becomes too much to think about to construct proofs in this “batch mode”.

Idris provides interactive editing capabilities, which can help with building proofs. For more details on building proofs interactively in an editor, see 定理证明.

Theorems in Practice

The need to prove theorems can arise naturally in practice. For example, previously (Views and the “with” rule) we implemented natToBin using a function parity:

parity : (n:Nat) -> Parity n

We provided a definition for parity, but without explanation. We might have hoped that it would look something like the following:

parity : (n:Nat) -> Parity n
parity Z     = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j)))     | Even = Even {n=S j}
  parity (S (S (S (j + j)))) | Odd  = Odd {n=S j}

Unfortunately, this fails with a type error:

With.idr:26:17--27:3:While processing right hand side of Main.with block in 2419 at With.idr:24:3--27:3:
Can't solve constraint between:
        plus j (S j)
and
        S (plus j j)

The problem is that normalising S j + S j, in the type of Even doesn’t result in what we need for the type of the right hand side of Parity. We know that S (S (plus j j)) is going to be equal to S j + S j, but we need to explain it to Idris with a proof. We can begin by adding some holes (see 完全性和覆盖性) to the definition:

parity : (n:Nat) -> Parity n
parity Z     = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j)))     | Even = let result = Even {n=S j} in
                                          ?helpEven
  parity (S (S (S (j + j)))) | Odd  = let result = Odd {n=S j} in
                                          ?helpOdd

Checking the type of helpEven shows us what we need to prove for the Even case:

  j : Nat
  result : Parity (S (plus j (S j)))
--------------------------------------
helpEven : Parity (S (S (plus j j)))

We can therefore write a helper function to rewrite the type to the form we need:

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in p

The rewrite ... in syntax allows you to change the required type of an expression by rewriting it according to an equality proof. Here, we have used plusSuccRightSucc, which has the following type:

plusSuccRightSucc : (left : Nat) -> (right : Nat) -> S (left + right) = left + S right

We can see the effect of rewrite by replacing the right hand side of helpEven with a hole, and working step by step. Beginning with the following:

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = ?helpEven_rhs

We can look at the type of helpEven_rhs:

  j : Nat
  p : Parity (S (plus j (S j)))
--------------------------------------
helpEven_rhs : Parity (S (S (plus j j)))

Then we can rewrite by applying plusSuccRightSucc j j, which gives an equation S (j + j) = j + S j, thus replacing S (j + j) (or, in this case, S (plus j j) since S (j + j) reduces to that) in the type with j + S j:

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in ?helpEven_rhs

Checking the type of helpEven_rhs now shows what has happened, including the type of the equation we just used (as the type of _rewrite_rule):

Main> :t helpEven_rhs
   j : Nat
   p : Parity (S (plus j (S j)))
-------------------------------------
helpEven_rhs : Parity (S (plus j (S j)))

Using rewrite and another helper for the Odd case, we can complete parity as follows:

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in p

helpOdd : (j : Nat) -> Parity (S (S (j + S j))) -> Parity (S (S (S (j + j))))
helpOdd j p = rewrite plusSuccRightSucc j j in p

parity : (n:Nat) -> Parity n
parity Z     = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j)))     | Even = helpEven j (Even {n = S j})
  parity (S (S (S (j + j)))) | Odd  = helpOdd j (Odd {n = S j})

Full details of rewrite are beyond the scope of this introductory tutorial, but it is covered in the theorem proving tutorial (see 定理证明).

Totality Checking

If we really want to trust our proofs, it is important that they are defined by total functions — that is, a function which is defined for all possible inputs and is guaranteed to terminate. Otherwise we could construct an element of the empty type, from which we could prove anything:

-- making use of 'hd' being partially defined
empty1 : Void
empty1 = hd [] where
    hd : List a -> a
    hd (x :: xs) = x

-- not terminating
empty2 : Void
empty2 = empty2

Internally, Idris checks every definition for totality, and we can check at the prompt with the :total command. We see that neither of the above definitions is total:

Void> :total empty1
Void.empty1 is not covering due to call to function empty1:hd
Void> :total empty2
Void.empty2 is possibly not terminating due to recursive path Void.empty2

Note the use of the word “possibly” — a totality check can never be certain due to the undecidability of the halting problem. The check is, therefore, conservative. It is also possible (and indeed advisable, in the case of proofs) to mark functions as total so that it will be a compile time error for the totality check to fail:

total empty2 : Void
empty2 = empty2

Reassuringly, our proof in Section The Empty Type that the zero and successor constructors are disjoint is total:

Main> :total disjoint
Main.disjoint is Total

The totality check is, necessarily, conservative. To be recorded as total, a function f must:

  • Cover all possible inputs

  • Be well-founded — i.e. by the time a sequence of (possibly mutually) recursive calls reaches f again, it must be possible to show that one of its arguments has decreased.

  • Not use any data types which are not strictly positive

  • Not call any non-total functions

Directives and Compiler Flags for Totality

警告

Not all of this is implemented yet for Idris 2

By default, Idris allows all well-typed definitions, whether total or not. However, it is desirable for functions to be total as far as possible, as this provides a guarantee that they provide a result for all possible inputs, in finite time. It is possible to make total functions a requirement, either:

  • By using the --total compiler flag.

  • By adding a %default total directive to a source file. All definitions after this will be required to be total, unless explicitly flagged as partial.

All functions after a %default total declaration are required to be total. Correspondingly, after a %default partial declaration, the requirement is relaxed.

Finally, the compiler flag --warnpartial causes to print a warning for any undeclared partial function.

Totality checking issues

Please note that the totality checker is not perfect! Firstly, it is necessarily conservative due to the undecidability of the halting problem, so many programs which are total will not be detected as such. Secondly, the current implementation has had limited effort put into it so far, so there may still be cases where it believes a function is total which is not. Do not rely on it for your proofs yet!

Hints for totality

In cases where you believe a program is total, but Idris does not agree, it is possible to give hints to the checker to give more detail for a termination argument. The checker works by ensuring that all chains of recursive calls eventually lead to one of the arguments decreasing towards a base case, but sometimes this is hard to spot. For example, the following definition cannot be checked as total because the checker cannot decide that filter (< x) xs will always be smaller than (x :: xs):

qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs)
   = qsort (filter (< x) xs) ++
      (x :: qsort (filter (>= x) xs))

The function assert_smaller, defined in the prelude, is intended to address this problem:

assert_smaller : a -> a -> a
assert_smaller x y = y

It simply evaluates to its second argument, but also asserts to the totality checker that y is structurally smaller than x. This can be used to explain the reasoning for totality if the checker cannot work it out itself. The above example can now be written as:

total
qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs)
   = qsort (assert_smaller (x :: xs) (filter (< x) xs)) ++
      (x :: qsort (assert_smaller (x :: xs) (filter (>= x) xs)))

The expression assert_smaller (x :: xs) (filter (<= x) xs) asserts that the result of the filter will always be smaller than the pattern (x :: xs).

In more extreme cases, the function assert_total marks a subexpression as always being total:

assert_total : a -> a
assert_total x = x

In general, this function should be avoided, but it can be very useful when reasoning about primitives or externally defined functions (for example from a C library) where totality can be shown by an external argument.

1

Timothy G. Griffin. 1989. A formulae-as-type notion of control. In Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL ‘90). ACM, New York, NY, USA, 47-58. DOI=10.1145/96709.96714 https://doi.acm.org/10.1145/96709.96714

交互式编辑

到目前为止,我们已经看到了几个例子,说明了 Idris 的依赖类型系统如何通过更精确地描述函数的*类型*中的预期行为来增强对函数正确性的信心。我们还看到了类型系统如何通过允许程序员描述对象语言的类型系统来帮助嵌入式 DSL 开发的示例。然而,精确类型给我们的不仅仅是程序的验证——我们还可以使用类型系统交互式地来帮助编写*按构造正确*的程序,交互。

Idris REPL 提供了几个用于检查和修改程序部分的命令,基于它们的类型,例如模式变量的大小写分割,检查孔的类型,甚至是基本的证明搜索机制。在本节中,我们将解释文本编辑器如何利用这些功能,特别是如何在 Vim 中这样做。 Emacs 的交互模式也可用,自 2021 年 2 月 23 日起针对 Idris 2 兼容性进行了更新。

在 REPL 中编辑

备注

The Idris2 repl does not support readline in the interest of keeping dependencies minimal. Unfortunately this precludes some niceties such as line editing, persistent history and completion. A useful work around is to install rlwrap, this utility provides all the aforementioned features simply by invoking the Idris2 repl as an argument to the utility rlwrap idris2

The REPL provides a number of commands, which we will describe shortly, which generate new program fragments based on the currently loaded module. These take the general form:

:command [line number] [name]

That is, each command acts on a specific source line, at a specific name, and outputs a new program fragment. Each command has an alternative form, which updates the source file in-place:

:command! [line number] [name]

It is also possible to invoke Idris in a mode which runs a REPL command, displays the result, then exits, using idris2 --client. For example:

$ idris2 --client ':t plus'
Prelude.plus : Nat -> Nat -> Nat
$ idris2 --client '2+2'
4

A text editor can take advantage of this, along with the editing commands, in order to provide interactive editing support.

Editing Commands

:addclause

The :addclause n f command, abbreviated :ac n f, creates a template definition for the function named f declared on line n. For example, if the code beginning on line 94 contains:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c

then :ac 94 vzipWith will give:

vzipWith f xs ys = ?vzipWith_rhs

The names are chosen according to hints which may be given by a programmer, and then made unique by the machine by adding a digit if necessary. Hints can be given as follows:

%name Vect xs, ys, zs, ws

This declares that any names generated for types in the Vect family should be chosen in the order xs, ys, zs, ws.

:casesplit

The :casesplit n c x command, abbreviated :cs n c x, splits the pattern variable x on line n at column c into the various pattern forms it may take, removing any cases which are impossible due to unification errors. For example, if the code beginning on line 94 is:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c
vzipWith f xs ys = ?vzipWith_rhs

then :cs 96 12 xs will give:

vzipWith f [] ys = ?vzipWith_rhs_1
vzipWith f (x :: xs) ys = ?vzipWith_rhs_2

That is, the pattern variable xs has been split into the two possible cases [] and x :: xs. Again, the names are chosen according to the same heuristic. If we update the file (using :cs!) then case split on ys on the same line, we get:

vzipWith f [] [] = ?vzipWith_rhs_3

That is, the pattern variable ys has been split into one case [], Idris having noticed that the other possible case y :: ys would lead to a unification error.

:addmissing

The :addmissing n f command, abbreviated :am n f, adds the clauses which are required to make the function f on line n cover all inputs. For example, if the code beginning on line 94 is:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = ?vzipWith_rhs_1

then :am 96 vzipWith gives:

vzipWith f (x :: xs) (y :: ys) = ?vzipWith_rhs_2

That is, it notices that there are no cases for empty vectors, generates the required clauses, and eliminates the clauses which would lead to unification errors.

:proofsearch

The :proofsearch n f command, abbreviated :ps n f, attempts to find a value for the hole f on line n by proof search, trying values of local variables, recursive calls and constructors of the required family. Optionally, it can take a list of hints, which are functions it can try applying to solve the hole. For example, if the code beginning on line 94 is:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = ?vzipWith_rhs_1
vzipWith f (x :: xs) (y :: ys) = ?vzipWith_rhs_2

then :ps 96 vzipWith_rhs_1 will give

[]

This works because it is searching for a Vect of length 0, of which the empty vector is the only possibility. Similarly, and perhaps surprisingly, there is only one possibility if we try to solve :ps 97 vzipWith_rhs_2:

f x y :: vzipWith f xs ys

This works because vzipWith has a precise enough type: The resulting vector has to be non-empty (a ::); the first element must have type c and the only way to get this is to apply f to x and y; finally, the tail of the vector can only be built recursively.

:makewith

The :makewith n f command, abbreviated :mw n f, adds a with to a pattern clause. For example, recall parity. If line 10 is:

parity (S k) = ?parity_rhs

then :mw 10 parity will give:

parity (S k) with (_)
  parity (S k) | with_pat = ?parity_rhs

If we then fill in the placeholder _ with parity k and case split on with_pat using :cs 11 with_pat we get the following patterns:

parity (S (plus n n)) | even = ?parity_rhs_1
parity (S (S (plus n n))) | odd = ?parity_rhs_2

Note that case splitting has normalised the patterns here (giving plus rather than +). In any case, we see that using interactive editing significantly simplifies the implementation of dependent pattern matching by showing a programmer exactly what the valid patterns are.

Interactive Editing in Vim

The editor mode for Vim provides syntax highlighting, indentation and interactive editing support using the commands described above. Interactive editing is achieved using the following editor commands, each of which update the buffer directly:

  • \a adds a template definition for the name declared on the

    current line (using :addclause).

  • \c case splits the variable at the cursor (using

    :casesplit).

  • \m adds the missing cases for the name at the cursor (using

    :addmissing).

  • \w adds a with clause (using :makewith).

  • \s invokes a proof search to solve the hole under the

    cursor (using :proofsearch).

There are also commands to invoke the type checker and evaluator:

  • \t displays the type of the (globally visible) name under the

    cursor. In the case of a hole, this displays the context and the expected type.

  • \e prompts for an expression to evaluate.

  • \r reloads and type checks the buffer.

Corresponding commands are also available in the Emacs mode. Support for other editors can be added in a relatively straightforward manner by using idris2 -–client. More sophisticated support can be added by using the IDE protocol (yet to be documented for Idris 2, but which mostly extends to protocol documented for Idris 1.

杂项

在本节中,我们将讨论各种附加功能:

  • 自动、隐式和默认参数;

  • 文学编程;和

  • 全域层级。

隐式参数

我们已经看到了隐式参数,它允许在类型检查器 1 在可以推断出参数时省略参数,例如。

index : forall a, n . Fin n -> Vect n a -> a
自动隐式参数

在其他情况下,可能不是通过类型检查而是通过在上下文中搜索适当的值或构造证明来推断参数。例如,下面 head 的定义需要证明列表是非空的:

isCons : List a -> Bool
isCons [] = False
isCons (x :: xs) = True

head : (xs : List a) -> (isCons xs = True) -> a
head (x :: xs) _ = x

如果列表静态已知为非空,或者因为它的值是已知的,或者因为上下文中已经存在证明,则可以自动构造证明。自动隐式参数允许这种情况静默发生。我们定义 head 如下:

head : (xs : List a) -> {auto p : isCons xs = True} -> a
head (x :: xs) = x

隐式参数上的 auto 注解意味着 Idris 将尝试通过搜索适当类型的值来填充隐式参数。事实上,在内部,这正是接口解析的工作方式。它将按顺序尝试以下操作:

  • 局部变量,即模式匹配或 let 绑定中的名称,具有完全正确的类型。

  • 所需类型的构造函数。如果它们有参数,它将递归搜索的最大深度为 100。

  • 具有函数类型的局部变量,递归搜索参数。

  • 任何具有适当返回类型且标有 %hint 注解的函数。

在没有找到证明的情况下,可以像往常一样明确提供:

head xs {p = ?headProof}
默认隐式参数

除了让 Idris 自动查找给定类型的值之外,有时我们还希望有一个具有特定默认值的隐式参数。在 Idris 中,我们可以使用 default 注解来做到这一点。虽然这主要是为了帮助自动构建 auto 失败或发现无用值的证明,但首先考虑不涉及证明的更简单的情况可能更容易。

如果我们想计算第 n 个斐波那契数(并将第 0 个斐波那契数定义为 0),我们可以这样写:

fibonacci : {default 0 lag : Nat} -> {default 1 lead : Nat} -> (n : Nat) -> Nat
fibonacci {lag} Z = lag
fibonacci {lag} {lead} (S n) = fibonacci {lag=lead} {lead=lag+lead} n

在这个定义之后, fibonacci 5 等价于 fibonacci {lag=0} {lead=1} 5 ,并且会返回第 5 个斐波那契数。请注意,虽然这有效,但这不是 default 注解的预期用途。此处仅用于说明目的。通常, default 用于提供自定义证明搜索脚本之类的东西。

文学编程

与 Haskell 一样,Idris 支持 文学 编程。如果一个文件的扩展名为 .lidr ,那么它被认为是一个 文学(literate) 文件。在文学编程中,所有内容都被假定为注释,除非该行以大于号 > 开头,例如:

> module literate

This is a comment. The main program is below

> main : IO ()
> main = putStrLn "Hello literate world!\n"

另一个限制是程序行(以 > 开头)和注释行(以任何其他字符开头)之间必须有一个空行。

累积性

警告

尚未在 IDRIS 2 中

由于值可以出现在类型中,然后 反之亦然 ,因此类型本身具有类型是很自然的。例如:

*universe> :t Nat
Nat : Type
*universe> :t Vect
Vect : Nat -> Type -> Type

但是 Type 的类型呢?如果我们问 Idris ,它会报告:

*universe> :t Type
Type : Type 1

如果 Type 是它自己的类型,那么它会因为 Girard 悖论 而导致不一致性,所以内部有一个 层级 类型(或 全域 ):

Type : Type 1 : Type 2 : Type 3 : ...

全域是 累积的 ,也就是说,如果 x : Type n 我们也可以拥有 x : Type m ,只要 n < m 。如果发现任何不一致,类型检查器会生成这样的全域约束并报告错误。通常,程序员不需要担心这一点,但它确实会防止(构造出)以下程序:

myid : (a : Type) -> a -> a
myid _ x = x

idid :  (a : Type) -> a -> a
idid = myid _ myid

myid 对自身的应用会导致 Universe 层次结构中的循环 - myid 的第一个参数是 Type ,如果应用它,那么其级别不能低于所要求的级别。

1

https://github.com/david-christiansen/idris-type-providers

延伸阅读

有关 Idris 编程和一般依赖类型编程的更多信息,可以从各种来源获得:

1

Edwin Brady and Kevin Hammond. 2012. Resource-Safe systems programming with embedded domain specific languages. In Proceedings of the 14th international conference on Practical Aspects of Declarative Languages (PADL’12), Claudio Russo and Neng-Fa Zhou (Eds.). Springer-Verlag, Berlin, Heidelberg, 242-257. DOI=10.1007/978-3-642-27694-1_18 https://dx.doi.org/10.1007/978-3-642-27694-1_18

2

Edwin C. Brady. 2011. IDRIS —: systems programming meets full dependent types. In Proceedings of the 5th ACM workshop on Programming languages meets program verification (PLPV ‘11). ACM, New York, NY, USA, 43-54. DOI=10.1145/1929529.1929536 https://doi.acm.org/10.1145/1929529.1929536

3

Edwin C. Brady and Kevin Hammond. 2010. Scrapping your inefficient engine: using partial evaluation to improve domain-specific language implementation. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (ICFP ‘10). ACM, New York, NY, USA, 297-308. DOI=10.1145/1863543.1863587 https://doi.acm.org/10.1145/1863543.1863587

常见问题解答

Idris 项目的目标是什么?

Idris 旨在使软件从业者可以使用与类型相关的高级编程技术。我们遵循的一个重要理念是,Idris 允许 软件开发人员表达其数据的不变量并证明程序的属性,但不会 要求 他们必须这样做。

此常见问题解答中的许多答案都证明了这一理念,我们在做出语言和库设计决策时始终牢记这一点。

Idris 主要是一个研究项目,由圣安德鲁斯大学的 Edwin Brady 领导,并受益于 SICSA (https://www.sicsa.ac.uk) 和 EPSRC (https://www.epsrc.ac.uk) /) 资助。这确实会影响一些设计选择和实现优先级,并且意味着有些事情没有我们想要的那么完美。尽管如此,我们仍在努力使其尽可能广泛地使用!

我在哪里可以找到库?有包管理器吗?

我们还没有包管理器,但您仍然可以在 wiki 上找到库的来源:https://github.com/idris-lang/Idris2/wiki/1-%5BLanguage%5D-Libraries

幸运的是,依赖关系目前并不复杂,但我们仍然希望包管理器提供帮助!目前还没有正式的,但有两个正在开发中:

Idris 2 可以使用自己进行编译吗?

是的,Idris 2 在 Idris 2 中实现。默认情况下,它以 Chez Scheme 为目标,因此您可以从生成的 Scheme 代码引导,如 入门 一节所述。

为什么 Idris 2 以 Scheme 为目标?动态类型的目标语言肯定会很慢吗?

您可能会对 Chez Scheme 的速度感到惊讶! Racket 作为替代目标,也表现良好。两者的性能都优于 Idris 1 后端,后者是用 C 语言编写的,但没有像 Chez 和 Racket 那样经过运行时系统专家数十年的工程努力。 Chez Scheme 还允许我们关闭运行时检查,我们也是这样做的。

作为性能改进的观察性证据,我们使用使用 Chez 运行时构建的编译器版本和使用引导 Idris 2 构建的相同版本,比较了 Idris 2 运行时与 Idris 1 运行时的性能。在戴尔 XPS 13运行 2020 年 5 月 23 日版本的 Ubuntu,性能为:

  • Idris 2(使用 Chez Scheme 运行时)在 93 秒内检查完自己的源码。

  • 引导 Idris 2(使用 Idris 1 编译)在 125 秒内检查完相同的源码。

  • Idris 1 在 768 秒内检查完引导 Idris 2 的源码(与上述相同,但由于语法更改而略有不同)。

不幸的是,我们不能用最新版本重复这个实验,因为引导 Idris 2 不再能够构建当前版本。

然而,这并不是一个长期的解决方案,即使它是一种非常方便的引导方式。

Idris 2 可以生成 Javascript 吗?那么可插拔代码生成器呢?

是的! JavaScript 代码生成器 是内置的,可以针对浏览器或 NodeJS。

与 Idris 1 一样,Idris 2 支持可插拔代码生成器 允许您为您选择的平台编写后端。

Idris 1 和 Idris 2 之间的主要区别是什么?

最重要的区别是 Idris 2 明确表示 擦除 类型,因此您可以在编译时看到哪些函数和数据类型参数被擦除,哪些将在运行时出现。您可以在 多重性 中查看更多详细信息。

Idris 2 具有明显更好的类型检查性能(甚至可能是数量级的差异!)并生成更好的代码。

此外,在 Idris 中实现,我们已经能够利用类型系统来消除一些重要的错误来源!

您可以在 自 Idris 1 以来的变化 部分中找到更多详细信息。

为什么库中没有更多的线性注解?

理论上,现在 Idris 2 基于定量类型理论(参见章节 多重性 ),我们可以在 Prelude 和 Base 库中编写更精确的类型,从而提供更精确的使用信息。但是,我们选择(暂时)不这样做。例如,考虑一下如果我们这样做会发生什么:

id : (1 _ : a) -> a
id x = x

这绝对是正确的,因为 x 只使用了一次。但是,我们也有:

map : (a -> b) -> List a -> List b

通常情况下,我们不能保证传递给 map 的函数在其参数中是线性的,因此我们不能再说 map id xs ,因为 id 的多重性和传递给 map 的函数的多重性不匹配。

最终,我们希望通过多重性多态来扩展核心语言,这将有助于解决这些问题。在那之前,我们认为线性是类型系统中的一个实验性新特性,因此我们遵循一般理念,即如果你不想使用线性,它的存在一定不会影响你编写程序的方式。

如何在 Idris2 REPL 中获取命令历史记录?

Idris2 REPL 不支持 readline 以保持最小的依赖关系。一个有用的解决方法是安装 rlwrap ,这个程序只需调用 Idris2 repl 作为程序 rlwrap idris2 的参数即可提供命令历史记录。

最终目标是使用 IDE 模式或 Idris API 作为独立于 Idris 2 核心开发的复杂 REPL 实现的基础。据我们所知,目前还没有人致力于此:如果您有兴趣,请联系我们,我们可以帮助您开始!

为什么 Idris 使用及早求值而不是惰性求值?

Idris 使用及早求值来获得更可预测的性能,特别是因为长期目标之一是能够编写高效且经过验证的低级代码,例如设备驱动程序和网络基础设施。此外,Idris 类型系统允许我们准确地声明每个值的类型,从而准确地声明每个值的运行时形式。在惰性语言中,考虑一个类型为 Int 的值:

thing : Int

thing 在运行时的表示形式是什么?它是表示整数的位模式,还是指向某些将计算整数的代码的指针?在 Idris 中,我们决定在类型中使这种区分更加精确:

thing_val : Int
thing_comp : Lazy Int

在这里,从类型中可以清楚地看出, thing_val 被保证是一个具体的 Int ,而 thing_comp 是一个将会产生一个 Int 的计算。

如何创建惰性控制结构?

您可以使用特殊的 Lazy 类型创建控制结构。例如,实现不依赖的 if...then...else... 的一种方法是通过名为 ifThenElse 的函数:

ifThenElse : Bool -> (t : Lazy a) -> (e : Lazy a) -> a
ifThenElse True  t e = t
ifThenElse False t e = e

teLazy a 类型表示只有在使用它们时才会对这些参数求值,也就是说,它们会被延迟求值。

顺便说一句:我们实际上并没有在 Idris 2 中以这种方式实现 if...then...else...” 相反,它被转换为允许依赖 ifcase 表达式。

REPL 的求值并不像我预期的那样。这是怎么回事?

作为一种完全依赖类型的语言,Idris 有两个阶段来对事物求值,编译时和运行时。在编译时,它只会求值它知道的全部内容(即终止并覆盖所有可能的输入),以保持类型检查的可判定性。编译时求值器是 Idris 内核的一部分,在 Idris 中作为解释器实现。由于这里的所有内容都具有范式,因此求值策略实际上并不重要,因为无论哪种方式都会得到相同的答案!在实践中,它使用按名称调用,因为这避免了类型检查不需要的子表达式求值。

为方便起见,REPL 使用了编译时的求值概念。除了更容易实现(因为我们有可用的求值器),这对于显示被求值项在类型检查器中如何求值非常有用。所以你可以看到两者之间的区别:

Main> \n, m => S n + m
\n, m => S (plus n m)

Main> \n, m => n + S m
\n, m => plus n (S m)

如果你想在 REPL 编译和执行一个表达式,你可以使用 :exec 命令。在这种情况下,表达式必须具有类型 IO aa 可以是任何类型,尽管它不会打印结果)。

为什么我不能使用类型中没有参数的函数?

如果您在以小写字母开头且不应用于任何参数的类型中使用名称,则 Idris 会将其视为隐式绑定参数。例如:

append : Vect n ty -> Vect m ty -> Vect (n + m) ty

在这里, nmty 是隐式绑定的。即使在其他地方定义了具有任何这些名称的函数,此规则也适用。例如,您可能还拥有:

ty : Type
ty = String

即使在这种情况下, ty 仍然被认为是隐式绑定在 append 的定义中,而不是使 append 的类型等价于…

append : Vect n String -> Vect m String -> Vect (n + m) String

…这可能不是预期的!这条规则的原因是,只看 append 的类型,而不是其他上下文,就可以清楚地知道隐式绑定的名称是什么。

如果您想在类型中使用未应用的名称,您有三个选项。您可以明确限定它,例如,如果在命名空间 Main 中定义了 ty ,则可以执行以下操作:

append : Vect n Main.ty -> Vect m Main.ty -> Vect (n + m) Main.ty

或者,您可以使用不以小写字母开头的名称,它永远不会被隐式绑定:

Ty : Type
Ty = String

append : Vect n Ty -> Vect m Ty -> Vect (n + m) Ty

按照惯例,如果一个名称打算用作类型同义词,最好以大写字母开头以避免这种限制。

最后,您可以使用指令关闭隐式的自动绑定:

%auto_implicits off

在这种情况下,您可以将 nm 绑定为隐式,但不能将 ty 绑定,如下所示:

append : forall n, m . Vect n ty -> Vect m ty -> Vect (n + m) ty

为什么 FunctorApplicativeMonad 等接口不包含定律?

从表面上看,这听起来是个好主意,因为类型系统允许我们指定规律。不过,我们不会在 prelude 中这样做,主要有两个原因:

  • 它违背了 Idris 允许 程序员证明其程序的属性,但不 要求 它的哲学(在上面)。

  • 在 Idris 系统内,有效、合法的实现不一定是可证明合法的,尤其是在涉及更高阶功能的情况下。

Control.Algebra 中有经过验证的接口版本,它们扩展了带有定律的接口。

我有一个明显终止的程序,但 Idris 说它可能不是完全函数。这是为什么?

由于 停机问题 的不确定性,Idris 通常无法确定程序是否终止。但是,可以识别某些肯定终止的程序。 Idris 使用 “大小更改终止” 来执行此操作,它查找从函数返回到自身的递归路径。在这样的路径上,必须至少有一个参数收敛到基本情况。

  • 支持相互递归函数

  • 不过,递归路径上的所有函数必须被完整地应用。此外,Idris 不支持高阶应用。

  • Idris 通过查找对语法上较小的输入参数的递归调用来识别收敛到基本情况的参数。例如 k 在语法上小于 S (S k) 因为 kS (S k) 的子项,但 (k, k) 在语法不小于 (S k, S k)

如果你有一个你认为要终止的函数,但 Idris 不这么认为,你可以重新组织程序,或者使用 assert_total 函数。

Idris 有全域多态吗? Type 的类型是什么?

Idris 2 当前实现了 Type : Type 。别担心,这不会永远如此!对于 Idris 1,FAQ 对这个问题的回答如下:

Idris 不是全域多态,而是全域的累积层层级。 Type : Type 1, Type 1 : Type 2 等等。累积性意味着如果 x : Type n 并且 n <= m ,那么 x : Type m `` 。全域级别总是由 Idris 推断,不能明确指定。 REPL 命令 ``:type Type 1 将导致错误,尝试指定任何类型的全域级别也会导致错误。

“Idris”这个名字是什么意思?

到了一定年龄的英国人可能对这条 会唱歌的龙 比较熟悉。如果这没有帮助,也许你可以发明一个合适的首字母缩略词:-)。

在哪里可以找到 Idris 社区的社区标准?

Idris 社区标准在 这里 声明

编译为可执行文件

备注

Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/

Idris 2(语言)被设计为不依赖于任何特定的代码生成器。不过,由于编写程序的重点是能够运行它,所以知道如何运行是很重要的,默认情况下,Idris通过 Chez Scheme 编译为可执行文件。

你可以在 REPL 中按如下方式编译到可执行文件:

Main> :c execname expr

…其中 execname 是要生成的可执行文件的名称, expr 是将被执行的 Idris 表达式。 expr 必须拥有 IO () 的类型。这将产生一个可执行文件 execname ,在相对于当前工作目录的 build/exec 目录下。

你也可以直接执行表达式:

Main> :exec expr

同样, expr 也必须要有类型 IO ()

最后,你可以通过添加 -o <output file> 选项从命令行编译为可执行文件:

$ idris2 hello.idr -o hello

将编译表达式 Main.main ,在 build/exec 目录下生成一个可执行的 hello (根据代码生成器的不同,可能会有一个文件扩展名)。

默认情况下,Idris 2 是一个完整的程序编译器 - 也就是说,它找到所有必要的函数定义,并在你构建可执行文件时才编译它们。这提供了大量的优化机会,但对于重新构建来说可能会很慢。然而,如果后端支持的话,你可以 增量 构建模块和可执行文件:

Incremental Code Generation

By default, Idris 2 is a whole program compiler - that is, it finds all the necessary function definitions and compiles them only when you build an executable. This gives plenty of optimisation opportunities, but can also be slow for rebuilding. However, if the backend supports it, you can build modules and executables incrementally. To do so, you can either:

  1. Set the --inc <backend> flag at the command line, for each backend you want to use incrementally.

  2. Set the IDRIS2_INC_CGS environment variable with a comma separated list of backends to use incrementally.

At the moment, only the Chez backend supports incremental builds.

Building modules incrementally

If either of the above are set, building a module will produce compiled binary code for all of the definitions in the module, as well as the usual checked TTC file. e.g.:

$ idris2 --inc chez Foo.idr
$ IDRIS2_INC_CGS=chez idris2 Foo.idr

On successful type checking, each of these will produce a Chez Scheme file (Foo.ss) and compiled code for it (Foo.so) as well as the usual Foo.ttc, in the same build directory as Foo.ttc.

In incremental mode, you will see a warning for any holes in the module, even if those holes will be defined in a different module.

Building executables incrementally

If either --inc is used or IDRIS2_INC_CGS is set, compiling to an executable will attempt to link all of the compiled modules together, rather than generating code for all of the functions at once. For this to work, all the imported modules must have been built with incremental compilation for the current back end (Idris will revert to whole program compilation if any are missing, and you will see a warning.)

Therefore, all packages used by the executable must also have been built incrementally for the current back end. The prelude, base, contrib, network and test packages are all built with incremental compilation support for Chez by default.

When switching between incremental and whole program compilation, it is recommended that you remove the build directory first. This is particularly important when switching to incremental compilation, since there may be stale object files that Idris does not currently detect!

Overriding incremental compilation

The --whole-program flag overrides any incremental compilation settings when building an executable.

Performance note

Incremental compilation means that executables are generated much quicker, especially when only a small proportion of modules have changed. However, it means that there are fewer optimisation opportunities, so the resulting executable will not perform as well. For deployment, --whole-program compilation is recommended.

如果后端支持,你可以通过设置 profile 标志来生成配置数据,或者用 --profile 启动 Idris,或者在 REPL 运行 :set profile 。生成的配置数据将取决于你所使用的后端。目前, Chez 和 Racket 后端支持生成配置数据。

Idris 2 中提供了五个代码生成器,并且有一个系统可以为各种目标语言插入新的代码生成器。默认是通过 Chez Scheme 编译,还有一个选择是通过 Racket 或 Gambit 编译。你可以在REPL中用 :set codegen 命令设置代码生成器,或者通过 IDRIS2_CG 环境变量进行设置。

Chez Scheme 代码生成器

Chez Scheme 代码生成器是默认的,或者可以通过 REPL 命令访问:

Main> :set cg chez

因此,默认情况下,要运行 Idris 程序,您需要安装 Chez Scheme 。 Chez Scheme 是开源的,可通过大多数操作系统包管理器获得。

您可以在 REPL 中将类型为 IO () 的表达式 expr 编译为可执行文件,如下所示:

Main> :c execname expr

…其中 execname 是可执行文件的名称。这将生成以下内容:

  • 调用程序的 shell 脚本 build/exec/execname

  • 子目录 build/exec/execname_app 中包含运行程序所需的所有数据。这包括 Chez Scheme 源代码( execname.ss ),已编译的 Chez Scheme 代码( execname.so )和外部函数定义所需的任何共享库。

可执行的 execname 可以重新定位到任何子目录,前提是 execname_app 也在同一个子目录中。

你也可以直接执行表达式:

Main> :exec expr

同样, expr 必须具有 IO () 类型。这将在 build/exec 目录中生成一个临时可执行脚本 _tmpchez ,并执行它。

Chez Scheme 是默认的代码生成器,因此如果您使用 -o execname 标志调用 idris2 ,它将生成一个可执行脚本 build/exec/execname ,和支持文件 build/exec/execname_app

Chez 指令

  • --directive extraRuntime=<path>

    将来自 <path> 的 Scheme 源代码直接嵌入到生成的输出中。可以多次指定,在这种情况下,所有给定的文件都将按指定的顺序包含。

    ; extensions.scm
    (define (my-mul a b)
      (* a b))
    
    -- Main.idr
    %foreign "scheme:my-mul"
    myMul : Int -> Int -> Int
    
    $ idris2 --codegen chez --directive extraRuntime=/path/to/extensions.scm -o main Main.idr
    

构建独立的可执行文件

可以使用 chez-exe 将 Chez Scheme 系统和内置的 Idris2 程序嵌入到独立的可执行文件中。

  • 通过运行配置脚本构建并安装 compile-chez-program-tool ,然后执行:

    $ scheme --script gen-config.ss --bootpath <bootpath>
    

    其中 <bootpath 是 Chez Scheme 引导文件( petite.bootscheme.boot )和 scheme.h 所在的路径。更多配置在 chez-exe 安装说明中描述。

  • 调用 compile-chez-program

    $ compile-chez-program --optimize-level 3 build/exec/my_idris_prog_app/my_idris_prog.ss
    

    请注意,它只能使用 .ss 文件而不是 .so 文件。要嵌入包括编译器在内的完整 Chez Scheme 系统,请添加 --full-chez 选项。

  • 完成的可执行文件仍然需要 libidris_support 共享库。也可以通过静态链接来消除这种依赖关系。

Racket 代码生成器

Racket 代码生成器通过 REPL 命令访问:

Main> :set cg racket

或者,您可以通过 IDRIS2_CG 环境变量进行设置:

$ export IDRIS2_CG=racket

您可以在 REPL 中将类型为 IO () 的表达式 expr 编译为可执行文件,如下所示:

Main> :c execname expr

…其中 execname 是可执行文件的名称。这将生成以下内容:

  • 调用程序的 shell 脚本 build/exec/execname

  • 一个子目录 build/exec/execname_app 中包含运行程序所需的所有数据。这包括 Racket 源代码( execname.rkt )、已编译的 Racket 代码(Windows 上的 execnameexecname.exe )以及外部函数定义所需的任何共享库。

可执行的 execname 可以重新定位到任何子目录,前提是 execname_app 也在同一个子目录中。

你也可以直接执行表达式:

Main> :exec expr

同样, expr 必须具有 IO () 类型。这将在 build/exec 目录中生成一个临时可执行脚本 _tmpracket ,并执行该脚本,而无需先编译为二进制文件(因此会解释生成的 Racket 代码)。

Racket 指令

  • --directive extraRuntime=<path>

    将来自 <path> 的 Scheme 源代码直接嵌入到生成的输出中。可以多次指定,在这种情况下,所有给定的文件都将按指定的顺序包含。

    ; extensions.scm
    (define (my-mul a b)
      (* a b))
    
    -- Main.idr
    %foreign "scheme:my-mul"
    myMul : Int -> Int -> Int
    
    $ idris2 --codegen chez --directive extraRuntime=/path/to/extensions.scm -o main Main.idr
    

Gambit Scheme 代码生成器

可以通过 REPL 命令访问 Gambit Scheme 代码生成器:

Main> :set cg gambit

或者,您可以通过 IDRIS2_CG 环境变量进行设置:

$ export IDRIS2_CG=gambit

要使用此生成器运行 Idris 程序,您需要安装 Gambit Scheme 。 Gambit Scheme 是免费软件,可通过大多数包管理器获得。

您可以在 REPL 中将类型为 IO () 的表达式 expr 编译为可执行文件,如下所示:

Main> :c execname expr

…其中 execname 是可执行文件的名称。这将生成以下内容:

  • 程序的可执行二进制文件为 build/exec/execname

  • 一个 Gambit Scheme 源文件 build/exec/execname.scm ,并从中生成二进制文件。

你也可以直接执行表达式:

Main> :exec expr

同样, expr 必须具有 IO () 类型。这将生成一个临时 Scheme 文件,并在其上执行 Gambit 解释器。

Gambit 指令

  • --directive extraRuntime=<path>

    将来自 <path> 的 Scheme 源代码直接嵌入到生成的输出中。可以多次指定,在这种情况下,所有给定的文件都将按指定的顺序包含。

    ; extensions.scm
    (define (my-mul a b)
      (* a b))
    
    -- Main.idr
    %foreign "scheme:my-mul"
    myMul : Int -> Int -> Int
    
    $ idris2 --codegen chez --directive extraRuntime=/path/to/extensions.scm -o main Main.idr
    
  • --directive C

    编译为 C。

Gambit 环境变量配置

  • GAMBIT_GSC_BACKEND

    GAMBIT_GSC_BACKEND 变量控制在编译期间 Gambit 将使用哪个 C 编译器。例如。使用 clang :

    $ export GAMBIT_GSC_BACKEND=clang
    

    v4.9.3 之后的 Gambit 支持 -cc 选项,它配置编译器后端 Gambit 将用于构建二进制文件。目前要获得此功能 Gambit 需要从源代码构建,因为它尚未在发布版本中可用。

Javascript 和 Node 代码生成器

有两个 javascript 代码生成器, nodejavascript 。两者之间有两个区别: javascript 代码生成器在被调用时,如果输出是一个一个HTML文件,会同时生成一个基本的HTML文件,生成的代码在 <script> 标签内;另一个区别是在 ffi 上,将在下面解释。

Javascript FFI 说明符

有三种主要的 javascript ffi 说明符 javascript, nodebrowserjavascript 表示在node 和浏览器上均可用, node 仅在 node 上可用, browser 仅在浏览器上可用。

对于 node 来说,有两种方法来定义一个外部函数:

%foreign "node:lambda: n => process.env[n]"
prim_getEnv : String -> PrimIO (Ptr String)

这里的 lambda 表示我们将定义作为一个 lambda 表达式进行提供。

%foreign "node:lambda:fp=>require('fs').fstatSync(fp.fd, {bigint: false}).size"
prim__fileSize : FilePtr -> PrimIO Int

require 可以用来导入 javascript 模块。

下面是一个完整示例,只有在 browser 的 codegen 是外部函数才可用:

%foreign "browser:lambda:x=>{document.body.innerHTML = x}"
prim__setBodyInnerHTML : String -> PrimIO ()
简短示例

一个有趣的例子是为 setTimeout 函数创建一个外部函数:

%foreign "javascript:lambda:(callback, delay)=>setTimeout(callback, delay)"
prim__setTimeout : (PrimIO ()) -> Int -> PrimIO ()

setTimeout : HasIO io => IO () -> Int -> io ()
setTimeout callback delay = primIO $ prim__setTimeout (toPrim callback) delay

注意:以前版本 的javascript 后端将 Int 视为一个64位有符号的整数,在 javascript 领域由 BigInt 表示。现在情况不是这样了。 Int 现在被视为一个32位有符号的整数,由 Number 表示。这应该有利于 Idris2 和后端之间的互操作。

但是,除非您有充分的理由这样做,否则请考虑使用其他固定精度整数类型之一。它们应该在所有后端上都具有相同的行为。所有精度高达 32 位的有符号和无符号整型( Int8, Int16, Int32, Bits8, Bits16, 和 Bits32 )都由 Number 表示,而 Int64Bits64IntegerBigInt 表示。因此,可以通过使用 Int32 代替 Int 来改进上面的示例:

%foreign "javascript:lambda:(callback, delay)=>setTimeout(callback, delay)"
prim__setTimeout : (PrimIO ()) -> Int32 -> PrimIO ()

setTimeout : HasIO io => IO () -> Int32 -> io ()
setTimeout callback delay = primIO $ prim__setTimeout (toPrim callback) delay

浏览器示例

要构建能在浏览器中使用的JavaScript,必须使用 javascript codegen 选项编译代码。编译器生成 JavaScript 或 HTML 文件。浏览器需要一个 HTML 文件才能加载。此HTML文件可以通过两种方式创建

  • 如果输出文件中包含 .html 后缀,编译器就会生成一个 HTML 文件,其中包括对已生成的 JavaScript 的包装。

  • 如果 没有 给出 .html 后缀,生成的文件只包含JavaScript代码。在这种情况下,需要手动包装。

包装到 HTML 的示例:

<html>
 <head><meta charset='utf-8'></head>
 <body>
  <script type='text/javascript'>
  JS code goes here
  </script>
 </body>
</html>

由于我们的目的是开发在浏览器中运行的东西,自然会产生一些问题:

  • 如何与 HTML 元素交互?

  • 更重要的是,生成的 Idris 代码会在什么时候开始执行?

Idris 生成代码的起点

为你的程序生成的 JavaScript 包含一个入口点。 main 函数被编译成一个 JavaScript 顶层表达式,它将在加载 script 标签时被求值,这就是Idris生成的程序在浏览器中开始的入口点。

与HTML元素的交互

正如简短示例部分所描述的,当 Idris 生成的代码和浏览器/JS生态系统的其他部分发生交互时,必须使用 FFI 。由 FFI 处理的信息被分成两类。第一是Idris FFI 的原语类型,如 Int 。第二类是除原语类型之外所有的。第二类是通过 AnyPtr 访问的。 %foreign 结构应该被用来在 JavaScript 方面给出实现。还有一个 Idris 函数声明,在 Idris 方面给出 Type 声明。语法是 %foreign "browser:lambda:js-lambda-expression" 。在 Idris 方面,当定义 %foreign 时,原语类型和 PrimIO t 类型应该作为参数。这个声明是一个辅助函数,需要在 primIO 函数后面被调用。关于这一点的更多信息可以在 FFI 章节中找到。

JavaScript FFI 示例
console.log
%foreign "browser:lambda: x => console.log(x)"
prim__consoleLog : String -> PrimIO ()

consoleLog : HasIO io => String -> io ()
consoleLog x = primIO $ prim__consoleLog x

在 Idris 中,字符串是一个原语类型,它被表示为一个 JavaScript 字符串。在 Idris 和 JavaScript 之间没有必要进行任何转换。

setInterval
%foreign "browser:lambda: (a,i)=>setInterval(a,i)"
prim__setInterval : PrimIO () -> Int32 -> PrimIO ()

setInterval : (HasIO io) => IO () -> Int32 -> io ()
setInterval a i = primIO $ prim__setInterval (toPrim a) i

JavaScript 中的 setInterval 函数在每 x 毫秒执行给定的函数。我们可以在回调中使用 Idris 生成的函数,只要它们的类型是 IO ()

HTML Dom 元素

让我们把注意力转移到 Dom 元素和事件上。如上所述,任何不是原语类型的东西都应该通过FFI中的 AnyPtr 类型来处理。任何由 JavaScript 函数返回的复杂的东西都应该在 AnyPtr 值中捕获。建议将 AnyPtr 值分成几类。

data DomNode = MkNode AnyPtr

%foreign "browser:lambda: () => document.body"
prim__body : () -> PrimIO AnyPtr

body : HasIO io => io DomNode
body = map MkNode $ primIO $ prim__body ()

我们创建了一个 DomNode 类型,它持有一个 AnyPtrprim__body 函数包装了一个没有参数的 lambda 函数。在 Idris 函数 body 中,我们传递一个额外的 () 参数,我们使用 MkNode 数据构造器将结果包裹在 DomNode 类型中。

JavaScript 返回的原语类型值

作为前面例子的延续,DOM元素的 width 属性可以通过FFI检索。

%foreign "browser:lambda: n=>(n.width)"
prim__width : AnyPtr -> PrimIO Bits32

width : HasIO io => DomNode -> io Bits32
width (MkNode p) = primIO $ prim__width p
处理 JavaScript 事件
data DomEvent = MkEvent AnyPtr

%foreign "browser:lambda: (event, callback, node) => node.addEventListener(event, x=>callback(x)())"
prim__addEventListener : String -> (AnyPtr -> PrimIO ()) -> AnyPtr -> PrimIO ()

addEventListener : HasIO io => String -> DomNode -> (DomEvent -> IO ()) -> io ()
addEventListener event (MkNode n) callback =
  primIO $ prim__addEventListener event (\ptr => toPrim $ callback $ MkEvent ptr) n

在这个例子中显示了如何将一个事件处理程序附加到一个特定的 DOM 元素。在Idris 方面事件的值也是 AnyPtr 类型。为了分离 DomNodeDomEvent 我们创建了两个不同的类型。它还演示了在 Idris 中定义的一个简单的回调函数如何在 JavaScript 侧使用。

指令

javascript 代码生成器接受三种不同的指令,即生成的代码应该有多紧凑和多晦涩。下面的例子显示了为 putStr 函数生成的代码,这三个指令分别来自 prelude 。(--cg node 被在下面的例子使用,但在生成代码在浏览器中运行时, --cg javascript 的行为是一样的)。

使用 idris2 --cg node --directive pretty (默认情况下,如果没有给出指令),一个基本的美观打印器被用来生成正确缩进的 javascript 代码。

function Prelude_IO_putStr($0, $1) {
 return $0.a2(undefined)($7 => Prelude_IO_prim__putStr($1, $7));
}

使用 idris2 --cg node --directive compact ,每一个顶层函数都在一行中声明,不需要的空格都会被删除:

function Prelude_IO_putStr($0,$1){return $0.a2(undefined)($7=>Prelude_IO_prim__putStr($1,$7));}

最后,通过 idris2 --cg node --directive minimal ,顶层函数名称(除了少数例外,如静态序言『static preamble』中的函数)会被混淆,以减少生成的javascript文件的大小:

function $R3a($0,$1){return $0.a2(undefined)($7=>$R3b($1,$7));}

C with Reference Counting

There is an experimental code generator which compiles to an executable via C, using a reference counting garbage collector. This is intended as a lightweight (i.e. minimal dependencies) code generator that can be ported to multiple platforms, especially those with memory constraints.

Performance is not as good as the Scheme based code generators, partly because the reference counting has not yet had any optimisation, and partly because of the limitations of C. However, the main goal is portability: the generated code should run on any platform that supports a C compiler.

This code generator can be accessed via the REPL command:

Main> :set cg refc

或者,您可以通过 IDRIS2_CG 环境变量进行设置:

$ export IDRIS2_CG=refc

The C compiler it invokes is determined by either the IDRIS2_CC or CC environment variables. If neither is set, it uses cc.

This code generator does not yet support :exec, just :c.

Also note that, if you link with any dynamic libraries for interfacing with C, you will need to arrange for them to be accessible via LD_LIBRARY_PATH when running the executable. The default Idris 2 support libraries are statically linked.

Extending RefC

RefC can be extended to produce a new backend for languages that support C foreign functions. For example, a Python backend for Idris.

In your backend, use the Compiler.RefC functions generateCSourceFile, compileCObjectFile {asLibrary = True}, and compileCFile {asShared = True} to generate a .so shared object file.

_ <- generateCSourceFile defs cSourceFile
_ <- compileCObjectFile {asLibrary = True} cSourceFile cObjectFile
_ <- compileCFile {asShared = True} cObjectFile cSharedObjectFile

To run a compiled Idris program, call the int main(int argc, char *argv[]) function in the compiled .so file, with the arguments you wish to pass to the running program.

For example, in Python:

import ctypes
import sys

argc = len(sys.argv)
argv = (ctypes.c_char_p * argc)(*map(str.encode, sys.argv))

cdll = ctypes.CDLL("main.so")
cdll.main(argc, argv)
Extending RefC FFIs

To make the generated C code recognize additional FFI languages beyond the standard RefC FFIs, pass the additionalFFILangs option to generateCSourceFile, with a list of the language identifiers your backend recognizes.

_ <- generateCSourceFile {additionalFFILangs = ["python"]} defs cSourceFile

This will generate stub FFI function pointers in the generated C file, which your backend should set to the appropriate C functions before main is called.

Each %foreign "lang: foreignFuncName, opts" definition for a function will produce a stub, of the appropriate function pointer type. This stub will be called cName $ NS (mkNamespace lang) funcName, where funcName is the fully qualified Idris name of that function.

So the %foreign function

%foreign "python: abs"
abs : Int -> Int

produces a stub python_Main_abs, which can be backpatched in Python by:

abs_ptr = ctypes.CFUNCTYPE(ctypes.c_int64, ctypes.c_int64)(abs)
ctypes.c_void_p.in_dll(cdll, "python_Main_abs").value = ctypes.cast(abs_ptr, ctypes.c_void_p).value

使用新后端构建 Idris 2

The way to extend Idris 2 with new backends is to use it as a library. The module Idris.Driver exports the function mainWithCodegens, that takes a list of (String, Codegen), starting idris with these codegens in addition to the built-in ones. The first codegen in the list will be set as the default codegen.

入门

要将 Idris 2 用作库,您需要自托管安装,然后安装 idris2api 库(位于 Idris2 存储库的顶层)

make install-api

接下来创建一个文件,包含以下内容

module Main

import Core.Context
import Compiler.Common
import Idris.Driver
import Idris.Syntax

compile :
  Ref Ctxt Defs -> Ref Syn SyntaxInfo ->
  (tmpDir : String) -> (execDir : String) ->
  ClosedTerm -> (outfile : String) -> Core (Maybe String)
compile syn defs tmp dir term file
  = do coreLift $ putStrLn "I'd rather not."
       pure Nothing

execute :
  Ref Ctxt Defs -> Ref Syn SyntaxInfo ->
  (execDir : String) -> ClosedTerm -> Core ()
execute defs syn dir term = do coreLift $ putStrLn "Maybe in an hour."

lazyCodegen : Codegen
lazyCodegen = MkCG compile execute Nothing Nothing

main : IO ()
main = mainWithCodegens [("lazy", lazyCodegen)]

构建它

$ idris2 -p idris2 -p contrib -p network Lazy.idr -o lazy-idris2

现在您有了一个带有附加后端的 idris2 。

$ ./build/exec/lazy-idris2
     ____    __     _         ___
    /  _/___/ /____(_)____   |__ \
    / // __  / ___/ / ___/   __/ /     Version 0.2.0-bd9498c00
  _/ // /_/ / /  / (__  )   / __/      https://www.idris-lang.org
 /___/\__,_/_/  /_/____/   /____/      Type :? for help

Welcome to Idris 2.  Enjoy yourself!
With codegen for: lazy
Main>

不过,它不会过分急于用新的后端实际编译任何代码

$ ./build/exec/lazy-idris2 --cg lazy Hello.idr -o hello
I'd rather not.
$
关于目录

代码生成器可以假设 tmpDiroutputDir 都存在。 tmpDir 用于临时文件,而 outputDir 是放置所需输出文件的位置。默认情况下, tmpDiroutputDir 指向同一个目录( build/exec )。可以从包描述(参见 部分)或通过命令行选项(在 idris2 --help 中列出)设置目录。

Custom backend cookbook

This document addresses the details on how to implement a custom code generation backend for the Idris compiler.

This part has no insights about how to implement the dependently typed bits. For that part of the compiler Edwin Brady gave lectures at SPLV20 which are available online.

The architecture of the Idris2 compiler makes it easy to implement a custom code generation back-end.

The way to extend Idris with new back-ends is to use it as a library. The module Idris.Driver exports the function mainWithCodegens, that takes a list of (String, Codegen), starting idris with these codegens in addition to the built-in ones. The first codegen in the list will be set as the default codegen.

Anyone who is interested in implementing a custom back-end needs to answer the following questions:

  • Which Intermediate Representation (IR) should be consumed by the custom back-end?

  • How to represent primitive values defined by the Core.TT.Constant type?

  • How to represent Algebraic Data Types?

  • How to implement special values?

  • How to implement primitive operations?

  • How to compile IR expressions?

  • How to compile Definitions?

  • How to implement Foreign Function Interface?

  • How to compile modules?

  • How to embed code snippets?

  • What should the runtime system support?

First of all, we should know that Idris2 is not an optimizing compiler. Currently its focus is only to compile dependently typed functional code in a timely manner. Its main purpose is to check if the given program is correct in a dependently typed setting and generate code in form of a lambda-calculus like IR where higher-order functions are present. Idris has 3 intermediate representations for code generation. At every level we get a simpler representation, closer to machine code, but it should be stressed that all the aggressive code optimizations should happen in the custom back-ends. The quality and readability of the generated back-end code is on the shoulders of the implementor of the back-end. Idris erases type information, in the IRs as it compiles to scheme by default, and there is no need to keep the type information around. With this in mind let’s answer the questions above.

The architecture of an Idris back-end

Idris compiles its dependently typed front-end language into a representation which is called Compile.TT.Term . This data type has a few constructors and it represents a dependently typed term. This Term is transformed to Core.CompileExpr.CExp which has more constructors than Term and it is a very similar construct to a lambda calculus with let bindings, structured and tagged data representation, primitive operations, external operations, and case expressions. The CExp is closer in the compiling process to code generation.

The custom code generation back-end gets a context of definitions, a template directory and an output directory, a Core.TT.ClosedTerm to compile and a path to an output file.

compile : Ref Ctxt Defs -> (tmpDir : String) -> (outputDir : String)
        -> ClosedTerm -> (outfile : String) -> Core (Maybe String)
compile defs tmpDir outputDir term file = ?

The ClosedTerm is a special Term where the list of the unbound variables is empty. This technicality is not important for the code generation of the custom back-end as the back-end needs to call the getCompileData function which produces the Compiler.Common.CompileData record.

The CompileData contains:

  • A main expression that will be the entry point for the program in CExp

  • A list of Core.CompileExpr.NamedDef

  • A list of lambda-lifted definitions Compiler.LambdaLift.LiftedDef

  • A list of Compiler.ANF.ANFDef

  • A list of Compiler.VMCode.VMDef definitions

These lists contain:

  • 函数

  • Top-level data definitions

  • Runtime crashes which represent unfilled holes, explicit calls by the user to idris_crash, and unreachable branches in case trees

  • Foreign call constructs

The job of the custom code generation back-end is to transform one of the phase encoded definitions (NamedDef, LiftedDef, CExp, ANF, or VM) into the intermediate representation of the code generator. It can then run optimizations and generate some form of executable. In summary, the code generator has to understand how to represent tagged data and function applications (even if the function application is partial), how to handle let expressions, how to implement and invoke primitive operations, how to handle Erased arguments, and how to do runtime crashes.

The implementor of the custom back-end should pick the closest Idris IR which fits to the abstraction of the technology that is aimed to compile to. The implementor should also consider how to transform the simple main expression which is represented in CExp. As Idris does not focus on memory management and threading. The custom back-end should model these concepts for the program that is compiled. One possible approach is to target a fairly high level language and reuse as much as possible from it for the custom back-end. Another possibility is to implement a runtime that is capable of handling memory management and threading.

Which Intermediate Representation (IR) should be consumed by the custom back-end?

Now lets turn our attention to the different intermediate representations (IRs) that Idris provides. When the getCompiledData function is invoked with the Phase parameter it will produce a CompileData record, which will contain lists of top-level definitions that needs to be compiled. These are:

  • NamedDef

  • LiftedDef

  • ANFDef

  • VMDef

The question to answer here is: Which one should be picked? Which one fits to the custom back-end?

How to represent primitive values defined by the Core.TT.Constant type?

After one selects the IR to be used during code generation, the next question to answer is how primitive types should be represented in the back-end. Idris has the following primitive types:

  • Int

  • Integer (arbitrary precision)

  • Bits(8/16/32/64)

  • Char

  • String

  • Double

  • WorldVal (token for IO computations)

And as Idris allows pattern matching on types all the primitive types have their primitive counterpart for describing a type:

  • IntType

  • IntegerType

  • Bits(8/16/32/64)Type

  • StringType

  • CharType

  • DoubleType

  • WorldType

The representation of these primitive types should be a well-thought out design decision as it affects many parts of the code generation, such as conversion from the back-end values when FFI is involved, big part of the data during the runtime is represented in these forms. Representation of primitive types affect the possible optimisation techniques, and they also affect the memory management and garbage collection.

There are two special primitive types: String and World.

String

As its name suggest this type represent a string of characters. As mentioned in Primitive FFI Types, Strings are encoded in UTF-8.

It is not always clear who is responsible for freeing up a String created by a component other than the Idris runtime. Strings created in Idris will always have value, unlike possible String representation of the host technology, where for example NULL pointer can be a value, which can not happen on the Idris side. This creates constraints on the possible representations of the Strings in the custom back-end and diverging from the Idris representation is not a good idea. The best approach here is to build a conversion layer between the string representation of the custom back-end and the runtime.

World

In pure functional programming, causality needs to be represented whenever we want to maintain the order in which subexpressions are executed. In Idris a token is used to chain IO function calls. This is an abstract notion about the state of the world. For example this information could be the information that the runtime needs for bookkeeping of the running program.

The WorldVal value in Idris programs is accessed via the primIO construction which leads us to the PrimIO module. Let’s see the relevant snippets:

data IORes : Type -> Type where
     MkIORes : (result : a) -> (1 x : %World) -> IORes a

fromPrim : (1 fn : (1 x : %World) -> IORes a) -> IO a
fromPrim op = MkIO op

primIO : HasIO io => (1 fn : (1 x : %World) -> IORes a) -> io a
primIO op = liftIO (fromPrim op)

The world value is referenced as %World in Idris. It is created by the runtime when the program starts. Its content is changed by the custom runtime. More precisely, the World is created when the WorldVal is evaluated during the execution of the program. This can happen when the program gets initialized or when an unsafePerformIO function is executed.

How to represent Algebraic Data Types?

In Idris there are two different ways to define a data type: tagged unions are introduced using the data keyword while structs are declared via the record keyword. Declaring a record amounts to defining a named collection of fields. Let’s see examples for both:

data Either a b
  = Left  a
  | Right b
record Pair a b
  constructor MkPair
  fst : a
  snd : b

Idris offers not only algebraic data types but also indexed families. These are tagged union where different constructors may have different return types. Here is Vect an example of a data type which is an indexed family corresponding to a linked-list whose length is known at compile time. It has one index (of type Nat) representing the length of the list (the value of this index is therefore different for the [] and (::) constructors) and a parameter (of type Type) corresponding to the type of values stored in the list.

data Vect : (size : Nat) -> Type -> Type where
  Nil  : Vect 0 a                         -- empty list: size is 0
  (::) : a -> Vect n a -> Vect (1 + n) a  -- extending a list of size n: size is 1+n

Both data and record are compiled to constructors in the intermediate representations. Two examples of such Constructors are Core.CompileExpr.CExp.CCon and Core.CompileExpr.CDef.MkCon.

Compiling the Either data type will produce three constructor definitions in the IR:

  • One for the Either type itself, with the arity of two. Arity tells how many parameters of the constructor should have. Two is reasonable in this case as the original Idris Either type has two parameters.

  • One for the Left constructor with arity of three. Three may be surprising, as the constructor only has one argument in Idris, but we should keep in mind the type parameters for the data type too.

  • One for the Right constructor with arity of three.

In the IR constructors have unique names. For efficiency reasons, Idris assigns a unique integer tag to each data constructors so that constructor matching is reduced to comparisons of integers instead of strings. In the Either example above Left gets tag 0 and Right gets tag 1.

Constructors can be considered structured information: a name together with parameters. The custom back-end needs to decide how to represent such data. For example using Dict in Python, JSON in JavaScript, etc. The most important aspect to consider is that these structured values are heap related values, which should be created and stored dynamically. If there is an easy way to map in the host technology, the memory management for these values could be inherited. If not, then the host technology is responsible for implementing an appropriate memory management. For example RefC is a C backend that implements its own memory management based on reference counting.

How to implement special values?

Apart from the data constructors there are two special kind of values present in the Idris IRs: type constructors and Erased.

Type constructors

Type and data constructors that are not relevant for the program’s runtime behaviour may be used at compile butand will be erased from the intermediate representation.

However some type constructors need to be kept around even at runtime because pattern matching on types is allowed in Idris:

notId : {a : Type} -> a -> a
notId {a=Int} x = x + 1
notId x = x

Here we can pattern match on a and ensure that notId behaves differently on Int than all the other types. This will generate an IR that will contain a Case expression with two branches: one Alt matching on the Int type constructor and a default for the non-Int matching part of the notId function.

This is not that special: Type is a bit like an infinite data type that contains all of the types a user may ever declare or use. This can be handled in the back-end and host language using the same mechanisms that were mobilised to deal with data constructors. The reason for using the same approach is that in dependently typed languages, the same language is used to form both type and value level expressions. Compilation of type level terms will be the same as that of value level terms. This is one of the things that make dependently typed abstraction elegant.

Erased

The other kind of special value is Erased. This is generated by the Idris compiler and part of the IR if the original value is only needed during the type elaboration process. For example:

data Subset : (type : Type)
           -> (pred : type -> Type)
           -> Type
  where
    Element : (value : type)
           -> (0 prf : pred value)
           -> Subset type pred

Because prf has quantity 0, it is guaranteed to be erased during compilation and thus not present at runtime. Therefore prf will be represented as Erased in the IR. The custom back-end needs to represent this value too as any other data value, as it could occur in place of normal values. The simplest approach is to implement it as a special data constructor and let the host technology provided optimizations take care of its removal.

How to implement primitive operations?

Primitive operations are defined in the module Core.TT.PrimFn. The constructors of this data type represent the primitive operations that the custom back-end needs to implement. These primitive operations can be grouped as:

  • Arithmetic operations (Add, Sub, Mul, Div, Mod, Neg)

  • Bit operations (ShiftL, ShiftR, BAnd, BOr, BXor)

  • Comparison operations (LT, LTE, EQ, GTE, GT)

  • String operations (Length, Head, Tail, Index, Cons, Append, Reverse, Substr)

  • Double precision floating point operations (Exp, Log, Sin, Cos, Tan, ASin, ACos, ATan, Sqrt, Floor, Ceiling)

  • Casting of numeric and string values

  • An unsafe cast operation BelieveMe

  • A Crash operation taking a type and a string and creating a value at that type by raising an error.

BelieveMe

The primitive believe_me is an unsafe cast that allows users to bypass the typechecker when they know something to be true even though it cannot be proven.

For instance, assuming that Idris’ primitives are correctly implemented, it should be true that if a boolean equality test on two Int i and j returns True then i and j are equal. Such a theorem can be implemented by using believe_me to cast Refl (the constructor for proofs of a propositional equality) from i === i to i === j. In this case, it should be safe to implement.

Boxing

Idris assumes that the back-end representation of the data is not strongly typed and that all the data type have the same kind of representation. This could introduce a constraint on the representation of the primitives and constructor represented data types. One possible solution is that the custom back-end should represent primitive data types the same way it does constructors, using special tags. This is called boxing.

Official backends represent primitive data types as boxed ones.

  • RefC: Boxes the primitives, which makes them easy to put on the heap.

  • Scheme: Prints the values that are a Constant as Scheme literals.

How to compile top-level definitions?

As mentioned earlier, Idris has 4 different IRs that are available in the CompileData record: Named, LambdaLifted, ANF, and VMDef. When assembling the CompileData we have to tell the Idris compiler which level we are interested in. The CompileData contains lists of definitions that can be considered as top level definitions that the custom back-end need to generate functions for.

There are four types of top-level definitions that the code generation back-end needs to support:

  • Function

  • Constructor

  • Foreign call

  • Error

Function contains a lambda calculus like expression.

Constructor represents a data or a type constructor, and it should be implemented as a function creating the corresponding data structure in the custom back-end.

A top-level foreign call defines an entry point for calling functions implemented outside the Idris program under compilation. The Foreign construction contains a list of Strings which are the snippets defined by the programmer, the type of the arguments and the return type of the foreign function. The custom back-end should generate a wrapper function. More on this on How to implement the Foreign Function Interface?

A top-level error definition represents holes in Idris programs, uses of idris_crash, or unreachable branches in a case tree. Users may want to execute incomplete programs for testing purposes which is fine as long as we never actually need the value of any of the holes. Library writers may want to raise an exception if an unrecoverable error has happened. Finally, Idris compiles the unreachable branches of a case tree to runtime error as it is dead code anyway.

How to compile IR expressions?

The custom back-end should decide which intermediate representation is used as a starting point. The result of the transformation should be expressions and functions of the host technology.

Definitions in ANF and Lifted are represented as a tree like expression, where control flow is based on the Let and Case expressions.

Case expressions

There are two types of case expressions, one for matching and branching on primitive values such as Int, and the second one is matching and branching on constructor values. The two types of case expressions will have two different representation for alternatives of the cases. These are ConstCase (for matching on constant values) and ConCase (for matching on constructors).

Matching on constructors can be implemented as matching on their tags or, less efficiently, as matching on the name of the constructor. In both cases a match should bind the values of the constructor’s arguments to variables in the body of the matching branch. This can be implemented in various ways depending on the host technology: switch expressions, case with pattern matching, or if-then-else chains.

When pattern matching binds variables, the number of arguments can be different from the arity of the constructor defined in top-level definitions and in GlobalDef. This is because all the arguments are kept around at typechecking time, but the code generator for the case tree removes the ones which are marked as erased. The code generator of the custom back-end also needs to remove the erased arguments in the constructor implementation. In GlobalDef, eraseArg contains this information, which can be used to extract the number of arguments which needs to be kept around.

Creating values

Values can be created in two ways.

If the value is a primitive value, it will be handed to the back-end as a PrimVal. It should be compiled to a constant in the host language following the design decisions made in the ‘How to represent primitive values?’ section.

If it is a structured value (i.e. a Con) it should be compiled to a function in the host language which creates a dynamic value. Design decisions made for ‘How to represent constructor values?’ is going to have effect here.

Function calls

There are four types of function calls: - Saturated function calls (all the arguments are there) - Under-applied function calls (some arguments are missing) - Primitive function calls (necessarily saturated, PrimFn constructor) - Foreign Function calls (referred to by its name)

The ANF and Lifted intermediate representations support under-applied function calls (using the UnderApp constructor in both IR). The custom back-end needs to support partial application of functions and creating closures in the host technology. This is not a problem with back-ends like Scheme where we get the partial application of a function for free. But if the host language does not have this tool in its toolbox, the custom back-end needs to simulate closures. One possible solution is to manufacture a closure as a special object storing the function and the values it is currently applied to and wait until all the necessary arguments have been received before evaluating it. The same approach is needed if the VMCode IR was chosen for code generation.

Let bindings

Both the ANF and Lifted intermediate representations have a Let construct that lets users assign values to local variables. These two IRs differ in their representation of bound variables.

Lifted is a type family indexed by the List Name of local variables in scope. A variable is represented using LLocal, a constructor that stores a Nat together with a proof that it points to a valid name in the local scope.

ANF is a lower level representation where this kind of guarantees are not present anymore. A local variable is represented using the AV constructor which stores an AVar whose definition we include below. The ALocal constructor stores an Int that corresponds to the Nat we would have seen in Lifted. The ANull constructor refers to an erased variable and its representation in the host language will depend on the design choices made in the ‘How to represent Erased values’ section.

VMDef specificities

VMDef is meant to be the closest IR to machine code. In VMDef, all the definitions have been compiled to instructions for a small virtual machine with registers and closures.

Instead of Let expressions, there only are ASSIGN statements at this level.

Instead of Case expressions binding variables when they successfully match on a data constructor, CASE picks a branch based on the constructor itself. An extra operation called PROJECT is introduced to explicitly extract a constructor’s argument based on their position.

There are no App or UnderApp. Both are replaced by APPLY which applies only one value and creates a closure from the application. For erased values the operation NULL assigns an empty/null value for the register.

How to implement the Foreign Function Interface?

The Foreign Function Interface (FFI) plays a big role in running Idris programs. The primitive operations which are mentioned above are functions for manipulating values and those functions aren’t meant for complex interaction with the runtime system. Many of the primitive types can be thought of as abstract types provided via external and foreign functions to manipulate them.

The responsibility of the custom back-end and the host technology is to represent these computations the operationally correct way. The design decisions with respect to representing primitive types in the host technology will inevitably have effects on the design of the FFI.

Foreign Types

Originally Idris had an official back-end implementation in C. Even though this has changed, the names in the types for the FFI kept their C prefix. The Core.CompileExpr.CFType contains the following definitions, many of them one-to-one mapping from the corresponding primitive type, but some of them needs explanation.

The foreign types are:

  • CFUnit

  • CFInt

  • CFUnsigned(8/16/32/64)

  • CFString

  • CFDouble

  • CFChar

  • CFFun of type CFType -> CFType -> CFType Callbacks can be registered in the host technology via parameters that have CFFun type. The back-end should be able to handle functions that are defined in Idris side and compiled to the host technology. If the custom back-end supports higher order functions then it should be used to implement the support for this kind of FFI type.

  • CFIORes of type CFType -> CFType Any PrimIO defined computation will have this extra layer. Pure functions shouldn’t have any observable IO effect on the program state in the host technology implemented runtime. NOTE: IORes is also used when callback functions are registered in the host technology.

  • CFWorld Represents the current state of the world. This should refer to a token that is passed around between function calls. The implementation of the World value should contain back-end specific values and information about the state of the Idris runtime.

  • CFStruct of type String -> List (String, CFType) -> CFType is the foreign type associated with the System.FFI.Struct. It represents a C like structure in the custom back-end. prim__getField and prim__setField primitives should be implemented to support this CFType.

  • CFUser of type Name -> List CFType -> CFType Types defined with [external] are represented with CFUser. For example data MyType : Type where [external] will be represented as CFUser Module.MyType []

  • CFBuffer Foreign type defined for Data.Buffer. Although this is an external type, Idris builds on a random access buffer.

  • CFPtr The Ptr t and AnyPtr are compiled to CFPtr Any complex structured data that can not be represented as a simple primitive can use this CFPtr to keep track where the value is used. In Idris Ptr t is defined as external type.

  • CFGCPtr The GCPtr t and GCAnyPtr are compiled to CFGCPtr. GCPtr is inferred from a Ptr value calling the onCollect function and has a special property. The onCollect attaches a finalizer for the pointer which should run when the pointer is freed.

Examples

Let’s step back and look into how this is represented at the Idris source level. The simplest form of a definition involving the FFI a function definition with a %foreign pragma. The pragma is passed a list of strings corresponding to a mapping from backends to names for the foreign calls. For instance:

this function should be translated by the C back end as a call to the add function defined in the smallc.c file. In the FFI, Int is translated to CFInt. The back-end assumes that the data representation specified in the library file correspond to that of normal Idris values.

We can also define external types like in the following examples:

Here ThreadID is defined as an external type and this type will be represented as CFUser "ThreadID" [] internally. The value which is created by the scheme runtime will be considered as a black box.

The type of prim__fork, once translated as a foreign type, is [%World -> IORes Unit, %World] -> IORes Main.ThreadID Here we see that the %World is added to the IO computations. The %World parameter is always the last in the argument list.

For the FFI functions, the type information and the user defined string can be found in the top-level definitions. The custom back-end should use the definitions to generate wrapper code, which should convert the types that are described by the CFType to the types that the function in the %foreign directive needs..

How to compile modules?

The Idris compiler generates intermediate files for modules, the content of the files are neither part of Lifted, ANF, nor VMCode. Because of this, when the compilation pipeline enters the stage of code generation, all the information will be in one instance of the CompileData record and the custom code generator back-end can process them as it would see the whole program.

The custom back-end has the option to introduce some hierarchy for the functions in different namespaces and organize some module structure to let the host technology process the bits and pieces in different sized chunks. However, this feature is not in the scope of the Idris compiler.

It is worth noting that modules can be mutually recursive in Idris. So a direct compilation of Idris modules to modules in the host language may be unsuccessful.

How to embed code snippets?

A possible motivation for implementing a custom back-end for Idris is to generate code that is meant to be used in a larger project. This project may be bound to another language that has many useful librarie but could benefit from relying on Idris’ strong type system in places.

When writing a code generator for this purpose, the interoperability of the host technology and Idris based on the Foreign Interface can be inconvenient. In this situation, the need to embed code of the host technology arises naturally. Elaboration can be an answer for that.

Elaboration is a typechecking time code generation technique. It relies on the Elab monad to write scripts that can interact with the typechecking machinery to generate Idris code in Core.TT.

When code snippets need to be embedded a custom library should be provided with the custom back-end to turn the valid code snippets into their representation in Core.TT.

What should the runtime system support?

As a summary, a custom back-end for the Idris compiler should create an environment in the host technology that is able to run Idris programs. As Idris is part of the family of functional programming languages, its computation model is based on graph reduction. Programs represented as simple graphs in the memory are based on the closure creation mechanism during evaluation. Closure creation exist even on the lowest levels of IRs. For that reason any runtime in any host technology needs to support some kind of representation of closures and be able to store them on the heap, thus the responsibility of memory management falls on the lap of the implementor of the custom back-end. If the host technology has memory management, the problem is not difficult. It is also likely that storing closures can be easily implemented via the tools of the host technology.

Although it is not clear how much functionality a back-end should support. Tools from the Scheme back-end are brought into the Idris world via external types and primitive operations around them. This is a good practice and gives the community the ability to focus on the implementation of a quick compiler for a dependently typed language. One of these hidden features is the concurrency primitives. These are part of the different libraries that could be part of the compiler or part of the contribution package. If the threading model is different for the host technology that the Idris default back-end inherits currently from the Scheme technology it could be a bigger piece of work.

IO in Idris is implemented using an abstract %World value, which serves as token for functions that operate interactively with the World through simple calls to the underlying runtime system. The entry point of the program is the main function, which has the type of the IO unit, such as main : IO (). This means that every program which runs, starts its part of some IO computation. Under the hood this is implemented via the creation of the %World abstract value, and invoking the main function, which is compiled to pass the abstract %World value for IO related foreign or external operations.

There is an operation called unsafePerformIO in the PrimIO module. The type signature of unsafePerformIO tells us that it is capable of evaluating an IO computation in a pure context. Under the hood it is run in exactly the same way the main function is. It manufactures a fresh %World token and passes it to the IO computations. This leads to a design decision: How to represent the state of the World, and how to represent the world that is instantiated for the sake of the unsafePerformIO operation via the unsafeCreateWorld? Both the mechanisms of main and unsafeCreateWorld use the %MkWorld constructor, which will be compiled to WorldVal and its type to WorldType, which means the implementation of the runtime is responsible for creating the abstraction around the World. Implementation of an abstract World value could be based on a singleton pattern, where we can have just one world, or we could have more than one world, resulting in parallel universes for unsafePerformIO.

还有一些其它的代码生成器,它们不是Idris 2 主资源库的一部分,你可以在 Idris 2 维基上找到:

其它后端

目前正在进行的工作是支持从 idris2 代码生成其他语言的库。

类库

这个编译指示告诉后端对一个给定的函数使用什么名字。

%nomangle
foo : Int -> Int
foo x = x + 1

在支持该功能的后端,该函数将被称为 foo 而不会被混淆,并带有命名空间。

如果您要使用的名称不是有效的 idris 标识符,则可以对已编译代码中显示的 idris 名称和函数使用不同的名称,例如。

%nomangle "$_baz"
baz : Int
baz = 42

你也可以为不同的后端指定不同的名字,类似于 %foreign 的方式

%nomangle "refc:idr_add_one"
          "node:add_one"
plusOne : Bits32 -> Bits32
plusOne x = x + 1

自 Idris 1 以来的变化

Idris 2 主要向后兼容 Idris 1,但有一些小例外。本文档描述了这些变化,大致按照在实践中遇到它们的可能性排序。新特性在最后的章节 新的特性 中描述。

Type Driven Development with Idris: Updates Required 章节描述了这些更改如何影响由 Edwin Brady 撰写的 《使用 Idris 进行类型驱动开发》 <https://www.manning.com/books/type-driven-development-with-idris> `_ 一书中的代码,可从 `Manning 获得。

备注

Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/

新核心语言:类型中的数量

Idris 2 是基于 量化类型理论(QTT) ,这是由 Bob Atkey 和 Conor McBride 开发的核心语言。在实践中,Idris 2 中的每个变量都有一个 数量 与之相关。数量是的取值是下列其中之一:

  • 0 ,表示变量在运行时被 擦除

  • 1 ,表示变量在运行时 正好使用一次

  • 不受限制 ,这与 Idris 1 的行为相同

有关这方面的更多详细信息,请参阅章节 多重性。在实践中,这可能会导致某些 Idris 1 程序由于尝试使用在运行时被擦除的参数而不能通过 Idris 2 的类型检查。

擦除

在 Idris 中,以小写字母开头的名称会自动绑定为类型中的隐式参数,例如在以下骨架定义中, nam 是隐式绑定的:

append : Vect n a -> Vect m a -> Vect (n + m) a
append xs ys = ?append_rhs

编译依赖类型编程语言的困难之一是决定哪些参数在运行时使用,哪些可以安全地擦除。更重要的是,这也是编程时的困难之一:程序员如何 知道 什么时候会删除参数?

在 Idris 2 中,变量的数量告诉我们它在运行时是否可用。我们可以通过检查 REPL 上的孔来查看 append_rhs 作用域内变量的数量:

Main> :t append_rhs
 0 m : Nat
 0 a : Type
 0 n : Nat
   ys : Vect m a
   xs : Vect n a
-------------------------------------
append_rhs : Vect (plus n m) a

0 旁边的 m, an 表示它们在范用域内,但在运行时将会出现 0 次,也就是说,将会 保证 它们在运行时会被删除。

如果您在运行时使用隐式参数,这确实会在转换 Idris 1 程序时导致一些潜在的困难。例如,在 Idris 1 中,您可以获得向量的长度,如下所示:

vlen : Vect n a -> Nat
vlen {n} xs = n

这似乎是个好主意,因为它在恒定时间内运行并利用了类型级别的信息,但代价是 n 必须在运行时可用,所以在运行时我们总是需要如果我们调用 vlen 时可用的向量的长度。 Idris 1 可以推断出是否需要长度,但程序员没有简单的方法可以确定。

在 Idris 2 中,我们需要明确指出,在运行时需要 n

vlen : {n : Nat} -> Vect n a -> Nat
vlen xs = n

(顺便说一下,还要注意在 Idris 2 中,在类型中绑定的名字也可以在定义中使用,而不需要明确地重新绑定它们)

这也意味着,当你调用 vlen 时,你需要可用的长度。例如,这将产生一个错误

sumLengths : Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen ys

Idris 2 会报告:

vlen.idr:7:20--7:28:While processing right hand side of Main.sumLengths at vlen.idr:7:1--10:1:
m is not accessible in this context

这意味着它需要使用 m 作为参数传递给 vlen xs ,在这里它需要在运行时可用,但是 msumLengths 中不可用,因为它有多重性 0

我们可以通过将 sumLengths 的右侧替换成一个孔来更清楚地看到这一点……

sumLengths : Vect m a -> Vect n a -> Nat
sumLengths xs ys = ?sumLengths_rhs

…然后在REPL检查孔的类型:

Main> :t sumLengths_rhs
 0 n : Nat
 0 a : Type
 0 m : Nat
   ys : Vect n a
   xs : Vect m a
-------------------------------------
sumLengths_rhs : Nat

相反,我们需要为 mn 提供无限制多重性的绑定

sumLengths : {m, n : _} -> Vect m a -> Vect n a —> Nat
sumLengths xs ys = vlen xs + vlen xs

请记住,在绑定器上不给出多重性,就像这里的 mn 一样,意味着变量的使用不受限制。

如果你要将 Idris 1 程序转换到 Idris 2 中使用,这可能是你需要考虑的最大问题。但需要注意的是,如果你有绑定的隐式参数,例如…

excitingFn : {t : _} -> Coffee t -> Moonbase t

…那么最好确保 t 真的被需要,否则由于运行时间不必要地建立 t 的实例,性能可能会受到影响!

关于擦除的最后一点说明:试图对一个具有多重性 0 的参数进行模式匹配是一个错误,,除非其值可以从其他地方推断出来。因此,下面的定义会被拒绝

badNot : (0 x : Bool) -> Bool
badNot False = True
badNot True = False

这被拒绝了,错误是:

badnot.idr:2:1--3:1:Attempt to match on erased argument False in
Main.badNot

然而,下面的情况是好的,因为在 sNot 中,尽管我们似乎在被删除的参数 x 上进行了匹配,但它的值是可以从第二个参数的类型中唯一推断出来的

data SBool : Bool -> Type where
     SFalse : SBool False
     STrue  : SBool True

sNot : (0 x : Bool) -> SBool x -> Bool
sNot False SFalse = True
sNot True  STrue  = False

到目前为止,Idris 2 的经验表明,在大多数情况下,只要你在 Idris 1 程序中使用非绑定隐式参数,它们在 Idris 2 中无需过多修改即可工作。 Idris 2 类型检查器将指出你在运行时需要非绑定隐式参数的地方–有时这既令人惊讶又具有启发性!

线性

多重性为 1 的线性参数的完整细节在章节 多重性 中给出。简而言之,多重性 1 背后的直觉是,如果我们有一个具有以下形式的函数……

f : (1 x : a) -> b

…那么类型系统提供的保证是 if f x 只使用一次,然后 x 在此过程中只使用一次

Prelude 和 base

Idris 1 中的 Prelude 包含很多定义,其中许多很少需要。 Idris 2 中的哲学是不同的。 (相当模糊的)经验法则是它应该包含几乎所有非平凡程序所需的基本功能。

这是一个模糊的规范,因为不同的程序员会考虑不同的东西绝对必要,但结果是它包含:

  • 细化器可以脱糖的任何东西(例如元组、 ()=

  • 基本类型 Bool, Nat, List, Stream, Dec, Maybe, Either

  • 最重要的实用函数: idthe 、composition 等

  • 基本类型和基本类型的算术接口和实现

  • 基本的 CharString 操作

  • ShowEqOrd ,以及 Prelude 中所有类型的实现

  • 基本证明的接口和函数( congUninhabited 等)

  • Semigroup, Monoid

  • Functor, Applicative, Monad 和相关函数

  • Foldable, AlternativeTraversable

  • Range ,用于列表区间语法

  • 控制台 IO

任何不适合此处的内容都已移至 base 库。在其他地方,您可以找到一些曾经在 prelude 中的函数:

  • Data.ListData.Nat

  • Data.MaybeData.Either

  • System.FileSystem.Directory ,(文件管理以前是 Prelude 的一部分)

  • Decidable.Equality

较小的变化

有歧义名称的解析

Idris 1 非常努力地按类型解析有歧义的名称,即使这涉及与接口解析的一些复杂交互。这有时可能是导致类型检查时间过长的原因。 Idris 2 简化了这一点,代价是有时需要对有歧义的名称进行更多的程序员注释。

作为一般规则,Idris 2 将能够区分具有不同具体返回类型(例如数据构造函数)或具有不同具体参数类型(例如记录投影)的名称。如果一个名称需要解析接口,则可能难以解决歧义。如果无法立即解析名称,它将推迟解析,但与 Idris 1 不同,它不会尝试显着回溯。如果你有深度嵌套的有歧义名称(超过一个小阈值,默认为 3),Idris 2 将报告错误。您可以使用指令更改此阈值,例如:

%ambiguity_depth 10

然而,在这种情况下,明确地消除歧义肯定是一个更好的主意。

实际上,一般来说,如果您遇到名称歧义错误,最好的方法是明确给出命名空间。您还可以在局部重新绑定名称:

Main> let (::) = Prelude.(::) in [1,2,3]
[1, 2, 3]

剩下的一个困难是解决有歧义的名称,其中一种可能是接口方法,另一种可能是具体的顶级函数。例如,我们可能有:

Prelude.(>>=) : Monad m => m a -> (a -> m b) -> m b
LinearIO.(>>=) : (1 act : IO a) -> (1 k : a -> IO b) -> IO b

作为一个务实的选择,如果在更具体的名称有效的上下文中进行类型检查(此处为 LinearIO.(>>=) ,因此如果对已知具有类型 IO t 的表达式 t 进行类型检查),将选择更具体的名称。

这在某种程度上令人不满意,所以我们将来可能会重新审视这个!

模块、命名空间和导出

privateexportpublic export 修饰符控制的可见性规则现在指的是来自其他 命名空间 的名称的可见性,而不是其他 文件

因此,如果您有以下内容,且所有内容都在同一个文件中…

namespace A
  private
  aHidden : Int -> Int
  aHidden x = x * 2

  export
  aVisible : Int -> Int
  aVisibile x = aHidden x

namespace B
  export
  bVisible : Int -> Int
  bVisible x = aVisible (x * 2)

…然后 bVisible 可以访 aVisible ,但不能访问 aHidden

和以前一样,记录在它们自己的命名空间中定义,但字段始终在父命名空间中可见。

此外,模块名称现在必须与定义它们的文件名匹配,但模块 “Main” 除外,它可以在任何名称的文件中定义。

%language 编译指示

Idris 1 中有几个 %language 编译指示,它们定义了各种实验性扩展。这些在 Idris 2 中都不可用,尽管将来可能会定义扩展。

还删除了用于默认可见性的 %access 编译指示,而是在每个声明上使用可见性修饰符。

let 绑定

let 绑定,即 let x = val in e 形式的表达式具有稍微不同的行为。以前,您可以依赖 e 作用域内的 x 的计算行为,因此类型检查可以考虑 x 替换为 val 。不幸的是,这导致了 casewith 子句的复杂化:如果我们想保留计算行为,我们需要对 casewith 的阐述方式进行重大改变。

所以,为了简单和一致(实际上,因为我没有足够的时间来解决 casewith 的问题)上面的表达式 let x = val in e 相当于 (\x => e) val

所以, let 现在有效地概括了一个复杂的子表达式。如果您确实需要定义的计算行为,现在可以使用局部函数定义来代替 - 请参阅下面的 局部函数定义 章节。

此外,还可以使用替代语法 let x := val in e 。有关更多信息,请参见 let 绑定 章节。

auto-隐式和接口

接口和 auto-隐式参数是相似的,因为它们调用表达式搜索机制来查找参数的值。在 Idris 1 中,它们是分开实现的,但在 Idris 2 中,它们使用相同的机制。考虑以下 fromMaybetotal 定义:

data IsJust : Maybe a -> Type where
     ItIsJust : IsJust (Just val)

fromMaybe : (x : Maybe a) -> {auto p : IsJust x} -> a
fromMaybe (Just x) {p = ItIsJust} = x

由于接口解析和 auto- 隐式现在是同一个东西, fromMaybe 的类型可以写成:

fromMaybe : (x : Maybe a) -> IsJust x => a

所以现在,约束箭头 => 意味着参数将通过 auto 隐式搜索找到。

在定义 data 类型时,可以通过为数据类型提供选项来控制 auto 隐式搜索将如何进行。例如:

data Elem : (x : a) -> (xs : List a) -> Type where
     [search x]
     Here : Elem x (x :: xs)
     There : Elem x xs -> Elem x (y :: xs)

search x 选项意味着 auto-隐式搜索类型为 Elem t ts 的值将在类型检查器解析值 t 后立即开始,即使 ts 仍然未知。

默认情况下, auto - 隐式搜索使用数据类型的构造函数作为搜索提示。数据类型上的 noHints 选项会关闭此行为。

您可以使用函数上的 %hint 选项添加自己的搜索提示。例如:

data MyShow : Type -> Type where
     [noHints]
     MkMyShow : (myshow : a -> String) -> MyShow a

%hint
showBool : MyShow Bool
showBool = MkMyShow (\x => if x then "True" else "False")

myShow : MyShow a => a -> String
myShow @{MkMyShow myshow} = myshow

在这种情况下,搜索 MyShow Bool 会找到 showBool ,如果我们尝试在 REPL 中对 myShow True 求值可以看到:

Main> myShow True
"True"

事实上,这就是接口的详细说明。然而, %hint 应该小心使用。提示过多会导致搜索空间过大!

记录字段

现在可以通过 . 访问记录字段。例如,如果您有:

record Person where
    constructor MkPerson
    firstName, middleName, lastName : String
    age : Int

并且您有一条记录 fred:Person ,那么您可以使用 fred.firstName 访问 firstName 字段。

完全性和覆盖性

%default covering 现在是默认状态,因此所有函数必须覆盖所有输入,除非另有说明 partial 注释,或切换到 %default partial``(不推荐 - 使用 ``partial 注释来代替函数是部分的最小可能位置)。

构建制品

这并不是真正的语言更改,而是 Idris 保存检查文件的方式的更改,并且仍然有用。所有检查的模块现在都保存在源代码树的根目录中的 build/ttc 目录中,目录结构遵循源目录结构。可执行文件放置在 build/exec 中。

对其他包的依赖现在用 depends 字段表示, pkgs 字段不再被识别。此外,具有 URLS 或其他字符串数据(模块或包名称除外)的字段必须用双引号引起来。例如:

package lightyear

sourceloc  = "git://git@github.com:ziman/lightyear.git"
bugtracker = "http://www.github.com/ziman/lightyear/issues"

depends = effects

modules = Lightyear
        , Lightyear.Position
        , Lightyear.Core
        , Lightyear.Combinators
        , Lightyear.StringFile
        , Lightyear.Strings
        , Lightyear.Char
        , Lightyear.Testing

新的特性

除了将核心语言更改为使用上述定量类型理论之外,还有其他几个新特性。

局部函数定义

现在可以使用 let 块在本地定义函数。例如,以下示例中的 greet ,它是在局部变量 x 的作用域内定义的:

chat : IO ()
chat
    = do putStr "Name: "
         x <- getLine
         let greet : String -> String
             greet msg = msg ++ " " ++ x
         putStrLn (greet "Hello")
         putStrLn (greet "Bye")

这些“ et 块可以在任何地方使用(在上面的 do 块中间,也可以在任何函数中,或在类型声明中)。 where 块现在通过翻译成局部 let 来阐述。

然而,Idris 不再尝试推断在 where 块中定义的函数的类型,因为这很脆弱。如果我们能想出一个好的、可预测的方法,这可能会被恢复。

隐式参数的作用域

类型中的隐式参数现在在定义主体的作用域内。我们已经在上面看到了,其中 n 自动在 vlen 的主体作用域内:

vlen : {n : Nat} -> Vect n a -> Nat
vlen xs = n

在使用 where 块或局部定义时记住这一点很重要,因为在声明局部定义的 type 时,作用域内的名称也将在作用域内。例如,下面的定义,我们试图为 Vect 定义我们自己的 Show 版本,将无法进行类型检查:

showVect : Show a => Vect n a -> String
showVect xs = "[" ++ showBody xs ++ "]"
  where
    showBody : Vect n a -> String
    showBody [] = ""
    showBody [x] = show x
    showBody (x :: xs) = show x ++ ", " ++ showBody xs

This fails because n is in scope already, from the type of showVect, in the type declaration for showBody, and so the first clause showBody [] will fail to type check because [] has length Z, not n. We can fix this by locally binding n:

showVect : Show a => Vect n a -> String
showVect xs = "[" ++ showBody xs ++ "]"
  where
    showBody : forall n . Vect n a -> String
    ...

Or, alternatively, using a new name:

showVect : Show a => Vect n a -> String
showVect xs = "[" ++ showBody xs ++ "]"
  where
    showBody : Vect n' a -> String
    ...

Idris 1 took a different approach here: names which were parameters to data types were in scope, other names were not. The Idris 2 approach is, we hope, more consistent and easier to understand.

Function application syntax additions

From now on you can utilise the new syntax of function applications:

f {x1 [= e1], x2 [= e2], ...}

There are three additions here:

  1. More than one argument can be written in braces, separated with commas:

record Dog where
  constructor MkDog
  name : String
  age : Nat

-- Notice that `name` and `age` are explicit arguments.
-- See paragraph (2)
haveADog : Dog
haveADog = MkDog {name = "Max", age = 3}

pairOfStringAndNat : (String, Nat)
pairOfStringAndNat = MkPair {x = "year", y = 2020}

myPlus : (n : Nat) -> (k : Nat) -> Nat
myPlus {n = Z   , k} = k
myPlus {n = S n', k} = S (myPlus n' k)

twoPlusTwoIsFour : myPlus {n = 2, k = 2} === 4
twoPlusTwoIsFour = Refl
  1. Arguments in braces can now correspond to explicit, implicit and auto implicit dependent function types (Pi types), provided the domain type is named:

myPointlessFunction : (exp : String) -> {imp : String} -> {auto aut : String} -> String
myPointlessFunction exp = exp ++ imp ++ aut

callIt : String
callIt = myPointlessFunction {imp = "a ", exp = "Just ", aut = "test"}

Order of the arguments doesn’t matter as long as they are in braces and the names are distinct. It is better to stick named arguments in braces at the end of your argument list, because regular unnamed explicit arguments are processed first and take priority:

myPointlessFunction' : (a : String) -> String -> (c : String) -> String
myPointlessFunction' a b c = a ++ b ++ c

badCall : String
badCall = myPointlessFunction' {a = "a", c = "c"} "b"

This snippet won’t type check, because “b” in badCall is passed first, although logically we want it to be second. Idris will tell you that it couldn’t find a spot for a = "a" (because “b” took its place), so the application is ill-formed.

Thus if you want to use the new syntax, it is worth naming your Pi types.

  1. Multiple explicit arguments can be “skipped” more easily with the following syntax:

f {x1 [= e1], x2 [= e2], ..., xn [= en], _}

or

f {}

in case none of the named arguments are wanted.

Examples:

import Data.Nat

record Four a b c d where
  constructor MkFour
  x : a
  y : b
  z : c
  w : d

firstTwo : Four a b c d -> (a, b)
firstTwo $ MkFour {x, y, _} = (x, y)
-- firstTwo $ MkFour {x, y, z = _, w = _} = (x, y)

dontCare : (x : Nat) -> Nat -> Nat -> Nat -> (y : Nat) -> x + y = y + x
dontCare {} = plusCommutative {}
--dontCare _ _ _ _ _ = plusCommutative _ _

Last rule worth noting is the case of named applications with repeated argument names, e.g:

data WeirdPair : Type -> Type -> Type where
  MkWeirdPair : (x : a) -> (x : b) -> WeirdPair a b

weirdSnd : WeirdPair a b -> b
--weirdSnd $ MkWeirdPair {x, x} = x
--                        ^
-- Error: "Non linear pattern variable"
-- But that one is okay:
weirdSnd $ MkWeirdPair {x = _, x} = x

In this example the name x is given repeatedly to the Pi types of the data constructor MkWeirdPair. In order to deconstruct the WeirdPair a b in weirdSnd, while writing the left-hand side of the pattern-matching clause in a named manner (via the new syntax), we have to rename the first occurrence of x to any fresh name or the _ as we did. Then the definition type checks normally.

In general, duplicate names are bound sequentially on the left-hand side and must be renamed for the pattern expression to be valid.

The situation is similar on the right-hand side of pattern-matching clauses:

0 TypeOf : a -> Type
TypeOf _ = a

weirdId : {0 a : Type} -> (1 a : a) -> TypeOf a
weirdId a = a

zero : Nat
-- zero = weirdId { a = Z }
--                      ^
-- Error: "Mismatch between: Nat and Type"
-- But this works:
zero = weirdId { a = Nat, a = Z }

Named arguments should be passed sequentially in the order they were defined in the Pi types, regardless of their (imp)explicitness.

Better inference

In Idris 1, holes (that is, unification variables arising from implicit arguments) were local to an expression, and if they were not resolved while checking the expression, they would not be resolved at all. In Idris 2, they are global, so inference works better. For example, we can now say:

test : Vect ? Int
test = [1,2,3,4]

Main> :t test
Main.test : Vect (S (S (S (S Z)))) Int

The ?, incidentally, differs from _ in that _ will be bound as an implicit argument if unresolved after checking the type of test, but ? will be left as a hole to be resolved later. Otherwise, they can be used interchangeably.

Dependent case

case blocks were available in Idris 1, but with some restrictions. Having better inference means that case blocks work more effectively in Idris 2, and dependent case analysis is supported.

append : Vect n a -> Vect m a -> Vect (n + m) a
append xs ys
    = case xs of
           [] => ys
           (x :: xs) => x :: append xs ys

The implicit arguments and original values are still available in the body of the case. Somewhat contrived, but the following is valid:

info : {n : _} -> Vect n a -> (Vect n a, Nat)
info xs
    = case xs of
           [] => (xs, n)
           (y :: ys) => (xs, n)

Record updates

Dependent record updates work, provided that all relevant fields are updated at the same time. Dependent record update is implemented via dependent case blocks rather than by generating a specific update function for each field as in Idris 1, so you will no longer get mystifying errors when trying to update dependent records!

For example, we can wrap a vector in a record, with an explicit length field:

record WrapVect a where
  constructor MkVect
  purpose : String
  length : Nat
  content : Vect length a

Then, we can safely update the content, provided we update the length correspondingly:

addEntry : String -> WrapVect String -> WrapVect String
addEntry val = { length $= S,
                 content $= (val :: ) }

Another novelty - new update syntax (previous one still functional):

record Three a b c where
  constructor MkThree
  x : a
  y : b
  z : c

-- Yet another contrived example
mapSetMap : Three a b c -> (a -> a') -> b' -> (c -> c') -> Three a' b' c'
mapSetMap three@(MkThree x y z) f y' g = {x $= f, y := y', z $= g} three

The record keyword has been discarded for brevity, symbol := replaces = in order to not introduce any ambiguity.

Generate definition

A new feature of the IDE protocol supports generating complete definitions from a type signature. You can try this at the REPL, for example, given our favourite introductory example…

append : Vect n a -> Vect m a -> Vect (n + m) a

…assuming this is defined on line 3, you can use the :gd command as follows:

Main> :gd 3 append
append [] ys = ys
append (x :: xs) ys = x :: append xs ys

This works by a fairly simple brute force search, which tries searching for a valid right hand side, and case splitting on the left if that fails, but is remarkably effective in a lot of situations. Some other examples which work:

my_cong : forall f . (x : a) -> (y : a) -> x = y -> f x = f y
my_curry : ((a, b) -> c) -> a -> b -> c
my_uncurry : (a -> b -> c) -> (a, b) -> c
append : Vect n a -> Vect m a -> Vect (n + m) a
lappend : (1 xs : List a) -> (1 ys : List a) -> List a
zipWith : (a -> b -> c) -> Vect n a -> Vect n b -> Vect n c

This is available in the IDE protocol via the generate-def command.

Chez Scheme target

The default code generator is, for the moment, Chez Scheme. Racket and Gambit code generators are also available. Like Idris 1, Idris 2 supports plug-in code generation to allow you to write a back end for the platform of your choice. To change the code generator, you can use the :set cg command:

Main> :set cg racket

Early experience shows that both are much faster than the Idris 1 C code generator, in both compile time and execution time (but we haven’t done any formal study on this yet, so it’s just anecdotal evidence).

Type Driven Development with Idris: Updates Required

The code in the book Type-Driven Development with Idris by Edwin Brady, available from Manning, will mostly work in Idris 2, with some small changes as detailed in this document. The updated code is also [going to be] part of the test suite (see tests/typedd-book in the Idris 2 source).

If you are new to Idris, and learning from the book, we recommend working through the first 3-4 chapters with Idris 1, to avoid the need to worry about the changes described here. After that, refer to this document for any necessary changes.

Chapter 1

No changes necessary

Chapter 2

The Prelude is smaller than Idris 1, and many functions have been moved to the base libraries instead. So:

In Average.idr, add:

import Data.String -- for `words`
import Data.List -- for `length` on lists

In AveMain.idr and Reverse.idr add:

import System.REPL -- for 'repl'

Chapter 3

Unbound implicits have multiplicity 0, so we can’t match on them at run-time. Therefore, in Matrix.idr, we need to change the type of createEmpties and transposeMat so that the length of the inner vector is available to match on:

createEmpties : {n : _} -> Vect n (Vect 0 elem)
transposeMat : {n : _} -> Vect m (Vect n elem) -> Vect n (Vect m elem)

Chapter 4

For the reasons described above:

  • In DataStore.idr, add import System.REPL and import Data.String

  • In SumInputs.idr, add import System.REPL

  • In TryIndex.idr, add an implicit argument:

tryIndex : {n : _} -> Integer -> Vect n a -> Maybe a
  • In exercise 5 of 4.2, add an implicit argument:

sumEntries : Num a => {n : _} -> (pos : Integer) -> Vect n a -> Vect n a -> Maybe a

Chapter 5

There is no longer a Cast instance from String to Nat, because its behaviour of returing Z if the String wasn’t numeric was thought to be confusing and potentially error prone. Instead, there is stringToNatOrZ in Data.String which at least has a clearer name. So:

In Loops.idr and ReadNum.idr add import Data.String and change cast to stringToNatOrZ

In ReadNum.idr, since functions must now be covering by default, add a partial annotation to readNumber_v2.

Chapter 6

In DataStore.idr and DataStoreHoles.idr, add import Data.String and import System.REPL. Also in DataStore.idr, the schema argument to display is required for matching, so change the type to:

display : {schema : _} -> SchemaType schema -> String

In TypeFuns.idr add import Data.String

Chapter 7

Abs is now a separate interface from Neg. So, change the type of eval to include Abs specifically:

eval : (Abs num, Neg num, Integral num) => Expr num -> num

Also, take abs out of the Neg implementation for Expr and add an implementation of Abs as follows:

Abs ty => Abs (Expr ty) where
    abs = Abs

Chapter 8

In AppendVec.idr, add import Data.Nat for the Nat proofs

cong now takes an explicit argument for the function to apply. So, in CheckEqMaybe.idr change the last case to:

checkEqNat (S k) (S j) = case checkEqNat k j of
                              Nothing => Nothing
                              Just prf => Just (cong S prf)

A similar change is necessary in CheckEqDec.idr.

In ExactLength.idr, the m argument to exactLength is needed at run time, so change its type to:

exactLength : {m : _} ->
              (len : Nat) -> (input : Vect m a) -> Maybe (Vect len a)

A similar change is necessary in ExactLengthDec.idr. Also, DecEq is no longer part of the prelude, so add import Decidable.Equality.

In ReverseVec.idr, add import Data.Nat for the Nat proofs.

In Void.idr, since functions must now be covering by default, add a partial annotation to nohead and its helper function getHead.

In Exercise 2 of 8.2.5, the definition of reverse' should be changed to reverse' : Vect k a -> Vect m a -> Vect (k + m) a, because the n in reverse' is otherwise bound to the same value as the n in the signature of myReverse.

Chapter 9

  • In ElemType.idr, add import Decidable.Equality

  • In Elem.idr, add import Data.Vect.Elem

In Hangman.idr:

  • Add import Data.String, import Data.Vect.Elem and import Decidable.Equality

  • removeElem pattern matches on n, so it needs to be written in its type:

removeElem : {n : _} ->
             (value : a) -> (xs : Vect (S n) a) ->
             {auto prf : Elem value xs} ->
             Vect n a
  • letters is used by processGuess, because it’s passed to removeElem:

processGuess : {letters : _} ->
               (letter : Char) -> WordState (S guesses) (S letters) ->
               Either (WordState guesses (S letters))
                      (WordState (S guesses) letters)
  • guesses and letters are implicit arguments to game, but are used by the definition, so add them to its type:

game : {guesses : _} -> {letters : _} ->
       WordState (S guesses) (S letters) -> IO Finished

In RemoveElem.idr

  • Add import Data.Vect.Elem

  • removeElem needs to be updated as above.

Chapter 10

Lots of changes necessary here, at least when constructing views, due to Idris 2 having a better (that is, more precise and correct!) implementation of unification, and the rules for recursive with application being tightened up.

In MergeSort.idr, add import Data.List

In MergeSortView.idr, add import Data.List, and make the arguments to the views explicit:

mergeSort : Ord a => List a -> List a
mergeSort input with (splitRec input)
  mergeSort [] | SplitRecNil = []
  mergeSort [x] | SplitRecOne x = [x]
  mergeSort (lefts ++ rights) | (SplitRecPair lefts rights lrec rrec)
       = merge (mergeSort lefts | lrec)
               (mergeSort rights | rrec)

In the problem 1 of exercise 10-1, the rest argument of the data constructor Exact of TakeN must be made explicit.

data TakeN : List a -> Type where
  Fewer : TakeN xs
  Exact : (n_xs : List a) -> {rest : _} -> TakeN (n_xs ++ rest)

In SnocList.idr, in my_reverse, the link between Snoc rec and xs ++ [x] needs to be made explicit. Idris 1 would happily decide that xs and x were the relevant implicit arguments to Snoc but this was little more than a guess based on what would make it type check, whereas Idris 2 is more precise in what it allows to unify. So, x and xs need to be explicit arguments to Snoc:

data SnocList : List a -> Type where
     Empty : SnocList []
     Snoc : (x, xs : _) -> (rec : SnocList xs) -> SnocList (xs ++ [x])

Correspondingly, they need to be explicit when matching. For example:

my_reverse : List a -> List a
my_reverse input with (snocList input)
  my_reverse [] | Empty = []
  my_reverse (xs ++ [x]) | (Snoc x xs rec) = x :: my_reverse xs | rec

Similar changes are necessary in snocListHelp and my_reverse_help. See tests/typedd-book/chapter10/SnocList.idr for the full details.

Also, in snocListHelp, input is used at run time so needs to be bound in the type:

snocListHelp : {input : _} ->
               (snoc : SnocList input) -> (rest : List a) -> SnocList (input +

It’s no longer necessary to give {input} explicitly in the patterns for snocListHelp, although it’s harmless to do so.

In IsSuffix.idr, the matching has to be written slightly differently. The recursive with application in Idris 1 probably shouldn’t have allowed this! Note that the Snoc - Snoc case has to be written first otherwise Idris generates a case tree splitting on input1 and input2 instead of the SnocList objects and this leads to a lot of cases being detected as missing.

isSuffix : Eq a => List a -> List a -> Bool
isSuffix input1 input2 with (snocList input1, snocList input2)
  isSuffix _ _ | (Snoc x xs xsrec, Snoc y ys ysrec)
     = (x == y) && (isSuffix _ _ | (xsrec, ysrec))
  isSuffix _ _ | (Empty, s) = True
  isSuffix _ _ | (s, Empty) = False

This doesn’t yet get past the totality checker, however, because it doesn’t know about looking inside pairs.

For the VList view in the exercise 4 after Chapter 10-2 import Data.List.Views.Extra from contrib library.

In DataStore.idr: Well this is embarrassing - I’ve no idea how Idris 1 lets this through! I think perhaps it’s too “helpful” when solving unification problems. To fix it, add an extra parameter schema to StoreView, and change the type of SNil to be explicit that the empty is the function defined in DataStore. Also add entry and store as explicit arguments to SAdd:

data StoreView : (schema : _) -> DataStore schema -> Type where
     SNil : StoreView schema DataStore.empty
     SAdd : (entry, store : _) -> (rec : StoreView schema store) ->
            StoreView schema (addToStore entry store)

Since size is as explicit argument in the DataStore record, it also needs to be relevant in the type of storeViewHelp:

storeViewHelp : {size : _} ->
                (items : Vect size (SchemaType schema)) ->
                StoreView schema (MkData size items)

In TestStore.idr:

  • In listItems, empty needs to be DataStore.empty to be explicit that you mean the function

  • In filterKeys, there is an error in the SNil case, which wasn’t caught because of the type of SNil above. It should be:

filterKeys test DataStore.empty | SNil = []

Chapter 11

In Streams.idr add import Data.Stream for iterate.

In Arith.idr and ArithTotal.idr, the Divides view now has explicit arguments for the dividend and remainder, so they need to be explicit in bound:

bound : Int -> Int
bound x with (divides x 12)
  bound ((12 * div) + rem) | (DivBy div rem prf) = rem + 1

In addition, import Data.Bits has to be added for shiftR, which now uses a safer type for the number of shifts:

randoms : Int -> Stream Int
randoms seed = let seed' = 1664525 * seed + 1013904223 in
                   (seed' `shiftR` 2) :: randoms seed'

In ArithCmd.idr, update DivBy, randoms, and import Data.Bits as above. Also add import Data.String for String.toLower.

In ArithCmd.idr, update DivBy, randoms, import Data.Bits and import Data.String as above. Also, since export rules are per-namespace now, rather than per-file, you need to export (>>=) from the namespaces CommandDo and ConsoleDo.

In ArithCmdDo.idr, since (>>=) is export, Command and ConsoleIO also have to be export. Also, update randoms and import Data.Bits as above.

In StreamFail.idr, add a partial annotation to labelWith.

In order to support do notation for custom types (like RunIO), you need to implement (>>=) for binding values in a do block and (>>) for sequencing computations without binding values. See tests for complete implementations.

For instance, the following do block is desugared to foo >>= (\x => bar >>= (\y => baz x y)):

do
  x <- foo
  y <- bar
  baz x y

while the following is converted to foo >> bar >> baz:

do
  foo
  bar
  baz

Chapter 12

For reasons described above: In ArithState.idr, add import Data.String and import Data.Bits and update randoms. Also the (>>=) operators need to be set as export since they are in their own namespaces, and in getRandom, DivBy needs to take additional arguments div and rem.

In ArithState.idr, since (>>=) is export, Command and ConsoleIO also have to be export.

evalState from Control.Monad.State now takes the stateType argument first.

Chapter 13

In StackIO.idr:

  • tryAdd pattern matches on height, so it needs to be written in its type:

tryAdd : {height : _} -> StackIO height
  • height is also an implicit argument to stackCalc, but is used by the definition, so add it to its type:

stackCalc : {height : _} -> StackIO height
  • In StackDo namespace, export (>>=):

namespace StackDo
  export
  (>>=) : StackCmd a height1 height2 ->
          (a -> Inf (StackIO height2)) -> StackIO height1
          (>>=) = Do

In Vending.idr:

  • Add import Data.String and change cast to stringToNatOrZ in strToInput

  • In MachineCmd type, add an implicit argument to (>>=) data constructor:

(>>=) : {state2 : _} ->
        MachineCmd a state1 state2 ->
        (a -> MachineCmd b state2 state3) ->
        MachineCmd b state1 state3
  • In MachineIO type, add an implicit argument to Do data constructor:

data MachineIO : VendState -> Type where
  Do : {state1 : _} ->
       MachineCmd a state1 state2 ->
       (a -> Inf (MachineIO state2)) -> MachineIO state1
  • runMachine pattern matches on inState, so it needs to be written in its type:

runMachine : {inState : _} -> MachineCmd ty inState outState -> IO ty
  • In MachineDo namespace, add an implicit argument to (>>=) and export it:

namespace MachineDo
  export
  (>>=) : {state1 : _} ->
          MachineCmd a state1 state2 ->
          (a -> Inf (MachineIO state2)) -> MachineIO state1
  (>>=) = Do
  • vend and refill pattern match on pounds and chocs, so they need to be written in their type:

vend : {pounds : _} -> {chocs : _} -> MachineIO (pounds, chocs)
refill: {pounds : _} -> {chocs : _} -> (num : Nat) -> MachineIO (pounds, chocs)
  • pounds and chocs are implicit arguments to machineLoop, but are used by the definition, so add them to its type:

machineLoop : {pounds : _} -> {chocs : _} -> MachineIO (pounds, chocs)

Chapter 14

In ATM.idr:

  • Add import Data.String and change cast to stringToNatOrZ in runATM

In Hangman.idr, add:

import Data.Vect.Elem -- `Elem` now has its own submodule
import Data.String -- for `toUpper`
import Data.List -- for `nub`
  • In Loop namespace, export GameLoop type and its data constructors:

namespace Loop
  public export
  data GameLoop : (ty : Type) -> GameState -> (ty -> GameState) -> Type where
    (>>=) : GameCmd a state1 state2_fn ->
            ((res : a) -> Inf (GameLoop b (state2_fn res) state3_fn)) ->
            GameLoop b state1 state3_fn
    Exit : GameLoop () NotRunning (const NotRunning)
  • letters and guesses are used by gameLoop, so they need to be written in its type:

gameLoop : {letters : _} -> {guesses : _} ->
           GameLoop () (Running (S guesses) (S letters)) (const NotRunning)
  • In Game type, add an implicit argument letters to InProgress data constructor:

data Game : GameState -> Type where
  GameStart : Game NotRunning
  GameWon : (word : String) -> Game NotRunning
  GameLost : (word : String) -> Game NotRunning
  InProgress : {letters : _} -> (word : String) -> (guesses : Nat) ->
               (missing : Vect letters Char) -> Game (Running guesses letters)
  • removeElem pattern matches on n, so it needs to be written in its type:

removeElem : {n : _} ->
             (value : a) -> (xs : Vect (S n) a) ->
             {auto prf : Elem value xs} ->
             Vect n a

Chapter 15

Idris includes a system for building packages from a package description file. These files can be used with the Idris compiler to manage the development process of your Idris programs and packages.

Package Descriptions

A package description includes the following:

  • A header, consisting of the keyword package followed by the package name. Package names can be any valid Idris identifier. The iPKG format also takes a quoted version that accepts any valid filename.

  • Fields describing package contents, <field> = <value>

Packages can describe libraries, executables, or both, and should include a version number. For library packages, one field must be the modules field, where the value is a comma separated list of modules to be installed. For example, a library test which has two modules Foo.idr and Bar.idr as source files would be written as follows:

package test
version = 0.0.1

modules = Foo, Bar

When installed, this will be in a directory test-0.1. If the version number is missing, it will default to 0.

Other examples of package files can be found in the libs directory of the main Idris repository, and in third-party libraries.

Metadata

The iPKG format supports additional metadata associated with the package. The added fields are:

  • brief = "<text>", a string literal containing a brief description of the package.

  • version = <version number>, a semantic version number, which must be in the form of integers separated by dots (e.g. 1.0.0, 0.3.0, 3.1.4 etc)

  • langversion <version constraints>, see depends below for a list of allowable constraints. For example, langversion >= 0.5.1 && < 1.0.0

  • readme = "<file>", location of the README file.

  • license = "<text>", a string description of the licensing information.

  • authors = "<text>", the author information.

  • maintainers = "<text>", Maintainer information.

  • homepage = "<url>", the website associated with the package.

  • sourceloc = "<url>", the location of the DVCS where the source can be found.

  • bugtracker = "<url>", the location of the project’s bug tracker.

Directories

  • sourcedir = "<dir>", the directory to look for Idris source files.

  • builddir = "<dir>", the directory to put the checked modules and the artefacts from the code generator.

  • outputdir = "<dir>", the directory where the code generator should output the executable.

Common Fields

Other common fields which may be present in an ipkg file are:

  • executable = <output>, which takes the name of the executable file to generate. Executable names can be any valid Idris identifier. the iPKG format also takes a quoted version that accepts any valid filename.

    Executables are placed in build/exec by default. The location can be changed by specifying the outputdir field.

  • main = <module>, which takes the name of the main module, and must be present if the executable field is present.

  • opts = "<idris options>", which allows options to be passed to Idris.

  • depends = <pkg description> (',' <pkg description>)+, a comma separated list of package names that the Idris package requires. The pkg_description is the package name, followed by an optional list of version constraints. Version constraints are separated by && and can use operators <, <=, >, >=, ==. For example, the following are valid package descriptions:

    • contrib (no constraints)

    • contrib == 0.3.0 (an exact version constraint)

    • contrib >= 0.3.0 (an inclusive lower bound)

    • contrib >= 0.3.0 && < 0.4 (an inclusive lower bound, and exclusive upper bound)

Comments

Package files support comments using the standard Idris singleline -- and multiline {- -} format.

Using Package files

Given an Idris package file test.ipkg it can be used with the Idris compiler as follows:

  • idris2 --build test.ipkg will build all modules in the package

  • idris2 --install test.ipkg will install the package to the global Idris library directory (that is $IDRIS2_PREFIX/idris-<version>/), making the modules in its modules field accessible by other Idris libraries and programs. Note that this doesn’t install any executables, just library modules.

  • idris2 --clean test.ipkg will clean the intermediate build files.

  • idris2 --mkdoc test.ipkg will generate HTML documentation for the package, output to build/docs

Once the test package has been installed, the command line option --package test makes it accessible (abbreviated to -p test). For example:

idris -p test Main.idr

Where does Idris look for packages?

Compiled packages are directories with compiled TTC files (see 构建制品 section). Directory structure of the source *.idr files is preserved for TTC files.

Compiled packages can be installed globally (under $IDRIS2_PREFIX/idris-<version>/ as described above) or locally (under a depends subdirectory in the top level working directory of a project). Packages specified using -p pkgname or with the depends field of a package will then be located as follows:

  • First, Idris looks in depends/pkgname-<version>, for a package which satisfies the version constraint.

  • If no package is found locally, Idris looks in $IDRIS2_PREFIX/idris-<version>/pkgname-<version>.

In each case, if more than one version satisfies the constraint, it will choose the one with the highest version number. If package versions are omitted in directory names, they are treated as the version 0.

在哪里可以找到库

您可以在 github 上的 wiki 上找到 Idris 库的列表:https://github.com/idris-lang/Idris2/wiki/1-%5BLanguage%5D-Libraries

请随时在那里贡献您自己的库!最终,我们的目标是拥有一个包管理器来管理库和依赖项。我们还没有正式的,但(至少)有两个正在开发中:

构建 Idris 2 应用程序

关于使用 Control.App 构建Idris 2应用程序的教程。

备注

Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/

Idris 应用程序有 main : IO () 作为一个入口点, 类 型 IO a 是对交互式操作的描述,它产生一个类型 a 的值。这对原语来说很好,但 IO 不支持异常,所以我们必须明确说明一个操作如何处理失败。另外,如果我们确实想支持异常,我们也要解释异常和线性(见章节 多重性 )如何交互。

在本教程中,我们描述了一个参数化类型 App 和一个相关的参数化类型 App1 ,它们共同允许我们在考虑到异常和线性的情况下构造更大的应用程序。 AppApp1 的目的是:

  • 使得在其类型中表达一个函数所做的交互成为可能,而没有太多的符号开销。

  • 与写在 IO 中相比,几乎没有性能开销。

  • 与其他副作用相关的库和技术兼容,如代数副作用『algebraic effects』或单子变压器『monad transformers』。

  • 足够易于使用和性能,它可以成为 所有 进行外部函数调用的库的基础,就像 IO 在 Idris 1 和 Haskell 中一样

  • 与线性类型兼容,也就是说,它们应该表达一段代码是否是线性的(保证只执行一次而不抛出异常)或是否可能抛出异常。

我们首先介绍 App ,用一些小的示例程序,然后展示如何用异常、状态和其他接口来扩展它。

APP 介绍

App 声明在模块 Control.App 中,它是 base 库的一部分。它的参数是一个隐含的 Path (说明程序的执行路径是线性的还是可能抛出异常),它有一个 default 值,程序可能会抛出一个 List Error (一个可以抛出的异常类型列表, ErrorType 的同义词):

data App : {default MayThrow l : Path} ->
           (es : List Error) -> Type -> Type

它的作用与 IO 相同,但支持抛出和捕获异常,并允许我们定义更多的由错误列表 es 参数化的约束性接口。例如,一个支持控制台IO的程序:

hello : Console es => App es ()
hello = putStrLn "Hello, App world!"

我们可以在一个完整的程序中使用它,如下所示:

module Main

import Control.App
import Control.App.Console

hello : Console es => App es ()
hello = putStrLn "Hello, App world!"

main : IO ()
main = run hello

或者,一个支持控制台IO的程序,携带一个 Int 的状态,标记为 Counter

data Counter : Type where

helloCount : (Console es, State Counter Int es) => App es ()
helloCount = do c <- get Counter
                put Counter (c + 1)
                putStrLn "Hello, counting world"
                c <- get Counter
                putStrLn ("Counter " ++ show c)

为了将其作为一个完整程序的一部分来运行,我们需要初始化状态。

main : IO ()
main = run (new 93 helloCount)

为了方便起见,我们可以一次性列出多个接口,使用 Control.App 中定义的函数 Has 来计算接口约束:

helloCount : Has [Console, State Counter Int] es => App es ()

0 Has : List (a -> Type) -> a -> Type
Has [] es = ()
Has (e :: es') es = (e es, Has es' es)

Path 的目的是说明一个程序是否可以抛出异常,这样我们就可以知道在哪里引用线性资源是安全的。它被声明如下:

data Path = MayThrow | NoThrow

App 的类型中 MayThrow 是默认的。我们希望这是最常见的情况。毕竟,现实中,大多数操作都有可能的失败模式,特别是那些与外部世界交互的操作。

0Has 的声明中表示它只能在一个被擦除的上下文中运行,所以它在运行时永远不会被运行。为了在 IO 内运行一个 App ,我们使用一个初始错误列表 Init (记住 ErrorType 的同义词):

Init : List Error
Init = [AppHasIO]

run : App {l} Init a -> IO a

Path 参数配合 l 使用,意味着我们可以为任何应用程序调用 run ,无论 PathNoThrow 还是 MayThrow 。但是,在实践中,所有给 run 的应用程序都不会在顶层抛出,因为唯一可用的异常类型是 AppHasIO 。任何异常都会在 App 里面被引入和处理。

异常和状态

Control.App 主要是为了更容易管理有异常和状态的应用程序的常见情况。我们可以抛出和捕捉错误列表中列出的异常( es 参数为 App ),并引入新的全局状态。

异常

List Error 是一个错误类型的列表,可通过定义在 Control.AppException 接口使用:

interface Exception err e where
  throw : err -> App e a
  catch : App e a -> (err -> App e a) -> App e a

要该异常类型存在于错误列表中,我们就可以使用 throwcatch 处理异常类型 err 。可以通过 HasErr 谓词来检查,被定义在``Control.App`` 中:

data HasErr : Error -> List Error -> Type where
     Here : HasErr e (e :: es)
     There : HasErr e es -> HasErr e (e' :: es)

HasErr err es => Exception err es where ...

注意 Exception 上的 HasErr 约束:这是在 Idris 2 中 auto 隐式机制和接口解析机制相同的符号方便的地方。最后,我们可以通过 handle 引入新的异常类型,它运行可能抛出的代码块,处理任何异常:

handle : App (err :: e) a ->
         (onok : a -> App e b) ->
         (onerr : err -> App e b) -> App e b

添加状态

应用程序通常需要跟踪状态,我们在 App 中使用 Control.App 中定义的 State 类型支持这个原语:

data State : (tag : a) -> Type -> List Error -> Type

tag 只被用于区分不同的状态,在运行时是不需要,如用于访问和更新的 getput 类型:

get : (0 tag : _) -> State tag t e => App {l} e t
put : (0 tag : _) -> State tag t e => (1 val : t) -> App {l} e ()

它们使用 auto-implicit 来隐式传递带有相关 tagState ,因此我们仅通过标签来引用状态。在前面的 helloCount 中,我们使用了一个空类型 Counter 作为标签:

data Counter : Type where -- complete definition

错误列表 e 用来确保状态只在其被引入的错误列表中可用。状态是用 new 引入的:

new : t -> (1 p : State tag t e => App {l} e a) -> App {l} e a

请注意,这个类型告诉我们 new 用这个状态运行程序正好一次。然而,我们通常不直接使用 StateException ,而是使用接口来约束错误列表中允许的操作。在内部, State 是通过 IORef 实现的,这主要是出于性能的考虑。

定义接口

Control.App 提供的运行 App 的唯一方法是通过 run 函数,它接收一个具体的错误列表 Init 。对这个错误列表的所有具体扩展都是通过 handle 以引入一个新的异常,或者 new 以引入一个新状态。为了有效地组成 App 程序,而不是笼统地引入具体的异常和状态,我们为在特定错误列表中工作的操作集合定义接口。

Console I/O 示例

我们已经看到了一个使用 Console 接口的初始示例,它在 Control.App.Console 中声明如下:

interface Console e where
  putChar : Char -> App {l} e ()
  putStr : String -> App {l} e ()
  getChar : App {l} e Char
  getLine : App {l} e String

它提供了用于写入和读取控制台的原语,并将路径参数推广到 | 意味着两者都不能抛出异常,因为它们必须在 NoThrowMayThrow 上下文中工作。

为了在顶层 IO 程序中实现这一点,我们需要访问原始的 IO 操作。 Control.App 库为此定义了一个原语接口:

interface PrimIO e where
  primIO : IO a -> App {l} e a
  fork : (forall e' . PrimIO e' => App {l} e' ()) -> App e ()

我们使用 primIO 来调用 IO 函数。我们还有一个 fork 原语,它在支持 PrimIO 的新错误列表中启动一个新线程。请注意, fork 启动了一个新的错误列表 e ,因此状态仅在单个线程中可用。

PrimIO 有一个错误列表的实现,可以将空类型作为异常抛出。这意味着如果 PrimIO 是唯一可用的接口,我们不能抛出异常,这与 IO 的定义是一致的。这也允许我们在初始错误列表 Init 中使用 PrimIO

HasErr AppHasIO e => PrimIO e where ...

鉴于此,我们可以实现 Console 并在 IO 中运行我们的 hello 程序。它在 Control.App.Console 中实现如下:

PrimIO e => Console e where
  putChar c = primIO $ putChar c
  putStr str = primIO $ putStr str
  getChar = primIO getChar
  getLine = primIO getLine

示例:文件 I/O

控制台 I/O 可以直接实现,但大多数 I/O 操作可能会失败。例如,打开文件失败的原因有多种:文件不存在;用户拥有错误的权限等。在 Idris 中, IO 原语在其类型中反映了这一点:

openFile : String -> Mode -> IO (Either FileError File)

虽然精确,但当有很长的 IO 操作序列时,这会变得笨拙。使用 App 时,我们可以提供一个接口,当操作失败时抛出异常,并保证使用 handle 在顶层处理任何异常。我们首先在 Control.App.FileIO 中定义 FileIO 接口:

interface Has [Exception IOError] e => FileIO e where
  withFile : String -> Mode ->
             (onError : IOError -> App e a) ->
             (onOpen : File -> App e a) ->
             App e a
  fGetStr : File -> App e String
  fGetChars : File -> Int -> App e String
  fGetChar : File -> App e Char
  fPutStr : File -> String -> App e ()
  fPutStrLn : File -> String -> App e ()
  fflush : File -> App e ()
  fEOF : File -> App e Bool

我们使用资源括号 - 将函数传递给 withFile 来处理打开的文件 - 而不是显式的 open 操作来打开文件,以确保文件句柄在完成时被清理。

还可以想象一个接口使用文件的线性资源,这在某些安全关键的上下文中可能是合适的,但对于大多数编程任务,异常应该就足够了。所有的操作都可能失败,接口明确表示,如果错误列表支持抛出和捕获 IOError 异常,我们只能实现 FileIOIOErrorControl.App 中定义。

例如,我们可以使用这个接口来实现 readFile ,如果在 withFile 中打开文件失败则抛出异常:

readFile : FileIO e => String -> App e String
readFile f = withFile f Read throw $ \h =>
               do content <- read [] h
                  pure (concat content)
where
  read : List String -> File -> App e (List String)
  read acc h = do eof <- fEOF h
                  if eof then pure (reverse acc)
                         else do str <- fGetStr h
                                 read (str :: acc) h

同样,这是在 Control.App.FileIO 中定义的。

要实现 FileIO ,我们需要通过 PrimIO 访问原始操作,以及在任何操作失败时抛出异常的能力。有了这个,我们可以如下实现 withFile ,例如:

Has [PrimIO, Exception IOError] e => FileIO e where
  withFile fname m onError proc
      = do Right h <- primIO $ openFile fname m
              | Left err => onError (FileErr (toFileEx err))
           res <- catch (proc h) onError
           primIO $ closeFile h
           pure res
  ...

鉴于 FileIO 的这个实现,我们可以运行 readFile ,前提是我们将它包装在一个顶级的 handle 函数中以处理 readFile 抛出的任何错误:

readMain : String -> App Init ()
readMain fname = handle (readFile fname)
       (\str => putStrLn $ "Success:\n" ++ show str)
       (\err : IOError => putStrLn $ "Error: " ++ show err)

线性资源

我们已经介绍了 App 用于编写交互式程序,使用接口来限制允许哪些操作,但还没有看到 Path 参数的作用。其目的是限制程序何时可以抛出异常,以了解允许线性资源使用的位置。 App 的绑定运算符定义如下(不是通过 Monad ):

data SafeBind : Path -> (l' : Path) -> Type where
     SafeSame : SafeBind l l
     SafeToThrow : SafeBind NoThrow MayThrow

(>>=) : SafeBind l l' =>
        App {l} e a -> (a -> App {l=l'} e b) -> App {l=l'} e b

这种类型背后的直觉是,当对两个 App 程序进行排序时:

  • 如果第一个动作可能抛出异常,那么整个程序就可能会抛出异常。

  • 如果第一个动作不能抛出异常,那么第二个动作仍然可以抛出,整个程序也就会抛出异常。

  • 如果两个动作都不会抛出异常,则整个程序都不会抛出异常。

类型中详细的原因是它对具有不同 Path 的程序进行排序很有用,但在这样做时,我们必须准确计算得到的 Path 。然后,如果我们想用线性变量对子程序进行排序,我们可以使用另一种绑定运算符来保证只运行一次延续:

bindL : App {l=NoThrow} e a ->
        (1 k : a -> App {l} e b) -> App {l} e b

为了说明 bindL 的必要性,我们可以尝试编写一个程序来跟踪安全数据存储的状态,这需要在读取数据之前登录。

示例:需要登录的数据存储

许多软件组件依赖于某种形式的状态,并且可能存在仅在特定状态下有效的操作。例如,考虑一个安全的数据存储,用户必须在其中登录才能访问某些秘密数据。该系统可以处于以下两种状态之一:

  • LoggedIn ,允许用户在其中读取秘密

  • LoggedOut ,用户无权访问机密

我们可以提供登录、注销和读取数据的命令,如下图所示:

登录

login 命令,如果成功,将整个系统状态从 LoggedOut 移动到 LoggedInlogout 命令将状态从 LoggedIn 移动到 LoggedOut 。最重要的是, readSecret 命令仅在系统处于 LoggedIn 状态时才有效。

我们可以使用线性类型的函数来表示状态转换。首先,我们定义一个用于连接和断开商店的接口:

interface StoreI e where
    connect : (1 prog : (1 d : Store LoggedOut) ->
              App {l} e ()) -> App {l} e ()
    disconnect : (1 d : Store LoggedOut) -> App {l} e ()

Neither connect nor disconnect throw, as shown by generalising over l. Once we have a connection, we can use the following functions to access the resource directly:

data Res : (a : Type) -> (a -> Type) -> Type where
     (#) : (val : a) -> (1 resource : r val) -> Res a r

login : (1 s : Store LoggedOut) -> (password : String) ->
        Res Bool (\ok => Store (if ok then LoggedIn else LoggedOut))
logout : (1 s : Store LoggedIn) -> Store LoggedOut
readSecret : (1 s : Store LoggedIn) ->
             Res String (const (Store LoggedIn))

Res is defined in the Prelude, since it is commonly useful. It is a dependent pair type, which associates a value with a linear resource. We’ll leave the other definitions abstract, for the purposes of this introductory example.

The following listing shows a complete program accessing the store, which reads a password, accesses the store if the password is correct and prints the secret data. It uses let (>>=) = bindL to redefine do-notation locally.

storeProg : Has [Console, StoreI] e => App e ()
storeProg = let (>>=) = bindL in
      do putStr "Password: "
         password <- getStr
         connect $ \s =>
           do let True # s = login s password
                | False # s => do putStrLn "Wrong password"
                                  disconnect s
              let str # s = readSecret s
              putStrLn $ "Secret: " ++ show str
              let s = logout s
              disconnect s

If we omit the let (>>=) = bindL, it will use the default (>>=) operator, which allows the continuation to be run multiple times, which would mean that s is not guaranteed to be accessed linearly, and storeProg would not type check. We can safely use getStr and putStr because they are guaranteed not to throw by the Path parameter in their types.

App1: Linear Interfaces

Adding the bindL function to allow locally rebinding the (>>=) operator allows us to combine existing linear resource programs with operations in App - at least, those that don’t throw. It would nevertheless be nice to interoperate more directly with App. One advantage of defining interfaces is that we can provide multiple implementations for different contexts, but our implementation of the data store uses primitive functions (which we left undefined in any case) to access the store.

To allow control over linear resources, Control.App provides an alternative parameterised type App1:

data App1 : {default One u : Usage} ->
            (es : List Error) -> Type -> Type

There is no need for a Path argument, since linear programs can never throw. The Usage argument states whether the value returned is to be used once, or has unrestricted usage, with the default in App1 being to use once:

data Usage = One | Any

The main difference from App is the (>>=) operator, which has a different multiplicity for the variable bound by the continuation depending on the usage of the first action:

Cont1Type : Usage -> Type -> Usage -> List Error -> Type -> Type
Cont1Type One a u e b = (1 x : a) -> App1 {u} e b
Cont1Type Any a u e b = (x : a) -> App1 {u} e b

(>>=) : {u : _} -> (1 act : App1 {u} e a) ->
        (1 k : Cont1Type u a u' e b) -> App1 {u=u'} e b

Cont1Type returns a continuation which uses the argument linearly, if the first App1 program has usage One, otherwise it returns a continuation where argument usage is unrestricted. Either way, because there may be linear resources in scope, the continuation is run exactly once and there can be no exceptions thrown.

Using App1, we can define all of the data store operations in a single interface, as shown in the following listing. Each operation other than disconnect returns a linear resource.

interface StoreI e where
  connect : App1 e (Store LoggedOut)
  login : (1 d : Store LoggedOut) -> (password : String) ->
          App1 e (Res Bool (\ok => Store (if ok then LoggedIn
                                                else LoggedOut))
  logout : (1 d : Store LoggedIn) -> App1 e (Store LoggedOut)
  readSecret : (1 d : Store LoggedIn) ->
               App1 e (Res String (const (Store LoggedIn)))
  disconnect : (1 d : Store LoggedOut) -> App {l} e ()

We can explicitly move between App and App1:

app : (1 p : App {l=NoThrow} e a) -> App1 {u=Any} e a
app1 : (1 p : App1 {u=Any} e a) -> App {l} e a

We can run an App program using app, inside App1, provided that it is guaranteed not to throw. Similarly, we can run an App1 program using app1, inside App, provided that the value it returns has unrestricted usage. So, for example, we can write:

storeProg : Has [Console, StoreI] e => App e ()
storeProg = app1 $ do
     store <- connect
     app $ putStr "Password: "
     ?what_next

This uses app1 to state that the body of the program is linear, then app to state that the putStr operation is in App. We can see that connect returns a linear resource by inspecting the hole what_next, which also shows that we are running inside App1:

 0 e : List Type
 1 store : Store LoggedOut
-------------------------------------
what_next : App1 e ()

For completeness, one way to implement the interface is as follows, with hard coded password and internal data:

Has [Console] e => StoreI e where
  connect
      = do app $ putStrLn "Connect"
           pure1 (MkStore "xyzzy")

  login (MkStore str) pwd
      = if pwd == "Mornington Crescent"
           then pure1 (True # MkStore str)
           else pure1 (False # MkStore str)
  logout (MkStore str) = pure1 (MkStore str)
  readSecret (MkStore str) = pure1 (str # MkStore str)

  disconnect (MkStore _)
      = putStrLn "Disconnect"

Then we can run it in main:

main : IO ()
main = run storeProg

外部函数接口

备注

Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/

Idris 2 旨在支持多个代码生成器。默认目标是 Chez Scheme,还支持 Racket 和 Gambit 代码生成器。但是,与 Idris 1 一样,其目的是支持多个平台上的多个目标,包括例如 JavaScript、JVM、.NET 和其他尚未发明的。这使得调用其他语言函数的外部函数接口 (FFI) 的设计有点挑战,因为理想情况下它将支持所有可能的目标!

为此,Idris 2 FFI 的目标是灵活和适应性强,同时仍然支持最常见的需求,而不需要太多外部语言中的 “胶水” 代码。

FFI 概述

外部函数使用 %foreign 指令声明,它采用以下一般形式:

%foreign [specifiers]
name : t

说明符是一个 Idris String ,它表示外部函数是用哪种语言编写的,它被称为什么,以及在哪里可以找到它。可能有多个说明符,并且代码生成器可以自由选择它理解的任何说明符 - 甚至完全忽略说明符并使用自己的方法。通常,说明符的形式为“Language:name,library”。例如,在 C 中:

%foreign "C:puts,libc"
puts : String -> PrimIO Int

由特定的代码生成器决定如何定位函数和库。在本文档中,我们将假设默认的 Chez Scheme 代码生成器(示例也适用于 Racket 或 Gambit 代码生成器)并且外部语音是 C。

Scheme 旁注

可以编写 Scheme 外部说明符以针对特定目标的口味。

以下示例显示了一个外部声明,它以特定于代码生成器选择的方式分配内存。在此示例中,不存在匹配每种风味的通用方案说明符,例如 scheme:foo ,所以它只会匹配列出的特定口味:

%foreign "scheme,chez:foreign-alloc"
         "scheme,racket:malloc"
         "C:malloc,libc"
allocMem : (bytes : Int) -> PrimIO AnyPtr

备注

如果您的后端(代码生成器)未指定但定义了 C FFI,它将能够使用 C:malloc,libc 说明符。

C 旁注

C 语言说明符用于任何后端都可以使用的通用函数,而后端又可以将 FFI 输出到 C。例如,Scheme。

常见的 C 函数不进行自动内存管理,将其推迟到各个后端。

标准 C 后端称为“RefC”,并使用 RefC 语言说明符。

FFI 示例

作为一个运行示例,我们将使用一个小的 C 文件。将以下内容保存到文件 smallc.c

#include <stdio.h>

int add(int x, int y) {
    return x+y;
}

int addWithMessage(char* msg, int x, int y) {
    printf("%s: %d + %d = %d\n", msg, x, y, x+y);
    return x+y;
}

然后,将其编译为共享库:

cc -shared smallc.c -o libsmall.so

我们现在可以编写一个 Idris 程序来调用其中的每一个函数。首先,我们将编写一个小程序,它使用 add 将两个整数相加:

%foreign "C:add,libsmall"
add : Int -> Int -> Int

main : IO ()
main = printLn (add 70 24)

%foreign 说明符声明 add 是用 C 语言编写的,在 libsmall 库中名为 add 。只要运行时能够找到 libsmall.so (实际上它会在当前目录和系统库路径中查找),我们就可以在 REPL 中运行它:

Main> :exec main
94

请注意,确保 Idris 函数和 C 函数具有相应的类型是程序员的责任。机器没有办法检查这个!如果你弄错了,你会得到不可预测的行为。

由于 add 没有副作用,我们给它一个 Int 返回类型。但是如果这个函数对外界有一些影响,比如 addWithMessage 呢?在这种情况下,我们使用 PrimIO Int 来表示它返回一个原语 IO 操作:

%foreign "C:addWithMessage,libsmall"
prim__addWithMessage : String -> Int -> Int -> PrimIO Int

在内部, PrimIO Int 是一个函数,它获取世界的当前(线性)状态,并返回一个带有更新的世界状态的 Int 。通常,Idris 程序中的 IO 操作被定义为 HasIO 接口的实例。我们可以使用 primIO 将原语操作转换为 HasIO 中可用的操作:

primIO : HasIO io => PrimIO a -> io a

因此,我们可以如下扩展我们的程序:

addWithMessage : HasIO io => String -> Int -> Int -> io Int
addWithMessage s x y = primIO $ prim__addWithMessage s x y

main : IO ()
main
    = do printLn (add 70 24)
         addWithMessage "Sum" 70 24
         pure ()

程序员可以通过 PrimIO 声明哪些函数是纯函数,哪些有副作用。执行以下内容:

Main> :exec main
94
Sum: 70 + 24 = 94

我们已经看到了两个外部函数的说明符:

%foreign "C:add,libsmall"
%foreign "C:addWithMessage,libsmall"

它们都具有相同的形式: "C:[name],libsmall" , 所以我们可以不写具体的 String ,而是写一个函数来计算说明符,并使用它来代替现在的字符串:

libsmall : String -> String
libsmall fn = "C:" ++ fn ++ ",libsmall"

%foreign (libsmall "add")
add : Int -> Int -> Int

%foreign (libsmall "addWithMessage")
prim__addWithMessage : String -> Int -> Int -> PrimIO Int

原语 FFI 类型

可以传递给外部函数和从外部函数返回的类型仅限于可以合理假设任何后端都可以处理的类型。在实践中,这意味着大多数原语类型,以及有限的其他类型。参数类型可以是以下任何原语:

  • Int

  • Char

  • Double (在 C 中为 double

  • Bits8

  • Bits16

  • Bits32

  • Bits64

  • String (在 C 中作为 char*)

  • Ptr tAnyPtr (在 C 中都是 void*

返回类型可以是上述任何一种,加上:

  • ()

  • PrimIO t ,其中 t 是除了 PrimIO 之外的有效返回类型。

处理 String 会导致一些复杂性,原因有很多:

  • 字符串可以有多种编码。在 Idris 运行时,字符串被编码为 UTF-8,但 C 不做任何假设。

  • 谁负责释放由 C 函数分配的字符串并不总是很清楚。

  • 在 C 中,字符串可以是 NULL ,但 Idris 字符串总是有一个值。

因此,当将 String 传入和传出 C 时,请记住以下几点:

  • C 函数返回的 char* 将被复制到 Idris 堆,并且 Idris 运行时立即对返回的 char* 调用 free 函数。

  • 如果 char*C 中可能是 NULL ,请使用 Ptr String 而不是 String

当使用 Ptr String 时,该值将作为 void* 传递,因此 Idris 代码不能直接访问。这是为了防止意外尝试将 NULL 用作 String 。尽管如此,您仍然可以使用它们并通过以下形式的外部函数转换为 String

char* getString(void *p) {
    return (char*)p;
}

void* mkString(char* str) {
    return (void*)str;
}

int isNullString(void* str) {
    return str == NULL;
}

例如,请参阅示例 示例:最小化的 Readline 绑定 绑定。

此外,外部函数可以接受*回调*,并接受和返回 C struct 指针。

回调

在 C 语言中,函数接受 callback 是很有用的,它是在完成一些工作后调用的函数。例如,我们可以编写一个函数,该函数接受一个回调,该回调接受一个 char* 和一个 int 并返回一个 char* ,在 C 语言中,如下所示(添加到 smallc. c 上面):

typedef char*(*StringFn)(char*, int);

char* applyFn(char* x, int y, StringFn f) {
    printf("Applying callback to %s %d\n", x, y);
    return f(x, y);
}

然后,我们可以通过将其声明为 %foreign 函数并将其包装在 HasIO 接口中来从 Idris 访问它,其中 C 函数调用 Idris 函数作为回调:

%foreign (libsmall "applyFn")
prim__applyFn : String -> Int -> (String -> Int -> String) -> PrimIO String

applyFn : HasIO io =>
          String -> Int -> (String -> Int -> String) -> io String
applyFn c i f = primIO $ prim__applyFn c i f

例如,我们可以尝试如下:

pluralise : String -> Int -> String
pluralise str x
    = show x ++ " " ++
             if x == 1
                then str
                else str ++ "s"

main : IO ()
main
    = do str1 <- applyFn "Biscuit" 10 pluralise
         putStrLn str1
         str2 <- applyFn "Tree" 1 pluralise
         putStrLn str2

作为一种变体,回调可能会产生副作用:

%foreign (libsmall "applyFn")
prim__applyFnIO : String -> Int -> (String -> Int -> PrimIO String) ->
                 PrimIO String

由于有回调,这对于提升到 HasIO 函数有点复杂,但是我们可以使用 toPrim : IO a -> PrimIO a 来做到这一点:

applyFnIO : HasIO io =>
            String -> Int -> (String -> Int -> IO String) -> io String
applyFnIO c i f = primIO $ prim__applyFnIO c i (\s, i => toPrim $ f s i)

请注意,回调显式的被包裹在 IO 中,因为 HasIO 没有提取原语 IO 操作的通用方法。

例如,我们可以扩展上面的 pluralise 示例以在回调中打印一条消息:

pluralise : String -> Int -> IO String
pluralise str x
    = do putStrLn "Pluralising"
         pure $ show x ++ " " ++
                if x == 1
                   then str
                   else str ++ "s"

main : IO ()
main
    = do str1 <- applyFnIO "Biscuit" 10 pluralise
         putStrLn str1
         str2 <- applyFnIO "Tree" 1 pluralise
         putStrLn str2

结构体

许多 C API 传递更复杂的数据结构,如 struct 。我们并不打算在我们支持的 C 类型中完全通用,因为这会使编写跨多个后端可移植的代码变得更加困难。但是,能够直接访问 struct 通常会很有用。例如,将以下内容添加到 smallc.c 的顶部,并重新构建 libsmall.so

#include <stdlib.h>

typedef struct {
    int x;
    int y;
} point;

point* mkPoint(int x, int y) {
    point* pt = malloc(sizeof(point));
    pt->x = x;
    pt->y = y;
    return pt;
}

void freePoint(point* pt) {
    free(pt);
}

我们可以通过导入 System.FFI 并使用 Struct 类型在 Idris 中定义一个访问 point 的类型,如下所示:

Point : Type
Point = Struct "point" [("x", Int), ("y", Int)]

%foreign (libsmall "mkPoint")
mkPoint : Int -> Int -> Point

%foreign (libsmall "freePoint")
prim__freePoint : Point -> PrimIO ()

freePoint : Point -> IO ()
freePoint p = primIO $ prim__freePoint p

Idris 中的 Point 类型现在对应于 C 中的 point* 。可以使用以下命令读取和写入字段,也可以通过 System.FFI

getField : Struct s fs -> (n : String) ->
           FieldType n ty fs => ty
setField : Struct s fs -> (n : String) ->
           FieldType n ty fs => ty -> IO ()

请注意,字段是按名称访问的,并且必须在结构中可用,给定约束 FieldType n ty fs ,它指出结构字段 fs 中名为 n 的字段具有类型 ty 。因此,我们可以通过如下所示直接访问字段来显示 Point

showPoint : Point -> String
showPoint pt
    = let x : Int = getField pt "x"
          y : Int = getField pt "y" in
          show (x, y)

而且,作为一个完整的例子,我们可以初始化、更新、显示和删除一个 Point ,如下所示:

main : IO ()
main = do let pt = mkPoint 20 30
          setField pt "x" (the Int 40)
          putStrLn $ showPoint pt
          freePoint pt

Struct 的字段类型可以是以下任何一种:

  • Int

  • Char

  • Double (C 中为 double)

  • Bits8

  • Bits16

  • Bits32

  • Bits64

  • Ptr aAnyPtr (C 中的 void*

  • 另一个 Struct ,在C中它是指向 struct 的指针

请注意,这不包括 String 或函数类型!这主要是因为 Chez 后端不直接支持这些。但是,您可以使用另一种指针类型并进行转换。例如,假设你在 C 中有:

typedef struct {
    char* name;
    point* pt;
} namedpoint;

您可以在 Idris 中将其表示为:

NamedPoint : Type
NamedPoint
    = Struct "namedpoint"
               [("name", Ptr String),
               ("pt", Point)]

也就是说,直接使用 Ptr String 而不是 String 。然后你可以在 C 中的 void*char* 之间进行转换:

char* getString(void *p) {
    return (char*)p;
}

…并在 Idris 中使用它转换为 String

%foreign (pfn "getString")
getString : Ptr String -> String

决赛选手

在某些库中,外部函数创建一个指针,调用者负责释放它。在这种情况下,您可以对 free 进行显式的外部调用。然而,这并不总是方便的,甚至是不可能的。相反,您可以使用 Prelude 中定义的 onCollect (或其无类型变体 onCollectAny )要求 Idris 运行时负责在指针不再可访问时释放它:

onCollect : Ptr t -> (Ptr t -> IO ()) -> IO (GCPtr t)
onCollectAny : AnyPtr -> (AnyPtr -> IO ()) -> IO GCAnyPtr

当传递给外部函数时, GCPtr t 的行为与 Ptr t 完全相同(类似地, GCAnyPtr 的行为类似于 AnyPtr )。然而,外部函数不能返回 GCPtr ,因为我们不能再假设指针完全由 Idris 运行时管理。

当垃圾收集器确定指针不再可访问时,或者在执行结束时调用终结器。

请注意,并非所有后端都支持终结器,因为它们依赖于特定后端运行时系统提供的设施。 Chez Scheme 和 Racket 后端肯定支持它们。

示例:最小化的 Readline 绑定

In this section, we’ll see how to create bindings for a C library (the GNU Readline library) in Idris, and make them available in a package. We’ll only create the most minimal bindings, but nevertheless they demonstrate some of the trickier problems in creating bindings to a C library, in that they need to handle memory allocation of String.

You can find the example in full in the Idris 2 source repository, in samples/FFI-readline. As a minimal example, this can be used as a starting point for other C library bindings.

We are going to provide bindings to the following functions in the Readline API, available via #include <readline/readline.h>:

char* readline (const char *prompt);
void add_history(const char *string);

Additionally, we are going to support tab completion, which in the Readline API is achieved by setting a global variable to a callback function (see Section 回调) which explains how to handle the completion:

typedef char *rl_compentry_func_t (const char *, int);
rl_compentry_func_t * rl_completion_entry_function;

A completion function takes a String, which is the text to complete, and an Int, which is the number of times it has asked for a completion so far. In Idris, this could be a function complete : String -> Int -> IO String. So, for example, if the text so far is "id", and the possible completions are idiomatic and idris, then complete "id" 0 would produce the string "idiomatic" and complete "id" 1 would produce "idris".

We will define glue functions in a C file idris_readline.c, which compiles to a shared object libidrisreadline, so we write a function for locating the C functions:

rlib : String -> String
rlib fn = "C:" ++ fn ++ ",libidrisreadline"

Each of the foreign bindings will have a %foreign specifier which locates functions via rlib.

Basic behaviour: Reading input, and history

We can start by writing a binding for readline directly. It’s interactive, so needs to return a PrimIO:

%foreign (rlib "readline")
prim__readline : String -> PrimIO String

Then, we can write an IO wrapper:

readline : String -> IO String
readline prompt = primIO $ readline prompt

Unfortunately, this isn’t quite good enough! The C readline function returns a NULL string if there is no input due to encountering an end of file. So, we need to handle that - if we don’t, we’ll get a crash on encountering end of file (remember: it’s the Idris programmer’s responsibility to give an appropriate type to the C binding!)

Instead, we need to use a Ptr to say that it might be a NULL pointer (see Section 原语 FFI 类型):

%foreign (rlib "readline")
prim__readline : String -> PrimIO (Ptr String)

We also need to provide a way to check whether the returned Ptr String is NULL. To do so, we’ll write some glue code to convert back and forth between Ptr String and String, in a file idris_readline.c and a corresponding header idris_readline.h. In idris_readline.h we have:

int isNullString(void* str); // return 0 if a string in NULL, non zero otherwise
char* getString(void* str); // turn a non-NULL Ptr String into a String (assuming not NULL)
void* mkString(char* str); // turn a String into a Ptr String
void* nullString(); // create a new NULL String

Correspondingly, in idris_readline.c:

int isNullString(void* str) {
    return str == NULL;
}

char* getString(void* str) {
    return (char*)str;
}

void* mkString(char* str) {
    return (void*)str;
}

void* nullString() {
    return NULL;
}

Now, we can use prim__readline as follows, with a safe API, checking whether the result it returns is NULL or a concrete String:

%foreign (rlib "isNullString")
prim__isNullString : Ptr String -> Int

export
isNullString : Ptr String -> Bool
isNullString str = if prim__isNullString str == 0 then False else True

export
readline : String -> IO (Maybe String)
readline s
    = do mstr <- primIO $ prim__readline s
         if isNullString mstr
            then pure $ Nothing
            else pure $ Just (getString mstr)

We’ll need nullString and mkString later, for dealing with completions.

Once we’ve read a string, we’ll want to add it to the input history. We can provide a binding to add_history as follows:

%foreign (rlib "add_history")
prim__add_history : String -> PrimIO ()

export
addHistory : String -> IO ()
addHistory s = primIO $ prim__add_history s

In this case, since Idris is in control of the String, we know it’s not going to be NULL, so we can add it directly.

A small readline program that reads input, and echoes it, recording input history for non-empty inputs, can be written as follows:

echoLoop : IO ()
echoLoop
    = do Just x <- readline "> "
              | Nothing => putStrLn "EOF"
         putStrLn ("Read: " ++ x)
         when (x /= "") $ addHistory x
         if x /= "quit"
            then echoLoop
            else putStrLn "Done"

This gives us command history, and command line editing, but Readline becomes much more useful when we add tab completion. The default tab completion, which is available even in the small example above, is to tab complete file names in the current working directory. But for any realistic application, we probably want to tab complete other commands, such as function names, references to local data, or anything that is appropriate for the application.

Completions

Readline has a large API, with several ways of supporting tab completion, typically involving setting a global variable to an appropriate completion function. We’ll use the following:

typedef char *rl_compentry_func_t (const char *, int);
rl_compentry_func_t * rl_completion_entry_function;

The completion function takes the prefix of the completion, and the number of times it has been called so far on this prefix, and returns the next completion, or NULL if there are no more completions. An Idris equivalent would therefore have the following type:

setCompletionFn : (String -> Int -> IO (Maybe String)) -> IO ()

The function returns Nothing if there are no more completions, or Just str for some str if there is another one for the current input.

We might hope that it’s a matter of defining a function to assign the completion function…

void idrisrl_setCompletion(rl_compentry_func_t* fn) {
    rl_completion_entry_function = fn;
}

…then defining the Idris binding, which needs to take into account that the Readline library expects NULL when there are no more completions:

%foreign (rlib "idrisrl_setCompletion")
prim__setCompletion : (String -> Int -> PrimIO (Ptr String)) -> PrimIO ()

export
setCompletionFn : (String -> Int -> IO (Maybe String)) -> IO ()
setCompletionFn fn
    = primIO $ prim__setCompletion $ \s, i => toPrim $
          do mstr <- fn s i
             case mstr of
                  Nothing => pure nullString // need to return a Ptr String to readline!
                  Just str => pure (mkString str)

So, we turn Nothing into nullString and Just str into mkString str. Unfortunately, this doesn’t quite work. To see what goes wrong, let’s try it for the most basic completion function that returns one completion no matter what the input:

testComplete : String -> Int -> IO (Maybe String)
testComplete text 0 = pure $ Just "hamster"
testComplete text st = pure Nothing

We’ll try this in a small modification of echoLoop above, setting a completion function first:

main : IO ()
main = do setCompletionFn testComplete
          echoLoop

We see that there is a problem when we try running it, and hitting TAB before entering anything:

Main> :exec main
> free(): invalid pointer

The Idris code which sets up the completion is fine, but there is a problem with the memory allocation in the C glue code.

This problem arises because we haven’t thought carefully enough about which parts of our program are responsible for allocating and freeing strings. When Idris calls a foreign function that returns a string, it copies the string to the Idris heap and frees it immediately. But, if the foreign library also frees the string, it ends up being freed twice. This is what’s happening here: the callback passed to prim__setCompletion frees the string and puts it onto the Idris heap, but Readline also frees the string returned by prim__setCompletion once it has processed it. We can solve this problem by writing a wrapper for the completion function which reallocates the string, and using that in idrisrl_setCompletion instead.

rl_compentry_func_t* my_compentry;

char* compentry_wrapper(const char* text, int i) {
    char* res = my_compentry(text, i); // my_compentry is an Idris function, so res is on the Idris heap,
                                       // and freed on return
    if (res != NULL) {
        char* comp = malloc(strlen(res)+1); // comp is passed back to readline, which frees it when
                                            // it is finished with it
        strcpy(comp, res);
        return comp;
    }
    else {
        return NULL;
    }
}

void idrisrl_setCompletion(rl_compentry_func_t* fn) {
    rl_completion_entry_function = compentry_wrapper;
    my_compentry = fn; // fn is an Idris function, called by compentry_wrapper
}

So, we define the completion function in C, which calls the Idris completion function then makes sure the string returned by the Idris function is copied to the C heap.

We now have a primitive API that covers the most fundamental features of the readline API:

readline : String -> IO (Maybe String)
addHistory : String -> IO ()
setCompletionFn : (String -> Int -> IO (Maybe String)) -> IO ()

定理证明

Idris 2 中的定理证明教程。

备注

Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/

Before we discuss the details of theorem proving in Idris, we will describe some fundamental concepts:

  • Propositions and judgments

  • Boolean and constructive logic

  • Curry-Howard correspondence

  • Definitional and propositional equalities

  • Axiomatic and constructive approaches

Propositions and Judgments

Propositions are the subject of our proofs. Before the proof, we can’t formally say if they are true or not. If the proof is successful then the result is a ‘judgment’. For instance, if the proposition is,

1+1=2

When we prove it, the judgment is,

1+1=2 true

Or if the proposition is,

1+1=3

we can’t prove it is true, but it is still a valid proposition and perhaps we can prove it is false so the judgment is,

1+1=3 false

This may seem a bit pedantic but it is important to be careful: in mathematics not every proposition is true or false. For instance, a proposition may be unproven or even unprovable.

So the logic here is different from the logic that comes from boolean algebra. In that case what is not true is false and what is not false is true. The logic we are using here does not have this law, the “Law of Excluded Middle”, so we cannot use it.

A false proposition is taken to be a contradiction and if we have a contradiction then we can prove anything, so we need to avoid this. Some languages, used in proof assistants, prevent contradictions.

The logic we are using is called constructive (or sometimes intuitional) because we are constructing a ‘database’ of judgments.

Curry-Howard correspondence

So how do we relate these proofs to Idris programs? It turns out that there is a correspondence between constructive logic and type theory. They have the same structure and we can switch back and forth between the two notations.

The way that this works is that a proposition is a type so…

Main> 1 + 1 = 2
2 = 2

Main> :t 1 + 1 = 2
(fromInteger 1 + fromInteger 1) === fromInteger 2 : Type

…is a proposition and it is also a type. The following will also produce an equality type:

Main> 1 + 1 = 3
2 = 3

Both of these are valid propositions so both are valid equality types. But how do we represent a true judgment? That is, how do we denote 1+1=2 is true but not 1+1=3? A type that is true is inhabited, that is, it can be constructed. An equality type has only one constructor ‘Refl’ so a proof of 1+1=2 is

onePlusOne : 1+1=2
onePlusOne = Refl

Now that we can represent propositions as types other aspects of propositional logic can also be translated to types as follows:

propositions

example of possible type

A

x=y

B

y=z

and

A /\ B

Pair(x=y,y=z)

or

A \/ B

Either(x=y,y=z)

implies

A -> B

(x=y) -> (y=z)

for all

y=z

exists

y=z

And (conjunction)

We can have a type which corresponds to conjunction:

AndIntro : a -> b -> A a b

There is a built in type called ‘Pair’.

Or (disjunction)

We can have a type which corresponds to disjunction:

data Or : Type -> Type -> Type where
OrIntroLeft : a -> A a b
OrIntroRight : b -> A a b

There is a built in type called ‘Either’.

Definitional and Propositional Equalities

We have seen that we can ‘prove’ a type by finding a way to construct a term. In the case of equality types there is only one constructor which is Refl. We have also seen that each side of the equation does not have to be identical like ‘2=2’. It is enough that both sides are definitionally equal like this:

onePlusOne : 1+1=2
onePlusOne = Refl

Both sides of this equation normalise to 2 and so Refl matches and the proposition is proved.

We don’t have to stick to terms; we can also use symbolic parameters so the following type checks:

varIdentity : m = m
varIdentity = Refl

If a proposition/equality type is not definitionally equal but is still true then it is propositionally equal. In this case we may still be able to prove it but some steps in the proof may require us to add something into the terms or at least to take some sideways steps to get to a proof.

Especially when working with equalities containing variable terms (inside functions) it can be hard to know which equality types are definitionally equal, in this example plusReducesL is definitionally equal but plusReducesR is not (although it is propositionally equal). The only difference between them is the order of the operands.

plusReducesL : (n:Nat) -> plus Z n = n
plusReducesL n = Refl

plusReducesR : (n:Nat) -> plus n Z = n
plusReducesR n = Refl

Checking plusReducesR gives the following error:

Proofs.idr:21:18--23:1:While processing right hand side of Main.plusReducesR at Proofs.idr:21:1--23:1:
Can't solve constraint between:
        plus n Z
and
        n

So why is Refl able to prove some equality types but not others?

The first answer is that plus is defined by recursion on its first argument. So, when the first argument is Z, it reduces, but not when the second argument is Z.

If an equality type can be proved/constructed by using Refl alone it is known as a definitional equality. In order to be definitionally equal both sides of the equation must normalise to the same value.

So when we type 1+1 in Idris it is immediately reduced to 2 because definitional equality is built in

Main> 1+1
2

In the following pages we discuss how to resolve propositional equalities.

Running example: Addition of Natural Numbers

Throughout this tutorial, we will be working with the following function, defined in the Idris prelude, which defines addition on natural numbers:

plus : Nat -> Nat -> Nat
plus Z     m = m
plus (S k) m = S (plus k m)

It is defined by the above equations, meaning that we have for free the properties that adding m to zero always results in m, and that adding m to any non-zero number S k always results in S (plus k m). We can see this by evaluation at the Idris REPL (i.e. the prompt, the read-eval-print loop):

Main> \m => plus Z m
\m => m

Idris> \k,m => plus (S k) m
\k => \m => S (plus k m)

Note that unlike many other language REPLs, the Idris REPL performs evaluation on open terms, meaning that it can reduce terms which appear inside lambda bindings, like those above. Therefore, we can introduce unknowns k and m as lambda bindings and see how plus reduces.

The plus function has a number of other useful properties, for example:

  • It is commutative, that is for all Nat inputs n and m, we know that plus n m = plus m n.

  • It is associative, that is for all Nat inputs n, m and p, we know that plus n (plus m p) = plus (plus m n) p.

We can use these properties in an Idris program, but in order to do so we must prove them.

Equality Proofs

Idris defines a propositional equality type as follows:

data Equal : a -> b -> Type where
   Refl : Equal x x

As syntactic sugar, Equal x y can be written as x = y.

It is propositional equality, where the type states that any two values in different types a and b may be proposed to be equal. There is only one way to prove equality, however, which is by reflexivity (Refl).

We have a type for propositional equality here, and correspondingly a program inhabiting an instance of this type can be seen as a proof of the corresponding proposition 1. So, trivially, we can prove that 4 equals 4:

four_eq : 4 = 4
four_eq = Refl

However, trying to prove that 4 = 5 results in failure:

four_eq_five : 4 = 5
four_eq_five = Refl

The type 4 = 5 is a perfectly valid type, but is uninhabited, so when trying to type check this definition, Idris gives the following error:

When unifying 4 = 4 and (fromInteger 4) = (fromInteger 5)
Mismatch between:
        4
and
        5
Type checking equality proofs

An important step in type checking Idris programs is unification, which attempts to resolve implicit arguments such as the implicit argument x in Refl. As far as our understanding of type checking proofs is concerned, it suffices to know that unifying two terms involves reducing both to normal form then trying to find an assignment to implicit arguments which will make those normal forms equal.

When type checking Refl, Idris requires that the type is of the form x = x, as we see from the type of Refl. In the case of four_eq_five, Idris will try to unify the expected type 4 = 5 with the type of Refl, x = x, notice that a solution requires that x be both 4 and 5, and therefore fail.

Since type checking involves reduction to normal form, we can write the following equalities directly:

twoplustwo_eq_four : 2 + 2 = 4
twoplustwo_eq_four = Refl

plus_reduces_Z : (m : Nat) -> plus Z m = m
plus_reduces_Z m = Refl

plus_reduces_Sk : (k, m : Nat) -> plus (S k) m = S (plus k m)
plus_reduces_Sk k m = Refl

Heterogeneous Equality

Equality in Idris is heterogeneous, meaning that we can even propose equalities between values in different types:

idris_not_php : Z = "Z"

The type Z = "Z" is uninhabited, and one might wonder why it is useful to be able to propose equalities between values in different types. However, with dependent types, such equalities can arise naturally. For example, if two vectors are equal, their lengths must be equal:

vect_eq_length : (xs : Vect n a) -> (ys : Vect m a) ->
                 (xs = ys) -> n = m

In the above declaration, xs and ys have different types because their lengths are different, but we would still like to draw a conclusion about the lengths if they happen to be equal. We can define vect_eq_length as follows:

vect_eq_length xs xs Refl = Refl

By matching on Refl for the third argument, we know that the only valid value for ys is xs, because they must be equal, and therefore their types must be equal, so the lengths must be equal.

Alternatively, we can put an underscore for the second xs, since there is only one value which will type check:

vect_eq_length xs _ Refl = Refl

Properties of plus

Using the (=) type, we can now state the properties of plus given above as Idris type declarations:

plus_commutes : (n, m : Nat) -> plus n m = plus m n
plus_assoc : (n, m, p : Nat) -> plus n (plus m p) = plus (plus n m) p

Both of these properties (and many others) are proved for natural number addition in the Idris standard library, using (+) from the Num interface rather than using plus directly. They have the names plusCommutative and plusAssociative respectively.

In the remainder of this tutorial, we will explore several different ways of proving plus_commutes (or, to put it another way, writing the function.) We will also discuss how to use such equality proofs, and see where the need for them arises in practice.

1

This is known as the Curry-Howard correspondence.

Inductive Proofs

Before embarking on proving plus_commutes in Idris itself, let us consider the overall structure of a proof of some property of natural numbers. Recall that they are defined recursively, as follows:

data Nat : Type where
     Z : Nat
     S : Nat -> Nat

A total function over natural numbers must both terminate, and cover all possible inputs. Idris checks functions for totality by checking that all inputs are covered, and that all recursive calls are on structurally smaller values (so recursion will always reach a base case). Recalling plus:

plus : Nat -> Nat -> Nat
plus Z     m = m
plus (S k) m = S (plus k m)

This is total because it covers all possible inputs (the first argument can only be Z or S k for some k, and the second argument m covers all possible Nat) and in the recursive call, k is structurally smaller than S k so the first argument will always reach the base case Z in any sequence of recursive calls.

In some sense, this resembles a mathematical proof by induction (and this is no coincidence!). For some property P of a natural number x, we can show that P holds for all x if:

  • P holds for zero (the base case).

  • Assuming that P holds for k, we can show P also holds for S k (the inductive step).

In plus, the property we are trying to show is somewhat trivial (for all natural numbers x, there is a Nat which need not have any relation to x). However, it still takes the form of a base case and an inductive step. In the base case, we show that there is a Nat arising from plus n m when n = Z, and in the inductive step we show that there is a Nat arising when n = S k and we know we can get a Nat inductively from plus k m. We could even write a function capturing all such inductive definitions:

nat_induction :
    (prop : Nat -> Type) ->                -- Property to show
    (prop Z) ->                            -- Base case
    ((k : Nat) -> prop k -> prop (S k)) -> -- Inductive step
    (x : Nat) ->                           -- Show for all x
    prop x
nat_induction prop p_Z p_S Z = p_Z
nat_induction prop p_Z p_S (S k) = p_S k (nat_induction prop p_Z p_S k)

Using nat_induction, we can implement an equivalent inductive version of plus:

plus_ind : Nat -> Nat -> Nat
plus_ind n m
   = nat_induction (\x => Nat)
                   m                      -- Base case, plus_ind Z m
                   (\k, k_rec => S k_rec) -- Inductive step plus_ind (S k) m
                                          -- where k_rec = plus_ind k m
                   n

To prove that plus n m = plus m n for all natural numbers n and m, we can also use induction. Either we can fix m and perform induction on n, or vice versa. We can sketch an outline of a proof; performing induction on n, we have:

  • Property prop is \x => plus x m = plus m x.

  • Show that prop holds in the base case and inductive step:

    • Base case: prop Z, i.e.
      plus Z m = plus m Z, which reduces to
      m = plus m Z due to the definition of plus.
    • Inductive step: Inductively, we know that prop k holds for a specific, fixed k, i.e.
      plus k m = plus m k (the induction hypothesis). Given this, show prop (S k), i.e.
      plus (S k) m = plus m (S k), which reduces to
      S (plus k m) = plus m (S k). From the induction hypothesis, we can rewrite this to
      S (plus m k) = plus m (S k).

To complete the proof we therefore need to show that m = plus m Z for all natural numbers m, and that S (plus m k) = plus m (S k) for all natural numbers m and k. Each of these can also be proved by induction, this time on m.

We are now ready to embark on a proof of commutativity of plus formally in Idris.

Pattern Matching Proofs

In this section, we will provide a proof of plus_commutes directly, by writing a pattern matching definition. We will use interactive editing features extensively, since it is significantly easier to produce a proof when the machine can give the types of intermediate values and construct components of the proof itself. The commands we will use are summarised below. Where we refer to commands directly, we will use the Vim version, but these commands have a direct mapping to Emacs commands.

Command

Vim binding

Emacs binding

Explanation

Check type

\t

C-c C-t

Show type of identifier or hole under the cursor.

Proof search

\s

C-c C-a

Attempt to solve hole under the cursor by applying simple proof search.

Make new definition

\a

C-c C-s

Add a template definition for the type defined under the cursor.

Make lemma

\l

C-c C-e

Add a top level function with a type which solves the hole under the cursor.

Split cases

\c

C-c C-c

Create new constructor patterns for each possible case of the variable under the cursor.

Creating a Definition

To begin, create a file pluscomm.idr containing the following type declaration:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n

To create a template definition for the proof, press \a (or the equivalent in your editor of choice) on the line with the type declaration. You should see:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes n m = ?plus_commutes_rhs

To prove this by induction on n, as we sketched in Section Inductive Proofs, we begin with a case split on n (press \c with the cursor over the n in the definition.) You should see:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = ?plus_commutes_rhs_1
plus_commutes (S k) m = ?plus_commutes_rhs_2

If we inspect the types of the newly created holes, plus_commutes_rhs_1 and plus_commutes_rhs_2, we see that the type of each reflects that n has been refined to Z and S k in each respective case. Pressing \t over plus_commutes_rhs_1 shows:

   m : Nat
-------------------------------------
plus_commutes_rhs_1 : m = plus m Z

Similarly, for plus_commutes_rhs_2:

  k : Nat
  m : Nat
--------------------------------------
plus_commutes_rhs_2 : (S (plus k m)) = (plus m (S k))

It is a good idea to give these slightly more meaningful names:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = ?plus_commutes_Z
plus_commutes (S k) m = ?plus_commutes_S

Base Case

We can create a separate lemma for the base case interactively, by pressing \l with the cursor over plus_commutes_Z. This yields:

plus_commutes_Z : (m : Nat) -> m = plus m Z

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z m
plus_commutes (S k) m = ?plus_commutes_S

That is, the hole has been filled with a call to a top level function plus_commutes_Z, applied to the variable in scope m.

Unfortunately, we cannot prove this lemma directly, since plus is defined by matching on its first argument, and here plus m Z has a concrete value for its second argument (in fact, the left hand side of the equality has been reduced from plus Z m.) Again, we can prove this by induction, this time on m.

First, create a template definition with \d:

plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z m = ?plus_commutes_Z_rhs

Now, case split on m with \c:

plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = ?plus_commutes_Z_rhs_1
plus_commutes_Z (S k) = ?plus_commutes_Z_rhs_2

Checking the type of plus_commutes_Z_rhs_1 shows the following, which is provable by Refl:

--------------------------------------
plus_commutes_Z_rhs_1 : Z = Z

For such immediate proofs, we can let write the proof automatically by pressing \s with the cursor over plus_commutes_Z_rhs_1. This yields:

plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = Refl
plus_commutes_Z (S k) = ?plus_commutes_Z_rhs_2

For plus_commutes_Z_rhs_2, we are not so lucky:

   k : Nat
-------------------------------------
plus_commutes_Z_rhs_2 : S k = S (plus k Z)

Inductively, we should know that k = plus k Z, and we can get access to this inductive hypothesis by making a recursive call on k, as follows:

plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = Refl
plus_commutes_Z (S k)
   = let rec = plus_commutes_Z k in
         ?plus_commutes_Z_rhs_2

For plus_commutes_Z_rhs_2, we now see:

   k : Nat
   rec : k = plus k Z
-------------------------------------
plus_commutes_Z_rhs_2 : S k = S (plus k Z)

So we know that k = plus k Z, but how do we use this to update the goal to S k = S k?

To achieve this, Idris provides a replace function as part of the prelude:

Main> :t replace
Builtin.replace : (0 rule : x = y) -> p x -> p y

Given a proof that x = y, and a property p which holds for x, we can get a proof of the same property for y, because we know x and y must be the same. Note the multiplicity on rule means that it’s guaranteed to be erased at run time. In practice, this function can be a little tricky to use because in general the implicit argument p can be hard to infer by unification, so Idris provides a high level syntax which calculates the property and applies replace:

rewrite prf in expr

If we have prf : x = y, and the required type for expr is some property of x, the rewrite ... in syntax will search for all occurrences of x in the required type of expr and replace them with y. We want to replace plus k Z with k, so we need to apply our rule rec in reverse, which we can do using sym from the Prelude

Main> :t sym
Builtin.sym : (0 rule : x = y) -> y = x

Concretely, in our example, we can say:

plus_commutes_Z (S k)
   = let rec = plus_commutes_Z k in
         rewrite sym rec in ?plus_commutes_Z_rhs_2

Checking the type of plus_commutes_Z_rhs_2 now gives:

   k : Nat
   rec : k = plus k Z
-------------------------------------
plus_commutes_Z_rhs_2 : S k = S k

Using the rewrite rule rec, the goal type has been updated with plus k Z replaced by k.

We can use proof search (\s) to complete the proof, giving:

plus_commutes_Z : (m : Nat) -> m = plus m Z
plus_commutes_Z Z = Refl
plus_commutes_Z (S k)
   = let rec = plus_commutes_Z k in
         rewrite sym rec in Refl

The base case of plus_commutes is now complete.

Inductive Step

Our main theorem, plus_commutes should currently be in the following state:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z m
plus_commutes (S k) m = ?plus_commutes_S

Looking again at the type of plus_commutes_S, we have:

   k : Nat
   m : Nat
-------------------------------------
plus_commutes_S : S (plus k m) = plus m (S k)

Conveniently, by induction we can immediately tell that plus k m = plus m k, so let us rewrite directly by making a recursive call to plus_commutes. We add this directly, by hand, as follows:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z
plus_commutes (S k) m = rewrite plus_commutes k m in ?plus_commutes_S

Checking the type of plus_commutes_S now gives:

   k : Nat
   m : Nat
-------------------------------------
plus_commutes_S : S (plus m k) = plus m (S k)

The good news is that m and k now appear in the correct order. However, we still have to show that the successor symbol S can be moved to the front in the right hand side of this equality. This remaining lemma takes a similar form to the plus_commutes_Z; we begin by making a new top level lemma with \l. This gives:

plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)

Again, we make a template definition with \a:

plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)
plus_commutes_S k m = ?plus_commutes_S_rhs

Like plus_commutes_Z, we can define this by induction over m, since plus is defined by matching on its first argument. The complete definition is:

total
plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)
plus_commutes_S k Z = Refl
plus_commutes_S k (S j) = rewrite plus_commutes_S k j in Refl

All holes have now been solved.

The total annotation means that we require the final function to pass the totality checker; i.e. it will terminate on all possible well-typed inputs. This is important for proofs, since it provides a guarantee that the proof is valid in all cases, not just those for which it happens to be well-defined.

Now that plus_commutes has a total annotation, we have completed the proof of commutativity of addition on natural numbers.

This page attempts to explain some of the techniques used in Idris to prove propositional equalities.

Proving Propositional Equality

We have seen that definitional equalities can be proved using Refl since they always normalise to values that can be compared directly.

However with propositional equalities we are using symbolic variables, which do not always normalise.

So to take the previous example:

plusReducesR : (n : Nat) -> plus n Z = n

In this case plus n Z does not normalise to n. Even though both sides of the equality are provably equal we cannot claim Refl as a proof.

If the pattern match cannot match for all n then we need to match all possible values of n. In this case

plusReducesR : (n : Nat) -> plus n Z = n
plusReducesR Z = Refl
plusReducesR (S k)
    = let rec = plusReducesR k in
          rewrite rec in Refl

we can’t use Refl to prove plus n 0 = n for all n. Instead, we call it for each case separately. So, in the second line for example, the type checker substitutes Z for n in the type being matched, and reduces the type accordingly.

Replace

This implements the ‘indiscernability of identicals’ principle, if two terms are equal then they have the same properties. In other words, if x=y, then we can substitute y for x in any expression. In our proofs we can express this as:

if x=y then prop x = prop y

where prop is a pure function representing the property. In the examples below prop is an expression in some variable with a type like this: prop: n -> Type

So if n is a natural number variable then prop could be something like \n => 2*n + 3.

To use this in our proofs there is the following function in the prelude:

||| Perform substitution in a term according to some equality.
replace : forall x, y, prop . (0 rule : x = y) -> prop x -> prop y
replace Refl prf = prf

If we supply an equality (x=y) and a proof of a property of x (prop x) then we get a proof of a property of y (prop y). So, in the following example, if we supply p1 x which is a proof that x=2 and the equality x=y then we get a proof that y=2.

p1: Nat -> Type
p1 n = (n=2)

testReplace: (x=y) -> (p1 x) -> (p1 y)
testReplace a b = replace a b

Rewrite

In practice, replace can be a little tricky to use because in general the implicit argument prop can be hard to infer for the machine, so Idris provides a high level syntax which calculates the property and applies replace.

Example: again we supply p1 x which is a proof that x=2 and the equality y=x then we get a proof that y=2.

p1: Nat -> Type
p1 x = (x=2)

testRewrite: (y=x) -> (p1 x) -> (p1 y)
testRewrite a b = rewrite a in b

We can think of rewrite as working in this way:

  • Start with a equation x=y and a property prop : x -> Type

  • Search for x in prop

  • Replaces all occurrences of x with y in prop.

That is, we are doing a substitution.

Notice that here we need to supply reverse equality, i.e. y=x instead of x=y. This is because rewrite performs the substitution of left part of equality to the right part and this substitution is done in the return type. Thus, here in the return type y=2 we need to apply y=x in order to match the type of the argument x=2.

Symmetry and Transitivity

In addition to ‘reflexivity’ equality also obeys ‘symmetry’ and ‘transitivity’ and these are also included in the prelude:

||| Symmetry of propositional equality
sym : forall x, y . (0 rule : x = y) -> y = x
sym Refl = Refl

||| Transitivity of propositional equality
trans : forall a, b, c . (0 l : a = b) -> (0 r : b = c) -> a = c
trans Refl Refl = Refl

Heterogeneous Equality

Also included in the prelude:

||| Explicit heterogeneous ("John Major") equality. Use this when Idris
||| incorrectly chooses homogeneous equality for `(=)`.
||| @ a the type of the left side
||| @ b the type of the right side
||| @ x the left side
||| @ y the right side
(~=~) : (x : a) -> (y : b) -> Type
(~=~) x y = (x = y)

实现说明

备注

Idris 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

关于CC0的更多信息,可以在网上找到:http://creativecommons.org/publicdomain/zero/1.0/

本节包含(或希望包含)关于 Idris 2 实现方面的各种注释,希望它们有助于调试和未来的贡献。

实现概述

这些是关于实现方面的一些未分类的注释。粗略的,并不总是完全最新的,但希望能提供一些关于正在发生的事情的提示以及在代码中查看某些功能如何工作的一些想法。

介绍

核心语言 TT(在 Core.TT 中定义),基于定量类型理论(参见 https://bentnib.org/quantitative-type-theory.html)。具有 01unlimited 的 “多重性”。

术语在范围内的名称上编入索引,因此我们知道术语始终具有良好的范围。值(即标准形式)在 Core.Value 中定义为 NF ;在明确请求之前,构造函数不会对参数进行求值。

从更高级别的语言 TTImp*(定义在 ``TTImp.TTImp`` )中细化到 *TT,这是带有隐式参数、局部函数定义、案例块、作为模式、具有自动类型导向消歧的限定名称的 TT , 还有证明搜索。

细化依赖于 unification(在 Core.Unify 中),它允许推迟 unification 问题。基本上与 Ulf Norell 论文中描述的 Agda 的工作方式相同。

一般的想法是高级语言将提供对 TT 的翻译。在 Idris/ 命名空间中,我们定义了 Idris 的高级语法,它通过脱糖操作符、do 符号等转换为 TTImp。

在细化之后有一个单独的线性检查,它会更新孔的类型(并且知道 case 块)。这是在 Core.LinearCheck 中实现的。在此检查期间,我们还重新计算孔应用程序中的多重性,以便它们正确显示(例如,如果线性变量在其他地方未使用,它将始终以多重性 1 出现在孔中)。

目录结构:

  • Core/ – 与核心 TT、类型检查和 unification 相关的任何内容

  • TTImp/ – 与隐式 TT 及其详细说明相关的任何内容

    • TTImp/Elab/ – 细化状态和细化术语

    • TTImp/Interactive/ – 交互式编辑基础设施

  • Parser/ – 用于解析和词法分析 TT 和 TTImp(以及其他东西)的各种实用程序

  • Utils/ – 一些通常有用的实用程序

  • Idris/ – 任何与高级语言相关的东西,翻译成 TTImp

    • Idris/Elab/ – 高级构造细化机制(例如接口)

  • Compiler/ – 编译器后端

核心类型和参考

Core 是一个 “monad”(不是真的,出于效率的原因,目前…)支持 ErrorIO (我最初确实计划允许将此限制到一些特定的 IO 操作,但尚未完成)。原始语法由 RawImp 类型定义,该类型在每个节点都有一个源位置,详细说明中的任何错误都会记录错误发生点的位置,作为文件上下文 FC

Ref 本质上是一个 IORef 。通常我们会隐式传递它们并使用标签来区分我们的意图。有关它们的定义,请参见 Core.Core 。再一次, IORef 是为了提高效率——即使使用 state monad 会更整洁,但结果却快了大约 2-3 倍,所以我选择了 “丑陋” 的选择……

术语表示

核心语言中的术语由作用域内的名称列表索引,最近定义的优先:

data Term : List Name -> Type

这意味着术语总是有恰当的作用域,我们可以使用类型系统来保持我们在操作名称时的正确性。例如,我们有:

Local : FC -> (isLet : Maybe Bool) ->
        (idx : Nat) -> (0 p : IsVar name idx vars) -> Term vars

因此,局部变量由局部上下文中的索引(de Bruijn 索引 idx )表示,并在运行时擦除该索引有效的证明。所以一切都被 de Bruijn 索引了,但是类型检查器仍然跟踪索引,这样我们就不必想太多了!

Core.TT 包含各种方便的工具,用于使用它们的索引来操作术语,例如:

weaken : Term vars -> Term (n :: vars) -- actually in an interface, Weaken
embed : Term vars -> Term (ns ++ vars)
refToLocal : (x : Name) -> -- explicit name of a reference
             (new : Name) -> -- name to bind as
             Term vars -> Term (new :: vars)

请注意,类型明确说明何时需要在运行时传递 vars ,何时不需要。大多数需要它的地方是帮助显示名称或名称生成,而不是核心中的任何基本原因。一般来说,这在运行时并不昂贵。

Core.Env 中定义的环境变量将局部变量映射到绑定器:

data Env : (tm : List Name -> Type) -> List Name -> Type

A binders is typically a lambda, a pi, or a let (with a value), but can also be a pattern variable. See the definition of TT for more details. Where we have a term, we usually also need an Env.

We also have values, which are in head normal form, and defined in Core.Value:

data NF : List Name -> Type

We can convert a term to a value by normalising…

nf : {vars : _} ->
     Defs -> Env Term vars -> Term vars -> Core (NF vars)

…and back again, by quoting:

quote : {vars : _} ->
        Defs -> Env Term vars -> tm vars -> Core (Term vars)

Both nf and quote are defined in Core.Normalise. We don’t always know whether we’ll need to work with NF or Term, so we also have a “glued” representation, Glued vars, again defined in Core.Normalise, which lazily computes either a NF or Term as required. Elaborating a term returns the type as a Glued vars.

Term separates Ref (global user defined names) from Meta, which are globally defined metavariables. For efficiency, metavariables are only substituted into terms if they have non-0 multiplicity, to preserve sharing as much as possible.

Unification

Unification is probably the most important part of the elaboration process, and infers values for implicit arguments. That is, it finds values for the things which are referred to by Meta in Term. It is defined in Core.Unify, as the top level unification function has the following type:

unify : Unify tm =>
        {vars : _} ->
        {auto c : Ref Ctxt Defs} ->
        {auto u : Ref UST UState} ->
        UnifyInfo ->
        FC -> Env Term vars ->
        tm vars -> tm vars ->
        Core UnifyResult

The Unify interface is there because it is convenient to be able to define unification on Term and NF, as well as Closure (which is part of NF to represent unevaluated arguments to constructors).

This is one place where indexing over vars is extremely valuable: we have to keep the environment consistent, so unification won’t accidentally introduce any scoping bugs!

Idris 2 implements pattern unification - see Adam Gundry’s thesis for an accessible introduction.

Context

Core.Context defines all the things needed for TT. Most importantly: Def gives definitions of names (case trees, builtins, constructors and holes, mostly); GlobalDef is a definition with all the other information about it (type, visibility, totality, etc); Context is a context mapping names to GlobalDef, and Defs is the core data structure with everything needed to typecheck more definitions.

The main Context type stores definitions in an array, indexed by a “resolved name id”, an integer, for fast look up. This means that it also needs to be able to convert between resolved names and full names. The HasNames interface defines methods for going back and forth between structures with human readable names, and structures with resolved integer names.

Since we store names in an array, all the lookup functions need to be in the Core monad. This also turns out to help with loading checked files (see below).

Elaboration Overview

Elaboration of RawImp to TT is driven by TTImp.Elab, with the top level function for elaborating terms defined in TTImp.Elab.Term, support functions defined in TTImp.Elab.Check, and elaborators for the various TTImp constructs defined in separate files under TTImp.Elab.*.

惰性

Like Idris 1, laziness is marked in types using Lazy, Delay and Force, or Inf (instead of Lazy) for codata. Unlike Idris 1, these are language primitives rather than special purpose names.

Implicit laziness resolution is handled during unification (in Core.Unify). When unification is invoked (by convert in TTImp.Elab.Check) with the withLazy flag set, it checks whether it is converting a lazy type with a non-lazy type. If so, it continues with unification, but returning that either a Force or Delay needs inserting as appropriate.

TTC format

We can save things to binary if we have an implementation of the TTC interface for it. See Utils.Binary to see how this is done. It uses a global reference Ref Bin Binary which uses Data.Buffer underneath.

When we load checked TTC files, we don’t process the definitions immediately, but rather store them as a ContextEntry, which is either a Binary blob, or a processed definition. We only process the definitions the first time they are looked up, since converting Binary to the definition is fairly costly (due to having to construct a lot of AST nodes), and often definitions in an imported file are never used.

Bound Implicits

The RawImp type has a constructor IBindVar. The first time we encounter an IBindVar, we record the name as one which will be implicitly bound. At the end of elaboration, we decide which holes should turn into bound variables (Pi bound in types, Pattern bound on a LHS, still holes on the RHS) by looking at the list of names bound as IBindVar, the things they depend on, and sorting them so that they are bound in dependency order. This happens in TTImp.Implicit.getToBind.

Once we know what the bound implicits need to be, we bind them in bindImplicits. Any application of a hole which stands for a bound implicit gets turned into a local binding (either Pi or Pat as appropriate, or PLet for @-patterns).

Unbound Implicits

Any name beginning with a lower case letter is considered an unbound implicit. They are elaborated as holes, which may depend on the initial environment of the elaboration, and after elaboration they are converted to an implicit pi binding, with multiplicity 0. So, for example:

map : {f : _} -> (a -> b) -> f a -> f b

becomes:

map : {f : _} -> {0 a : _} -> {0 b : _} -> (a -> b) -> f a -> f b

Bindings are ordered according to dependency. It’ll infer any additional names, e.g. in:

lookup : HasType i xs t -> Env xs -> t

… where xs is a Vect n a, it infers bindings for n and a.

The %unbound_implicits directive means that it will no longer automatically bind names (that is, a and b in map above) but it will still infer the types for any additional names, e.g. if you write:

lookup : forall i, x, t . HasType i xs t -> Env xs -> t

… it will still infer a type for xs and infer bindings for n and a.

隐式参数

When we encounter an implicit argument (_ in the raw syntax, or added when we elaborate an application and see that there is an implicit needed) we make a new hole which is a fresh name applied to the current environment, and return that as the elaborated term. This happens in TTImp.Elab.Check, with the function metaVar. If there’s enough information elsewhere we’ll find the definition of the hole by unification.

We never substitute holes in a term during elaboration and rely on normalisation if we need to look inside it. If there are holes remaining after elaboration of a definition, report an error (it’s okay for a hole in a type as long as it’s resolved by the time the definition is done).

See Elab.App.makeImplicit, Elab.App.makeAutoImplicit to see where we add holes for the implicit arguments in applications.

Elab.App does quite a lot of tricky stuff! In an attempt to help with resolving ambiguous names and record updates, it will sometimes delay elaboration of an argument (see App.checkRestApp) so that it can get more information about its type first.

Core.Unify.solveConstraints revisits all of the currently unsolved holes and constrained definitions, and tries again to unify any constraints which they require. It also tries to resolve anything defined by proof search. The current state of unification is defined in Core.UnifyState, and unification constraints record which metavariables are blocking them. This improves performance, since we’ll only retry a constraint if one of the blocking metavariables has been resolved.

Additional type inference

A ? in a type means “infer this part of the type”. This is distinct from _ in types, which means “I don’t care what this is”. The distinction is in what happens when inference fails. If inference fails for _, we implicitly bind a new name (just like pattern matching on the lhs - i.e. it means match anything). If inference fails for ?, we leave it as a hole and try to fill it in later. As a result, we can say:

foo : Vect ? Int
foo = [1,2,3,4]

… and the ? will be inferred to be 4. But if we say:

foo : Vect _ Int
foo = [1,2,3,4]

… we’ll get an error, because the _ has been bound as a new name. Both ? and _ are represented in RawImp by the Implicit constructor, which has a boolean flag meaning “bind if unresolved”.

So the meaning of _ is now consistent on the lhs and in types (i.e. it means infer a value and bind a variable on failure to infer anything). In practice, using _ will get you the old Idris behaviour, but ? might get you a bit more type inference.

Auto Implicits

Auto implicits are resolved by proof search, and can be given explicit arguments in the same way as ordinary implicits: i.e. {x = exp} to give exp as the value for auto implicit x. Interfaces are syntactic sugar for auto implicits (it is the same resolution mechanism - interfaces translate into records, and implementations translate into hints for the search).

The argument syntax @{exp} means that the value of the next auto implicit in the application should be exp - this is the same as the syntax for invoking named implementations in Idris 1, but interfaces and auto implicits have been combined now.

Implicit search is defined in Core.AutoSearch. It will only begin a search if all the determining arguments of the goal are defined, meaning that they don’t contain any holes. This avoids committing too early to the solution of a hole by resolving it by search, rather than unification, unless a programmer has explicitly said (via a search option on a data type) that that’s what they want.

Dot Patterns

IMustUnify is a constructor of RawImp. When we elaborate this, we generate a hole, then elaborate the term, and add a constraint that the generated hole must unify with the term which was explicitly given (in UnifyState.addDot), without resolving any holes. This is finally checked in UnifyState.checkDots.

@-Patterns

Names which are bound in types are also bound as @-patterns, meaning that functions have access to them. For example, we can say:

vlength : {n : Nat} -> Vect n a -> Nat
vlength [] = n
vlength (x :: xs) = n

As patterns are implemented as a constructor of TT, which makes a lot of things more convenient (especially case tree compilation).

Linear Types

Following Conor McBride and Bob Atkey’s work, all binders have a multiplicity annotation (RigCount). After elaboration in TTImp.Elab, we do a separate linearity check which: a) makes sure that linear variables are used exactly once; b) updates hole types to properly reflect usage information.

Local definitions

We elaborate relative to an environment, meaning that we can elaborate local function definitions. We keep track of the names being defined in a nested block of declarations, and ensure that they are lifted to top level definitions in TT by applying them to every name in scope.

Since we don’t know how many times a local definition will be applied, in general, anything bound with multiplicity 1 is passed to the local definition with multiplicity 0, so if you want to use it in a local definition, you need to pass it explicitly.

Case blocks

Similar to local definitions, these are lifted to top level definitions which represent the case block, which is immediately applied to the scrutinee of the case. We don’t attempt to calculate the multiplicities of arguments when elaborating the case block, since we’ll probably get it wrong - instead, these are checked during linearity checking, which knows about case functions.

Case blocks in the scope of local definitions are tricky, because the names need to match up, and the types might be refined, but we also still need to apply the local names to the scope in which they were defined. This is a bit fiddly, and dealt with by the ICaseLocal constructor of RawImp.

Various parts of the system treat case blocks specially, even though they aren’t strictly part of the core. In particular, these are linearity checking and totality checking.

Parameters

The parameters to a data type are taken to be the arguments which appear, unchanged, in the same position, everywhere across a data definition.

擦除

Unbound implicits are given 0 multiplicity, so the rule is now that if you don’t explicitly write it in the type of a function or constructor, the argument is erased at run time.

Elaboration and the case tree compiler check ensure that 0-multiplicity arguments are not inspected in case trees. In the compiler, 0-multiplicity arguments to constructors are erased completely, whereas 0-multiplicity arguments to functions are replaced with a placeholder erased value.

Namespaces and name visibility

Same rules mostly apply as in Idris 1. The difference is that visibility is per namespace not per file (that is, files have no relevance other except in that they introduce their own namespace, and in that they allow separate typechecking).

One effect of this is that when a file defines nested namespaces, the inner namespace can see what’s in the outer namespace, but not vice versa unless names defined in the inner namespace are explicitly exported. The visibility modifiers export, public export, and private control whether the name can be seen in any other namespace, and it’s nothing to do with the file they’re defined in at all.

Unlike Idris 1, there is no restriction on whether public definitions can refer to private names. The only restriction on private names is that they can’t be referred to directly (i.e. in code) outside the namespace.

记录

Records are part of TTImp (rather than the surface language). Elaborating a record declaration creates a data type and associated projection functions. Record setters are generated on demand while elaborating TTImp (in TTImp.Elab.Record). Setters are translated directly to case blocks, which means that update of dependent fields works as one might expect (i.e. it’s safe as long as all of the fields are updated at the same time consistently).

The IDE Protocol

The Idris REPL has two modes of interaction: a human-readable syntax designed for direct use in a terminal, and a machine-readable syntax designed for using Idris as a backend for external tools.

The IDE-Protocol is versioned separately from the Idris compiler. The first version of Idris (written in Haskell and is at v1.3.3) implements version one of the IDE Protocol, and Idris2 (self-hosting and is at v.0.3.0) implements version two of the IDE Protocol.

The protocol and its serialisation/deserialisation routines are part of the Protocol submodule hierarchy and are packaged in the idris2protocols.ipkg package.

Starting IDE Mode

To initiate the IDE-Protocol on stdin/stdout, use the --ide-mode command line option. To run the protocol over a TCP socket, use the --ide-mode-socket option:

idris2 --ide-mode-socket
53864

By default this will chose an open port, print the number of the port to stdout followed by a newline, and listen to that socket on localhost. You may optionally specify the hostname and port to listen to:

idris2 --ide-mode-socket localhost:12345
12345

The IDE-Protocol will run on that socket, and Idris will exit when the client disconnects from the socket.

Protocol Overview

The communication protocol is of asynchronous request-reply style: a single request from the client is handled by Idris at a time. Idris waits for a request on its standard input stream, and outputs the answer or answers to standard output. The result of a request can be either success, failure, or intermediate output; and furthermore, before the result is delivered, there might be additional meta-messages.

A reply can consist of multiple messages: any number of messages to inform the user about the progress of the request or other informational output, and finally a result, either ok or error.

The wire format is the length of the message in characters, encoded in 6 characters hexadecimal, followed by the message encoded as S-expression (sexp). Additionally, each request includes a unique integer (counting upwards), which is repeated in all messages corresponding to that request.

An example interaction from loading the file /home/hannes/empty.idr looks as follows on the wire:

00002a((:load-file "/home/hannes/empty.idr") 1)
000039(:write-string "Type checking /home/hannes/empty.idr" 1)
000025(:set-prompt "/home/hannes/empty" 1)
000032(:return (:ok "Loaded /home/hannes/empty.idr") 1)

The first message is the request from idris-mode to load the specific file, which length is hex 2a, decimal 42 (including the newline at the end). The request identifier is set to 1. The first message from Idris is to write the string Type checking /home/hannes/empty.idr, another is to set the prompt to */home/hannes/empty. The answer, starting with :return is ok, and additional information is that the file was loaded.

There are three atoms in the wire language: numbers, strings, and symbols. The only compound object is a list, which is surrounded by parenthesis. The syntax is:

A ::= NUM | '"' STR '"' | ':' ALPHA+
S ::= A | '(' S* ')' | nil

where NUM is either 0 or a positive integer, ALPHA is an alphabetical character, and STR is the contents of a string, with " escaped by a backslash. The atom nil is accepted instead of () for compatibility with some regexp pretty-printing routines.

The state of the Idris process is mainly the active file, which needs to be kept synchronised between the editor and Idris. This is achieved by the already seen :load-file command.

Protocol Versioning

When interacting with Idris through the IDE Protocol the initial message sent by the running Idris Process is the version (major and minor) of the IDE Protocol being used.

The expected message has the following format:

(:protocol-version MAJOR MINOR)

IDE Clients can use this to help support multiple Idris versions.

Commands

The available commands are listed below. They are compatible with Version 1 and 2.0 of the protocol unless otherwise stated.

(:load-file FILENAME [LINE])

Load the named file. If a LINE number is provided, the file is only loaded up to that line. Otherwise, the entire file is loaded. Version 2 of the IDE Protocol requires that the file name be a quoted string, as in (:load-file "MyFile.idr") and not (:load-file MyFile.idr).

(:cd FILEPATH)

Change the working direction to the given FILEPATH. Version 2 of the IDE Protocol requires that the path is quoted, as in (:cd "a/b/c") and not (:cd a/b/c).

(:interpret STRING)

Interpret STRING at the Idris REPL, returning a highlighted result.

(:type-of STRING)

Return the type of the name, written with Idris syntax in the STRING. The reply may contain highlighting information.

(:case-split LINE NAME)

Generate a case-split for the pattern variable NAME on program line LINE. The pattern-match cases to be substituted are returned as a string with no highlighting.

(:add-clause LINE NAME)

Generate an initial pattern-match clause for the function declared as NAME on program line LINE. The initial clause is returned as a string with no highlighting.

(:add-proof-clause LINE NAME)

Add a clause driven by the <== syntax.

(:add-missing LINE NAME)

Add the missing cases discovered by totality checking the function declared as NAME on program line LINE. The missing clauses are returned as a string with no highlighting.

(:make-with LINE NAME)

Create a with-rule pattern match template for the clause of function NAME on line LINE. The new code is returned with no highlighting.

(:make-case LINE NAME)

Create a case pattern match template for the clause of function NAME on line LINE. The new code is returned with no highlighting.

(:make-lemma LINE NAME)

Create a top level function with a type which solves the hole named NAME on line LINE.

(:proof-search LINE NAME HINTS)

Attempt to fill out the hole on LINE named NAME by proof search. HINTS is a possibly-empty list of additional things to try while searching. This operation is also called ExprSearch in the Idris 2 API.

(:refine LINE NAME TM)

Refine the hole on LINE named NAME by using the term TM.

(:docs-for NAME [MODE])

Look up the documentation for NAME, and return it as a highlighted string. If MODE is :overview, only the first paragraph of documentation is provided for NAME. If MODE is :full, or omitted, the full documentation is returned for NAME.

(:apropos STRING)

Search the documentation for mentions of STRING, and return any found as a list of highlighted strings.

(:metavariables WIDTH)

List the currently-active holes, with their types pretty-printed in WIDTH columns.

(:who-calls NAME)

Get a list of callers of NAME.

(:calls-who NAME)

Get a list of callees of NAME.

(:browse-namespace NAMESPACE)

Return the contents of NAMESPACE, like :browse at the command-line REPL.

(:normalise-term TM)

Return a highlighted string consisting of the results of normalising the serialised term TM (which would previously have been sent as the tt-term property of a string).

(:show-term-implicits TM)

Return a highlighted string, consisting of the results of making all arguments in serialised term TM explicit. The arguments in TM would previously have been sent as the tt-term property of a string.

(:hide-term-implicits TM)

Return a highlighted string, consisting of the results of making all arguments in serialised term TM follow their usual implicitness setting. The arguments in TM would previously have been sent as the tt-term property of a string.

(:elaborate-term TM)

Return a highlighted string, consisting of the core language term corresponding to serialised term TM. The arguments in TM would previously have been sent as the tt-term property of a string.

(:print-definition NAME)

Return the definition of NAME as a highlighted string.

(:repl-completions NAME)

Search names, types and documentations which contain NAME. Return the result of tab-completing NAME as a REPL command.

:version

Return the version information of the Idris compiler.

New For Version 2

New in Version 2 of the protocol are:

(:generate-def LINE NAME)

Attempt to generate a complete definition from a type.

(:generate-def-next)

Replace the previous generated definition with the next generated definition.

(:proof-search-next)

Replace the previous proof search result with the next proof search result.

(:intro LINE NAME)

Returns the non-empty list of valid saturated constructors that can be used in the hole at line LINE named NAME.

Possible Replies

Possible replies include a normal final reply:

(:return (:ok SEXP [HIGHLIGHTING]) ID)
(:return (:error String [HIGHLIGHTING]) ID)

A normal intermediate reply:

(:output (:ok SEXP [HIGHLIGHTING]) ID)
(:output (:error String [HIGHLIGHTING]) ID)

Informational and/or abnormal replies:

(:write-string String ID)
(:set-prompt String ID)
(:warning (FilePath (LINE COL) (LINE COL) String [HIGHLIGHTING]) ID)

Warnings include compiler errors that don’t cause the compiler to stop.

Output Highlighting

Idris mode supports highlighting the output from Idris. In reality, this highlighting is controlled by the Idris compiler. Some of the return forms from Idris support an optional extra parameter: a list mapping spans of text to metadata about that text. Clients can then use this list both to highlight the displayed output and to enable richer interaction by having more metadata present. For example, the Emacs mode allows right-clicking identifiers to get a menu with access to documentation and type signatures.

A particular semantic span is a three element list. The first element of the list is the index at which the span begins, the second element is the number of characters included in the span, and the third is the semantic data itself. The semantic data is a list of lists. The head of each list is a key that denotes what kind of metadata is in the list, and the tail is the metadata itself.

The following keys are available:
name

gives a reference to the fully-qualified Idris name

implicit

provides a Boolean value that is True if the region is the name of an implicit argument

decor

describes the category of a token, which can be:

type: type constructors

function: defined functions

data: data constructors

bound: bound variables, or

keyword

source-loc

states that the region refers to a source code location. Its body is a collection of key-value pairs, with the following possibilities:

filename

provides the filename

start

provides the line and column that the source location starts at as a two-element tail

end

provides the line and column that the source location ends at as a two-element tail

text-formatting

provides an attribute of formatted text. This is for use with natural-language text, not code, and is presently emitted only from inline documentation. The potential values are bold, italic, and underline.

link-href

provides a URL that the corresponding text is a link to.

quasiquotation

states that the region is quasiquoted.

antiquotation

states that the region is antiquoted.

tt-term

A serialised representation of the Idris core term corresponding to the region of text.

Source Code Highlighting

Idris supports instructing editors how to colour their code. When elaborating source code or REPL input, Idris will locate regions of the source code corresponding to names, and emit information about these names using the same metadata as output highlighting.

These messages will arrive as replies to the command that caused elaboration to occur, such as :load-file or :interpret. They have the format:

(:output (:ok (:highlight-source POSNS)) ID)

where POSNS is a list of positions to highlight. Each of these is a two-element list whose first element is a position (encoded as for the source-loc property above) and whose second element is highlighting metadata in the same format used for output.

Idris2 Reference Guide

备注

Idris 2 的文档已在知识共享 CC0 许可下发布。因此,在法律允许的范围内,Idris 社区 已经放弃了 Idris 文档的所有版权和相关或邻近的权利。

有关 CC0 的更多信息,请访问:https://creativecommons.org/publicdomain/zero/1.0/

This is a placeholder, to get set up with readthedocs.

Documenting Idris Code

Idris documentation comes in two major forms: comments, which exist for a reader’s edification and are ignored by the compiler, and inline API documentation, which the compiler parses and stores for future reference. To consult the documentation for a declaration f, write :doc f at the REPL or use the appropriate command in your editor (C-c C-d in Emacs, <LocalLeader>h in Vim).

Comments

Use comments to explain why code is written the way that it is. Idris’s comment syntax is the same as that of Haskell: lines beginning with -- are comments, and regions bracketed by {- and -} are comments even if they extend across multiple lines. These can be used to comment out lines of code or provide simple documentation for the readers of Idris code.

Inline Documentation

Idris also supports a comprehensive and rich inline syntax for Idris code to be generated. This syntax also allows for named parameters and variables within type signatures to be individually annotated using a syntax similar to Javadoc parameter annotations.

Documentation always comes before the declaration being documented. Inline documentation applies to either top-level declarations or to constructors. Documentation for specific arguments to constructors, type constructors, or functions can be associated with these arguments using their names.

The inline documentation for a declaration is an unbroken string of lines, each of which begins with ||| (three pipe symbols). The first paragraph of the documentation is taken to be an overview, and in some contexts, only this overview will be shown. After the documentation for the declaration as a whole, it is possible to associate documentation with specific named parameters, which can either be explicitly name or the results of converting free variables to implicit parameters. Annotations are the same as with Javadoc annotations, that is for the named parameter (n : T), the corresponding annotation is ||| @ n Some description that is placed before the declaration.

Documentation is written in Markdown, though not all contexts will display all possible formatting (for example, images are not displayed when viewing documentation in the REPL, and only some terminals render italics correctly). A comprehensive set of examples is given below.

||| Modules can also be documented.
module Docs

||| Add some numbers.
|||
||| Addition is really great. This paragraph is not part of the overview.
||| Still the same paragraph.
|||
||| You can even provide examples which are inlined in the documentation:
||| ```idris example
||| add 4 5
||| ```
|||
||| Lists are also nifty:
||| * Really nifty!
||| * Yep!
||| * The name `add` is a **bold** choice
||| @ n is the recursive param
||| @ m is not
add : (n, m : Nat) -> Nat
add Z     m = m
add (S n) m = S (add n m)


||| Append some vectors
||| @ a the contents of the vectors
||| @ xs the first vector (recursive param)
||| @ ys the second vector (not analysed)
appendV : (xs : Vect n a) -> (ys : Vect m a) -> Vect (add n m) a
appendV []      ys = ys
appendV (x::xs) ys = x :: appendV xs ys

||| Here's a simple datatype
data Ty =
  ||| Unit
  UNIT |
  ||| Functions
  ARR Ty Ty

||| Points to a place in a typing context
data Elem : Vect n Ty -> Ty -> Type where
  Here : {ts : Vect n Ty} -> Elem (t::ts) t
  There : {ts : Vect n Ty} -> Elem ts t -> Elem (t'::ts) t

||| A more interesting datatype
||| @ n the number of free variables
||| @ ctxt a typing context for the free variables
||| @ ty the type of the term
data Term : (ctxt : Vect n Ty) -> (ty : Ty) -> Type where

  ||| The constructor of the unit type
  ||| More comment
  ||| @ ctxt the typing context
  UnitCon : {ctxt : Vect n Ty} -> Term ctxt UNIT

  ||| Function application
  ||| @ f the function to apply
  ||| @ x the argument
  App : {ctxt : Vect n Ty} -> (f : Term ctxt (ARR t1 t2)) -> (x : Term ctxt t1) -> Term ctxt t2

  ||| Lambda
  ||| @ body the function body
  Lam : {ctxt : Vect n Ty} -> (body : Term (t1::ctxt) t2) -> Term ctxt (ARR t1 t2)

  ||| Variables
  ||| @ i de Bruijn index
  Var : {ctxt : Vect n Ty} -> (i : Elem ctxt t) -> Term ctxt t

||| We can document records, including their fields and constructors
record Yummy where
  ||| Make a yummy
  constructor MkYummy
  ||| What to eat
  food : String

Environment Variables

Idris 2 recognises a number of environment variables, to decide where to look for packages, external libraries, code generators, etc. It currently recognises, in approximately the order you’re likely to need them:

  • EDITOR - Sets the editor used in REPL :e command

  • IDRIS2_CG - Sets which code generator to use when compiling programs

  • IDRIS2_PACKAGE_PATH - Lists the directories where Idris2 looks for packages, in addition to the defaults (which are under the IDRIS2_PREFIX and in the depends subdirectory of the current working directory). Directories are separated by a :, or a ; on Windows

  • IDRIS2_PATH - Places Idris2 looks for import files, in addition to the imports in packages

  • IDRIS2_DATA - Places Idris2 looks for its data files. These are typically support code for code generators.

  • IDRIS2_LIBS - Places Idris2 looks for libraries used by code generators.

  • IDRIS2_PREFIX - Gives the Idris2 installation prefix

  • CHEZ - Sets the location of the chez executable used in Chez codegen

  • RACKET - Sets the location of the racket executable used in Racket codegen

  • RACKET_RACO - Sets the location of the raco executable used in Racket codegen

  • GAMBIT_GSI - Sets the location of the gsi executable used in Gambit codegen

  • GAMBIT_GSC - Sets the location of the gsc executable used in Gambit codegen

  • GAMBIT_GSC_BACKEND - Sets the gsc executable backend argument

  • IDRIS2_CC - Sets the location of the C compiler executable used in RefC codegen

  • CC - Sets the location of the C compiler executable used in RefC codegen

  • NODE - Sets the location of the node executable used in Node codegen

  • PATH - used to search for executables in certain codegens

Dot syntax for records

Long story short, .field is a postfix projection operator that binds tighter than function application.

Lexical structure

  • .foo is a valid name, which stands for record fields (new Name constructor RF "foo")

  • Foo.bar.baz starting with uppercase F is one lexeme, a namespaced identifier: DotSepIdent ["baz", "bar", "Foo"]

  • foo.bar.baz starting with lowercase f is three lexemes: foo, .bar, .baz

  • .foo.bar.baz is three lexemes: .foo, .bar, .baz

  • If you want Constructor.field, you have to write (Constructor).field.

  • All module names must start with an uppercase letter.

New syntax of simpleExpr

Expressions binding tighter than application (simpleExpr), such as variables or parenthesised expressions, have been renamed to simplerExpr, and an extra layer of syntax has been inserted.

simpleExpr ::= (.field)+               -- parses as PPostfixAppPartial
             | simplerExpr (.field)+   -- parses as PPostfixApp
             | simplerExpr             -- (parses as whatever it used to)
  • (.foo) is a name, so you can use it to e.g. define a function called .foo (see .squared below)

  • (.foo.bar) is a parenthesised expression

Desugaring rules

  • (.field1 .field2 .field3) desugars to (\x => .field3 (.field2 (.field1 x)))

  • (simpleExpr .field1 .field2 .field3) desugars to ((.field .field2 .field3) simpleExpr)

Record elaboration

  • there is a new pragma %prefix_record_projections, which is on by default

  • for every field f of a record R, we get:

    • projection f in namespace R (exactly like now), unless %prefix_record_projections is off

    • projection .f in namespace R with the same definition

Example code

record Point where
  constructor MkPoint
  x : Double
  y : Double

This record creates two projections: * .x : Point -> Double * .y : Point -> Double

Because %prefix_record_projections are on by default, we also get: * x : Point -> Double * y : Point -> Double

To prevent cluttering the ordinary global name space with short identifiers, we can do this:

%prefix_record_projections off

record Rect where
  constructor MkRect
  topLeft : Point
  bottomRight : Point

For Rect, we don’t get the prefix projections:

Main> :t topLeft
(interactive):1:4--1:11:Undefined name topLeft
Main> :t .topLeft
\{rec:0} => .topLeft rec : ?_ -> Point

Let’s define some constants:

pt : Point
pt = MkPoint 4.2 6.6

rect : Rect
rect =
  MkRect
    (MkPoint 1.1 2.5)
    (MkPoint 4.3 6.3)

User-defined projections work, too. (Should they?)

(.squared) : Double -> Double
(.squared) x = x * x

Finally, the examples:

main : IO ()
main = do
  -- desugars to (.x pt)
  -- prints 4.2
  printLn $ pt.x

  -- prints 4.2, too
  -- maybe we want to make this a parse error?
  printLn $ pt .x

  -- prints 10.8
  printLn $ pt.x + pt.y

  -- works fine with namespacing
  -- prints 4.2
  printLn $ (Main.pt).x

  -- the LHS can be an arbitrary expression
  -- prints 4.2
  printLn $ (MkPoint pt.y pt.x).y

  -- user-defined projection
  -- prints 17.64
  printLn $ pt.x.squared

  -- prints [1.0, 3.0]
  printLn $ map (.x) [MkPoint 1 2, MkPoint 3 4]

  -- .topLeft.y desugars to (\x => .y (.topLeft x))
  -- prints [2.5, 2.5]
  printLn $ map (.topLeft.y) [rect, rect]

  -- desugars to (.topLeft.x rect + .bottomRight.y rect)
  -- prints 7.4
  printLn $ rect.topLeft.x + rect.bottomRight.y

  -- qualified names work, too
  -- all these print 4.2
  printLn $ Main.Point.(.x) pt
  printLn $ Point.(.x) pt
  printLn $ (.x) pt
  printLn $ .x pt

  -- haskell-style projections work, too
  printLn $ Main.Point.x pt
  printLn $ Point.x pt
  printLn $ (x) pt
  printLn $ x pt

  -- record update syntax uses dots now
  -- prints 3.0
  printLn $ ({ topLeft.x := 3 } rect).topLeft.x

  -- but for compatibility, we support the old syntax, too
  printLn $ ({ topLeft->x := 3 } rect).topLeft.x

  -- prints 2.1
  printLn $ ({ topLeft.x $= (+1) } rect).topLeft.x
  printLn $ ({ topLeft->x $= (+1) } rect).topLeft.x

Parses but does not typecheck:

-- parses as: map.x [MkPoint 1 2, MkPoint 3 4]
-- maybe we should disallow spaces before dots?
--
printLn $ map .x [MkPoint 1 2, MkPoint 3 4]

Literate Programming

Idris2 supports several types of literate mode styles.

The unlit’n has been designed based such that we assume that we are parsing markdown-like languages The unlit’n is performed by a Lexer that uses a provided literate style to recognise code blocks and code lines. Anything else is ignored. Idris2 also provides support for recognising both ‘visible’ and ‘invisible’ code blocks using ‘native features’ of each literate style.

A literate style consists of:

  • a list of String encoded code block deliminators;

  • a list of line indicators; and

  • a list of valid file extensions.

Lexing is simple and greedy in that when consuming anything that is a code blocks we treat everything as code until we reach the closing deliminator. This means that use of verbatim modes in a literate file will also be treated as active code.

In future we should add support for literate LaTeX files, and potentially other common document formats. But more importantly, a more intelligent processing of literate documents using a pandoc like library in Idris such as: Edda <https://github.com/jfdm/edda> would also be welcome.

Bird Style Literate Files

We treat files with an extension of .lidr as bird style literate files.

Bird notation is a classic literate mode found in Haskell, (and Orwell) in which visible code lines begin with > and hidden lines with <. Other lines are treated as documentation.

备注

We have diverged from lhs2tex in which < is traditionally used to display inactive code. Bird-style is presented as is, and we recommended use of the other styles for much more comprehensive literate mode.

Embedding in Markdown-like documents

While Bird Style literate mode is useful, it does not lend itself well to more modern markdown-like notations such as Org-Mode and CommonMark. Idris2 also provides support for recognising both ‘visible’ and ‘invisible’ code blocks and lines in both CommonMark and OrgMode documents using native code blocks and lines..

The idea being is that:

  1. Visible content will be kept in the pretty printer’s output;

  2. Invisible content will be removed; and

  3. Specifications will be displayed as is and not touched by the compiler.

OrgMode

We treat files with an extension of .org as org-style literate files. Each of the following markup is recognised regardless of case:

  • Org mode source blocks for idris sans options are recognised as visible code blocks:

    #+begin_src idris
    data Nat = Z | S Nat
    #+end_src
    
  • Comment blocks that begin with #+BEGIN_COMMENT idris are treated as invisible code blocks:

    #+begin_comment idris
    data Nat = Z | S Nat
    #+end_comment
    
  • Visible code lines, and specifications, are not supported. Invisible code lines are denoted with #+IDRIS::

    #+IDRIS: data Nat = Z | S Nat
    
  • Specifications can be given using OrgModes plain source or example blocks:

    #+begin_src
    map : (f : a -> b)
       -> List a
       -> List b
    map f _ = Nil
    #+end_src
    
CommonMark

We treat files with an extension of .md and .markdown as CommonMark style literate files.

  • CommonMark source blocks for idris sans options are recognised as visible code blocks:

    ```idris
    data Nat = Z | S Nat
    ```
    
    ~~~idris
    data Nat = Z | S Nat
    ~~~
    
  • Comment blocks of the form <!-- idris\n ... \n --> are treated as invisible code blocks:

    <!-- idris
    data Nat = Z | S Nat
    -->
    
  • Code lines are not supported.

  • Specifications can be given using CommonMark’s pre-formatted blocks (indented by four spaces) or unlabelled code blocks.:

    Compare
    
    ```idris
    map : (f : a -> b)
       -> List a
       -> List b
    map f _ = Nil
    ```
    
    with
    
        map : (f : a -> b)
           -> List a
           -> List b
        map f _ = Nil
    
LaTeX

We treat files with an extension of .tex and .ltx as LaTeX style literate files.

  • We treat environments named code as visible code blocks:

    \begin{code}
    data Nat = Z | S Nat
    \end{code}
    
  • We treat environments named hidden as invisible code blocks:

    \begin{hidden}
    data Nat = Z | S Nat
    \end{hidden}
    
  • Code lines are not supported.

  • Specifications can be given using user defined environments.

We do not provide definitions for these code blocks and ask the user to define them. With one such example using fancyverbatim and comment packages as:

\usepackage{fancyvrb}
\DefineVerbatimEnvironment
  {code}{Verbatim}
  {}

\usepackage{comment}

\excludecomment{hidden}

Overloaded literals

The compiler provides directives for literals overloading, respectively %stringLit <fun> and %integerLit <fun> for string and integer literals. During elaboration, the given function is applied to the corresponding literal. In the Prelude these functions are set to fromString and fromInteger.

The interface FromString ty provides the fromString : String -> ty function, while the Num ty interface provides the fromInteger : Integer -> ty function for all numerical types.

Restricted overloads

Although the overloading of literals can be achieved by implementing the interfaces described above, in principle only a function with the correct signature and name is enough to achieve the desired behaviour. This can be exploited to obtain more restrictive overloading such as converting literals to Fin n values, where integer literals greater or equal to n are not constructible values for the type. Additional implicit arguments can be added to the function signature, in particular auto implicit arguments for searching proofs. As an example, this is the implementation of fromInteger for Fin n.

public export
fromInteger : (x : Integer) -> {n : Nat} ->
              {auto prf : (IsJust (integerToFin x n))} ->
              Fin n
fromInteger {n} x {prf} with (integerToFin x n)
fromInteger {n} x {prf = ItIsJust} | Just y = y

The prf auto implicit is an automatically constructed proof (if possible) that the literal is suitable for the Fin n type. The restricted behaviour can be observed in the REPL, where the failure to construct a valid proof is caught during the type-checking phase and not at runtime:

Main> the (Fin 3) 2
FS (FS FZ)
Main> the (Fin 3) 5
(interactive):1:13--1:14:Can't find an implementation for IsJust (integerToFin 5 3) at:
1   the (Fin 3) 5

String literals in Idris

To facilitate the use of string literals, idris provides three features in addition to plain string literals: multiline strings, raw strings and interpolated strings.

Plain string literals

String literals behave the way you expect from other programming language. Use quotation marks " around the piece of text that you want to use as a string:

"hello world"

As explained in Overloaded literals, string literals can be overloaded to return a type different than string.

Multiline string literals

In some cases you will have to display a large string literal that spans multiple lines. For this you can use multiline string literals, they allow you to span a string across multiple vertical lines, preserving the line returns and the indentation. Additionally they allow you to indent your multiline string with the surrounding code, without breaking the intended format of the string.

To use multiline strings, start with a triple quote """ followed by a line return, then enter your text and close it with another triple quote """ with whitespace on its left. The indentation of the closing triple quote will determine how much whitespace should be cropped from each line of the text.

备注

Multiline strings use triple quotes to enable the automatic cropping of leading whitespace when the multiline block is indented.

welcome : String
welcome = """
    Welcome to Idris 2

    We hope you enjoy your stay
      This line will remain indented with 2 spaces
    This line has no intendation
    """

printing the variable welcome will result in the following text:

Welcome to Idris 2

We hope you enjoy your stay
  This line will remain indented with 2 spaces
This line has no intendation

As you can see, each line has been stripped of its leading 4 space, that is because the closing delimiter was indented with 4 spaces.

In order to use multiline string literals, remember the following:

  • The starting delimited must be followed by a line return

  • The ending delimiter’s intendation level must not exceed the indentation of any line

Raw string literals

It is not uncommon to write string literals that require some amount of escaping. For plain string literals the characters \\ and " must be escaped, for multiline strings the characters """ must be escaped. Raw string literals allow you to dynamically change the required escaped sequence in order to avoid having to escape those very common sets of characters. For this, use #" as starting delimiter and "# as closing delimiter. The number of # symbols can be increased in order to accomodate for edge cases where "# would be a valid symbol. In the following example we are able to match on \{ by using half as many \\ characters as if we didn’t use raw string literals:

myRegex : Regex
myRegex = parseRegex #"\\{"#

If you need to escape characters you still can by using a \\ followed by the same number of # that you used for your string delimiters. In the following example we are using two # characters as our escape sequence and want to print a line return:

markdownExample : String
markdownExample = ##"markdown titles look like this: \##n"# Title \##n body""##

This last example could be implemented by combining raw string literals with multiline strings:

markdownExample : String
markdownExample = ##"""
    markdown titles look like this:
    "# Title
    body"
    """##

Interpolated strings

Concatenating string literals with runtime values happens all the time, but sprinkling our code with lots of " and ++ symbols sometimes hurts legibility which in turn can introduce bugs that are hard to detect for human eyes. Interpolated strings allow to inline the execution of programs that evaluate to strings with a string literals in order to avoid manually writing out the concatenation of those expressions. To use interpolated strings, use \{ to start an interpolation slice in which you can write an idris expression. Close it with }

print : Expr -> String
print (Var name expr) = "let \{name} = \{print expr}"
print (Lam arg body) = #"\\#{arg} => \#{print body}"#
print (Decl fname fargs body) = """
    func \{fname}(\{commasep fargs}) {
        \{unlines (map print body)}
    }
    """
print (Multi lns) = #"""
    """
    \#{unlines lns}
    """
    """#

As you can see in the second line, raw string literals and interpolated strings can be combined. The starting and closing delimiters indicate how many # must be used as escape sequence in the string, since interpolated strings require the first { to be escaped, an interpolated slice in a raw string uses \#{ as starting delimiter.

Additionally multiline strings can also be combined with string interpolation in the way you expect, as shown with the Decl pattern. Finally all three features can be combined together in the last branch of the example, where a multiline string has a custom escape sequence and includes an interpolated slice.

Interpolation Interface

The Prelude exposes an Interpolation interface with one function interpolate. This function is used within every interpolation slice to convert an arbitrary expression into a string that can be concatenated with the rest of the interpolated string.

To go into more details, when you write "hello \{username}" the compiler translates the expression into concat [interpolate "hello ", interpolate username] so that the concatenation is fast and so that if username implement the Interpolation interface, you don’t have to convert it to a string manually.

Here is an example where we reuse the Expr type but instead of implementing a print function we implement Interpolation:

Interpolation Expr where
    interpolate (Var name expr) = "let \{name} = \{expr}"
    interpolate (Lam arg body) = #"\\#{arg} => \#{body}"#
    interpolate (Decl fname fargs body) = """
        func \{fname}(\{commasep fargs}) {
            \{unlines (map interpolate body)}
        }
        """
    interpolate (Multi lns) = #"""
        """
        \#{unlines lns}
        """
        """#

As you can see we avoid repeated calls to print since the slices are automatically applied to interpolate.

We use Interpolation instead of Show for interpolation slices because the semantics of show are not necessarily the same as interpolate. Typically the implementation of show for String adds double quotes around the text, but for interpolate what we want is to return the string as is. In the previous example, "hello \{username}", if we were to use show we would end up with the string "hello "Susan which displays an extra pair of double quotes. That is why the implementation of interpolate for String is the identity function: interpolate x = x. This way the desugared code looks like: concat [id "hello ", interpolate username].

Pragmas

Idris2 supports a number of pragmas (identifiable by the % prefix). Some pragmas change compiler behavior until the behavior is changed back using the same pragma while others apply to the following declaration. A small niche of pragmas apply directly to one or more arguments instead of the code following the pragma (like the %name pragma described below).

备注

This page is a work in progress. If you know about a pragma that is not described yet, please consider submitting a pull request!

%builtin

The %builtin Natural pragma converts recursive/unary representations of natural numbers into primitive Integer representations.

This pragma is explained in detail on its own page. For more, see Builtins.

%deprecate

Mark the following definition as deprecated. Whenever the function is used, Idris will show a deprecation warning.

%deprecate
foo : String -> String
foo x = x ++ "!"

bar : String
bar = foo "hello"
Warning: Deprecation warning: Man.foo is deprecated and will be removed in a future version.

You can use code documentation (triple vertical bar ||| docs) to suggest a strategy for removing the deprecated function call and that documentation will be displayed alongside the warning.

||| Please use the @altFoo@ function from now on.
%deprecate
foo : String -> String
foo x = x ++ "!"

bar : String
bar = foo "hello"
Warning: Deprecation warning: Man.foo is deprecated and will be removed in a future version.
  Please use the @altFoo@ function from now on.

%inline

Instruct the compiler to inline the following definition when it is applied. It is generally best to let the compiler and the backend you are using optimize code based on its predetermined rules, but if you want to force a function to be inlined when it is called, this pragma will force it.

%inline
foo : String -> String
foo x = x ++ "!"

%noinline

Instruct the compiler _not_ to inline the following definition when it is applied. It is generally best to let the compiler and the backend you are using optimize code based on its predetermined rules, but if you want to force a function to never be inlined when it is called, this pragma will force it.

%noinline
foo : String -> String
foo x = x ++ "!"

%name

Give the compiler some suggested names to use for a particular type when it is asked to generate names for values. You can specify any number of suggested names; they will be used in-order when more than one is needed for a single definition.

data Foo = X | Y

%name Foo foo,bar

Builtins

Natural numbers

Idris2 supports an optimized runtime representation of natural numbers (non-negative integers). This optimization is automatic, however it only works when natural numbers are represented in a specific way

Here is an example of a natural number that would be optimized:

data Natural
    = Zero
    | Succ Natural

Natural numbers are generally represented as either zero or the successor (1 more than) of another natural number. These are called Peano numbers.

At runtime, Idris2 will automatically represent this the same as the Integer type. This will massively reduce the memory usage.

There are a few rules governing when this optimization occures:

  • The data type must have 2 constructors

    • After erasure of runtime irrelevant arguments + One must have no arguments + One must have exactly 1 argument (called Succ)

  • The type of the argument to Succ must have the same type constructor as the parent type. This means indexed data types, like Fin, can be optimised.

  • The argument to Succ must be strict, ie not Lazy Natural

To ensure that a type is optimized to an Integer, use %builtin Natural ie

data MyNat
    = Succ MyNat
    | Zero

%builtin Natural MyNat

Casting between natural numbers and integer

Idris optimizes functions which convert between natural numbers and integers, so that it takes constant time rather than linear time.

Such functions must be written in a specific way, so that idris can detect that it can be optimised.

Here is an example of a natural to Integer function.

cast : Natural -> Integer
cast Z = 0
cast (S k) = cast k + 1

This optimization is applied late in the compilation process, so it may be sensitive to seemingly insignificant changes.

However here are roughly the rules governing this optimisation:

  • Exactly one argument must be pattern matched on (any other forced or dotted patterns are allowed)

  • The right hand side of the ‘Zero’ case must be 0

  • The right hand side of the ‘Succ’ case must be 1 + cast k where k is the predecessor of the pattern matched argument

Casting from an Integer to a natural is a little more complex.

castNonNegative : Integer -> Natural
castNonNegative x = case x of
    0 => Zero
    _ => Succ $ castNonNegative (x - 1)

cast : Integer -> Natural
cast x = if x < 0 then Zero else castNonNegative x

For now you must manually check the given integer is non-negative.

If you are using an indexed data type it may be very hard to write your Integer to natural cast in such a way, so you can use %builtin IntegerToNatural to assert to the compiler that a function is correct. It is your responsibility to make sure this is correct.

module ComplexNat

import Data.Maybe

data ComplexNat
    = Zero
    | Succ ComplexNat

integerToMaybeNat : Integer -> Maybe ComplexNat
integerToMaybeNat _ = ...

integerToNat :
    (x : Integer) ->
    {auto 0 prf : IsJust (ComplexNat.integerToMaybeNat x)} ->
    ComplexNat
integerToNat x {prf} = fromJust (integerToMaybeNat x) @{prf}

%builtin IntegerToNatural ComplexNat.integerToNat

Other operations

This can be used with %transform to allow many other operations to be O(1) too.

eqNat : Nat -> Nat -> Bool
eqNat Z Z = True
eqNat (S j) (S k) = eqNat j k
eqNat _ _ = False

%transform "eqNat" eqNat j k = natToInteger j == natToInteger k

plus : Nat -> Nat -> Nat
plus Z y = y
plus (S x) y = S $ plus x y

%transform "plus" plus j k = integerToNat (natToInteger j + natToInteger j)

Compilation

Here are the details of how natural numbers are compiled to Integer s. Note: a numeric literal here is an Integer.

Zero => 0

Succ k => 1 + k

case k of
    Z => zexp
    S k' => sexp

=>

case k of
    0 => zexp
    _ => let k' = k - 1 in sexp

Debugging The Compiler

Performance

The compiler has the --timing flag to dump timing information collected during operation.

The output documents, in reverse chronological order, the cumulative time taken for the operation (and sub operations) to complete. Sub levels are indicated by successive repetitions of +.

Logging

The compiler logs various categories of information during operation at various levels.

Log levels are characterised by two things:

  • a dot-separated path of ever finer topics of interest e.g. scope.let

  • a natural number corresponding to the verbosity level e.g. 5

If the user asks for some logs by writing:

%logging "scope" 5

they will get all of the logs whose path starts with scope and whose verbosity level is less or equal to 5. By combining different logging directives, users can request information about everything (with a low level of details) and at the same time focus on a particular subsystem they want to get a lot of information about. For instance::

%logging 1
%logging "scope.let" 10

will deliver basic information about the various phases the compiler goes through and deliver a lot of information about scope-checking let binders.

You can set the logging level at the command line using:

--log <level>

and through the REPL using:

:log <string category> <level>

:logging <string category> <level>

The supported logging categories can be found using the command line flag:

--help logging

REPL Commands

To see more debug information from the REPL there are several options one can set.

Logging Categories

command

description

:di <name>

show debugging information for a name

:set showimplicits

show values of implicit arguments

Compiler Flags

There are several ‘hidden’ compiler flags that can help expose Idris’ inner workings.

Logging Categories

command

description

--dumpcases <file>

dump case trees to the given file

--dumplifted <file>

dump lambda lifted trees to the given file

--dumpanf <file>

dump ANF to the given file

--dumpvmcode <file>

dump VM Code to the given file

--debug-elab-check

do more elaborator checks (currently conversion in LinearCheck)

Output Formats

Debug Output

Calling :di <name> dumps debugging information about the selected term. Specifically dumped are:

Debugging Information

topic

description

Full Name(s)

The fully qualified name of the term.

Multiplicity

The terms multiplicity.

Erasable Arguments

Things that are erased.

Detaggable argument types

Specialised arguments

Inferrable arguments

Compiled version

Compile time linked terms

Runtime linked terms

Flags

Size change graph

经典实例

经典实例是为 Idris 2 中的常见模式和应用提供常见的案例。

解析

Idris 2 带有一个词法分析库和语法解析库,内置在 contrib 包中。

在本示例中,我们将写一个非常简单的 lambda 演算解析器,该解析器将接受以下语言:

let name = world in (\x.hello x) name

一旦我们写了一个 lambda 演算解析器,我们还将看到我们如何利用 Idris 2 中强大的内置表达式解析器来写一个小计算器,它应该足够聪明来解析以下表达式:

1 + 2 - 3 * 4 / 5

词法分析器

词法分析模块主要在 Text.Lexer 下。这个模块包含 toTokenMap ,这是一个转换 List (Lexer, k) -> TokenMap (Token k) 的函数,其中 k 是一个标记种类。这个函数可用于词法与 Token 的简单映射。该模块还包括高级词法,用于指定数量和常见的编程原语,如 alphas , intLit, lineCommentblockComment

Text.Lexer 模块还重新导出了 Text.Lexer.CoreText.QuantityText.Token

Text.Lexer.Core 提供了词法的基本构建块,包括一个叫做 Recognise 的类型,它是词法的底层数据类型。这个模块提供的另一个重要功能是 lex ,它接收一个词法分析器并返回 token。

Text.Quantity 提供了一个数据类型 Quantity 可以与某些词法一起使用,以指定某些东西预计会出现多少次。

Text.Token 提供一个数据类型 Token 表示一个被解析的标记和它的种类以及文本。这个模块还提供了一个重要的接口,称为 TokenKind.,它告诉词法分析器如何将标记种类映射到 Idris 2 类型,以及如何将每种种类从字符串转换为一个值。

解析器

解析器模主要在 Text.Parser 下。这个模块包含不同的语法分析器,主要的语法分析器是 match 它接收一个 TokenKind 并返回 TokenKind 接口中定义的值。还有其他的语法分析器,但对于我们的例子,我们将只使用 match

Text.Parser 模块重新导出 Text.Parser.Core , Text.QuantityText.Token

Text.Parser.Core 提供了解析器的构建块,包括一个叫做 Grammar 的类型,它是解析器的底层数据类型。这个模块提供的另一个重要函数是 parse 它接收一个 Grammar 并返回解析后的表达式。

我们在 Lexer 部分介绍了 Text.QuantityText.Token ,所以我们不打算在这里重复它们的作用。

Lambda 演算的分析器和解析器

LambdaCalculus.idr
  1import Data.List
  2import Data.List1
  3import Text.Lexer
  4import Text.Parser
  5
  6%default total
  7
  8data Expr = App Expr Expr | Abs String Expr | Var String | Let String Expr Expr
  9
 10Show Expr where
 11  showPrec d (App e1 e2) = showParens (d == App) (showPrec (User 0) e1 ++ " " ++ showPrec App e2)
 12  showPrec d (Abs v e) = showParens (d > Open) ("\\" ++ v ++ "." ++ show e)
 13  showPrec d (Var v) = v
 14  showPrec d (Let v e1 e2) = showParens (d > Open) ("let " ++ v ++ " = " ++ show e1 ++ " in " ++ show e2)
 15
 16data LambdaTokenKind
 17  = LTLambda
 18  | LTIdentifier
 19  | LTDot
 20  | LTOParen
 21  | LTCParen
 22  | LTIgnore
 23  | LTLet
 24  | LTEqual
 25  | LTIn
 26
 27Eq LambdaTokenKind where
 28  (==) LTLambda LTLambda = True
 29  (==) LTDot LTDot = True
 30  (==) LTIdentifier LTIdentifier = True
 31  (==) LTOParen LTOParen = True
 32  (==) LTCParen LTCParen = True
 33  (==) LTLet LTLet = True
 34  (==) LTEqual LTEqual = True
 35  (==) LTIn LTIn = True
 36  (==) _ _ = False
 37
 38Show LambdaTokenKind where
 39  show LTLambda = "LTLambda"
 40  show LTDot = "LTDot"
 41  show LTIdentifier = "LTIdentifier"
 42  show LTOParen = "LTOParen"
 43  show LTCParen = "LTCParen"
 44  show LTIgnore = "LTIgnore"
 45  show LTLet = "LTLet"
 46  show LTEqual = "LTEqual"
 47  show LTIn = "LTIn"
 48
 49LambdaToken : Type
 50LambdaToken = Token LambdaTokenKind
 51
 52Show LambdaToken where
 53  show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
 54
 55TokenKind LambdaTokenKind where
 56  TokType LTIdentifier = String
 57  TokType _ = ()
 58
 59  tokValue LTLambda _ = ()
 60  tokValue LTIdentifier s = s
 61  tokValue LTDot _ = ()
 62  tokValue LTOParen _ = ()
 63  tokValue LTCParen _ = ()
 64  tokValue LTIgnore _ = ()
 65  tokValue LTLet _ = ()
 66  tokValue LTEqual _ = ()
 67  tokValue LTIn _ = ()
 68
 69ignored : WithBounds LambdaToken -> Bool
 70ignored (MkBounded (Tok LTIgnore _) _ _) = True
 71ignored _ = False
 72
 73identifier : Lexer
 74identifier = alpha <+> many alphaNum
 75
 76keywords : List (String, LambdaTokenKind)
 77keywords = [
 78  ("let", LTLet),
 79  ("in", LTIn)
 80]
 81
 82lambdaTokenMap : TokenMap LambdaToken
 83lambdaTokenMap = toTokenMap [(spaces, LTIgnore)] ++
 84  [(identifier, \s =>
 85      case lookup s keywords of
 86        (Just kind) => Tok kind s
 87        Nothing => Tok LTIdentifier s
 88    )
 89  ] ++ toTokenMap [
 90    (exact "\\", LTLambda),
 91    (exact ".", LTDot),
 92    (exact "(", LTOParen),
 93    (exact ")", LTCParen),
 94    (exact "=", LTEqual)
 95  ]
 96
 97lexLambda : String -> Maybe (List (WithBounds LambdaToken))
 98lexLambda str =
 99  case lex lambdaTokenMap str of
100    (tokens, _, _, "") => Just tokens
101    _ => Nothing
102
103mutual
104  expr : Grammar state LambdaToken True Expr
105  expr = do
106    t <- term
107    app t <|> pure t
108
109  term : Grammar state LambdaToken True Expr
110  term = abs
111    <|> var
112    <|> paren
113    <|> letE
114
115  app : Expr -> Grammar state LambdaToken True Expr
116  app e1 = do
117    e2 <- term
118    app1 $ App e1 e2
119
120  app1 : Expr -> Grammar state LambdaToken False Expr
121  app1 e = app e <|> pure e
122
123  abs : Grammar state LambdaToken True Expr
124  abs = do
125    match LTLambda
126    commit
127    argument <- match LTIdentifier
128    match LTDot
129    e <- expr
130    pure $ Abs argument e
131
132  var : Grammar state LambdaToken True Expr
133  var = map Var $ match LTIdentifier
134
135  paren : Grammar state LambdaToken True Expr
136  paren = do
137    match LTOParen
138    e <- expr
139    match LTCParen
140    pure e
141
142  letE : Grammar state LambdaToken True Expr
143  letE = do
144    match LTLet
145    commit
146    argument <- match LTIdentifier
147    match LTEqual
148    e1 <- expr
149    match LTIn
150    e2 <- expr
151    pure $ Let argument e1 e2
152
153parseLambda : List (WithBounds LambdaToken) -> Either String Expr
154parseLambda toks =
155  case parse expr $ filter (not . ignored) toks of
156    Right (l, []) => Right l
157    Right e => Left "contains tokens that were not consumed"
158    Left e => Left (show e)
159
160parse : String -> Either String Expr
161parse x =
162  case lexLambda x of
163    Just toks => parseLambda toks
164    Nothing => Left "Failed to lex."

测试一下我们的分析器,得到的输出结果如下:

$ idris2 -p contrib LambdaCalculus.idr
Main> :exec printLn $ parse "let name = world in (\\x.hello x) name"
Right (let name = world in (\x.hello x) name)

表达式解析器

Idris 2 还在 Text.Parser.Expression 中配备了一个非常方便的表达式解析器,可以明确优先权和关联性。

名为 buildExpressionParser 的主函数接受一个 OperatorTable 和一个表示术语的 Grammar ,并返回一个解析后的表达式。魔法来自 OperatorTable ,因为该表定义了所有运算符及其语法、优先级和关联性。

一个 OperatorTable 是一个包含 Op 类型的列表。 Op 类型允许你指定 Prefix, Postfix, 和 Infix 运算符以及它们的语法。 Infix 也包含了名为 Assoc 的关联性,可以指定左关联性 AssocLeft ,右关联性 AssocRight ,以及非关联性 AssocNone

我们将在计算器中使用的运算符表的一个例子是:

[
  [ Infix (match CTMultiply >> pure (*)) AssocLeft
  , Infix (match CTDivide >> pure (/)) AssocLeft
  ],
  [ Infix (match CTPlus >> pure (+)) AssocLeft
  , Infix (match CTMinus >> pure (-)) AssocLeft
  ]
]

这张表定义了4个运算符,用于乘法、除法、加法和减法。乘法和除法出现在第一个表中,因为它们的优先级高于加法和减法,后者出现在第二个表中。我们还将它们定义为 infix 运算符,有一个特定的语法,并且都是通过 AssocLeft 进行左关联。

构建一个计算器

Calculator.idr
  1import Data.List1
  2import Text.Lexer
  3import Text.Parser
  4import Text.Parser.Expression
  5
  6%default total
  7
  8data CalculatorTokenKind
  9  = CTNum
 10  | CTPlus
 11  | CTMinus
 12  | CTMultiply
 13  | CTDivide
 14  | CTOParen
 15  | CTCParen
 16  | CTIgnore
 17
 18Eq CalculatorTokenKind where
 19  (==) CTNum CTNum = True
 20  (==) CTPlus CTPlus = True
 21  (==) CTMinus CTMinus = True
 22  (==) CTMultiply CTMultiply = True
 23  (==) CTDivide CTDivide = True
 24  (==) CTOParen CTOParen = True
 25  (==) CTCParen CTCParen = True
 26  (==) _ _ = False
 27
 28Show CalculatorTokenKind where
 29  show CTNum = "CTNum"
 30  show CTPlus = "CTPlus"
 31  show CTMinus = "CTMinus"
 32  show CTMultiply = "CTMultiply"
 33  show CTDivide = "CTDivide"
 34  show CTOParen = "CTOParen"
 35  show CTCParen = "CTCParen"
 36  show CTIgnore = "CTIgnore"
 37
 38CalculatorToken : Type
 39CalculatorToken = Token CalculatorTokenKind
 40
 41Show CalculatorToken where
 42    show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
 43
 44TokenKind CalculatorTokenKind where
 45  TokType CTNum = Double
 46  TokType _ = ()
 47
 48  tokValue CTNum s = cast s
 49  tokValue CTPlus _ = ()
 50  tokValue CTMinus _ = ()
 51  tokValue CTMultiply _ = ()
 52  tokValue CTDivide _ = ()
 53  tokValue CTOParen _ = ()
 54  tokValue CTCParen _ = ()
 55  tokValue CTIgnore _ = ()
 56
 57ignored : WithBounds CalculatorToken -> Bool
 58ignored (MkBounded (Tok CTIgnore _) _ _) = True
 59ignored _ = False
 60
 61number : Lexer
 62number = digits
 63
 64calculatorTokenMap : TokenMap CalculatorToken
 65calculatorTokenMap = toTokenMap [
 66  (spaces, CTIgnore),
 67  (digits, CTNum),
 68  (exact "+", CTPlus),
 69  (exact "-", CTMinus),
 70  (exact "*", CTMultiply),
 71  (exact "/", CTDivide)
 72]
 73
 74lexCalculator : String -> Maybe (List (WithBounds CalculatorToken))
 75lexCalculator str =
 76  case lex calculatorTokenMap str of
 77    (tokens, _, _, "") => Just tokens
 78    _ => Nothing
 79
 80mutual
 81  term : Grammar state CalculatorToken True Double
 82  term = do
 83    num <- match CTNum
 84    pure num
 85
 86  expr : Grammar state CalculatorToken True Double
 87  expr = buildExpressionParser [
 88    [ Infix ((*) <$ match CTMultiply) AssocLeft
 89    , Infix ((/) <$ match CTDivide) AssocLeft
 90    ],
 91    [ Infix ((+) <$ match CTPlus) AssocLeft
 92    , Infix ((-) <$ match CTMinus) AssocLeft
 93    ]
 94  ] term
 95
 96parseCalculator : List (WithBounds CalculatorToken) -> Either String Double
 97parseCalculator toks =
 98  case parse expr $ filter (not . ignored) toks of
 99    Right (l, []) => Right l
100    Right e => Left "contains tokens that were not consumed"
101    Left e => Left (show e)
102
103parse1 : String -> Either String Double
104parse1 x =
105  case lexCalculator x of
106    Just toks => parseCalculator toks
107    Nothing => Left "Failed to lex."

测试一下我们的计算器,就可以得到以下输出:

$ idris2 -p contrib Calculator.idr
Main> :exec printLn $ parse1 "1 + 2 - 3 * 4 / 5"
Right 0.6000000000000001