{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " \"Universiteit\n", " \n", "\n", " \"OpenDreamKit\"\n", "\n", "
\n", "\n", "# Cython tutorial\n", "\n", "**Jeroen Demeyer** (Ghent University, OpenDreamKit)\n", "\n", "## What is Cython?\n", "\n", "Cython is a programming language which combines Python and C (or C++ but we focus on C for now).\n", "The Cython compiler **translates a Cython program to C**.\n", "Cython programs look syntactically like Python but also allow to manipulate C variables.\n", "\n", "In order to use Cython from the Jupyter notebook, we need the following command\n", "(it suffices to do this just once per session):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext cython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic examples\n", "\n", "First in plain Python:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def sum_first_n(n):\n", " s = 0\n", " for i in range(n):\n", " s += i\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_first_n(100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time sum_first_n(10**7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to compile the above example with Cython, we just need to add `%%cython`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "def sum_first_n(n):\n", " s = 0\n", " for i in range(n):\n", " s += i\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_first_n(100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time sum_first_n(10**7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that even just adding `%%cython` already gives a small speed up.\n", "But that is just the beginning:\n", "The variables `i`, `n` and `s` are \"small\" integers (less than 2^63),\n", "so they fit in a C `long` (assuming a 64-bit machine).\n", "We now tell Cython to use a `long` for these variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "def sum_first_n(long n): # n is a C long\n", " cdef long s = 0 # s is a C long (and set to 0)\n", " cdef long i # i is a C long\n", " for i in range(n):\n", " s += i\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_first_n(100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time sum_first_n(10**7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We just gained 2 orders of magnitude in speed!\n", "However, using a `long` has hidden dangers, because it can overflow on large input:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_first_n(10**10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Seeing the generated C code\n", "\n", "Now what does this really mean anyway, \"compiling to C\"?\n", "The `--annotate` flag lets us find out:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython --annotate\n", "def sum_first_n(long n):\n", " cdef long s = 0\n", " cdef long i\n", " for i in range(n):\n", " s += i\n", " return s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare this to the C code generated when we do not declare the types:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython --annotate\n", "def sum_first_n(n):\n", " s = 0\n", " for i in range(n):\n", " s += i\n", " return s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the standard C library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "# A cimport is a \"C import\". This makes the\n", "# function sin() from available.\n", "from libc.math cimport sin\n", "\n", "def sum_sin_first_n(long n):\n", " cdef double s = 0 # s is a C double\n", " cdef long i\n", " for i in range(n):\n", " s += sin(i)\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_sin_first_n(10000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "from libc.stdio cimport *\n", "\n", "def readfilename(filename):\n", " \"\"\"\n", " Return the first 10000 characters of a text file\n", " \"\"\"\n", " # Cython knows that fopen() returns a FILE*, so we don't have\n", " # to declare this explicitly. It is allowed though.\n", " f = fopen(filename, \"r\") # implicit conversion bytes -> const char*\n", " if f is NULL:\n", " raise OSError(filename)\n", " cdef char data[10001]\n", " n = fread(data, 1, sizeof(data)-1, f)\n", " data[n] = 0 # add zero terminator\n", " fclose(f)\n", " return data # implicit conversion char* -> bytes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(readfilename(\"/etc/resolv.conf\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(readfilename(\"/no/such/file\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cython functions: `def`, `cdef` and `cpdef`\n", "\n", "We already saw that you can use `def` to define functions just like in Python.\n", "You can specify the types of the arguments: these are converted from a Python object to the C type.\n", "The return value is always a Python object.\n", "\n", "A `cdef` function is a pure C function: it can be only accessed from Cython itself, not from Python.\n", "The arguments and return value can be arbitrary types.\n", "Unlike `def` functions, this includes types (pointers for example) which cannot be converted to/from Python." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython --annotate\n", "cdef long square(long x):\n", " return x*x\n", "\n", "def sum_first_n_squares(n):\n", " cdef long s = 0\n", " cdef long i\n", " for i in range(n):\n", " s += square(i)\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_first_n_squares(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a `cpdef` function combines both a `def` and a `cdef` function:\n", "it can be called from Python and it can be called very efficiently from Cython:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython --annotate\n", "cpdef long square(long x):\n", " return x*x\n", "\n", "def sum_first_n_squares(n):\n", " cdef long s = 0\n", " cdef long i\n", " for i in range(n):\n", " s += square(i)\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum_first_n_squares(3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "square(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extension types (cdef classes)\n", "\n", "So far, we have only dealt with functions, but classes are what really makes Python great.\n", "So it's only natural that Cython supports optimized classes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "cdef class Mod:\n", " cdef long val, mod # cdef attributes\n", " \n", " def __init__(self, v, m):\n", " self.mod = m\n", " self.set_value(v)\n", " \n", " cdef void set_value(self, long v): # cdef method\n", " self.val = v % self.mod\n", " \n", " def __repr__(self):\n", " return f\"Mod({self.val}, {self.mod})\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Mod(22, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We continue this example and add support for addition.\n", "This is different from Python:\n", "cdef classes do not support `__radd__`, you can only implement `__add__`.\n", "However, either the first of second argument can be an instance of the class.\n", "To put it in a different way, `a + b` calls `type(a).__add__(a, b)` or `type(b).__add__(a, b)`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "cdef class Mod:\n", " cdef long val, mod # cdef attributes\n", " \n", " def __init__(self, v, m):\n", " self.mod = m\n", " self.set_value(v)\n", " \n", " cdef void set_value(self, long v): # cdef method\n", " self.val = v % self.mod\n", " \n", " def __repr__(self):\n", " return f\"Mod({self.val}, {self.mod})\"\n", " \n", " def __add__(left, right): # style: don't use \"self\"\n", " cdef Mod self, other\n", " if isinstance(left, Mod) and isinstance(right, Mod):\n", " self = left\n", " other = right\n", " if self.mod != other.mod:\n", " raise TypeError(f\"different moduli ({self.mod} and {other.mod})\")\n", " return Mod(self.val + other.val, self.mod)\n", " elif isinstance(left, Mod):\n", " self = left\n", " return Mod(self.val + right, self.mod)\n", " else:\n", " self = right\n", " return Mod(self.val + left, self.mod)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Mod(5, 12) + 7" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "5 + Mod(7, 12)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Mod(1, 3) + Mod(1, 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Mod(1, 3) + Mod(1, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that Cython also supports normal Python classes.\n", "These obviously do not support `cdef` or `cpdef` methods,\n", "but otherwise you can still put arbitrary Cython code in the methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More on declaring types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the `Mod` example, we had code like\n", " \n", " cdef Mod self\n", " [...]\n", " self = left\n", " \n", "We now investigate what this really does." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "def retlist(x):\n", " cdef list L = x\n", " return L" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "retlist([1,2,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "retlist((1,2,3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "retlist(None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we see that Cython checks for the class `list` but it also always allows `None`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now what's the point of this `cdef list`?\n", "For one, it was needed in the `Mod` example to access its cdef attributes and efficiently call cdef/cpdef methods.\n", "For standard Python types like `list`, it allows Cython to make optimizations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "def listsum1(L):\n", " cdef long i\n", " cdef long s = 0\n", " for i in range(len(L)):\n", " s += L[i]\n", " return s\n", "\n", "def listsum2(list L):\n", " cdef long i\n", " cdef long s = 0\n", " for i in range(len(L)):\n", " s += L[i]\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "L = list(range(10**6))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit listsum1(L)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit listsum2(L)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see a small speedup.\n", "The only reason why the gain is not larger is because Cython special-cases lists anyway\n", "when implementing `L[i]`.\n", "\n", "Some final notes about declaring types:\n", "\n", "- In some cases, Cython is able to infer the class and a `cdef` declaration is not needed.\n", " It is hard to give general rules about this though.\n", " When in doubt, add the `cdef` declaration anyway or check the generated C code.\n", "\n", "- `int`, `long`, `float` and `complex` always refer to C types.\n", " There is no way in Python to declare that something is a Python `float`\n", " (but you probably want to use a C `double` anyway).\n", " \n", "- For methods, the `self` argument is always considered an instance of the class.\n", " It is never needed to add a type to `self`, but it is allowed.\n", " As explained earlier, this is not the case for aithmetic special functions like `__add__`\n", " (but this is the only exception).\n", " \n", "- For standard classes like `list`, the class must match exactly: a subclass is not accepted.\n", " For cdef classes, subclasses *are* allowed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Type casting\n", "\n", "In the `listsum2` example, the check that `L` is really of type `list` takes a little time\n", "(implementation note: there are optimized checks for basic classes like `list` so in this specific case this slow-down is probably not significant).\n", "To avoid that, we can use an explicit cast:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "def listsum3(x):\n", " cdef list L = x\n", " cdef long i\n", " cdef long s = 0\n", " for i in range(len(L)):\n", " s += L[i]\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "L = list(range(1000))\n", "listsum3(L)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above is extremely dangerous however: if you pass an object of the wrong class, then Python will crash.\n", "You should only do this if you're guaranteed that the class is correct.\n", "\n", "Finally, the syntax `` is for a checked cast.\n", "It behaves very similar to simply assigning to a `cdef` variable of that class, except that `None` is not allowed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "def listsum4(x):\n", " cdef list L = x\n", " cdef long i\n", " cdef long s = 0\n", " for i in range(len(L)):\n", " s += L[i]\n", " return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "listsum4([1,2,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "listsum4((1,2,3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "listsum4(None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `cdef` attributes\n", "\n", "Going back to the `Mod` example, we have declared attributes `cdef long val` and `cdef long mod`.\n", "By default, these cannot be accessed from Python.\n", "It is possible to automatically allow Python access by declaring them\n", "as `cdef readonly` (read-only) or `cdef public` (read-write)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "cdef class Mod:\n", " cdef public long val\n", " cdef readonly long mod\n", " \n", " def __init__(self, v, m):\n", " self.mod = m\n", " self.set_value(v)\n", " \n", " cdef void set_value(self, long v): # cdef method\n", " self.val = v % self.mod\n", " \n", " def __repr__(self):\n", " return f\"Mod({self.val}, {self.mod})\"\n", " \n", " def __add__(left, right): # style: don't use \"self\"\n", " cdef Mod self, other\n", " if isinstance(left, Mod) and isinstance(right, Mod):\n", " self = left\n", " other = right\n", " if self.mod != other.mod:\n", " raise TypeError(f\"different moduli ({self.mod} and {other.mod})\")\n", " return Mod(self.val + other.val, self.mod)\n", " elif isinstance(left, Mod):\n", " self = left\n", " return Mod(self.val + right, self.mod)\n", " else:\n", " self = right\n", " return Mod(self.val + left, self.mod)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = Mod(3,5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.val = 19; a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With Python classes, you can assign arbitrary attributes.\n", "This works because those attributes are stored in the instance `__dict__`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class A(object):\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = A()\n", "a.foo = 42\n", "a.__dict__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cython classes do not have a `__dict__` by default.\n", "This means that you cannot assign arbitrary attributes.\n", "You can still explicitly add a `__dict__` if you want (at the expense of performance).\n", "Compare:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "cdef class B:\n", " pass\n", "\n", "cdef class C:\n", " cdef dict __dict__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = B()\n", "b.foo = 42" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c = C()\n", "c.foo = 42" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interrupting Cython code\n", "\n", "Cython code cannot be interrupted by CTRL-C.\n", "This is because interrupt and signal handling is handled by the Python bytecode interpreter, which Cython bypasses.\n", "\n", "Using the `cysignals.signals` package,\n", "you can still make Cython code interruptible but it needs to be done explicitly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "from cysignals.signals cimport sig_check\n", "\n", "def infinite_loop():\n", " while True:\n", " sig_check()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `sig_check()` command checks for a pending CTRL-C.\n", "If so, it raises a `KeyboardInterrupt` as usual.\n", "\n", "**WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING**\n", "\n", "This does not actually in the IPython notebook. It does work from the command line or in the SageMath notebook.\n", "So make sure to run this under the SageMath notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "infinite_loop()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`sig_check()` is good for interrupting loops in Cython,\n", "but what if you want to interrupt code from some external library?\n", "You cannot change that library to put `sig_check()` there.\n", "Instead, wrap the code that you want to interrupt in a `sig_on()`/`sig_off()` block.\n", "With great power comes great responsibility and this is great power:\n", "when an interrupt happens inside a `sig_on()`/`sig_off()` block,\n", "the C code is immediately interrupted without any kind of cleanup.\n", "This can easily lead to memory leaks or corrupted state.\n", "In particular, no Python operations should be done inside such a block.\n", "\n", "In the example below, let's pretend for the sake of argument\n", "that `infinite_loop` is implemented in an external library." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "from posix.unistd cimport sleep\n", "\n", "cdef void external_loop():\n", " while True:\n", " sleep(1)\n", " \n", "#---------------------------------- \n", "\n", "from cysignals.signals cimport sig_on, sig_off\n", "\n", "def infinite_loop():\n", " sig_on()\n", " external_loop()\n", " sig_off()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "infinite_loop()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A note on Python 2/3 compatibility\n", "\n", "It is good to understand that the Cython language tries to be as much as possible\n", "independent from the Python version.\n", "There is no \"Cython 2\" and \"Cython 3\".\n", "For example, Cython always supports `xrange`, no matter which Python version you use.\n", "It also supports syntax specific to Python 3 like `def f(foo, *, hello=\"hola\")`, `super()`,\n", "`yield from`, `f\"strings\"`, `class X(metaclass=...)` even when compiling on Python 2.\n", "\n", "Of course, this is limited to language features.\n", "If Python behaves differently, then that difference will be seen in Cython code too.\n", "For example, Cython does not help when dealing with `bytes` vs `unicode` issues." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises\n", "\n", "(1) Compute the first value $n$ for which $\\sum_{k=1}^n 1/k > 4 \\pi$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(2) Use the funtions [time()](https://en.cppreference.com/w/c/chrono/time) and [ctime()](https://en.cppreference.com/w/c/chrono/ctime) from `` to display the current time in a human-readable form." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(3) Write a `cdef` function `readfile()` reading the contents of a file, given as `FILE*`.\n", "Use this to reimplement the `readfilename()` example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(4) Implement `__sub__` and `__neg__` in the `Mod` example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(5a) Change the `listsum` example such that it can deal both with `list` and `tuple`, having optimized code in both cases\n", "\n", "(5b) Do this in such a way that it can be interrupted with CTRL-C (imagine that the user gives a very long list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 2 }