diff --git a/lectures/10-trees slides.pdf b/lectures/10-trees slides.pdf index 6d5fd52a8662e16e0138809d372ef35a633f72f5..045ef4c97789f48f5e0007aa83e0682d3c59b1ee 100644 Binary files a/lectures/10-trees slides.pdf and b/lectures/10-trees slides.pdf differ diff --git a/notebooks/08-binary-trees.ipynb b/notebooks/08-binary-trees.ipynb index b3f1f95405c04d7d96669eeacd384e7c319a273f..0c2e36db398e732613a3a72c9fbdee871f51762f 100644 --- a/notebooks/08-binary-trees.ipynb +++ b/notebooks/08-binary-trees.ipynb @@ -1004,7 +1004,7 @@ }, { "cell_type": "code", - "execution_count": 277, + "execution_count": 296, "id": "3a064bfb", "metadata": {}, "outputs": [], @@ -1013,25 +1013,18 @@ "from IPython.display import display\n", "\n", "def draw_binary_tree(tree_dict):\n", - " # Create a new graph\n", " dot = Digraph(format='png')\n", " \n", - " # Recursive function to add nodes and edges\n", " def add_nodes_and_edges(node, parent_name=None):\n", " if isinstance(node, dict):\n", " for key, value in node.items():\n", - " # Add the node\n", " dot.node(key, key)\n", - " # Add the edge to the parent (if it exists)\n", " if parent_name:\n", " dot.edge(parent_name, key)\n", - " # Recursively call the function for the children\n", " add_nodes_and_edges(value, key)\n", "\n", - " # Call the function to build the tree\n", " add_nodes_and_edges(tree_dict)\n", " \n", - " # Display the graph in the notebook\n", " display(dot)" ] }, diff --git a/notebooks/10-trees.ipynb b/notebooks/10-trees.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..5f29e68ab317339af079d125a68e01aa6770eeeb --- /dev/null +++ b/notebooks/10-trees.ipynb @@ -0,0 +1,832 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "09fc003e", + "metadata": {}, + "source": [ + "# UE5 Fundamentals of Algorithms\n", + "## Lecture 10: Trees\n", + "### Ecole Centrale de Lyon, Bachelor of Science in Data Science for Responsible Business\n", + "#### Romain Vuillemot\n", + "<center><img src=\"figures/Logo_ECL.png\" style=\"width:300px\"></center>" + ] + }, + { + "cell_type": "markdown", + "id": "74743087", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "f3ebe7d2", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Outline\n", + "- Definitions\n", + "- Data structures\n", + "- Weighted trees" + ] + }, + { + "cell_type": "markdown", + "id": "a4973a08", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Trees\n", + "\n", + "\n", + "> Tree is a hierarchical data structure with nodes connected by edges\n", + "\n", + "- A non-linear data structures (multiple ways to traverse it)\n", + "- Nodes are connected by only one path (a series of edges) so trees have no cycle\n", + "- Edges are also called links, they can be traversed in both ways (no orientation)\n", + "\n", + "Example of trees:\n", + "\n", + "- Binary trees, binary search trees, N-ary trees, recursive call trees, etc.\n", + "\n", + "- HOB (Horizontally Ordered Binary), AVL (Adelson-Velskii and Landis, self-balancing trees), ...\n", + "\n", + "- B-trees, forests, lattices, etc.\n" + ] + }, + { + "cell_type": "markdown", + "id": "e35608be", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Definitions on trees\n", + "\n", + "(similar to the ones for the binary trees)\n", + "\n", + "`Nodes` - a tree is composed of nodes that contain a `value` and `children`.\n", + "\n", + "`Edges` - are the connections between nodes; nodes may contain a value.\n", + "\n", + "`Root` - the topmost node in a tree; there can only be one root.\n", + "\n", + "`Parent and child` - each node has a single parent and up to two children.\n", + "\n", + "`Leaf` - no node below that node.\n", + "\n", + "`Depth` - the number of edges on the path from the root to that node.\n", + "\n", + "`Height` - maximum depth in a tree." + ] + }, + { + "cell_type": "markdown", + "id": "ce722126", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Definitions on trees (cont.)\n", + "\n", + "`N-ary Tree`- a tree in which each node can have up to $N$ children. Binary trees is the case where $N = 2$.\n", + "\n", + "`Weight` - a quantity is associated to the edges.\n", + "\n", + "`Degree` - the number of child nodes it has. Binary tree is the case where degree is 2.\n", + "\n", + "`Subtree` - a portion of a tree that is itself a tree.\n", + "\n", + "`Forest` - a collection of trees not connected to each other." + ] + }, + { + "cell_type": "markdown", + "id": "0612be20", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Data structures (dicts + lists)\n", + "\n", + "A simple way is the adjacency list using a dictionnary `dict` type." + ] + }, + { + "cell_type": "code", + "execution_count": 147, + "id": "b66a9451", + "metadata": {}, + "outputs": [], + "source": [ + "tree = {\n", + " \"a\": [\"b\", \"c\"],\n", + " \"b\": [\"d\", \"e\"],\n", + " \"c\": [\"f\"],\n", + " \"d\": [],\n", + " \"e\": [],\n", + " \"f\": []\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 148, + "id": "f2bce0eb", + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", + "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", + " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", + "<!-- Generated by graphviz version 7.1.0 (20230121.1956)\n", + " -->\n", + "<!-- Pages: 1 -->\n", + "<svg width=\"206pt\" height=\"188pt\"\n", + " viewBox=\"0.00 0.00 206.00 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", + "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n", + "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 202,-184 202,4 -4,4\"/>\n", + "<!-- a -->\n", + "<g id=\"node1\" class=\"node\">\n", + "<title>a</title>\n", + "<ellipse fill=\"none\" stroke=\"black\" cx=\"135\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n", + "<text text-anchor=\"middle\" x=\"135\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">a</text>\n", + "</g>\n", + "<!-- b -->\n", + "<g id=\"node2\" class=\"node\">\n", + "<title>b</title>\n", + "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n", + "<text text-anchor=\"middle\" x=\"99\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">b</text>\n", + "</g>\n", + "<!-- a->b -->\n", + "<g id=\"edge1\" class=\"edge\">\n", + "<title>a->b</title>\n", + "<path fill=\"none\" stroke=\"black\" d=\"M126.65,-144.76C122.42,-136.55 117.19,-126.37 112.42,-117.09\"/>\n", + "<polygon fill=\"black\" stroke=\"black\" points=\"115.68,-115.79 108,-108.49 109.46,-118.99 115.68,-115.79\"/>\n", + "</g>\n", + "<!-- c -->\n", + "<g id=\"node3\" class=\"node\">\n", + "<title>c</title>\n", + "<ellipse fill=\"none\" stroke=\"black\" cx=\"171\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\n", + "<text text-anchor=\"middle\" x=\"171\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">c</text>\n", + "</g>\n", + "<!-- a->c -->\n", + "<g id=\"edge2\" class=\"edge\">\n", + "<title>a->c</title>\n", + "<path fill=\"none\" stroke=\"black\" d=\"M143.35,-144.76C147.58,-136.55 152.81,-126.37 157.58,-117.09\"/>\n", + "<polygon fill=\"black\" stroke=\"black\" points=\"160.54,-118.99 162,-108.49 154.32,-115.79 160.54,-118.99\"/>\n", + "</g>\n", + "<!-- d -->\n", + "<g id=\"node4\" class=\"node\">\n", + "<title>d</title>\n", + "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n", + "<text text-anchor=\"middle\" x=\"27\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">d</text>\n", + "</g>\n", + "<!-- b->d -->\n", + "<g id=\"edge3\" class=\"edge\">\n", + "<title>b->d</title>\n", + "<path fill=\"none\" stroke=\"black\" d=\"M84.08,-74.5C74.23,-64.92 61.14,-52.19 49.97,-41.34\"/>\n", + "<polygon fill=\"black\" stroke=\"black\" points=\"52.59,-39 42.98,-34.54 47.71,-44.02 52.59,-39\"/>\n", + "</g>\n", + "<!-- e -->\n", + "<g id=\"node5\" class=\"node\">\n", + "<title>e</title>\n", + "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n", + "<text text-anchor=\"middle\" x=\"99\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">e</text>\n", + "</g>\n", + "<!-- b->e -->\n", + "<g id=\"edge4\" class=\"edge\">\n", + "<title>b->e</title>\n", + "<path fill=\"none\" stroke=\"black\" d=\"M99,-71.7C99,-64.41 99,-55.73 99,-47.54\"/>\n", + "<polygon fill=\"black\" stroke=\"black\" points=\"102.5,-47.62 99,-37.62 95.5,-47.62 102.5,-47.62\"/>\n", + "</g>\n", + "<!-- f -->\n", + "<g id=\"node6\" class=\"node\">\n", + "<title>f</title>\n", + "<ellipse fill=\"none\" stroke=\"black\" cx=\"171\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n", + "<text text-anchor=\"middle\" x=\"171\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">f</text>\n", + "</g>\n", + "<!-- c->f -->\n", + "<g id=\"edge5\" class=\"edge\">\n", + "<title>c->f</title>\n", + "<path fill=\"none\" stroke=\"black\" d=\"M171,-71.7C171,-64.41 171,-55.73 171,-47.54\"/>\n", + "<polygon fill=\"black\" stroke=\"black\" points=\"174.5,-47.62 171,-37.62 167.5,-47.62 174.5,-47.62\"/>\n", + "</g>\n", + "</g>\n", + "</svg>\n" + ], + "text/plain": [ + "<graphviz.graphs.Digraph at 0x1106595a0>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "draw_tree(tree)" + ] + }, + { + "cell_type": "markdown", + "id": "0aa22e17", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Data structures (dicts + named lists)\n", + "\n", + "- A variation is to use a named variable for the list." + ] + }, + { + "cell_type": "code", + "execution_count": 168, + "id": "16070d37", + "metadata": {}, + "outputs": [], + "source": [ + "tree = {\n", + " \"a\": {\"neighbors\": [\"b\", \"c\"]},\n", + " \"b\": {\"neighbors\": [\"d\", \"e\"]},\n", + " \"c\": {\"neighbors\": [\"f\"]},\n", + " \"d\": {\"neighbors\": []},\n", + " \"e\": {\"neighbors\": []},\n", + " \"f\": {\"neighbors\": []}\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 169, + "id": "4c182c85", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['b', 'c']" + ] + }, + "execution_count": 169, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tree[\"a\"][\"neighbors\"]" + ] + }, + { + "cell_type": "markdown", + "id": "c3c23285", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Data structures (sets)\n", + "\n", + "- The children are not ordered\n", + "- Children names are unique" + ] + }, + { + "cell_type": "code", + "execution_count": 115, + "id": "d996b53e", + "metadata": {}, + "outputs": [], + "source": [ + "tree = {\n", + " \"a\": set([\"b\", \"c\"]),\n", + " \"b\": set([\"d\", \"e\"]),\n", + " \"c\": set([\"f\"]),\n", + " \"d\": set(),\n", + " \"e\": set(),\n", + " \"f\": set()\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "735ef0c3", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Data structures (lists of lists)\n", + "\n", + "- Each node is an entry in the list\n", + "- Childre are sub-lists" + ] + }, + { + "cell_type": "code", + "execution_count": 122, + "id": "1cfece72", + "metadata": {}, + "outputs": [], + "source": [ + "tree_list = [\n", + " ['a', ['b', 'c']],\n", + " ['b', ['d', 'e']],\n", + " ['c', ['f', 'g']],\n", + " ['d', []],\n", + " ['e', []],\n", + " ['f', []],\n", + " ['g', []] \n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "ec31a4a3", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Data structures (tuples)\n", + "\n", + "- Each node is the first tuple\n", + "- Children are additionnal tuply entries" + ] + }, + { + "cell_type": "code", + "execution_count": 119, + "id": "0ed87f90", + "metadata": {}, + "outputs": [], + "source": [ + "tree = (\"a\", [\n", + " (\"b\", []),\n", + " (\"c\", [\n", + " (\"d\", [\n", + " (\"e\", [])\n", + " ])\n", + " ])\n", + "])" + ] + }, + { + "cell_type": "markdown", + "id": "f30c5bc6", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Class object\n", + "\n", + "- The object contains a value and an unrestricted list of children" + ] + }, + { + "cell_type": "code", + "execution_count": 176, + "id": "7d12baee", + "metadata": {}, + "outputs": [], + "source": [ + "class Node:\n", + " def __init__(self, value, children = []):\n", + " self.value = value\n", + " self.children = children\n", + "\n", + " def get_all_nodes(self):\n", + " nodes = [self.value]\n", + " for child in self.children:\n", + " nodes += child.get_all_nodes()\n", + " return nodes\n", + " \n", + " def get_all_nodes_iterative(self):\n", + " nodes = []\n", + " stack = [self]\n", + " while stack:\n", + " current_node = stack.pop()\n", + " nodes.append(current_node.value)\n", + " stack += current_node.children\n", + " return nodes" + ] + }, + { + "cell_type": "code", + "execution_count": 177, + "id": "a1b3ac88", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "root = Node(\"a\", [\n", + " Node(\"b\", [\n", + " Node(\"d\"),\n", + " Node(\"e\"),\n", + " ]),\n", + " Node(\"c\", [\n", + " Node(\"f\"),\n", + " ]),\n", + "])\n", + "\n", + "# or using root.children" + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "id": "48cb2472", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'b', 'd', 'e', 'c', 'f']" + ] + }, + "execution_count": 178, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "root.get_all_nodes()" + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "id": "95477007", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'c', 'f', 'b', 'e', 'd']" + ] + }, + "execution_count": 175, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "root.get_all_nodes_iterative()" + ] + }, + { + "cell_type": "markdown", + "id": "f5e99024", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Weighted trees\n", + "\n", + "> Trees with a quantity associated to the edges\n", + "\n", + "- Since we have a tree a way to store weights is using nodes values\n", + "- Root note weight is $0$" + ] + }, + { + "cell_type": "markdown", + "id": "c4544dce", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Data structures (dicts for edges)\n", + "\n", + "- To encode values in edges we need to add an extra value" + ] + }, + { + "cell_type": "code", + "execution_count": 134, + "id": "cc40dc55", + "metadata": {}, + "outputs": [], + "source": [ + "tree = {'a': [{'b': 0}, {'c': 0}],\n", + " 'b': [{'d': 0}, {'e': 0}],\n", + " 'c': [{'f': 0}],\n", + " 'd': [],\n", + " 'e': []\n", + " }\n" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "id": "69301406", + "metadata": {}, + "outputs": [], + "source": [ + "tree = {\n", + " 'a': [('b', 0), ('c', 0)],\n", + " 'b': [('d', 0), ('e', 0)],\n", + " 'c': [('f', 0)],\n", + " 'd': [],\n", + " 'e': []\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "44b1c278", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Weigthted trees as classes" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "id": "cad00333", + "metadata": {}, + "outputs": [], + "source": [ + "class Node_weight:\n", + " def __init__(self, data, weight=0):\n", + " self.data = data\n", + " self.children = []\n", + " self.weight = weight\n", + "\n", + " \n", + "tree = Node_weight(1)\n", + "child1 = Node_weight(2, weight=5)\n", + "child2 = Node_weight(3, weight=7)\n", + "tree.children = [child1, child2]" + ] + }, + { + "cell_type": "markdown", + "id": "d6e804ec", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Exercise: Calculate the total weight of a tree\n", + "\n", + "Go through all the nodes.." + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "id": "a3d02030", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def get_tree_edges(root):\n", + " edges = []\n", + " stack = [(root, None)]\n", + "\n", + " while stack:\n", + " node, parent_data = stack.pop()\n", + " \n", + " for child in node.children:\n", + " stack.append((child, node.data))\n", + " edges.append((node.data, child.data, child.weight))\n", + "\n", + " return edges" + ] + }, + { + "cell_type": "code", + "execution_count": 138, + "id": "ea86d253", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[(1, 2, 5), (1, 3, 7)]" + ] + }, + "execution_count": 138, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tree = Node_weight(1)\n", + "child1 = Node_weight(2, weight=5)\n", + "child2 = Node_weight(3, weight=7)\n", + "tree.children = [child1, child2]\n", + "get_tree_edges(tree)" + ] + }, + { + "cell_type": "code", + "execution_count": 139, + "id": "461303a6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "12" + ] + }, + "execution_count": 139, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(tpl[2] for tpl in get_tree_edges(tree))" + ] + }, + { + "cell_type": "markdown", + "id": "b5554daf", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Exercise: Calculate the total weight of a tree\n", + "\n", + "A recursive version:" + ] + }, + { + "cell_type": "code", + "execution_count": 142, + "id": "6118a8e3", + "metadata": {}, + "outputs": [], + "source": [ + "def calculate_total_weight(node):\n", + " total_weight = node.weight\n", + " for child in node.children:\n", + " total_weight += calculate_total_weight(child)\n", + " return total_weight" + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "id": "e9742977", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "12" + ] + }, + "execution_count": 143, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "calculate_total_weight(tree)" + ] + }, + { + "cell_type": "markdown", + "id": "2949e143", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# An Edge class for edges\n", + "\n", + "- To consider edges as objects" + ] + }, + { + "cell_type": "code", + "execution_count": 145, + "id": "6958cf7d", + "metadata": {}, + "outputs": [], + "source": [ + "class Edge:\n", + " def __init__(self, source, target):\n", + " self.source = source\n", + " self.target = target\n", + "\n", + "class Node:\n", + " def __init__(self, label):\n", + " self.label = label\n", + " self.children = []\n", + "\n", + "class Tree:\n", + " def __init__(self, root_label):\n", + " self.root = Node(root_label)\n", + " self.edges = []" + ] + }, + { + "cell_type": "markdown", + "id": "7dc8b845", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Visualize a tree" + ] + }, + { + "cell_type": "code", + "execution_count": 146, + "id": "9054d359", + "metadata": {}, + "outputs": [], + "source": [ + "from graphviz import Digraph\n", + "from IPython.display import display\n", + "\n", + "def draw_tree(T):\n", + " dot = Digraph(format='png')\n", + "\n", + " def add_nodes_and_edges(tree, parent_name=None):\n", + " for parent, children in tree.items():\n", + " dot.node(parent, parent)\n", + " if parent_name:\n", + " dot.edge(parent_name, parent)\n", + " add_nodes_and_edges({child: [] for child in children}, parent)\n", + "\n", + " add_nodes_and_edges(T)\n", + " \n", + " display(dot)" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}