diff --git a/01-data-structures-complexity.ipynb b/01-data-structures-complexity.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..e1b8d697b1aecdfea66bc0e70aba06d5ce3b158d --- /dev/null +++ b/01-data-structures-complexity.ipynb @@ -0,0 +1,2203 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b50844ff", + "metadata": { + "slideshow": { + "slide_type": "slide" + }, + "tags": [ + "definition" + ] + }, + "source": [ + "# UE5 Fundamentals of Algorithms\n", + "## Lecture 1: Introduction\n", + "### Ecole Centrale de Lyon, Bachelor of Science in Data Science for Responsible Business\n", + "#### Romain Vuillemot\n", + "<center><img src=\"figures/Logo_ECL.png\" style=\"width:300px\"></center>" + ] + }, + { + "cell_type": "markdown", + "id": "2ada5ceb", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Outline\n", + "- Definition and examples of algorithms\n", + "- Algorithms properties\n", + "- Complexity analysis\n", + "- Data structures\n", + "- Empirical complexity analysis" + ] + }, + { + "cell_type": "markdown", + "id": "2f278159", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "f828e797", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## What is an algorithm?\n", + "\n", + "### Definition\n", + "\n", + "> An algorithm is a **set of unambiguous instructions** designed to solve a problem.\n", + "\n", + "\n", + "### History\n", + "\n", + "The earliest algorithms, originating from the name **Mūsā al-Khwārizmī**, a Persian mathematician from the 9th century. For more information, visit https://mathematical-tours.github.io/algorithms/.\n", + "\n", + "Back to ancient civilizations, such as the Egyptians and Babylonians, developed algorithms for **basic arithmetic operations**, like addition and multiplication. Euclid's algorithm, developed around 300 BCE, is **one of the earliest known algorithms** and is used to find the greatest common divisor (GCD) of two numbers.\n" + ] + }, + { + "cell_type": "markdown", + "id": "a3b03927", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Question\n", + "\n", + "- Are you aware of any algorithm?" + ] + }, + { + "cell_type": "markdown", + "id": "0e1b605e", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "- Do you know how they work?\n", + "- Do you think they work perfectly? \n", + "- Can they be biased or make non-optimal decisions?" + ] + }, + { + "cell_type": "markdown", + "id": "598f0585", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Notes\n", + "\n", + "- The representation (or sometimes translation) into a programming language is not reciprocal: **not every program is an algorithm.**\n", + "\n", + "- For example, reactive programs (handling input/output) or those containing animations do not terminate because they are always waiting for input. They do not constitute algorithms in the strict sense.\n", + "\n", + "- Algorithms are language-agnostic; they describe the logic and steps needed to solve a problem, but not the specific coding details." + ] + }, + { + "cell_type": "markdown", + "id": "13950423", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example: Euclid's algorithm\n", + "\n", + "One of the earliest algorithm: Euclid's algorithm to compute the greatest common divisor of two integers a and b: " + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "1ab6e76d", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def gcd(a, b):\n", + " while b != 0:\n", + " t = b\n", + " b = a % b\n", + " a = t\n", + " return a\n", + "\n", + "gcd(10, 20) # 10" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "e594e658", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "assert gcd(12, 18) == 6 # GCD of 12 and 18 is 6\n", + "assert gcd(1071, 462) == 21 # GCD of 1071 and 462 is 21\n", + "assert gcd(0, 8) == 8 # GCD of 0 and 8 is 8\n", + "assert gcd(25, 0) == 25 # GCD of 25 and 0 is 25\n", + "assert gcd(-12, 18) == 6 # GCD of -12 and 18 is 6" + ] + }, + { + "cell_type": "markdown", + "id": "d06be14a", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### How do you check an algorithm is correct?\n", + "\n", + "- **Mathematical Proof:** a formal and rigorous method of demonstrating that an algorithm is correct.\n", + "- **Code Review:** a collaborative process where one or more peers review the code implementation of an algorithm.\n", + "- **Test Cases:** sets of inputs and expected outputs used to validate that an algorithm produces correct results.\n", + "\n", + "For **test cases:**, the ```assert``` statement is used to check whether a given condition evaluates to ```True```, then the program continues to execute normally. If the condition is ```False```, an ```AssertionError``` exception is raised, and the program stops executing.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b4756712", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### How do you check an algorithm is correct? (cont.)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "id": "c978e5c9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All test cases passed!\n" + ] + } + ], + "source": [ + "def add(a, b): # function to test\n", + " return a + b\n", + "\n", + "assert add(2, 3) == 5, \"Test Case 1 Failed\" \n", + "assert add(-1, 1) == 0, \"Test Case 2 Failed\" \n", + "assert add(0, 0) == 0, \"Test Case 3 Failed\" \n", + "assert add(10, -5) == 5, \"Test Case 4 Failed\"\n", + "\n", + "print(\"All test cases passed!\")" + ] + }, + { + "cell_type": "markdown", + "id": "b3440a3c", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: x power n\n", + "\n", + "An algorithm (and tests) that calculates $x^n$:" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "id": "0eec5ad4", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def puissance(x, n):\n", + " if n == 0:\n", + " return 1\n", + " elif n % 2 == 0:\n", + " temp = puissance(x, n // 2)\n", + " return temp * temp\n", + " elif n < 0:\n", + " temp = puissance(x, -(n + 1) // 2)\n", + " return 1 / (temp * temp * x)\n", + " else:\n", + " temp = puissance(x, (n - 1) // 2)\n", + " return temp * temp * x" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "id": "265c8dbf", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "assert puissance(2, 3) == 8\n", + "assert puissance(5, 0) == 1\n", + "assert puissance(3, -2) == 1/9\n", + "assert puissance(2, 10) == 1024\n", + "assert puissance(2, -3) == 1/8\n", + "assert puissance(2, 1) == 2" + ] + }, + { + "cell_type": "markdown", + "id": "42bd90e0", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: The sum of the first n integers\n", + "\n", + "An algorithm (and tests) that calculates $\\sum_{i=1}^{n} x_i$:" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "id": "c97cd26b", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def sum_n(n):\n", + " return n*(n+1)/2\n", + "\n", + "assert sum_n(1) == 1 # 1\n", + "assert sum_n(2) == 3 # 1 + 2\n", + "assert sum_n(3) == 6 # 1 + 2 + 3\n", + "assert sum_n(4) == 10 # 1 + 2 + 3 + 4\n", + "assert sum_n(5) == 15 # 1 + 2 + 3 + 4 + 5\n", + "assert sum_n(1000) == 500500 # .." + ] + }, + { + "cell_type": "markdown", + "id": "f4b2a737", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: Leap year\n", + "\n", + "Write a function `is_leap_year` that takes a year as input and returns `True` if it's a leap year and `False`\n", + " otherwise. The function follows the rules for leap year determination:\n", + "\n", + "- A year that is divisible by 4 is a leap year.\n", + "- However, a year that is divisible by 100 is not a leap year, unless...\n", + "- The year is also divisible by 400, in which case it is a leap year.\n", + "\n", + "E.g 2000 is a leap year, 2020 is a leap year." + ] + }, + { + "cell_type": "markdown", + "id": "6086ec57", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: Leap year (cont.)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "8630973c", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2000 is a leap year.\n", + "2020 is a leap year.\n", + "2100 is not a leap year.\n", + "2400 is a leap year.\n" + ] + } + ], + "source": [ + "def is_leap_year(year):\n", + " if (year % 4 == 0 and year % 100 != 0) or (year % 400 == 0):\n", + " return True\n", + " else:\n", + " return False\n", + "\n", + "test_years = [2020, 2100, 2400]\n", + "\n", + "for year in test_years:\n", + " if is_leap_year(year):\n", + " print(f\"{year} is a leap year.\")\n", + " else:\n", + " print(f\"{year} is not a leap year.\")" + ] + }, + { + "cell_type": "markdown", + "id": "eb56b548", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Another possible test: compare to the Python [isLeap](https://github.com/python/cpython/blob/607f18c89456cdc9064e27f86a7505e011209757/Lib/calendar.py#L141) from the `calendar` module." + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "id": "c3a28d0e", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "import calendar\n", + "\n", + "def is_leap_year(year):\n", + " return calendar.isleap(year)" + ] + }, + { + "cell_type": "markdown", + "id": "68fa27fe", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: Find a number in a list\n", + "\n", + "Given a list of integer, return a specific number provided as parameter" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "id": "8b2ba103", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def search_element_in_list(element, list):\n", + "\n", + " for i in list:\n", + " if i == element:\n", + " return True\n", + " return False\n", + "\n", + "element_list = [1, 2, 3, 4, 5]\n", + "element_to_find = 3\n", + "result = search_element_in_list(element_to_find, element_list)\n", + "assert result == True, f\"Expected True, but got {result}\"" + ] + }, + { + "cell_type": "markdown", + "id": "6dbe39e2", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Another type of test is to compare with a built-in Python function:" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "id": "40147ce1", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "def search_element_in_list_python(element, lst):\n", + " return element in lst\n", + "\n", + "assert search_element_in_list(element_to_find, element_list) == search_element_in_list(element_to_find, element_list)" + ] + }, + { + "cell_type": "markdown", + "id": "15b9f515", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Algorithms properties" + ] + }, + { + "cell_type": "markdown", + "id": "b78724b1", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Properties\n", + "\n", + "An algorithm possesses the following properties (among others):\n", + "\n", + "- Communicable\n", + "- Efficient\n", + "- Complete, terminates, and correct\n", + "- Deterministic" + ] + }, + { + "cell_type": "markdown", + "id": "8bccda63", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Communicate algorithms\n", + "\n", + "There are different ways to write algorithms. There is no optimal one, it depends on the context. Examples of contexts are:\n", + "\n", + "- Plain language (pseudo-code)\n", + "- Formalization such as an equation\n", + "- A software specification\n", + "- Implementation in a programming language" + ] + }, + { + "cell_type": "markdown", + "id": "60a6156f", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Plain language (pseudo-code)\n", + "\n", + "The pseudocode is a way to write algorithms in a human-readable way. It is not a programming language, but it is close to it. It is a way to communicate algorithms. E.g. for Euclid's algorithm:\n", + "\n", + "- Divide a by b, and you get the remainder r.\n", + "- Replace a with b.\n", + "- Replace b with r.\n", + "- Continue as long as it's possible; otherwise, you get the GCD (Greatest Common Divisor).\n", + "\n", + "or\n", + " \n", + "```\n", + "function gcd(a, b)\n", + " while b ≠ 0\n", + " t := b; \n", + " b := a mod b; \n", + " a := t; \n", + " return a;\n", + " ````" + ] + }, + { + "cell_type": "markdown", + "id": "9d16ff11", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Equation\n", + "\n", + "You can use mathematical equations and notations to describe certain aspects of the algorithm's behavior or to express mathematical relationships within the algorithm. \n", + "\n", + "- $\\sum_{i=1}^{n} x_i$\n", + "\n", + "- $Fn = Fn-1 + Fn-2$\n", + "\n", + "- μ = (Σx) / N\n", + "\n", + "- $PR_{t+1}(P_i) = \\sum_{P_j} \\frac{PR_t(P_j)}{C(P_j)}$" + ] + }, + { + "cell_type": "markdown", + "id": "b2f99051", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Graphics\n", + "\n", + "Graphical representations of algorithms are visual ways to illustrate the flow, logic, and structure of an algorithm. They are often used to aid in understanding, designing, and communicating algorithms, especially in algorithm design and computer science education. There are various types of graphical representations, and the choice depends on the complexity and purpose of the algorithm. \n", + "\n", + "<img src=\"figures/flowchart.png\" width=150></img>\n", + "\n", + "source: https://commons.wikimedia.org/wiki/File:Euclid_flowchart.svg\n" + ] + }, + { + "cell_type": "markdown", + "id": "861e2f0b", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Code (Python)\n", + "\n", + "Code (Python, Java, ..); example in Python:\n", + "\n", + "```python\n", + "def gcd(a, b):\n", + " while b != 0:\n", + " t = b\n", + " b = a % b\n", + " a = t\n", + " return a\n", + "```\n", + "\n", + "In Java:\n", + "\n", + "```java\n", + "public class GCD {\n", + " public static int gcd(int a, int b) {\n", + " while (b != 0) {\n", + " int t = b;\n", + " b = a % b;\n", + " a = t;\n", + " }\n", + " return a;\n", + " }\n", + "}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "55662526", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Discussion on the type of representation\n", + "\n", + "There are different ways to express an algorithm, depending on the context and the level of formalization required.\n", + "\n", + "- **Graphical representation** is more accessible and provides an overview, allowing for the detection of errors, patterns, etc. Humans have better perception abilities in the visual space than in text.\n", + "\n", + "- **Pseudo-language** has the characteristic of being flexible, close to both human and computer languages, and independent of a programming language. However, it is often defined ambiguously and requires additional effort for implementation.\n", + "\n", + "- Finally, **implementation (e.g., Python)** has the advantage of being immediately testable. However, it can be very strict (must be correct) and sometimes challenging to read if one is not familiar with the language. This also depends on the programmer." + ] + }, + { + "cell_type": "markdown", + "id": "39e29af0", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Efficiency\n", + "\n", + "> An algorithm is considered **efficient** if it minimizes the consumption of resources required to perform it.\n", + "\n", + "\n", + "Efficiency is relative to various criteria (values we want to measure) that need to be calculated (theoretically) or measured (empirically) in order to understand what is happening. Note that it is necessary to use large values of $n$ to obtain a representative behavior. Among these criteria:\n", + "\n", + "- Execution time\n", + "\n", + "- Required memory space\n", + "\n", + "- Disk storage space\n", + "\n", + "- Etc.\n", + "\n", + "We will see later that the concept of **Complexity** is based on one of these criteria and allows independence from the technology used (language, computer, compiler, etc.).\n" + ] + }, + { + "cell_type": "markdown", + "id": "a9849a9c", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example:\n", + "\n", + "In genomics, it is common to compare two sequences (of genes) of lengths $N$ and $M$ (e.g., $\\texttt{TAG CAC}$ and $\\texttt{TGC TTG}$).\n", + "\n", + "- The number of comparisons is $N \\times M$.\n", + "\n", + "- If the size of the sequences doubles, then the number of comparisons... quadruples!\n", + "\n", + "- $(2 \\times N) \\times (2 \\times M) = 4 \\times (N \\times M)$.\n", + "\n", + "- Now, if we want to align 3 sequences, it becomes $N^{3}$.\n", + "\n", + "In practice, it becomes challenging to find a solution quickly (especially when comparing more than 2 sequences).\n", + "\n", + "\n", + "$\\rightarrow$ The same applies to long sequences.\n", + "\n", + "$\\rightarrow$ Therefore, it is necessary to have an efficient algorithm (in the case of sequence comparison, consider the [BLAST algorithm](https://en.wikipedia.org/wiki/BLAST) (Basic Local Alignment Search Tool)).\n" + ] + }, + { + "cell_type": "markdown", + "id": "b48e5cc5", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Other properties\n", + "\n", + "Other qualities of an algorithm (beyond being simple and understandable):\n", + "\n", + "---\n", + "\n", + "> **Completeness**: An algorithm must be complete, meaning that for a given problem, it provides a solution for each of the inputs.\n", + "\n", + "---\n", + "\n", + "> **Termination**: An algorithm must terminate within a finite time.\n", + "\n", + "---\n", + "\n", + "> **Correctness**: An algorithm must be correct and terminate by providing a result that is the solution to the problem it is supposed to solve.\n", + "\n", + "---\n", + "\n", + "$\\rightarrow$ All of this is very difficult to prove (formal proof, etc.)!\n" + ] + }, + { + "cell_type": "markdown", + "id": "0af6998c", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Algorithms patterns\n", + "\n", + "An algorithm has a **pattern**, which is a way to classify algorithms based on their properties.\n", + "\n", + "- There are several ways to design algorithms, either based on performance constraints or based on the structural style.\n", + "\n", + "- There is not a single unique algorithm for a given problem.\n", + "\n", + "Examples of patterns (main ones):\n", + "\n", + "- By purpose\n", + "- By implementation (e.g., **recursion**, functional, etc.)\n", + "- By **design paradigm** (Divide and Conquer, etc.)\n", + "- By **complexity**\n" + ] + }, + { + "cell_type": "markdown", + "id": "3d5ba2a2", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Complexity" + ] + }, + { + "cell_type": "markdown", + "id": "e260b392", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### What is complexity?\n", + "\n", + "> The **complexity of an algorithm** is the formal estimation of the amount of resources required to execute an algorithm. These resources can include time, memory space, storage, etc. \n", + "\n", + "There are different types of complexity:\n", + "\n", + "- **Best Case:** The _smallest_ number of operations the algorithm will have to execute on a dataset of a fixed size.\n", + "\n", + "- **Worst Case:** This is the _largest_ number of operations the algorithm will have to execute on a dataset of a fixed size.\n", + "\n", + "- **Average Case:** This is the _average_ of the algorithm's complexities on datasets of a fixed size.\n", + "\n", + "\n", + "Note: It is often the worst-case analysis that is chosen (provides an upper performance limit). The complexity in terms of the number of operations is typically the most studied.\n" + ] + }, + { + "cell_type": "markdown", + "id": "41ff1b6e", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "<img src=\"figures/big-o-chart.png\" width=75%>" + ] + }, + { + "cell_type": "markdown", + "id": "37fee118", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find the complexity\n", + "\n", + "```python\n", + " def maximum(L):\n", + " m=L[0]\n", + " for i in range(1,len(L)):\n", + " if L[i]>m:\n", + " m=L[i]\n", + " return m\n", + "````\n" + ] + }, + { + "cell_type": "markdown", + "id": "7420338d", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + " $\\mathcal{O}(n)$\n", + " \n", + " (goes through the whole list in the worst case scenario)" + ] + }, + { + "cell_type": "markdown", + "id": "1c68b012", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Intuition behind the complexity calculation\n", + "\n", + "| Notation | Complexity | Intuition |\n", + "| -------------------- | ---------------- | ------------------------------------------------ |\n", + "| $\\mathcal{O} 1$ | Constant | First or nth element of a list, ... |\n", + "| $\\mathcal{O} log n$ | Logarithmic | Divide in half and repeat, ... |\n", + "| $\\mathcal{O} n$ | Linear | Traverse data, ... |\n", + "| $\\mathcal{O} nlog n$ | Quasi-Linear | Divide in half and combine, ... |\n", + "| $\\mathcal{O}n^{2}$ | Quadratic | Traverse data with 2 loops, ... |\n", + "| $\\mathcal{O}2^{n}$ | Exponential | Test all combinations, ... |\n", + "| $\\mathcal{O}n^k$, k >2 | Polynomial | Traverse data with k loops, ... |\n", + "| $\\mathcal{O}n!$ | Factorial | Test all paths (graph), ... |" + ] + }, + { + "cell_type": "markdown", + "id": "354c08c0", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find the complexity" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "id": "65be2326", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "def nocc(x,L):\n", + " n=0\n", + " for y in L:\n", + " if x==y:\n", + " n=n+1\n", + " return n" + ] + }, + { + "cell_type": "markdown", + "id": "c2853423", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "$\\mathcal{O}(n)$\n", + " \n", + "(goes through the whole list in the worst case scenario)" + ] + }, + { + "cell_type": "markdown", + "id": "a667f037", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find the complexity" + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "id": "41fee19f", + "metadata": {}, + "outputs": [], + "source": [ + "def maj(L):\n", + " xmaj=L[0]\n", + " nmaj=nocc(xmaj,L)\n", + " for i in range(1,len(L)):\n", + " if nocc(L[i],L)>nmaj:\n", + " xmaj=L[i]\n", + " nmaj=nocc(L[i],L)\n", + " return xmaj" + ] + }, + { + "cell_type": "markdown", + "id": "ad350594", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "$\\mathcal{O}(n^{2})$" + ] + }, + { + "cell_type": "markdown", + "id": "20906f5f", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find the complexity\n", + "\n", + "The complexity of an `is_even(n)`algorithm that takes an integer `n` as input and returns `True` if n is an even number and `False`` otherwise." + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "id": "f6306bf2", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def is_even(n):\n", + " return n % 2 == 0" + ] + }, + { + "cell_type": "markdown", + "id": "41b3bb3c", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "$\\mathcal{O}(1)$" + ] + }, + { + "cell_type": "markdown", + "id": "872695b1", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find the complexity" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "id": "206fae84", + "metadata": {}, + "outputs": [], + "source": [ + "def somcubes(n):\n", + " s = 0\n", + " while n>0:\n", + " s = s+(n%10)**3\n", + " n = n//10\n", + " return s\n", + "\n", + "\n", + "def eq_somcubes(N):\n", + " L = []\n", + " for n in range(0, N+1):\n", + " if n==somcubes(n):\n", + " L.append(n)\n", + " return L" + ] + }, + { + "cell_type": "markdown", + "id": "83fa5d5d", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "$\\mathcal{O}(nlog(n))$ (we seek numbers that are equal to the sum of the cubes of their digits)." + ] + }, + { + "cell_type": "markdown", + "id": "6a617a3b", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find the complexity\n", + "\n", + "You have two sorted lists, `[1, 3, 8, 10]` and `[2, 3, 9]``, and you want to obtain a new merged list from these two lists (without using sorting functions like sort or sorted). What is the complexity?" + ] + }, + { + "cell_type": "markdown", + "id": "01f5db81", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "We iterate through all the data once: $O(n)$." + ] + }, + { + "cell_type": "code", + "execution_count": 120, + "id": "ae749b90", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 2, 3, 3, 8, 9, 10]\n" + ] + } + ], + "source": [ + "def merge_sorted_lists(list1, list2):\n", + " merged_list = []\n", + " i = j = 0\n", + "\n", + " while i < len(list1) and j < len(list2):\n", + " if list1[i] < list2[j]:\n", + " merged_list.append(list1[i])\n", + " i += 1\n", + " else:\n", + " merged_list.append(list2[j])\n", + " j += 1\n", + "\n", + " while i < len(list1):\n", + " merged_list.append(list1[i])\n", + " i += 1\n", + "\n", + " while j < len(list2):\n", + " merged_list.append(list2[j])\n", + " j += 1\n", + "\n", + " return merged_list\n", + "\n", + "# Example usage:\n", + "list1 = [1, 3, 8, 10]\n", + "list2 = [2, 3, 9]\n", + "result = merge_sorted_lists(list1, list2)\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "id": "85357f95", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example: Selection sort\n", + "\n", + "Implement the selection sort which is described as pseudo-code below:\n", + " \n", + "- Start with an unsorted list of elements.\n", + "- Find the smallest element in the unsorted portion of the list.\n", + "- Swap this smallest element with the first element in the unsorted portion.\n", + "- Now, consider the remaining unsorted portion (excluding the element that was just swapped).\n", + "- Repeat steps 2 to 4 until the entire list is sorted.\n", + "- The result is a sorted list in ascending order.\n", + "- The key idea is to repeatedly select the smallest element from the unsorted part of the list and move it to the beginning of the sorted part of the list. This process continues until the entire list is sorted.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "ef34203d", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example: Selection sort (cont.)" + ] + }, + { + "cell_type": "code", + "execution_count": 116, + "id": "311b2a95", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[17, 20, 26, 31, 44, 54, 55, 77, 93]\n" + ] + } + ], + "source": [ + "def selectionSort(l):\n", + " for i in range(0, len(l)):\n", + " min = i\n", + " for j in range(i+1, len(l)):\n", + " if(l[j] < l[min]):\n", + " min = j\n", + " tmp = l[i]\n", + " l[i] = l[min]\n", + " l[min] = tmp\n", + " return l \n", + "\n", + "if __name__==\"__main__\": \n", + " liste = [54,26,93,17,77,31,44,55,20]\n", + " selectionSort(liste)\n", + " print(liste) # [17, 20, 26, 31, 44, 54, 55, 77, 93]\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "75b4a494", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Complexity is on the order of $\\mathcal{O}(n^{2})$." + ] + }, + { + "cell_type": "markdown", + "id": "c1afcc1d", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Complexity Calculation\n", + "\n", + "There isn't just one but several methods to calculate the complexity of an algorithm, depending on its properties (and the desired precision of the complexity). Here are the main approaches:\n", + "\n", + "- **Reduction of the code to a known case** and combination of complexities. For example, two loops ($O(\\log N)$) result in an overall complexity of $O(n^{2} \\log(n))$.\n", + "\n", + "- **Reduction to a family of known functions** and calculation of the relative growth rate (limit).\n", + "\n", + "- **Empirical calculation by displaying execution times** as a function of the problem size. It's worth noting that this is independent of the power of the machine.\n" + ] + }, + { + "cell_type": "markdown", + "id": "204152e1", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Data structures\n" + ] + }, + { + "cell_type": "markdown", + "id": "282ed691", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Standard data structures\n", + "\n", + "Included in Python ([documentation](https://docs.python.org/3/tutorial/datastructures.html))\n", + "\n", + "\n", + "- `int`: Integer, typically 4 bytes in size.\n", + "- `long`: Long integer, can be 4 or 8 bytes in size.\n", + "- `float`: Real number.\n", + "- `str`: String, a sequence of characters (with Unicode conversion).\n", + "- `bool`: Boolean, representing True or False.\n", + "- `tuple`: Tuple, an ordered collection of elements, e.g., `(1, 2, \"ECL\", 3.14)`.\n", + "- `list`: List, an ordered and mutable collection of elements.\n", + "- `set`: Set, an unordered collection of unique elements.\n", + "- `dict`: Dictionary, a collection of key-value pairs, e.g., `{'small': 1, 'large': 2}`.\n", + "\n", + "You can check the data type of a variable or object \n", + "\n", + "```python\n", + "print(int)\n", + "print(type(int))\n", + "assert isinstance(3, int)\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "id": "b0dfbe40", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Standard data structures (cont.)\n", + "\n", + "- `range`: A range, representing a sequence of values to generate (in Python 2, `xrange()`).\n", + "- `complex`: Complex number, e.g., `1j` is one of the square roots of -1.\n", + "- `file`: File, for handling file input/output.\n", + "- `None`: Represents the absence of a value (equivalent to `void` in some contexts).\n", + "- `exception`: Exception, for handling errors and exceptional conditions.\n", + "- `function`: Function, a reusable block of code.\n", + "- `module`: Module, a file containing Python code and definitions.\n", + "- `object`: Object, a generic data type representing any Python object.\n" + ] + }, + { + "cell_type": "markdown", + "id": "01b8a90c", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Advanced data structures\n", + "\n", + "Not included in Python, often achieved using standard structure and object-oriented programming:\n", + "\n", + "- **Linked Lists**: A data structure where elements are linked together with pointers, allowing for efficient insertions and deletions but not direct access to elements by index.\n", + "\n", + "- **Stacks**: A linear data structure that follows the Last-In-First-Out (LIFO) principle, commonly used for managing function calls, undo operations, and parsing expressions.\n", + "\n", + "- **Queues**: A linear data structure that follows the First-In-First-Out (FIFO) principle, used for tasks such as managing tasks in a print queue or breadth-first search in graphs.\n", + "\n", + "- **Priority Queue**: A data structure that stores elements with associated priorities and allows for efficient retrieval of the element with the highest (or lowest) priority." + ] + }, + { + "cell_type": "markdown", + "id": "ce212ed1", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Advanced data structures (cont.)\n", + "- **Heaps**: A specialized tree-based data structure that is often used to implement priority queues. It ensures that the highest (or lowest) priority element can be efficiently accessed.\n", + "\n", + "- **Deques (Double-Ended Queues)**: A linear data structure that allows elements to be added or removed from both ends with constant-time complexity, useful for certain algorithms and data management.\n", + "\n", + "- **Trees**: A hierarchical data structure with a root node and child nodes, commonly used for various purposes such as binary search trees, AVL trees, and decision trees.\n", + "\n", + "- **Graphs**: A non-linear data structure consisting of nodes and edges, used for modeling relationships between objects or entities. Python provides libraries like NetworkX for graph manipulation.\n", + "\n", + "- **Hash Tables (Dictionaries)**: A data structure that allows efficient key-value mapping and retrieval. Python's built-in `dict` type is an example.\n" + ] + }, + { + "cell_type": "markdown", + "id": "8982a34a", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Data structures complexity\n", + "\n", + "\n", + "- **List:** Lists in Python offer dynamic resizing and allow for constant-time access to elements by index. However, they may have linear time complexity for operations like insertion or deletion in the middle of the list due to shifting elements.\n", + "\n", + "- **Dictionary:** Python dictionaries, implemented as hash tables, provide constant-time average-case complexity for key-based operations such as insertion, retrieval, and deletion. However, the worst-case scenario can lead to linear time complexity.\n", + "\n", + "- **Set:** Sets in Python have efficient average-case time complexity for set operations like union, intersection, and difference, which is often close to constant time. However, in rare cases, these operations may exhibit linear time complexity.\n", + "\n", + "Understanding the complexities of these built-in data structures is essential for selecting the right one for specific programming tasks and optimizing the performance of Python programs." + ] + }, + { + "cell_type": "markdown", + "id": "87538ea2", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Dictionnaries\n", + "\n", + "\n", + "A **dictionary** in Python is an unordered collection of key-value pairs. It is a versatile data structure that allows you to store and retrieve values based on unique keys. Unlike lists or arrays, which use integer indices, dictionaries use keys to access their elements.\n", + "\n", + "- **Keys** in a dictionary must be unique and immutable, meaning you can use strings, numbers, or tuples as keys, but not lists or other dictionaries.\n", + "- **Values** can be of any data type, including strings, numbers, lists, other dictionaries, or even functions.\n", + "\n", + "Dictionaries are useful for a wide range of applications, such as:\n", + "\n", + "- Storing and retrieving configuration settings.\n", + "- Counting the frequency of elements in a dataset.\n", + "- Representing data in a structured way, such as JSON.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "9036ba27", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example: Creating a Dictionary in Python\n", + "\n", + "\n", + "```python\n", + ">>> phonebook = {'bob': 7387, 'alice': 3719, 'jack': 7052}\n", + ">>> phonebook['alice']\n", + "3719\n", + "```\n", + "\n", + "- Implemented as a Python dictionary.\n", + "- Raises a `KeyError: 'missing'` exception if accessing an undefined key.\n", + "- A good practice is to use `.get(\"attr\", \"\")` to return a default value if the key doesn't exist.\n", + "- We will see that they are widely used for memoization to avoid recomputing certain calculations (e.g., dynamic programming).\n" + ] + }, + { + "cell_type": "markdown", + "id": "192c9168", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example: Creating a Dictionary in Python\n", + "\n", + "Here's an example of how to create a dictionary in Python:\n", + "\n", + "```python\n", + "# Create a dictionary to store information about a person\n", + "person = {\n", + " \"name\": \"John Doe\",\n", + " \"age\": 30,\n", + " \"city\": \"New York\"\n", + "}\n", + "\n", + "# Access values using keys\n", + "print(\"Name:\", person[\"name\"])\n", + "print(\"Age:\", person[\"age\"])\n", + "print(\"City:\", person[\"city\"])\n", + "```\n", + "\n", + "In this example, we've created a dictionary named `person` that contains information about an individual. We access the values stored in the dictionary using their respective keys.\n", + "\n", + "Output:\n", + "```\n", + "Name: John Doe\n", + "Age: 30\n", + "City: New York\n", + "```\n", + "." + ] + }, + { + "cell_type": "markdown", + "id": "3cc3715e", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Question: Count words in a list (using a dictionnary)\n", + "\n", + "Write an algorithm that takes two parameters:\n", + "- `stri`: A list of words.\n", + "- `n`: An integer.\n", + "\n", + "And returns how many words in the list appear exactly `n` times, and return that count.\n" + ] + }, + { + "cell_type": "markdown", + "id": "db5cc56c", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Question: Count words in a list (using a dictionnary)" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "id": "d5b559aa", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2\n" + ] + } + ], + "source": [ + "def countWords(stri, n): \n", + " \n", + " m = dict() \n", + " for w in stri: # m {'hate': 2, 'love': 4, 'peace': 4}\n", + " m[w] = m.get(w, 0) + 1\n", + "\n", + " res = 0\n", + " for i in m.values(): \n", + " if i == n: \n", + " res += 1\n", + "\n", + " return res \n", + "\n", + "if __name__==\"__main__\": \n", + " # Driver code \n", + " s = [ \"hate\", \"love\", \"peace\", \"love\", \n", + " \"peace\", \"hate\", \"love\", \"peace\", \"love\", \"peace\" ] \n", + "\n", + " print(countWords(s, 4)) # 2" + ] + }, + { + "cell_type": "markdown", + "id": "35a76856", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: detect duplicates from a list (using dicts)\n", + "\n", + "Write an algorithm validates the following:\n", + "\n", + "```python\n", + "assert duplicatas([1,2]) == False\n", + "assert duplicatas([1,2,1]) == True\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "id": "67336f28", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def duplicatas(L):\n", + " d = {}\n", + " for x in L:\n", + " if x in d:\n", + " return True\n", + " d[x] = True\n", + " return False\n", + "\n", + "assert duplicatas([1,2]) == False\n", + "assert duplicatas([1,2,1]) == True" + ] + }, + { + "cell_type": "markdown", + "id": "3fb60306", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: algorithm optimization (using dicts)\n", + "\n", + "Optimize this algorithm all integers such that $A^2 + B^2 = C^2 + D^2$ with A, B, C, D ranging from 1 to 1000.\n", + "\n", + "```python\n", + "n = 1000\n", + "for a in range(1, n+1):\n", + " for b in range(1, n+1):\n", + " for c in range(1, n+1):\n", + " for d in range(1, n+1):\n", + " if a**2 + b**2 == c**2 + d**2:\n", + " print(a, b, c, d)\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "247972d9", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: algorithm optimization (using dicts) (cont.)\n", + "\n", + "```python\n", + "n = 1000\n", + "result_map = {}\n", + "\n", + "for c in range(1, n+1):\n", + " for d in range(1, n+1):\n", + " result = c**2 + d**2\n", + " if result in result_map:\n", + " result_map[result].append((c, d))\n", + " else:\n", + " result_map[result] = [(c, d)]\n", + "\n", + "for a in range(1, n+1):\n", + " for b in range(1, n+1):\n", + " result = a**2 + b**2\n", + " if result in result_map:\n", + " matching_pairs = result_map[result]\n", + " for pair in matching_pairs:\n", + " print(a, b, pair)\n", + "\n", + "```\n", + "\n", + "- A first loop uses a dictionary `result_map` to store pairs $(c, d)$ that yield the same result $c^2 + d^2$.\n", + "- A second loop iterates through $a^2 + b^2$ values and checks if there are matching pairs in `result_map`.\n" + ] + }, + { + "cell_type": "markdown", + "id": "c82d4d90", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Sets\n", + "\n", + "A **set** in Python is an unordered collection of unique elements. It is similar to a mathematical set and has several important characteristics:\n", + "\n", + "1. **Uniqueness**: Sets do not allow duplicate elements. If you try to add a duplicate element to a set, it will be ignored.\n", + "\n", + "2. **Unordered**: Unlike lists or tuples, sets do not have a specific order. The elements are not stored in any particular sequence, and you cannot access them by index.\n", + "\n", + "3. **Mutable**: Sets are mutable, which means you can add or remove elements after creating a set.\n", + "\n", + "4. **No Indexing**: Since sets are unordered, you cannot access elements by their index. Instead, you typically perform operations on sets as a whole.\n", + "\n", + "5. **Common Set Operations**: Sets support various set operations such as union, intersection, difference, and more, making them useful for mathematical and data manipulation tasks.\n" + ] + }, + { + "cell_type": "markdown", + "id": "bca7c805", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Sets (cont.)\n", + "\n", + "```python\n", + "# Creating a set\n", + "my_set = {1, 2, 3, 4, 5}\n", + "\n", + "# Creating an empty set\n", + "empty_set = set()\n", + "```\n", + "\n", + "Common set operations include:\n", + "\n", + "- **Adding Elements**: You can add elements to a set using the `add()` method.\n", + "\n", + "- **Removing Elements**: Elements can be removed from a set using the `remove()` or `discard()` method.\n", + "\n", + "- **Set Operations**: You can perform operations like union (`|`), intersection (`&`), difference (`-`), and more between sets.\n", + "\n", + "- **Checking Membership**: You can check if an element is in a set using the `in` operator.\n", + "\n", + "- **Iterating**: You can iterate through the elements of a set using a `for` loop.\n", + "\n", + "Sets are commonly used for tasks where uniqueness and set operations are essential." + ] + }, + { + "cell_type": "markdown", + "id": "cf4fcdf0", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Set Operations in Python\n", + "\n", + "| Method | Description |\n", + "|-----------------------|--------------------------------|\n", + "| `add()` | Adds an element to the set. |\n", + "| `clear()` | Removes all elements from the set. |\n", + "| `copy()` | Returns a copy of the set. |\n", + "| `difference()` | Returns the difference of two sets. |\n", + "| `intersection()` | Returns the intersection of two sets. |\n", + "| `pop()` | Removes and returns a random element from the set. |\n", + "| `union()` | Returns the union of two sets. |\n", + "| `isdisjoint()` | Returns `True` if the sets have no elements in common. |\n", + "| `issubset()` | Returns `True` if the set is a subset of another set. |\n", + "| `issuperset()` | Returns `True` if the set contains another set. |\n", + "\n", + "There are many other set operations available in Python, and `frozenset` can be used to create an immutable set.\n", + "\n", + "For more details, refer to the [Python documentation](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset).\n" + ] + }, + { + "cell_type": "markdown", + "id": "e3fce712", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: remove duplicatas from a list (using sets)\n", + "\n", + "Write an algorithm validates the following:\n", + "\n", + "```python\n", + "assert duplicatas_sets([1,2]) == False\n", + "assert duplicatas_sets([1,2,1]) == True\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "id": "19e2c4ab", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def duplicatas_sets(L):\n", + "\ts = set()\n", + "\tfor x in L:\n", + "\t\tif x in s:\n", + "\t\t\treturn True\n", + "\t\ts.add(x)\n", + "\treturn False" + ] + }, + { + "cell_type": "code", + "execution_count": 141, + "id": "aec4421f", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def duplicatas_sets2(nums):\n", + " return True if len(set(nums)) < len(nums) else False\n", + " \n", + "assert duplicatas_sets2([1,2]) == False\n", + "assert duplicatas_sets2([1,2,1]) == True" + ] + }, + { + "cell_type": "markdown", + "id": "61b23935", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find pairs duplicates (using sets) \n", + "\n", + "In a list, return the values that occure exactly 2 times." + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "id": "632fd6b4", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[(2, 2), (3, 3), (5, 5)]\n" + ] + } + ], + "source": [ + "def find_duplicate_pairs_optimized(lst):\n", + " seen = set()\n", + " duplicate_pairs = []\n", + "\n", + " for num in lst:\n", + " if num in seen:\n", + " duplicate_pairs.append((num, num))\n", + " seen.add(num)\n", + "\n", + " return duplicate_pairs\n", + "\n", + "# Example usage:\n", + "input_list = [2, 3, 5, 2, 7, 3, 8, 5]\n", + "result = find_duplicate_pairs_optimized(input_list)\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "id": "7c958cd9", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Exercice: find words typed with a single row on a keyboard (using sets)\n", + "\n", + "You can determine words that can be typed with a single row of letters on a keyboard using sets in Python.\n", + "\n", + "```python\n", + "words = ['Velo', 'Ecole', 'Informatique', 'Etroit']\n", + "check_keyboard(words) == ['Etroit'] # for a French keyboard\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 151, + "id": "e3686496", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Etroit']\n" + ] + } + ], + "source": [ + "def check_keyboard(words):\n", + " result = []\n", + " for w in words:\n", + " ws = set([c.lower() for c in w])\n", + " if not ws.difference(\"azertyuiop\") \\\n", + " or not ws.difference(\"qsdfghjklm\") \\\n", + " or not ws.difference(\"wxcvbn\"):\n", + " result.append(w)\n", + " return result\n", + "\n", + "typed_with_single_row = solution(words)\n", + "print(typed_with_single_row)" + ] + }, + { + "cell_type": "markdown", + "id": "8865e8a6", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "<img src=\"figures/complexite-arrays.png\" width=75%>" + ] + }, + { + "cell_type": "markdown", + "id": "995c354c", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "<img src=\"figures/complexite-data-structures.png\" width=75%>" + ] + }, + { + "cell_type": "markdown", + "id": "b6e68dc5", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Empirical complexity analysis" + ] + }, + { + "cell_type": "markdown", + "id": "0a04e8e2", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Empirical complexity analysis\n", + "\n", + "A practical way to estimate complexity\n", + "\n", + "1. **Gather data** on the execution time of algorithms or operations for various input sizes. This data is typically collected through various random measurements.\n", + "\n", + "2. **Plot the time measures** for the various measurements, for each algorithm to assess performance scales.\n", + "\n", + "3. **Analyzing trends** to draw conclusions about the algorithm's time complexity by observing curves in the plotted data.\n", + "\n", + "Using the matplotlib library (to be imported as a module):" + ] + }, + { + "cell_type": "code", + "execution_count": 146, + "id": "273e0647", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "id": "80c00461", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example: constant time" + ] + }, + { + "cell_type": "code", + "execution_count": 145, + "id": "76f6f476", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[<matplotlib.lines.Line2D at 0x11774cd30>]" + ] + }, + "execution_count": 145, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "<Figure size 640x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "steps = []\n", + "def constant(n):\n", + " return 1\n", + " \n", + "for i in range(1, 100):\n", + " steps.append(constant(i))\n", + "plt.plot(steps)" + ] + }, + { + "cell_type": "markdown", + "id": "fe0a6c66", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### Example: linear time" + ] + }, + { + "cell_type": "code", + "execution_count": 150, + "id": "6e6c4334", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'Steps')" + ] + }, + "execution_count": 150, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "<Figure size 640x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "steps = []\n", + "def linear(n):\n", + " return n\n", + " \n", + "for i in range(1, 100):\n", + " steps.append(linear(i))\n", + " \n", + "plt.plot(steps)\n", + "plt.xlabel('Inputs')\n", + "plt.ylabel('Steps')" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "id": "54dcf1fa", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "import time\n", + "import random\n", + "import numpy as np\n", + "#%matplotlib inline\n", + "\n", + "nvalues = [100, 500, 1000, 1500, 2000, 2500, 3000]\n", + "timesAlgo = []\n", + "\n", + "for i in nvalues:\n", + "\n", + " random.seed()\n", + " p = 12**2 # magnitude of values\n", + " liste = []\n", + " \n", + " for x in range(i): liste.append(random.randint(0, p))\n", + "\n", + " a=time.perf_counter()\n", + " e1 = []\n", + " for n in liste:\n", + " e1.append(n)\n", + " b = time.perf_counter()\n", + " timesAlgo.append(b-a)" + ] + }, + { + "cell_type": "code", + "execution_count": 128, + "id": "340dc177", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "<Figure size 640x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(nvalues, timesAlgo, \"r-\", label=\"Algo 1\")\n", + "plt.title(\"Complexity/Perf comparison\")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8593238b", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "import time\n", + "import random\n", + "\n", + "def measure_sorting_time(sorting_function, lst):\n", + " a = time.perf_counter()\n", + " sorting_function(lst)\n", + " b = time.perf_counter()\n", + " return b - a\n", + "\n", + "nvalues = [100, 500, 1000, 1500, 2000, 2500, 3000]\n", + "timesAlgo = []\n", + "\n", + "for i in nvalues:\n", + " random.seed()\n", + " p = 12**2 # Magnitude of values\n", + " lst = [random.randint(0, p) for x in range(i)]\n", + "\n", + " time_python_sort = measure_sorting_time(sorted, lst.copy())\n", + " time_selection_sort = measure_sorting_time(selectionSort, lst.copy())\n", + " # add more sorting algorithms\n", + " \n", + " timesAlgo.append((time_python_sort, time_selection_sort))\n", + "\n", + "python_sort_times = [t[0] for t in timesAlgo]\n", + "selection_sort_times = [t[1] for t in timesAlgo]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 148, + "id": "fb711c57", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "<Figure size 640x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Plot the results\n", + "plt.plot(nvalues, python_sort_times, marker='o', linestyle='-', color='b', label='Python Built-in Sort')\n", + "plt.plot(nvalues, selection_sort_times, marker='o', linestyle='-', color='g', label='Selection Sort (Custom)')\n", + "plt.xlabel('Input Size (n)')\n", + "plt.ylabel('Time (seconds)')\n", + "plt.title('Comparison of Sorting Algorithms')\n", + "plt.legend()\n", + "plt.grid()\n", + "plt.show()" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/03-lists-search-sort.ipynb b/03-lists-search-sort.ipynb index b1084b9cd440f98ae9fc14c507e8fcc85612c262..cc8c8cd05f260bebfab488daa464b7188909ab52 100644 --- a/03-lists-search-sort.ipynb +++ b/03-lists-search-sort.ipynb @@ -597,25 +597,6 @@ "order_by_alphabetical_order(\"cherry\", \"banana\")" ] }, - { - "cell_type": "code", - "execution_count": 53, - "id": "a0c076fa", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "66" - ] - }, - "execution_count": 53, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [] - }, { "cell_type": "markdown", "id": "ee60b8ff", @@ -764,9 +745,17 @@ " print(item)" ] }, + { + "cell_type": "markdown", + "id": "c5bf0281", + "metadata": {}, + "source": [ + "Fibonacci with generators" + ] + }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 15, "id": "ff445529", "metadata": {}, "outputs": [], @@ -832,7 +821,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 16, "id": "53534019", "metadata": { "slideshow": { @@ -849,10 +838,8 @@ } ], "source": [ - "# Initialize an empty linked list\n", "linked_list = None\n", "\n", - "# Function to append data to the linked list\n", "def append(data):\n", " global linked_list\n", " if linked_list is None:\n", @@ -863,7 +850,6 @@ " current = current[\"next\"]\n", " current[\"next\"] = {\"data\": data, \"next\": None}\n", "\n", - "# Function to traverse and print the linked list\n", "def traverse():\n", " current = linked_list\n", " while current:\n", @@ -871,13 +857,11 @@ " current = current[\"next\"]\n", " print(\"None\")\n", "\n", - "# Append some data to the linked list\n", "append(1)\n", "append(2)\n", "append(3)\n", "\n", - "# Print the linked list\n", - "traverse()\n" + "traverse()" ] }, { @@ -973,14 +957,6 @@ "list(map(lambda x : x * x, x))" ] }, - { - "cell_type": "code", - "execution_count": null, - "id": "fb30b1ce", - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "code", "execution_count": 15, @@ -1075,19 +1051,19 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 17, "id": "601e1691", "metadata": {}, "outputs": [ { - "ename": "ValueError", - "evalue": "10 is not in list", + "ename": "NameError", + "evalue": "name 'L' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[34], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mL\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mindex\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m10\u001b[39;49m\u001b[43m)\u001b[49m\n", - "\u001b[0;31mValueError\u001b[0m: 10 is not in list" + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[17], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mL\u001b[49m\u001b[38;5;241m.\u001b[39mindex(\u001b[38;5;241m10\u001b[39m)\n", + "\u001b[0;31mNameError\u001b[0m: name 'L' is not defined" ] } ], diff --git a/04-05-06-programming-strategies.ipynb b/04-05-06-programming-strategies.ipynb index f1d734f362c05b95f6906995c1c4616a6889b547..4c0ca2272abf686e33a3b8b3fe4a706f78f5cd6f 100644 --- a/04-05-06-programming-strategies.ipynb +++ b/04-05-06-programming-strategies.ipynb @@ -665,7 +665,7 @@ "- Fibonacci Sequence\n", "- Rod Cutting\n", "- Sequence Alignment, Longest Subsequence Finding\n", - "- Shortest Path Findin" + "- Shortest Path Finding" ] }, { @@ -810,7 +810,7 @@ "source": [ "## Rod cutting \n", "\n", - "_Given a list of cuts and prices, identify the optimal cuts. Given the example below, what is the best cutting strategy for a rod of size 2?_\n", + "_Given a list of cuts and prices, identify the optimal cuts. Given the example below, what is the best cutting strategy for a rod of size `4`?_\n", "\n", "<img src=\"figures/rod-cutting.png\" style=\"width:500px\">" ] @@ -828,9 +828,13 @@ { "cell_type": "markdown", "id": "47119bfb", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, "source": [ - "Solution: $2$ with $5 + 5 = 10$." + "Solution: For a rod of size `4` optimal solution is 2 cuts of size 2 so $5 + 5 = 10$." ] }, { @@ -1037,7 +1041,7 @@ "\n", "Now, we can calculate `V_3` using the values of `V_2` and `V_1`:\n", "\n", - "$$V_3 = \\max(p_1 + V_2, p_2 + V_1, p_3 + V_0) = \\max(2 + 5, 5 + 2, 9 + 0) = \\max(7, 7, 9) = 9$$" + "$$V_3 = \\max(p_1 + V_2, p_2 + V_1, p_3 + V_0) = \\max(1 + 5, 5 + 2, 9 + 0) = \\max(6, 7, 8) = 8$$" ] }, { @@ -1093,6 +1097,36 @@ " print(\"Max size cut \" + str(cutRod(arr, size)), len(arr) ) " ] }, + { + "cell_type": "markdown", + "id": "c42683cc", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Change-making problem (dynamic programming)\n", + "\n", + "\n", + "$Q_{opt}(S,M) = min \\ \\sum_{i=1}^n x_i$.\n", + " \n", + "$S$: all the available coins\n", + " \n", + "$M$: amount\n", + " \n", + "$\n", + " Q_{opt}(i,m) = min\n", + "\\begin{cases}\n", + " 1 + Q_{opt}(i, m - v_i) \\quad si \\ (m - v_i) \\geq 0 \\qquad \\text{we use a coin of type $i$ of value $v_i$}\\\\\n", + "Q_{opt}(i-1, m) \\qquad \\quad si \\ i \\geq 1 \\qquad \\qquad \\quad \\text{we do not use coin of type $i$, \n", + "we use $i-1$}\n", + "\\end{cases}\n", + "$\n", + "\n", + "<img src=\"figures/coins-changing.png\" style=\"width:500px\">\n" + ] + }, { "cell_type": "markdown", "id": "fd3e3af7", diff --git a/07-stacks-queues.ipynb b/07-stacks-queues.ipynb index e41de9135de4815f4639c2a95a4bb5861bfb8cf5..740b4b6eae83683ab8aa69dd02070d31405d62b2 100644 --- a/07-stacks-queues.ipynb +++ b/07-stacks-queues.ipynb @@ -13,9 +13,18 @@ "## Lecture 7: Stacks and queues\n", "### Ecole Centrale de Lyon, Bachelor of Science in Data Science for Responsible Business\n", "#### Romain Vuillemot\n", - "<center><img src=\"figures/Logo_ECL.png\" style=\"width:300px\"></center>\n", - "\n", - "\n", + "<center><img src=\"figures/Logo_ECL.png\" style=\"width:300px\"></center>" + ] + }, + { + "cell_type": "markdown", + "id": "4dfe56e7", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ "---" ] }, @@ -192,14 +201,20 @@ } }, "source": [ - "### Stacks (using OOP)" + "### Stacks (using OOP)\n", + "\n", + "_Internally, will be based on an `Array` structure._" ] }, { "cell_type": "code", "execution_count": 17, "id": "8ae9a611", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, "outputs": [], "source": [ "class Stack():\n", diff --git a/exercises/01-data-structures-complexity-exercises.ipynb b/exercises/01-data-structures-complexity-exercises.ipynb index 0be737d823b8270c6f727b80d3d3dfc798baae8e..50f1d8fdd67cb0271978a336e741d5ad527d2efd 100644 --- a/exercises/01-data-structures-complexity-exercises.ipynb +++ b/exercises/01-data-structures-complexity-exercises.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "59aef681", + "id": "fa46e55d", "metadata": {}, "source": [ "# UE5 Fundamentals of Algorithms\n", @@ -23,7 +23,7 @@ { "cell_type": "code", "execution_count": null, - "id": "88952ac5", + "id": "4c1206d1", "metadata": {}, "outputs": [], "source": [ @@ -33,7 +33,7 @@ }, { "cell_type": "markdown", - "id": "b779c274", + "id": "6157338f", "metadata": {}, "source": [ "---" diff --git a/exercises/02-recursion-exercises.ipynb b/exercises/02-recursion-exercises.ipynb index 5f854e39a6f1e92ed01267baa4bfea467c61cfb1..5d727a2d641bef5dc48ae5be44efd40f8253ab28 100644 --- a/exercises/02-recursion-exercises.ipynb +++ b/exercises/02-recursion-exercises.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "a14c054e", + "id": "01d26f8d", "metadata": {}, "source": [ "# UE5 Fundamentals of Algorithms\n", @@ -23,7 +23,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b13780a7", + "id": "a6f5bcf0", "metadata": {}, "outputs": [], "source": [ @@ -33,7 +33,7 @@ }, { "cell_type": "markdown", - "id": "67ead887", + "id": "51dc6634", "metadata": {}, "source": [ "---" diff --git a/exercises/03-lists-search-sort-exercises.ipynb b/exercises/03-lists-search-sort-exercises.ipynb index 2ff574110dcfc0a1b629bd8cddc1b1781ed34e55..d64010d6f3467b5100f3b6b1aad2839c4abb411f 100644 --- a/exercises/03-lists-search-sort-exercises.ipynb +++ b/exercises/03-lists-search-sort-exercises.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "f8151910", + "id": "7989517f", "metadata": {}, "source": [ "# UE5 Fundamentals of Algorithms\n", @@ -23,7 +23,7 @@ { "cell_type": "code", "execution_count": null, - "id": "94c8f6d3", + "id": "c659f857", "metadata": {}, "outputs": [], "source": [ @@ -33,7 +33,7 @@ }, { "cell_type": "markdown", - "id": "9e5bf33f", + "id": "a58b61ff", "metadata": {}, "source": [ "---" diff --git a/exercises/04-05-06-programming-strategies-exercises.ipynb b/exercises/04-05-06-programming-strategies-exercises.ipynb index e79716bbd73ef9d85a0250452eb7e90e62b6d225..1a37afee2497791d8d64044d5eed09612ce1f6ff 100644 --- a/exercises/04-05-06-programming-strategies-exercises.ipynb +++ b/exercises/04-05-06-programming-strategies-exercises.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "8fa9ea2e", + "id": "c271b1aa", "metadata": {}, "source": [ "# UE5 Fundamentals of Algorithms\n", @@ -23,7 +23,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6017bb23", + "id": "757d4143", "metadata": {}, "outputs": [], "source": [ @@ -33,7 +33,7 @@ }, { "cell_type": "markdown", - "id": "7dd0e72c", + "id": "451d6862", "metadata": {}, "source": [ "---" @@ -599,7 +599,7 @@ }, "outputs": [], "source": [ - "intervals = [(1, 3), (2, 4), (3, 5), (5, 7), (6, 8)]" + "interval_scheduling([(0, 2), (2, 4), (1, 3)])" ] }, { @@ -611,7 +611,7 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "a1c5ab0cb1fbb052b5567e17549a9723", + "checksum": "d8a22346b92125746d141cee329408f1", "grade": true, "grade_id": "correct_interval_scheduling", "locked": true, @@ -624,7 +624,19 @@ }, "outputs": [], "source": [ - "assert interval_scheduling(intervals) == [(1, 3), (3, 5), (5, 7)]" + "assert interval_scheduling([(0, 2), (2, 4), (1, 3)]) == [(0, 2), (2, 4)]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de2c9857-9925-4477-96e4-341fa5bc4ec8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert interval_scheduling([(0, 2), (2, 4), (1,3)]) == [(0, 2), (2, 4)]" ] }, { @@ -741,7 +753,7 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "4d258685229c335d689928bcd6c973ec", + "checksum": "959fdc68a051b0090f4ead159f3356df", "grade": false, "grade_id": "cell-0bf75b2bfe8e2e4c", "locked": false, @@ -753,7 +765,7 @@ }, "outputs": [], "source": [ - "def greedy_knapsack(W, wt):\n", + "def greedy_knapsack(W, w):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] @@ -849,6 +861,52 @@ "source": [ "assert dynamic_knapsack(max_weight, weights) == (5, [3, 2])" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef953a29-7632-4475-8230-54a8a110d19a", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7dd8b57-0d0d-43d0-b228-b9c994043f81", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a066e026-8a09-46ca-a919-53bc90c8a308", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "80c70549-29c1-49ff-a7de-f47dbce004a9", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "greedy_intevals([(0, 2), (2, 4), (1,3)])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41e37d95-4e71-42f0-81dd-fd6122fc9023", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/exercises/07-stacks-queues-exercises.ipynb b/exercises/07-stacks-queues-exercises.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..cfbe2107bfebf771dbefd59816f4083bf69beb28 --- /dev/null +++ b/exercises/07-stacks-queues-exercises.ipynb @@ -0,0 +1,444 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "49d0b44c", + "metadata": {}, + "source": [ + "# UE5 Fundamentals of Algorithms\n", + "## Exercices\n", + "### Ecole Centrale de Lyon, Bachelor of Science in Data Science for Responsible Business\n", + "#### [Romain Vuillemot](https://romain.vuillemot.net/)\n", + "\n", + "Before you turn this problem in:\n", + "- make sure everything runs as expected. \n", + " - first, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) \n", + " - then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n", + "- make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"\n", + "- remove `raise NotImplementedError()` to get started with your answer\n", + "- bonus points (at the end of this notebook) are optionals\n", + "- write your name (and collaborators as a list if any) below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a884873", + "metadata": {}, + "outputs": [], + "source": [ + "ID = \"\"\n", + "COLLABORATORS_ID = []" + ] + }, + { + "cell_type": "markdown", + "id": "c7232915", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "2f1f2dcd-96a9-45ef-90a6-4ad488635679", + "metadata": {}, + "source": [ + "# Stacks and queues" + ] + }, + { + "cell_type": "markdown", + "id": "b9bd540c-dd15-49ac-bfbd-f2e758688a85", + "metadata": { + "tags": [] + }, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "03a0653e-65c2-4e79-9e83-31765cf19098", + "metadata": {}, + "source": [ + "## Exercise 1: Reverse a string using a Stack\n", + "\n", + "_Use the `Stack` below to reverse a string given as input._" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4932473d-2734-4e81-b777-ca10decfd9e8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "class Stack:\n", + " def __init__(self):\n", + " self.items = []\n", + "\n", + " def push(self, item):\n", + " self.items.append(item)\n", + "\n", + " def pop(self):\n", + " if not self.is_empty():\n", + " return self.items.pop()\n", + "\n", + " def peek(self):\n", + " if not self.is_empty():\n", + " return self.items[-1]\n", + "\n", + " def is_empty(self):\n", + " return len(self.items) == 0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b77ae34-ef7c-4664-94e0-8928156f2224", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "128a37273e0e5da052abe4bf08bb1c27", + "grade": false, + "grade_id": "cell-5b0828e97507162e", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def reverse_string(s):\n", + " # YOUR CODE HERE\n", + " raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "63719c8e-f60c-4544-8e41-cb6380ae4bcf", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "reverse_string(\"Hello\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81e93620-0664-4a9d-ba5f-894937c9769e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert reverse_string(\"Hello\") == \"olleH\" " + ] + }, + { + "cell_type": "markdown", + "id": "81df9b1e-cfe5-4b69-96a5-c8065259cc7d", + "metadata": {}, + "source": [ + "## Exercise 2: Check if a word is a palindrom (using a Stack)\n", + "_A palindrome is a sequence of characters that reads the same forward and backward._" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf6fbdd5-53c5-45c2-a0c5-a5ed845c4f81", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "048d788477e620bf78240329c0dd8771", + "grade": false, + "grade_id": "is_palindrome", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def is_palindrome(s):\n", + " # YOUR CODE HERE\n", + " raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "586bafba-2fbb-4833-b2e3-609db9b28fbf", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "is_palindrome(\"ABA\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d0005a10-9152-4aa5-a94b-fcbff1bd2281", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "4165b33ba9b75546b0edd15216e61e4f", + "grade": true, + "grade_id": "correct_is_palindrome", + "locked": true, + "points": 0, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "assert is_palindrome(\"ABA\")" + ] + }, + { + "cell_type": "markdown", + "id": "f767bf25-9f4f-4a0d-8cb9-b729bbec5c27", + "metadata": {}, + "source": [ + "## Exercise 3: Implement a min-heap\n", + "\n", + "Use a `PriorityQueue` to return the smallest element when using `pop` of a stack (or a queue). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ddcccaf6-d235-4327-826f-7a62a4c23f28", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from queue import PriorityQueue" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2da2db1e-f55d-43b4-877f-96ef944818e8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# how to use the modue\n", + "priority_queue = PriorityQueue()\n", + "priority_queue.put((3, 'apple'))\n", + "priority_queue.put((1, 'banana'))\n", + "priority_queue.put((2, 'cherry'))\n", + "element = priority_queue.get()\n", + "print(element)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "804ea32d-5bf8-42b9-ae52-6318b26f4065", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "7f6b90fc037aa2a24fa9ce3b4dfca6dd", + "grade": false, + "grade_id": "cell-4b9a5ecdee87514e", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1b2d28c4-277b-44fa-b7e8-590aa00f8f70", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "min_heap = MinHeap()\n", + "min_heap.insert(5)\n", + "min_heap.insert(3)\n", + "min_heap.insert(8)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed61bced-f000-41c6-8ecd-d669b4edb700", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert min_heap.pop() == 3\n", + "assert min_heap.peek() == 5\n", + "assert min_heap.peek() == 5" + ] + }, + { + "cell_type": "markdown", + "id": "a445d290-b04f-49b5-a8e7-2c6e259daf58", + "metadata": { + "tags": [] + }, + "source": [ + "## Exercise 4: Evaluate a postfix expression\n", + "\n", + "_Write a code that given the following expression, provides the following evaluation (using arthmetic operations over numerical values)._\n", + "\n", + "Expression: `\"3 4 +\"`\n", + "Evaluation: `3 + 4 = 7`\n", + "\n", + "First step: write a function `apply_operator` that applies an operation (ie + - * /) over two elements." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4cc7f805-0887-4422-b6b7-3d591d0df1fb", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "eb88296bc1de3c5dd7c68059e0a071e8", + "grade": false, + "grade_id": "cell-8c5106f02f243455", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def apply_operator(op, b, a):\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "markdown", + "id": "e68bdf7c-ca08-4553-9874-8bd9038fd4b5", + "metadata": {}, + "source": [ + "Solution in pseudo-code:\n", + "- Split the input expression in to a list of tokens\n", + "- If not an operator\n", + " - Add the value to the stack\n", + "- If an operator \n", + " - Make sure there is enough parameters `a` and `b`\n", + " - Pop `a` and `b`\n", + " - Apply `apply_operator` on `a` and `b`\n", + " - Store the result in the stack" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e792c90d-1b38-47f5-9879-399debc934b9", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "73960d3c6b85c2efc0ad8e298e2649b7", + "grade": false, + "grade_id": "cell-e9236618b265b34f", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def evaluate_postfix(expression):\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea6e4840-1b7e-4265-b37d-e8c45ea6b3ed", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "postfix_expression = \"3 4 + 2 *\"\n", + "result = evaluate_postfix(postfix_expression)\n", + "print(\"Result:\", result)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0dc4dff8-089b-46a6-a08d-f53ee2fe72c3", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert evaluate_postfix(\"3 4 + 2 *\") == 14\n", + "assert evaluate_postfix(\"4 2 3 5 * + *\") == 68 # (4 * (2 + (3 * 5))\n", + "assert evaluate_postfix(\"8 4 / 6 2 * +\") == 14 # ((8 / 4) + (6 * 2))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/solutions/04-05-06-programming-strategies-exercises.ipynb b/solutions/04-05-06-programming-strategies-exercises.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..5bee7faa259005d9bec99b1e6b4b7ba18a89307c --- /dev/null +++ b/solutions/04-05-06-programming-strategies-exercises.ipynb @@ -0,0 +1,1106 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e58599e3-9ab7-4d43-bb22-aeccade424ce", + "metadata": {}, + "source": [ + "# Programming strategies" + ] + }, + { + "cell_type": "markdown", + "id": "691b3c38-0e83-4bb2-ac90-ef76d2dd9a7a", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "9d813205-b709-42ab-b414-6f3fc947022a", + "metadata": {}, + "source": [ + "## Exercise 1: tribonacci\n", + "\n", + "Here is an implementation of the Tribonacci sequence (similar to the Fibonacci) defined as:\n", + "\n", + "$T(n) = T(n-1) + T(n-2) + T(n-3)$\n", + "\n", + "_Explain the role of the `tab` variable; write some tests._" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "id": "f3b233ec-7077-479d-9c04-f1a4c35f3111", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def tribonacci(n: int) -> int:\n", + " \n", + " if(n==0):\n", + " return 0\n", + " if(n==1):\n", + " return 1\n", + " if(n==2):\n", + " return 1\n", + " \n", + " tab=[0 for i in range(n+1)]\n", + " tab[0]=0\n", + " tab[1]=1\n", + " tab[2]=1\n", + " \n", + " for i in range(3, n+1):\n", + " tab[i]=tab[i-1]+tab[i-2]+tab[i-3]\n", + " \n", + " return tab[n]" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "f280eeea-4812-4a30-80b5-3fe1cafa9283", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "274" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tribonacci(11)" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "423d2637-8bd6-4e8f-95ff-2765dae5bce7", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "cell-9e7356aa23ccfb4c", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "### BEGIN SOLUTION\n", + "assert tribonacci(0) == 0\n", + "assert tribonacci(1) == 1\n", + "assert tribonacci(2) == 1\n", + "assert tribonacci(11) == 274\n", + "### END SOLUTION" + ] + }, + { + "cell_type": "markdown", + "id": "b2b31dca-a19a-46cf-b0de-4feec6afa083", + "metadata": {}, + "source": [ + "## Exercice 2: find two numbers with a given sum\n", + "\n", + "_Find two numbers in an array `nums` that add up to a specific target value `target`. Return a list of the values index. Tip: make sure you do not use twice the same array element._" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "f5a964df-958e-4758-8ca2-1139c59a7585", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "two_sum", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def two_sum(nums, target: int):\n", + " ### BEGIN SOLUTION\n", + " for i in range(len(nums)):\n", + " for j in range(i+1,len(nums)):\n", + " if(nums[i]+nums[j]==target):\n", + " return([i,j])\n", + " return -1\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "1cec37b0-5d27-4973-b53a-053a46992c0c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 2]" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "two_sum([1, 2, 3, 4, 5], 4)" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "id": "395cb69f-b99a-4c60-88f5-11216a5ec857", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "correct_two_sum", + "locked": true, + "points": 1, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "assert two_sum([2, 7, 11, 15], 9) == [0, 1] # 2 + 7 = 9\n", + "assert two_sum([3, 2, 4], 6) == [1, 2] # 2 + 4 = 6\n", + "assert two_sum([3, 3], 6) == [0, 1] # 3 + 3 = 6\n", + "assert two_sum([3, 3], 123) == -1 # not possible" + ] + }, + { + "cell_type": "markdown", + "id": "dc4f4361-5dee-4e19-b3fd-d18ea40d341e", + "metadata": { + "tags": [] + }, + "source": [ + "## Exercice 3: find the minimum distance between two points\n", + "\n", + "_Given a list of points find the minimum distance between all the pairs of points._\n", + "\n", + "- write a `dist`function using \n", + "- define a `closest_point_naive`function using `math.inf`as initial value\n", + "\n", + "Tip: make sure you do not use the same point twice!" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "id": "f73f28a4-d39c-497d-9cbb-1c4edd1c628f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import math" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "0be305c3-c054-4092-b041-bd7e705e2178", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "cell-8738237c4c6682e4", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def dist(point1, point2):\n", + " ### BEGIN SOLUTION\n", + " return math.sqrt((point1[0] - point2[0])**2 + (point1[1] - point2[1])**2)\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "id": "c0e25948-eefd-4cb0-a893-c3d5e34ff0ee", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "cell-9ca280911ed42034", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def closest_point_naive(P):\n", + " ### BEGIN SOLUTION\n", + " min_dist = math.inf\n", + " n = len(P)\n", + " for i in range(n):\n", + " for j in range(i + 1, n):\n", + " if dist(P[i], P[j]) < min_dist:\n", + " min_dist = dist(P[i], P[j])\n", + " return min_dist\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "id": "965d8274-470e-4a3e-8b88-60d458de74e2", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "1.4142135623730951" + ] + }, + "execution_count": 92, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "points = [(1, 2), (4, 6), (7, 8), (3, 5)]\n", + "closest_point_naive(points)" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "id": "5d875175-45c3-4594-a434-23e15cfb88f7", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "cell-618f956021833284", + "locked": true, + "points": 1, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0" + ] + }, + "execution_count": 95, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "points1 = [(1, 2), (4, 6), (0, 0), (0, 0)]\n", + "closest_point_naive(points1)" + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "id": "755116cb-456e-48ad-8dfe-bd72617488d9", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert math.isclose(closest_point_naive(points1), 0.0, rel_tol=1e-9)" + ] + }, + { + "cell_type": "markdown", + "id": "4d521622-f4b7-4859-b501-2583b943d0e7", + "metadata": {}, + "source": [ + "## Exercice 4: display the minimum distance\n", + "\n", + "_Update the previous function to take a single list of points `P`as input and return the closest points and draw the line that connects them._" + ] + }, + { + "cell_type": "code", + "execution_count": 97, + "id": "a5902565-9c98-482a-8e63-9cc37214beb2", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from matplotlib import pyplot as plt\n", + "import random\n", + "\n", + "points_count = 10\n", + "x = [random.gauss(0, 1) for _ in range(points_count)]\n", + "y = [random.gauss(0, 1) for _ in range(points_count)]\n", + "\n", + "def draw_points(x, y):\n", + " color = \"blue\"\n", + " plt.figure(figsize=(10, 7))\n", + " _ = plt.plot(x, y, '.', markersize=14, color=color)" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "id": "e35557a8-54cb-4324-b280-4479835685db", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "<Figure size 1000x700 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "draw_points(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "id": "6c504c53-d3fc-44a1-904c-49ceff572638", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "closest_point_naive_pair", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def closest_point_naive_pair(P):\n", + " ### BEGIN SOLUTION\n", + " n = len(P)\n", + " min_dist = math.inf\n", + " closest_point1 = None\n", + " closest_point2 = None\n", + "\n", + " for i in range(n):\n", + " for j in range(i + 1, n):\n", + " distance = dist(P[i], P[j])\n", + " if distance < min_dist:\n", + " min_dist = distance\n", + " closest_point1 = P[i]\n", + " closest_point2 = P[j]\n", + "\n", + " ### END SOLUTION\n", + " return closest_point1, closest_point2" + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "id": "4266685c-19f9-4d2f-bb71-4d047bb54787", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "correct_closest_point_naive_pair", + "locked": true, + "points": 1, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "<Figure size 1000x700 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from matplotlib import pyplot as plt\n", + "import random\n", + "import math\n", + "\n", + "points_count = 10\n", + "color = \"blue\"\n", + "x = [random.gauss(0, 1) for _ in range(points_count)]\n", + "y = [random.gauss(0, 1) for _ in range(points_count)]\n", + "\n", + "points = list(zip(x, y))\n", + "\n", + "closest_point1, closest_point2 = closest_point_naive_pair(points)\n", + "\n", + "plt.figure(figsize=(10, 7))\n", + "_ = plt.plot(x, y, '.', markersize=14, color=color)\n", + "plt.plot([closest_point1[0], closest_point2[0]], [closest_point1[1], closest_point2[1]], '-', color='red', linewidth=2)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "id": "bcec1dfc-6ce9-4b08-bc1c-f8d0887aa876", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "((0, 0), (1, 1))" + ] + }, + "execution_count": 103, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "points1 = [(0, 0), (1, 1), (3, 3), (5, 5)]\n", + "closest_point_naive_pair(points1)" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "id": "aad82edd-144d-4117-9ee6-485b9e739251", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert closest_point_naive_pair(points1) == ((0, 0), (1, 1))" + ] + }, + { + "cell_type": "markdown", + "id": "52ede988-1834-489d-9cca-e9882e90f9af", + "metadata": {}, + "source": [ + "## Exercice 5: implement the merge sort\n", + "\n", + "0. Find a base case\n", + "1. Finding the mid of the array \n", + "2. Divide the array elements into 2 halves \n", + "3. Sorting the first half and the second half independantly\n", + "4. merge by copying arrays into a final one" + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "id": "b835049a-27a7-4d45-b819-b9e4d813fdcf", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "merge_sort", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def merge_sort(arr): \n", + " ### BEGIN SOLUTION\n", + " if len(arr) > 1: \n", + " mid = len(arr) // 2 \n", + " L = arr[:mid]\n", + " R = arr[mid:]\n", + " \n", + " merge_sort(L) \n", + " merge_sort(R)\n", + " \n", + " i = j = k = 0\n", + " \n", + " while i < len(L) and j < len(R): \n", + " if L[i] < R[j]: \n", + " arr[k] = L[i] \n", + " i+=1\n", + " else: \n", + " arr[k] = R[j] \n", + " j+=1\n", + " k+=1\n", + " \n", + " while i < len(L): \n", + " arr[k] = L[i] \n", + " i+=1\n", + " k+=1\n", + " \n", + " while j < len(R): \n", + " arr[k] = R[j] \n", + " j+=1\n", + " k+=1\n", + " \n", + " return arr\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 109, + "id": "4ad36fe3-7e9e-4ae8-9df6-f816e24d6c28", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 3, 6]" + ] + }, + "execution_count": 109, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "merge_sort([3, 6, 1])" + ] + }, + { + "cell_type": "code", + "execution_count": 110, + "id": "85865859-828f-4a6b-aad8-51ba0de8ac5b", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "correct_merge_sort", + "locked": true, + "points": 1, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "assert merge_sort([4, 2, 3]) == sorted([4, 2, 3])\n", + "assert merge_sort([7, 2, 1]) == [1, 2, 7]" + ] + }, + { + "cell_type": "markdown", + "id": "e5b7304d-7b64-42df-9465-8f4491525a8b", + "metadata": {}, + "source": [ + "# Exercice 6: organize a schedule\n", + "\n", + "_Propose a greedy algorithm that returns a list of time slots that do not overlap._\n", + "\n", + "In this question you may prioritize the ones that end last, so you may sort by [reverse order](https://docs.python.org/3/howto/sorting.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "id": "e0ed0841-1255-4bdb-b7f8-e689c2a953af", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "interval_scheduling", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def interval_scheduling(intervals):\n", + " ### BEGIN SOLUTION\n", + "\n", + " intervals.sort(key=lambda x: x[1], reverse=True)\n", + " \n", + " selected_intervals = []\n", + " current_end_time = float('-inf')\n", + " \n", + " for interval in intervals:\n", + " start_time, end_time = interval\n", + " if start_time >= current_end_time:\n", + " selected_intervals.append(interval)\n", + " current_end_time = end_time\n", + " return selected_intervals\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 176, + "id": "d6a50280-f016-4686-8515-1b4a136c0fd9", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[(0, 2), (2, 4)]" + ] + }, + "execution_count": 176, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "interval_scheduling([(0, 2), (2, 4), (1, 3)])" + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "id": "bd38d673-077d-44b1-b81c-ac7448f5c01c", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "correct_interval_scheduling", + "locked": true, + "points": 1, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "assert interval_scheduling([(0, 2), (2, 4), (1, 3)]) == [(0, 2), (2, 4)]" + ] + }, + { + "cell_type": "code", + "execution_count": 179, + "id": "de2c9857-9925-4477-96e4-341fa5bc4ec8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert interval_scheduling([(0, 2), (2, 4), (1,3)]) == [(0, 2), (2, 4)]" + ] + }, + { + "cell_type": "markdown", + "id": "c142c3b3-2f01-48d0-8596-601bb133542b", + "metadata": { + "tags": [] + }, + "source": [ + "Now you may prioritize the ones that are the longest." + ] + }, + { + "cell_type": "code", + "execution_count": 181, + "id": "095836b8-2612-4d58-a9ce-b7968489418c", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "interval_scheduling_longest", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def interval_scheduling_longest(intervals):\n", + " ### BEGIN SOLUTION\n", + " intervals.sort(key=lambda x: x[1] - x[0], reverse=True)\n", + " \n", + " selected_intervals = []\n", + " current_end_time = float('-inf')\n", + " \n", + " for interval in intervals:\n", + " start_time, end_time = interval\n", + " if start_time >= current_end_time:\n", + " selected_intervals.append(interval)\n", + " current_end_time = end_time\n", + " return selected_intervals\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 182, + "id": "c87ae1e2-7497-4920-9597-d1b3cddf8580", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[(0, 4), (5, 7)]" + ] + }, + "execution_count": 182, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "interval_scheduling_longest([(0, 4), (3, 5), (5, 7)])" + ] + }, + { + "cell_type": "code", + "execution_count": 183, + "id": "afcf3a69-aa36-4b71-93a8-218337ed88b9", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "correct_interval_scheduling_longest", + "locked": true, + "points": 1, + "schema_version": 3, + "solution": false, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "assert interval_scheduling_longest([(0, 4), (3, 5), (5, 7)]) == [(0, 4), (5, 7)]" + ] + }, + { + "cell_type": "markdown", + "id": "a559e4db-9ef2-4d20-bb9f-cb638d8c1f24", + "metadata": {}, + "source": [ + "# Exercice 7: knapsack problem\n", + "\n", + "Propose a greedy solution for the Kanpsack problem defined as follows:\n", + "\n", + "$\\sum_{i=1}^n w_i x_i \\leq W$ and $x_i \\in \\{0,1,2,\\dots,c\\}$\n", + "\n", + "Which selects the item with the least weight that can fit within the knapsack's capacity at each step " + ] + }, + { + "cell_type": "markdown", + "id": "5833bde3-1960-43ad-b140-381ac6dd228c", + "metadata": {}, + "source": [ + "`W:` The maximum weight capacity of the knapsack.\n", + "\n", + "`w:` A list of item weights.\n", + "\n", + "`n:` The number of items available to choose from.\n", + "\n", + "Tip:\n", + "\n", + "- Initialize with all the item indices of the remaining items\n", + "- Start with an empty knapsack of total weight 0\n", + "- Create a loop that selects the best item from the remaining (with least weight)\n", + "- Add the item to the knapsack and remove it from the list of items\n", + "- Return the total weight and list of items used" + ] + }, + { + "cell_type": "code", + "execution_count": 134, + "id": "46947633-3f5f-420b-b416-500a0dab4fcb", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "cell-0bf75b2bfe8e2e4c", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def greedy_knapsack(W, w):\n", + " ### BEGIN SOLUTION\n", + " w.sort(reverse=True)\n", + " n = len(w)\n", + " remaining_items = list(range(n))\n", + "\n", + " total_weight = 0\n", + " knapsack = []\n", + "\n", + " while remaining_items and total_weight < W:\n", + " best_weight = float('inf')\n", + " best_item = None\n", + "\n", + " for i in remaining_items:\n", + " if w[i] < best_weight:\n", + " best_weight = w[i]\n", + " best_item = i\n", + "\n", + " if best_item is not None:\n", + " weight = w[best_item]\n", + " knapsack.append(weight)\n", + " total_weight += weight\n", + " remaining_items.remove(best_item)\n", + "\n", + " return total_weight, knapsack\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "id": "737df134-4ee1-49a9-9034-da3b0267ae84", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total weight: 5 and selected weights: [2, 3]\n" + ] + } + ], + "source": [ + "weights = [5, 3, 4, 2]\n", + "max_weight = 5\n", + "\n", + "result, selected_weights = greedy_knapsack(max_weight, weights)\n", + "print(\"Total weight:\", result, \"and selected weights:\", selected_weights)" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "id": "a66c6e52-2d60-4423-bc3a-a3dcef38da26", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "assert greedy_knapsack(5, [2, 3, 4, 5]) == (5, [2, 3])" + ] + }, + { + "cell_type": "markdown", + "id": "37ff64e9-fc2c-4593-b85a-d83982445b9d", + "metadata": {}, + "source": [ + "Propose a dynamic programming solution following the general case:\n", + "\n", + "$\n", + "dp[i][w] = \n", + "\\begin{cases}\n", + "0 & \\text{if } i = 0 \\text{ or } w = 0 \\\\\n", + "dp[i-1][w] & \\text{if } w_i > w \\\\\n", + "\\max(dp[i-1][w], w_i + dp[i-1][w - w_i]) & \\text{otherwise}\n", + "\\end{cases}$" + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "id": "2be5f2f7-d9a5-4cf0-afe4-5a5a54ee0434", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "cell-5dc5571a7aca5b0f", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + }, + "tags": [] + }, + "outputs": [], + "source": [ + "def dynamic_knapsack(W, wt):\n", + " ### BEGIN SOLUTION\n", + " n = len(wt)\n", + "\n", + " dp = [[0 for _ in range(W + 1)] for _ in range(n + 1)]\n", + "\n", + " for i in range(n + 1):\n", + " for w in range(W + 1):\n", + " if i == 0 or w == 0:\n", + " dp[i][w] = 0\n", + " elif wt[i - 1] <= w:\n", + " dp[i][w] = max(wt[i - 1] + dp[i - 1][w - wt[i - 1]], dp[i - 1][w])\n", + " else:\n", + " dp[i][w] = dp[i - 1][w]\n", + "\n", + " max_weight = dp[n][W]\n", + "\n", + " selected_items = []\n", + " w = W\n", + " for i in range(n, 0, -1):\n", + " if dp[i][w] != dp[i - 1][w]:\n", + " selected_items.append(wt[i - 1])\n", + " w -= wt[i - 1]\n", + "\n", + " return max_weight, selected_items\n", + " ### END SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 138, + "id": "5adf0e0e-73c4-4656-b546-5b02347f1a28", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total weight: 5 and selected weights: [3, 2]\n" + ] + } + ], + "source": [ + "weights = [2, 3, 4, 5]\n", + "max_weight = 5\n", + "result, selected_weights = dynamic_knapsack(max_weight, weights)\n", + "print(\"Total weight:\", result, \"and selected weights:\", selected_weights)" + ] + }, + { + "cell_type": "code", + "execution_count": 139, + "id": "935873d5-14ee-4b79-9e6a-c692d1c73a9a", + "metadata": {}, + "outputs": [], + "source": [ + "assert dynamic_knapsack(max_weight, weights) == (5, [3, 2])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef953a29-7632-4475-8230-54a8a110d19a", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7dd8b57-0d0d-43d0-b228-b9c994043f81", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 155, + "id": "a066e026-8a09-46ca-a919-53bc90c8a308", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 156, + "id": "80c70549-29c1-49ff-a7de-f47dbce004a9", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[(2, 4), (0, 2)]" + ] + }, + "execution_count": 156, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "greedy_intevals([(0, 2), (2, 4), (1,3)])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41e37d95-4e71-42f0-81dd-fd6122fc9023", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}