{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f855094f-31d3-4951-971d-cddef0e776b9",
   "metadata": {},
   "source": [
    "# 2 APM and Apriori"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f1201cc-92f3-4f38-bcdd-7591da4851ef",
   "metadata": {},
   "source": [
    "Association pattern mining (APM) is a technique used in data mining to identify frequent patterns or associations among a set of items in large datasets. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4b20acf7-0b6d-4fca-9eae-8e2f0c9daeed",
   "metadata": {},
   "source": [
    "## 2.1 The metrics used in APM\n",
    "\n",
    "The metrics help to quantify the strength and significance of these discovered relationships in data sets."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b7a5f1a-3d25-4fe3-8dea-24ffe6fb1244",
   "metadata": {},
   "source": [
    "### 2.1.1 Support\n",
    "\n",
    "__Support__ of an itemset is the proportion of transactions in the dataset in which the itemset appears, it indicates how frequently the itemset appears in the dataset. \n",
    "\n",
    "Transactions refer to individual instances or records in a dataset that contain a collection of items. \n",
    "\n",
    "\n",
    "$\\text{Support}(A \\Rightarrow B) = \\frac{\\text{Freq.} (A \\text{ and } B)}{\\text{Total number of transactions}}$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ab4512fa-ee4c-476d-9c65-ae6ea5348c9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Define the items and records/transactions\n",
    "\n",
    "data = {'Bread': [0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1],\n",
    "        'Milk':  [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0],\n",
    "        'Cheese':[0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],\n",
    "        'Eggs':  [0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],\n",
    "        'Yogurt':[0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1]\n",
    "       }\n",
    "\n",
    "# Define the df records/transactions\n",
    "df = pd.DataFrame(data)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "72c1cfa1-1d88-4465-b5b5-3f9c5c070840",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Bread</th>\n",
       "      <th>Milk</th>\n",
       "      <th>Cheese</th>\n",
       "      <th>Eggs</th>\n",
       "      <th>Yogurt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Bread  Milk  Cheese  Eggs  Yogurt\n",
       "0       0     0       0     0       0\n",
       "1       0     0       0     0       0\n",
       "2       1     0       1     1       1\n",
       "3       0     0       1     1       1\n",
       "4       1     1       1     1       1\n",
       "5       1     0       0     0       1\n",
       "6       1     1       1     1       1\n",
       "7       1     0       0     0       0\n",
       "8       0     0       1     1       0\n",
       "9       0     1       1     1       1\n",
       "10      1     1       0     0       1\n",
       "11      0     0       0     0       0\n",
       "12      1     1       0     0       0\n",
       "13      1     0       0     0       1\n",
       "14      0     0       0     0       0\n",
       "15      1     1       0     0       0\n",
       "16      1     1       1     1       1\n",
       "17      0     0       1     1       1\n",
       "18      0     1       1     1       0\n",
       "19      1     0       1     1       1"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41a94204-ef45-46dc-9000-99eb074677f8",
   "metadata": {},
   "source": [
    "__Example:__ $\\text{Support}(Bread \\Rightarrow Cheese)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b685c137-9a92-4822-82cc-906fd0b5aedd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Support of Bread => Cheese: 0.25\n"
     ]
    }
   ],
   "source": [
    "# Step 1: Count transactions where Bread and Cheese both appear, i.e., the rule of Bread -> Cheese\n",
    "both_present = df[(df['Bread'] == 1) & (df['Cheese'] == 1)]\n",
    "\n",
    "# Step 2: Count these transactions\n",
    "both_present_count = len(both_present)\n",
    "\n",
    "# Step 3: Calculate\n",
    "total_transactions = len(df)\n",
    "support = both_present_count / total_transactions\n",
    "\n",
    "print(f\"Support of Bread => Cheese: {support:.2f}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3556614a-e0b1-4a49-b0db-05286cb63030",
   "metadata": {},
   "source": [
    "### 2.1.2 Confidence \n",
    "\n",
    "__Confidence__ measures the strength of association between two items in APM. It is determined by dividing the number of transactions that contain both items by the number of transactions that contain the first item alone.\n",
    "\n",
    "$\\text{Confidence}(A \\Rightarrow B) = \\frac{\\text{Freq.}(A \\text{ and } B)}{ \\text{Freq.}(A)}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f96bef0-fe1d-4798-a824-e4db4d0acb81",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ae788299-0629-4741-994d-65d8425e82f4",
   "metadata": {},
   "source": [
    "__Example:__ $\\text{Confidence}(Bread \\Rightarrow Cheese)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "6ee3e6ef-9537-4b82-8cf4-6d01ec1cbe73",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Confidence of Bread => Cheese: 0.45\n"
     ]
    }
   ],
   "source": [
    "# Step 1: Count transactions where Bread and Cheese both appear, i.e., the rule of Bread -> Cheese\n",
    "both_present = df[(df['Bread'] == 1) & (df['Cheese'] == 1)]\n",
    "\n",
    "# Step 2: Count these transactions\n",
    "both_present_count = len(both_present)\n",
    "\n",
    "# Step 3: Calculate\n",
    "lhs_present_count = len(df[(df['Bread'] == 1)])\n",
    "confidence = both_present_count / lhs_present_count\n",
    "\n",
    "print(f\"Confidence of Bread => Cheese: {confidence:.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c179eb91-8fff-4fa5-ab0f-404ecfd5cf0f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "32faf6cb-0026-4e4c-a9ff-b3ae69a202ad",
   "metadata": {},
   "source": [
    "### 2.1.3 Completeness\n",
    "\n",
    "__Completeness__ assess the strength of the association rule (A => B) by quantifying how often B (consequence) appears relative to A (Antecedent), providing insights into the reliability and coverage of the rule in capturing the relationship between A and B in the dataset.\n",
    "\n",
    "\n",
    "$\\text{Completeness}(A \\Rightarrow B) = \\frac{\\text{Freq.}(A \\text{ and } B)}{ \\text{Freq.}(B)}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74d0765d-ece1-4c46-a6c6-6b21d8a095f0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "61eae92f-df6f-4e91-a429-f576c8f1b0ca",
   "metadata": {},
   "source": [
    "__Example:__ $\\text{Completeness}(Bread \\Rightarrow Cheese)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e4323837-dc1c-4405-8b4d-bf4548b57401",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Completeness of Bread => Cheese: 0.50\n"
     ]
    }
   ],
   "source": [
    "# Step 1: Count transactions where Bread and Cheese both appear, i.e., the rule of Bread -> Cheese\n",
    "both_present = df[(df['Bread'] == 1) & (df['Cheese'] == 1)]\n",
    "\n",
    "# Step 2: Count these transactions\n",
    "both_present_count = len(both_present)\n",
    "\n",
    "# Step 3: Calculate\n",
    "rhs_present_count = len(df[(df['Cheese'] == 1)])\n",
    "Completeness = both_present_count / rhs_present_count\n",
    "\n",
    "print(f\"Completeness of Bread => Cheese: {Completeness:.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2bd3145-38f9-44f4-872a-8d06ca21c530",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b88a00bd-659a-47d8-aea0-e2fba5333e29",
   "metadata": {},
   "source": [
    "### 2.1.4 Lift\n",
    "\n",
    "__Lift__ quantifies how much more often items A and B occur together than expected if they were statistically independent.\n",
    "\n",
    "$\\text{Lift}(A \\Rightarrow B) = \\frac{\\text{Support}(A \\text{ and } B)}{  \\text{Support}(A) \\times \\text{Support}(B)   }$\n",
    "\n",
    "\n",
    "- Lift > 1: A and B are positively associated (potentially useful).\n",
    "- Lift = 1: A and B are independent (likely not useful).\n",
    "- Lift < 1: A and B are negatively associated (potentially anti-useful)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62684c5a-1bd2-4011-9e73-b8d970b00132",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "87629b02-61e6-4760-a1db-adeea7119dcf",
   "metadata": {},
   "source": [
    "__Example:__ $\\text{Lift}(Bread \\Rightarrow Cheese)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "043caf29-e024-4904-b4d8-8e100670881c",
   "metadata": {},
   "outputs": [],
   "source": [
    "lhs_rhs_support = support # see the support example: support of (bread and cheese)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "133bf868-62a6-4da0-a90f-86b5a0c86150",
   "metadata": {},
   "outputs": [],
   "source": [
    "lhs_support = lhs_present_count / total_transactions  # support of bread"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8121d566-7ddf-444c-98ef-05f966cd66ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "rhs_support = rhs_present_count / total_transactions # support of cheese"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "37c17282-7e3d-4d65-b79c-b60673ab6977",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Lift of Bread => Cheese: 0.91\n"
     ]
    }
   ],
   "source": [
    "# Caculation\n",
    "lift = lhs_rhs_support / (lhs_support * rhs_support)\n",
    "print(f\"Lift of Bread => Cheese: {lift:.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2c91230-20ee-482c-9291-0c7c2ca2b7eb",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "64e47f17-fab0-403c-a424-552686b70820",
   "metadata": {},
   "source": [
    "### 2.1.5 Conviction \n",
    "\n",
    "__Conviction__ essentially asks the question: how often does A occur without B? A high conviction value indicates that the rule A ⇒ B is unlikely to happen by chance, suggesting a strong association between A and B. Conversely, a low conviction value implies that A and B occurring together might be due to chance, indicating a weaker association.\n",
    "\n",
    "$\\text{Conviction}(A \\Rightarrow B) = \\frac{1 - \\text{Support}(B)}{1 - \\text{Confidence}(A \\Rightarrow B)}$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "986d4212-97d8-43e1-b850-bc70e00239d6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e2595d16-b31a-4d32-b052-dc85c03c3cb6",
   "metadata": {},
   "source": [
    "__Example:__ $\\text{Conviction}(Bread \\Rightarrow Cheese)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "91a35382-3793-4235-bc78-24f9eaf3f8f1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Conviction of Bread => Cheese: 0.92\n"
     ]
    }
   ],
   "source": [
    "conviction = (1 - rhs_support) / (1 - confidence)\n",
    "print(f\"Conviction of Bread => Cheese: {conviction:.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4981ad05-53c9-4123-ad72-5124d60034f3",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1b2326d-70a8-4114-9b29-3f02651d2dbb",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "8d850747-dd92-4b0f-867e-e774e8debfa3",
   "metadata": {},
   "source": [
    "## 2.2 Apriori algorithm\n",
    "\n",
    "The Apriori algorithm, introduced in [Fast algorithms for mining association rules](https://www.it.uu.se/edu/course/homepage/infoutv/ht08/vldb94_rj.pdf) by Rakesh Agrawal and Ramakrishnan Srikant in 1994, is a foundational tool in data mining for discovering frequent itemsets and generating association rules. \n",
    "\n",
    "This algorithm works on transaction databases, such as those recording items purchased by customers in a supermarket, with the primary goal of identifying itemsets that frequently appear together.\n",
    "\n",
    "[apriori in mlxtend lib](https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3d1ca790-b61a-45cf-87b6-149f0283dd3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ! pip install mlxtend  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "6f5baa38-6a9b-4f4a-88fa-30a9ca094b01",
   "metadata": {},
   "outputs": [],
   "source": [
    "from mlxtend.preprocessing import TransactionEncoder\n",
    "from mlxtend.frequent_patterns import apriori, association_rules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "379b8079-a8b2-4dd7-b208-4015a83f4aa7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read csv as df\n",
    "df_cs = pd.read_csv('cakeshop2.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "7d04b306-eea1-4155-97ce-52b6f22f93e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TransactionID</th>\n",
       "      <th>BREAD</th>\n",
       "      <th>MILK</th>\n",
       "      <th>JAM</th>\n",
       "      <th>EGGS</th>\n",
       "      <th>BUTTER</th>\n",
       "      <th>SUGAR</th>\n",
       "      <th>FLOUR</th>\n",
       "      <th>CHOCOLATE</th>\n",
       "      <th>YEAST</th>\n",
       "      <th>CANDLES</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>Jam</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>Flour</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>Yeast</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>Flour</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Flour</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>495</th>\n",
       "      <td>496</td>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Butter</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>Flour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>496</th>\n",
       "      <td>497</td>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>497</th>\n",
       "      <td>498</td>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>498</th>\n",
       "      <td>499</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Yeast</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>499</th>\n",
       "      <td>500</td>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>Jam</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Yeast</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>500 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     TransactionID  BREAD  MILK  JAM  EGGS  BUTTER  SUGAR  FLOUR  CHOCOLATE  \\\n",
       "0                1  Bread  Milk  Jam  Eggs  Butter  Sugar  Flour  Chocolate   \n",
       "1                2    NaN  Milk  NaN  Eggs     NaN  Sugar  Flour  Chocolate   \n",
       "2                3    NaN  Milk  NaN  Eggs     NaN  Sugar    NaN  Chocolate   \n",
       "3                4    NaN  Milk  NaN  Eggs  Butter    NaN    NaN  Chocolate   \n",
       "4                5    NaN  Milk  NaN  Eggs  Butter    NaN  Flour  Chocolate   \n",
       "..             ...    ...   ...  ...   ...     ...    ...    ...        ...   \n",
       "495            496  Bread  Milk  NaN   NaN  Butter  Sugar  Flour        NaN   \n",
       "496            497  Bread  Milk  NaN   NaN     NaN    NaN    NaN        NaN   \n",
       "497            498  Bread  Milk  NaN   NaN     NaN    NaN    NaN        NaN   \n",
       "498            499    NaN  Milk  NaN  Eggs     NaN    NaN    NaN        NaN   \n",
       "499            500  Bread  Milk  Jam  Eggs  Butter  Sugar    NaN        NaN   \n",
       "\n",
       "     YEAST  CANDLES  \n",
       "0    Yeast  Candles  \n",
       "1      NaN      NaN  \n",
       "2      NaN      NaN  \n",
       "3      NaN  Candles  \n",
       "4      NaN      NaN  \n",
       "..     ...      ...  \n",
       "495    NaN  Candles  \n",
       "496    NaN  Candles  \n",
       "497    NaN  Candles  \n",
       "498  Yeast  Candles  \n",
       "499  Yeast  Candles  \n",
       "\n",
       "[500 rows x 11 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_cs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "5fc09292-f67c-41f8-962d-d8ba2134a15b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>BREAD</th>\n",
       "      <th>MILK</th>\n",
       "      <th>JAM</th>\n",
       "      <th>EGGS</th>\n",
       "      <th>BUTTER</th>\n",
       "      <th>SUGAR</th>\n",
       "      <th>FLOUR</th>\n",
       "      <th>CHOCOLATE</th>\n",
       "      <th>YEAST</th>\n",
       "      <th>CANDLES</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>Jam</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>Flour</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>Yeast</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>Flour</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Flour</td>\n",
       "      <td>Chocolate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>495</th>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Butter</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>Flour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>496</th>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>497</th>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>498</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Milk</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Yeast</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>499</th>\n",
       "      <td>Bread</td>\n",
       "      <td>Milk</td>\n",
       "      <td>Jam</td>\n",
       "      <td>Eggs</td>\n",
       "      <td>Butter</td>\n",
       "      <td>Sugar</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Yeast</td>\n",
       "      <td>Candles</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>500 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     BREAD  MILK  JAM  EGGS  BUTTER  SUGAR  FLOUR  CHOCOLATE  YEAST  CANDLES\n",
       "0    Bread  Milk  Jam  Eggs  Butter  Sugar  Flour  Chocolate  Yeast  Candles\n",
       "1      NaN  Milk  NaN  Eggs     NaN  Sugar  Flour  Chocolate    NaN      NaN\n",
       "2      NaN  Milk  NaN  Eggs     NaN  Sugar    NaN  Chocolate    NaN      NaN\n",
       "3      NaN  Milk  NaN  Eggs  Butter    NaN    NaN  Chocolate    NaN  Candles\n",
       "4      NaN  Milk  NaN  Eggs  Butter    NaN  Flour  Chocolate    NaN      NaN\n",
       "..     ...   ...  ...   ...     ...    ...    ...        ...    ...      ...\n",
       "495  Bread  Milk  NaN   NaN  Butter  Sugar  Flour        NaN    NaN  Candles\n",
       "496  Bread  Milk  NaN   NaN     NaN    NaN    NaN        NaN    NaN  Candles\n",
       "497  Bread  Milk  NaN   NaN     NaN    NaN    NaN        NaN    NaN  Candles\n",
       "498    NaN  Milk  NaN  Eggs     NaN    NaN    NaN        NaN  Yeast  Candles\n",
       "499  Bread  Milk  Jam  Eggs  Butter  Sugar    NaN        NaN  Yeast  Candles\n",
       "\n",
       "[500 rows x 10 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# delete TransactionID columns\n",
    "del df_cs['TransactionID']\n",
    "df_cs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "41694218-a61a-4c8b-b58c-8d589df30363",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert categorical variable into dummy/indicator variables (Converting to ones and zeros): one-hot or hot-one encoding\n",
    "df_cs = pd.get_dummies(df_cs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "f15be594-5ba2-413a-be8b-682e17d33835",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>BREAD_Bread</th>\n",
       "      <th>MILK_Milk</th>\n",
       "      <th>JAM_Jam</th>\n",
       "      <th>EGGS_Eggs</th>\n",
       "      <th>BUTTER_Butter</th>\n",
       "      <th>SUGAR_Sugar</th>\n",
       "      <th>FLOUR_Flour</th>\n",
       "      <th>CHOCOLATE_Chocolate</th>\n",
       "      <th>YEAST_Yeast</th>\n",
       "      <th>CANDLES_Candles</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>495</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>496</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>497</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>498</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>499</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>500 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     BREAD_Bread  MILK_Milk  JAM_Jam  EGGS_Eggs  BUTTER_Butter  SUGAR_Sugar  \\\n",
       "0           True       True     True       True           True         True   \n",
       "1          False       True    False       True          False         True   \n",
       "2          False       True    False       True          False         True   \n",
       "3          False       True    False       True           True        False   \n",
       "4          False       True    False       True           True        False   \n",
       "..           ...        ...      ...        ...            ...          ...   \n",
       "495         True       True    False      False           True         True   \n",
       "496         True       True    False      False          False        False   \n",
       "497         True       True    False      False          False        False   \n",
       "498        False       True    False       True          False        False   \n",
       "499         True       True     True       True           True         True   \n",
       "\n",
       "     FLOUR_Flour  CHOCOLATE_Chocolate  YEAST_Yeast  CANDLES_Candles  \n",
       "0           True                 True         True             True  \n",
       "1           True                 True        False            False  \n",
       "2          False                 True        False            False  \n",
       "3          False                 True        False             True  \n",
       "4           True                 True        False            False  \n",
       "..           ...                  ...          ...              ...  \n",
       "495         True                False        False             True  \n",
       "496        False                False        False             True  \n",
       "497        False                False        False             True  \n",
       "498        False                False         True             True  \n",
       "499        False                False         True             True  \n",
       "\n",
       "[500 rows x 10 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_cs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f85962a0-0fbf-4d17-a880-de583bc4ab89",
   "metadata": {},
   "source": [
    "__Support__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "7e6652c5-3305-4d9c-9173-efb346965dc0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.712</td>\n",
       "      <td>(BREAD_Bread)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.802</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.246</td>\n",
       "      <td>(JAM_Jam)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.336</td>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.392</td>\n",
       "      <td>(BUTTER_Butter)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.592</td>\n",
       "      <td>(BREAD_Bread, MILK_Milk)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.240</td>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.246</td>\n",
       "      <td>(BREAD_Bread, BUTTER_Butter)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.284</td>\n",
       "      <td>(MILK_Milk, EGGS_Eggs)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.258</td>\n",
       "      <td>(MILK_Milk, BUTTER_Butter)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.208</td>\n",
       "      <td>(BREAD_Bread, MILK_Milk, EGGS_Eggs)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                             itemsets\n",
       "0     0.712                        (BREAD_Bread)\n",
       "1     0.802                          (MILK_Milk)\n",
       "2     0.246                            (JAM_Jam)\n",
       "3     0.336                          (EGGS_Eggs)\n",
       "4     0.392                      (BUTTER_Butter)\n",
       "5     0.592             (BREAD_Bread, MILK_Milk)\n",
       "6     0.240             (BREAD_Bread, EGGS_Eggs)\n",
       "7     0.246         (BREAD_Bread, BUTTER_Butter)\n",
       "8     0.284               (MILK_Milk, EGGS_Eggs)\n",
       "9     0.258           (MILK_Milk, BUTTER_Butter)\n",
       "10    0.208  (BREAD_Bread, MILK_Milk, EGGS_Eggs)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_frequent = apriori(df_cs, min_support=0.2, use_colnames=True)\n",
    "df_frequent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "a8e6852d-7bc7-4698-8d2a-80434ab35680",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_frequent['length'] = df_frequent.itemsets.apply(lambda x : len(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "5e340c1c-2229-4459-9119-dd14435031e0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.712</td>\n",
       "      <td>(BREAD_Bread)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.802</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.246</td>\n",
       "      <td>(JAM_Jam)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.336</td>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.392</td>\n",
       "      <td>(BUTTER_Butter)</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.592</td>\n",
       "      <td>(BREAD_Bread, MILK_Milk)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.240</td>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.246</td>\n",
       "      <td>(BREAD_Bread, BUTTER_Butter)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.284</td>\n",
       "      <td>(MILK_Milk, EGGS_Eggs)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.258</td>\n",
       "      <td>(MILK_Milk, BUTTER_Butter)</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.208</td>\n",
       "      <td>(BREAD_Bread, MILK_Milk, EGGS_Eggs)</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                             itemsets  length\n",
       "0     0.712                        (BREAD_Bread)       1\n",
       "1     0.802                          (MILK_Milk)       1\n",
       "2     0.246                            (JAM_Jam)       1\n",
       "3     0.336                          (EGGS_Eggs)       1\n",
       "4     0.392                      (BUTTER_Butter)       1\n",
       "5     0.592             (BREAD_Bread, MILK_Milk)       2\n",
       "6     0.240             (BREAD_Bread, EGGS_Eggs)       2\n",
       "7     0.246         (BREAD_Bread, BUTTER_Butter)       2\n",
       "8     0.284               (MILK_Milk, EGGS_Eggs)       2\n",
       "9     0.258           (MILK_Milk, BUTTER_Butter)       2\n",
       "10    0.208  (BREAD_Bread, MILK_Milk, EGGS_Eggs)       3"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_frequent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f48be902-6f01-44ce-8892-e8aa71f682ec",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "47683603-7a36-4555-956a-8619f3498aa7",
   "metadata": {},
   "source": [
    "__Metrics for association_rules__\n",
    "\n",
    "[The returned columns metrics](https://rasbt.github.io/mlxtend/api_subpackages/mlxtend.frequent_patterns/#association_rules)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "7ad257d8-314d-4c49-adf8-73e5845663e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>antecedents</th>\n",
       "      <th>consequents</th>\n",
       "      <th>antecedent support</th>\n",
       "      <th>consequent support</th>\n",
       "      <th>support</th>\n",
       "      <th>confidence</th>\n",
       "      <th>lift</th>\n",
       "      <th>leverage</th>\n",
       "      <th>conviction</th>\n",
       "      <th>zhangs_metric</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(BREAD_Bread)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.712</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.592</td>\n",
       "      <td>0.831461</td>\n",
       "      <td>1.036734</td>\n",
       "      <td>0.020976</td>\n",
       "      <td>1.174800</td>\n",
       "      <td>0.123029</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>(BREAD_Bread)</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.712</td>\n",
       "      <td>0.592</td>\n",
       "      <td>0.738155</td>\n",
       "      <td>1.036734</td>\n",
       "      <td>0.020976</td>\n",
       "      <td>1.099886</td>\n",
       "      <td>0.178952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "      <td>(BREAD_Bread)</td>\n",
       "      <td>0.336</td>\n",
       "      <td>0.712</td>\n",
       "      <td>0.240</td>\n",
       "      <td>0.714286</td>\n",
       "      <td>1.003210</td>\n",
       "      <td>0.000768</td>\n",
       "      <td>1.008000</td>\n",
       "      <td>0.004819</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.336</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.284</td>\n",
       "      <td>0.845238</td>\n",
       "      <td>1.053913</td>\n",
       "      <td>0.014528</td>\n",
       "      <td>1.279385</td>\n",
       "      <td>0.077041</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.240</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.208</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>1.080632</td>\n",
       "      <td>0.015520</td>\n",
       "      <td>1.485000</td>\n",
       "      <td>0.098178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>(MILK_Milk, EGGS_Eggs)</td>\n",
       "      <td>(BREAD_Bread)</td>\n",
       "      <td>0.284</td>\n",
       "      <td>0.712</td>\n",
       "      <td>0.208</td>\n",
       "      <td>0.732394</td>\n",
       "      <td>1.028644</td>\n",
       "      <td>0.005792</td>\n",
       "      <td>1.076211</td>\n",
       "      <td>0.038891</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                antecedents    consequents  antecedent support  \\\n",
       "0             (BREAD_Bread)    (MILK_Milk)               0.712   \n",
       "1               (MILK_Milk)  (BREAD_Bread)               0.802   \n",
       "2               (EGGS_Eggs)  (BREAD_Bread)               0.336   \n",
       "3               (EGGS_Eggs)    (MILK_Milk)               0.336   \n",
       "4  (BREAD_Bread, EGGS_Eggs)    (MILK_Milk)               0.240   \n",
       "5    (MILK_Milk, EGGS_Eggs)  (BREAD_Bread)               0.284   \n",
       "\n",
       "   consequent support  support  confidence      lift  leverage  conviction  \\\n",
       "0               0.802    0.592    0.831461  1.036734  0.020976    1.174800   \n",
       "1               0.712    0.592    0.738155  1.036734  0.020976    1.099886   \n",
       "2               0.712    0.240    0.714286  1.003210  0.000768    1.008000   \n",
       "3               0.802    0.284    0.845238  1.053913  0.014528    1.279385   \n",
       "4               0.802    0.208    0.866667  1.080632  0.015520    1.485000   \n",
       "5               0.712    0.208    0.732394  1.028644  0.005792    1.076211   \n",
       "\n",
       "   zhangs_metric  \n",
       "0       0.123029  \n",
       "1       0.178952  \n",
       "2       0.004819  \n",
       "3       0.077041  \n",
       "4       0.098178  \n",
       "5       0.038891  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# return all the rules with confidence above 0.7\n",
    "association_rules(df_frequent, metric='confidence', min_threshold=0.7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "ee5fe64e-d001-4cec-9fa4-d42424910298",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>antecedents</th>\n",
       "      <th>consequents</th>\n",
       "      <th>antecedent support</th>\n",
       "      <th>consequent support</th>\n",
       "      <th>support</th>\n",
       "      <th>confidence</th>\n",
       "      <th>lift</th>\n",
       "      <th>leverage</th>\n",
       "      <th>conviction</th>\n",
       "      <th>zhangs_metric</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.336</td>\n",
       "      <td>0.284</td>\n",
       "      <td>0.354115</td>\n",
       "      <td>1.053913</td>\n",
       "      <td>0.014528</td>\n",
       "      <td>1.028046</td>\n",
       "      <td>0.258358</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.336</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.284</td>\n",
       "      <td>0.845238</td>\n",
       "      <td>1.053913</td>\n",
       "      <td>0.014528</td>\n",
       "      <td>1.279385</td>\n",
       "      <td>0.077041</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.240</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.208</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>1.080632</td>\n",
       "      <td>0.015520</td>\n",
       "      <td>1.485000</td>\n",
       "      <td>0.098178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.240</td>\n",
       "      <td>0.208</td>\n",
       "      <td>0.259352</td>\n",
       "      <td>1.080632</td>\n",
       "      <td>0.015520</td>\n",
       "      <td>1.026128</td>\n",
       "      <td>0.376845</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                antecedents               consequents  antecedent support  \\\n",
       "0               (MILK_Milk)               (EGGS_Eggs)               0.802   \n",
       "1               (EGGS_Eggs)               (MILK_Milk)               0.336   \n",
       "2  (BREAD_Bread, EGGS_Eggs)               (MILK_Milk)               0.240   \n",
       "3               (MILK_Milk)  (BREAD_Bread, EGGS_Eggs)               0.802   \n",
       "\n",
       "   consequent support  support  confidence      lift  leverage  conviction  \\\n",
       "0               0.336    0.284    0.354115  1.053913  0.014528    1.028046   \n",
       "1               0.802    0.284    0.845238  1.053913  0.014528    1.279385   \n",
       "2               0.802    0.208    0.866667  1.080632  0.015520    1.485000   \n",
       "3               0.240    0.208    0.259352  1.080632  0.015520    1.026128   \n",
       "\n",
       "   zhangs_metric  \n",
       "0       0.258358  \n",
       "1       0.077041  \n",
       "2       0.098178  \n",
       "3       0.376845  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# return all the rules with lift above 1.05\n",
    "association_rules(df_frequent, metric='lift', min_threshold=1.05) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2915ac7a-7a38-4155-ad20-a1932768ffea",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "ed9934f3-4de5-455b-a183-a827a5cef4c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we define a df for rules for further selection\n",
    "df_rules = association_rules(df_frequent, metric='lift', min_threshold=1.05) \n",
    "df_rules['antecedent_len'] = df_rules['antecedents'].apply(lambda x: len(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "1c4afc83-f5fb-4260-9ccd-ea60adcbfd72",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>antecedents</th>\n",
       "      <th>consequents</th>\n",
       "      <th>antecedent support</th>\n",
       "      <th>consequent support</th>\n",
       "      <th>support</th>\n",
       "      <th>confidence</th>\n",
       "      <th>lift</th>\n",
       "      <th>leverage</th>\n",
       "      <th>conviction</th>\n",
       "      <th>zhangs_metric</th>\n",
       "      <th>antecedent_len</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.336</td>\n",
       "      <td>0.284</td>\n",
       "      <td>0.354115</td>\n",
       "      <td>1.053913</td>\n",
       "      <td>0.014528</td>\n",
       "      <td>1.028046</td>\n",
       "      <td>0.258358</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(EGGS_Eggs)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.336</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.284</td>\n",
       "      <td>0.845238</td>\n",
       "      <td>1.053913</td>\n",
       "      <td>0.014528</td>\n",
       "      <td>1.279385</td>\n",
       "      <td>0.077041</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.240</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.208</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>1.080632</td>\n",
       "      <td>0.015520</td>\n",
       "      <td>1.485000</td>\n",
       "      <td>0.098178</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.240</td>\n",
       "      <td>0.208</td>\n",
       "      <td>0.259352</td>\n",
       "      <td>1.080632</td>\n",
       "      <td>0.015520</td>\n",
       "      <td>1.026128</td>\n",
       "      <td>0.376845</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                antecedents               consequents  antecedent support  \\\n",
       "0               (MILK_Milk)               (EGGS_Eggs)               0.802   \n",
       "1               (EGGS_Eggs)               (MILK_Milk)               0.336   \n",
       "2  (BREAD_Bread, EGGS_Eggs)               (MILK_Milk)               0.240   \n",
       "3               (MILK_Milk)  (BREAD_Bread, EGGS_Eggs)               0.802   \n",
       "\n",
       "   consequent support  support  confidence      lift  leverage  conviction  \\\n",
       "0               0.336    0.284    0.354115  1.053913  0.014528    1.028046   \n",
       "1               0.802    0.284    0.845238  1.053913  0.014528    1.279385   \n",
       "2               0.802    0.208    0.866667  1.080632  0.015520    1.485000   \n",
       "3               0.240    0.208    0.259352  1.080632  0.015520    1.026128   \n",
       "\n",
       "   zhangs_metric  antecedent_len  \n",
       "0       0.258358               1  \n",
       "1       0.077041               1  \n",
       "2       0.098178               2  \n",
       "3       0.376845               1  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_rules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "d92fc331-ba33-4ca6-a888-ba10233f4084",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>antecedents</th>\n",
       "      <th>consequents</th>\n",
       "      <th>antecedent support</th>\n",
       "      <th>consequent support</th>\n",
       "      <th>support</th>\n",
       "      <th>confidence</th>\n",
       "      <th>lift</th>\n",
       "      <th>leverage</th>\n",
       "      <th>conviction</th>\n",
       "      <th>zhangs_metric</th>\n",
       "      <th>antecedent_len</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(BREAD_Bread, EGGS_Eggs)</td>\n",
       "      <td>(MILK_Milk)</td>\n",
       "      <td>0.24</td>\n",
       "      <td>0.802</td>\n",
       "      <td>0.208</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>1.080632</td>\n",
       "      <td>0.01552</td>\n",
       "      <td>1.485</td>\n",
       "      <td>0.098178</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                antecedents  consequents  antecedent support  \\\n",
       "2  (BREAD_Bread, EGGS_Eggs)  (MILK_Milk)                0.24   \n",
       "\n",
       "   consequent support  support  confidence      lift  leverage  conviction  \\\n",
       "2               0.802    0.208    0.866667  1.080632   0.01552       1.485   \n",
       "\n",
       "   zhangs_metric  antecedent_len  \n",
       "2       0.098178               2  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# useful length\n",
    "df_rules.query('antecedent_len >= 2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9535fa3f-ad57-4157-9997-c538244d9a41",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "8e4c7e1d-d8f6-4eee-9cd5-821fdbc71caf",
   "metadata": {},
   "source": [
    "## 2.3 APM in road accident data\n",
    "\n",
    "In this section, we will use APM in road accident ('acciden_data_v1.0.0_2023.db) for example.\n",
    "\n",
    "Schema info can be found in 'dft-road-casualty-statistics-road-safety-open-dataset-data-guide-2023-1.xlsx'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdf02564-2af2-4897-86d1-9a808b345248",
   "metadata": {},
   "source": [
    "### 2.3.1 Database infos "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "b7763e57-9f59-4620-9b08-a362aba55c38",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sqlite3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "bc6197c3-829d-497a-b3e4-010a58cc2fef",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Connect to accident database \n",
    "conn = sqlite3.connect('accident_data_v1.0.0_2023.db')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "984675b5-332a-4568-a491-35b66969eca4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create cursor handling SQL \n",
    "cur = conn.cursor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "1bfb860f-6fcb-48e0-ad1d-8d9b9b318e76",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[]\n"
     ]
    }
   ],
   "source": [
    "# output all the tables in db\n",
    "res_tables = cur.execute(\"SELECT name FROM sqlite_master WHERE type='table'\") \n",
    "table_names = res_tables.fetchall()\n",
    "print(table_names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "8e5065c5-74b0-4347-8647-1aac52524f26",
   "metadata": {},
   "outputs": [
    {
     "ename": "IndexError",
     "evalue": "list index out of range",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mIndexError\u001b[0m                                Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[41], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mtable_names\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m]\u001b[49m[\u001b[38;5;241m0\u001b[39m]\n",
      "\u001b[0;31mIndexError\u001b[0m: list index out of range"
     ]
    }
   ],
   "source": [
    "table_names[1][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "298b7072-68e1-4ea7-bcbd-50fe09257893",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the foreign keys to understand the relationships across tables in database\n",
    "# it is clear that accident_index column is used to link tables in db\n",
    "for tn in table_names:\n",
    "    res_fk = cur.execute(f\"PRAGMA foreign_key_list({tn[0]})\")\n",
    "    fk_info = res_fk.fetchall()\n",
    "    if len(fk_info) < 1:\n",
    "        print(f'No foreign keys found in Table: {tn[0]}')\n",
    "    \n",
    "    else:\n",
    "        print(f'Table {tn[0]} fkeys info --',  \n",
    "             'id:', fk_info[0][0], '--',\n",
    "             'seq:', fk_info[0][1],'--',\n",
    "             'table:', fk_info[0][2],'--',\n",
    "             'from:', fk_info[0][3],'--',\n",
    "             'to:', fk_info[0][4],'--',\n",
    "             'on_update:', fk_info[0][5],'--',\n",
    "             'on_delete:', fk_info[0][6],'--',\n",
    "             'match:', fk_info[0][7]) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e613e52e-d04a-47bc-973f-c5d43363eb9d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "3b40525b-c128-4ac8-bbe3-30dafc0ff3eb",
   "metadata": {},
   "source": [
    "### 2.3.2 APM in accident table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c733b5ef-6db5-4f15-8ace-43cb3fca8764",
   "metadata": {},
   "outputs": [],
   "source": [
    "# using pandas to read table accident\n",
    "df_ac = pd.read_sql(\"SELECT * FROM accident;\", conn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c7b92268-b601-426c-8d84-ec9d824ee5a8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>accident_index</th>\n",
       "      <th>accident_year</th>\n",
       "      <th>accident_reference</th>\n",
       "      <th>location_easting_osgr</th>\n",
       "      <th>location_northing_osgr</th>\n",
       "      <th>longitude</th>\n",
       "      <th>latitude</th>\n",
       "      <th>police_force</th>\n",
       "      <th>accident_severity</th>\n",
       "      <th>number_of_vehicles</th>\n",
       "      <th>...</th>\n",
       "      <th>pedestrian_crossing_physical_facilities</th>\n",
       "      <th>light_conditions</th>\n",
       "      <th>weather_conditions</th>\n",
       "      <th>road_surface_conditions</th>\n",
       "      <th>special_conditions_at_site</th>\n",
       "      <th>carriageway_hazards</th>\n",
       "      <th>urban_or_rural_area</th>\n",
       "      <th>did_police_officer_attend_scene_of_accident</th>\n",
       "      <th>trunk_road_flag</th>\n",
       "      <th>lsoa_of_accident_location</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017010001708</td>\n",
       "      <td>2017</td>\n",
       "      <td>010001708</td>\n",
       "      <td>532920.0</td>\n",
       "      <td>196330.0</td>\n",
       "      <td>-0.080107</td>\n",
       "      <td>51.650061</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>E01001450</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017010009342</td>\n",
       "      <td>2017</td>\n",
       "      <td>010009342</td>\n",
       "      <td>526790.0</td>\n",
       "      <td>181970.0</td>\n",
       "      <td>-0.173845</td>\n",
       "      <td>51.522425</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>E01004702</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017010009344</td>\n",
       "      <td>2017</td>\n",
       "      <td>010009344</td>\n",
       "      <td>535200.0</td>\n",
       "      <td>181260.0</td>\n",
       "      <td>-0.052969</td>\n",
       "      <td>51.514096</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>E01004298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017010009348</td>\n",
       "      <td>2017</td>\n",
       "      <td>010009348</td>\n",
       "      <td>534340.0</td>\n",
       "      <td>193560.0</td>\n",
       "      <td>-0.060658</td>\n",
       "      <td>51.624832</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>E01001429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2017010009350</td>\n",
       "      <td>2017</td>\n",
       "      <td>010009350</td>\n",
       "      <td>533680.0</td>\n",
       "      <td>187820.0</td>\n",
       "      <td>-0.072372</td>\n",
       "      <td>51.573408</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>E01001808</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461347</th>\n",
       "      <td>2020991027064</td>\n",
       "      <td>2020</td>\n",
       "      <td>991027064</td>\n",
       "      <td>343034.0</td>\n",
       "      <td>731654.0</td>\n",
       "      <td>-2.926320</td>\n",
       "      <td>56.473539</td>\n",
       "      <td>99</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>-1</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461348</th>\n",
       "      <td>2020991029573</td>\n",
       "      <td>2020</td>\n",
       "      <td>991029573</td>\n",
       "      <td>257963.0</td>\n",
       "      <td>658891.0</td>\n",
       "      <td>-4.267565</td>\n",
       "      <td>55.802353</td>\n",
       "      <td>99</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>-1</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461349</th>\n",
       "      <td>2020991030297</td>\n",
       "      <td>2020</td>\n",
       "      <td>991030297</td>\n",
       "      <td>383664.0</td>\n",
       "      <td>810646.0</td>\n",
       "      <td>-2.271903</td>\n",
       "      <td>57.186317</td>\n",
       "      <td>99</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>-1</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461350</th>\n",
       "      <td>2020991030900</td>\n",
       "      <td>2020</td>\n",
       "      <td>991030900</td>\n",
       "      <td>277161.0</td>\n",
       "      <td>674852.0</td>\n",
       "      <td>-3.968753</td>\n",
       "      <td>55.950940</td>\n",
       "      <td>99</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>-1</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461351</th>\n",
       "      <td>2020991032575</td>\n",
       "      <td>2020</td>\n",
       "      <td>991032575</td>\n",
       "      <td>240402.0</td>\n",
       "      <td>681950.0</td>\n",
       "      <td>-4.561040</td>\n",
       "      <td>56.003843</td>\n",
       "      <td>99</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>-1</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>461352 rows × 36 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       accident_index  accident_year accident_reference  \\\n",
       "0       2017010001708           2017          010001708   \n",
       "1       2017010009342           2017          010009342   \n",
       "2       2017010009344           2017          010009344   \n",
       "3       2017010009348           2017          010009348   \n",
       "4       2017010009350           2017          010009350   \n",
       "...               ...            ...                ...   \n",
       "461347  2020991027064           2020          991027064   \n",
       "461348  2020991029573           2020          991029573   \n",
       "461349  2020991030297           2020          991030297   \n",
       "461350  2020991030900           2020          991030900   \n",
       "461351  2020991032575           2020          991032575   \n",
       "\n",
       "        location_easting_osgr  location_northing_osgr  longitude   latitude  \\\n",
       "0                    532920.0                196330.0  -0.080107  51.650061   \n",
       "1                    526790.0                181970.0  -0.173845  51.522425   \n",
       "2                    535200.0                181260.0  -0.052969  51.514096   \n",
       "3                    534340.0                193560.0  -0.060658  51.624832   \n",
       "4                    533680.0                187820.0  -0.072372  51.573408   \n",
       "...                       ...                     ...        ...        ...   \n",
       "461347               343034.0                731654.0  -2.926320  56.473539   \n",
       "461348               257963.0                658891.0  -4.267565  55.802353   \n",
       "461349               383664.0                810646.0  -2.271903  57.186317   \n",
       "461350               277161.0                674852.0  -3.968753  55.950940   \n",
       "461351               240402.0                681950.0  -4.561040  56.003843   \n",
       "\n",
       "        police_force  accident_severity  number_of_vehicles  ...  \\\n",
       "0                  1                  1                   2  ...   \n",
       "1                  1                  3                   2  ...   \n",
       "2                  1                  3                   3  ...   \n",
       "3                  1                  3                   2  ...   \n",
       "4                  1                  2                   1  ...   \n",
       "...              ...                ...                 ...  ...   \n",
       "461347            99                  2                   2  ...   \n",
       "461348            99                  3                   1  ...   \n",
       "461349            99                  2                   2  ...   \n",
       "461350            99                  3                   2  ...   \n",
       "461351            99                  3                   1  ...   \n",
       "\n",
       "        pedestrian_crossing_physical_facilities light_conditions  \\\n",
       "0                                             0                4   \n",
       "1                                             0                4   \n",
       "2                                             0                4   \n",
       "3                                             4                4   \n",
       "4                                             5                4   \n",
       "...                                         ...              ...   \n",
       "461347                                        0                1   \n",
       "461348                                        0                1   \n",
       "461349                                        0                1   \n",
       "461350                                        0                1   \n",
       "461351                                        0                1   \n",
       "\n",
       "        weather_conditions road_surface_conditions  \\\n",
       "0                        1                       1   \n",
       "1                        1                       2   \n",
       "2                        1                       1   \n",
       "3                        2                       2   \n",
       "4                        1                       2   \n",
       "...                    ...                     ...   \n",
       "461347                   1                       1   \n",
       "461348                   1                       1   \n",
       "461349                   1                       1   \n",
       "461350                   1                       1   \n",
       "461351                   1                       1   \n",
       "\n",
       "        special_conditions_at_site carriageway_hazards urban_or_rural_area  \\\n",
       "0                                0                   0                   1   \n",
       "1                                0                   0                   1   \n",
       "2                                0                   0                   1   \n",
       "3                                0                   0                   1   \n",
       "4                                0                   0                   1   \n",
       "...                            ...                 ...                 ...   \n",
       "461347                           0                   0                   1   \n",
       "461348                           0                   0                   1   \n",
       "461349                           0                   0                   2   \n",
       "461350                           0                   0                   1   \n",
       "461351                           0                   2                   1   \n",
       "\n",
       "        did_police_officer_attend_scene_of_accident  trunk_road_flag  \\\n",
       "0                                                 1                2   \n",
       "1                                                 1                2   \n",
       "2                                                 1                2   \n",
       "3                                                 1                2   \n",
       "4                                                 1                2   \n",
       "...                                             ...              ...   \n",
       "461347                                            1               -1   \n",
       "461348                                            2               -1   \n",
       "461349                                            1               -1   \n",
       "461350                                            2               -1   \n",
       "461351                                            1               -1   \n",
       "\n",
       "        lsoa_of_accident_location  \n",
       "0                       E01001450  \n",
       "1                       E01004702  \n",
       "2                       E01004298  \n",
       "3                       E01001429  \n",
       "4                       E01001808  \n",
       "...                           ...  \n",
       "461347                         -1  \n",
       "461348                         -1  \n",
       "461349                         -1  \n",
       "461350                         -1  \n",
       "461351                         -1  \n",
       "\n",
       "[461352 rows x 36 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# This is a accident-level dataset which means each row representing one accident idetified by 'addicent_index'\n",
    "df_ac"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12a09e16-0263-491a-a763-0fbbf932ed7d",
   "metadata": {},
   "source": [
    "**You can use pandas .describe() or info() to output the basic info in a df**\n",
    "\n",
    "**You can also use some libs called [ydata-profiling](https://docs.profiling.ydata.ai/latest/), [sweetviz](https://pypi.org/project/sweetviz/) and [dtale](https://pypi.org/project/dtale/) to implement the initial EDA**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "3526b049-6582-4476-80fe-42014c7c71ab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>accident_year</th>\n",
       "      <th>location_easting_osgr</th>\n",
       "      <th>location_northing_osgr</th>\n",
       "      <th>longitude</th>\n",
       "      <th>latitude</th>\n",
       "      <th>police_force</th>\n",
       "      <th>accident_severity</th>\n",
       "      <th>number_of_vehicles</th>\n",
       "      <th>number_of_casualties</th>\n",
       "      <th>day_of_week</th>\n",
       "      <th>...</th>\n",
       "      <th>pedestrian_crossing_human_control</th>\n",
       "      <th>pedestrian_crossing_physical_facilities</th>\n",
       "      <th>light_conditions</th>\n",
       "      <th>weather_conditions</th>\n",
       "      <th>road_surface_conditions</th>\n",
       "      <th>special_conditions_at_site</th>\n",
       "      <th>carriageway_hazards</th>\n",
       "      <th>urban_or_rural_area</th>\n",
       "      <th>did_police_officer_attend_scene_of_accident</th>\n",
       "      <th>trunk_road_flag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461236.000000</td>\n",
       "      <td>4.612360e+05</td>\n",
       "      <td>461226.000000</td>\n",
       "      <td>461226.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "      <td>461352.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2018.368378</td>\n",
       "      <td>452593.115756</td>\n",
       "      <td>2.801803e+05</td>\n",
       "      <td>-1.246998</td>\n",
       "      <td>52.408849</td>\n",
       "      <td>28.193388</td>\n",
       "      <td>2.784436</td>\n",
       "      <td>1.840441</td>\n",
       "      <td>1.301245</td>\n",
       "      <td>4.108171</td>\n",
       "      <td>...</td>\n",
       "      <td>0.262112</td>\n",
       "      <td>1.099518</td>\n",
       "      <td>2.036235</td>\n",
       "      <td>1.649918</td>\n",
       "      <td>1.385239</td>\n",
       "      <td>0.217519</td>\n",
       "      <td>0.170518</td>\n",
       "      <td>1.325463</td>\n",
       "      <td>1.334877</td>\n",
       "      <td>1.680147</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>1.091566</td>\n",
       "      <td>94822.718705</td>\n",
       "      <td>1.515501e+05</td>\n",
       "      <td>1.389702</td>\n",
       "      <td>1.365101</td>\n",
       "      <td>24.801609</td>\n",
       "      <td>0.443648</td>\n",
       "      <td>0.709869</td>\n",
       "      <td>0.746398</td>\n",
       "      <td>1.927216</td>\n",
       "      <td>...</td>\n",
       "      <td>1.460680</td>\n",
       "      <td>2.333113</td>\n",
       "      <td>1.724691</td>\n",
       "      <td>1.819426</td>\n",
       "      <td>0.955180</td>\n",
       "      <td>1.240981</td>\n",
       "      <td>1.121964</td>\n",
       "      <td>0.469057</td>\n",
       "      <td>0.556352</td>\n",
       "      <td>0.851554</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>2017.000000</td>\n",
       "      <td>64084.000000</td>\n",
       "      <td>1.023500e+04</td>\n",
       "      <td>-7.525273</td>\n",
       "      <td>49.912362</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "      <td>-1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>2017.000000</td>\n",
       "      <td>388539.000000</td>\n",
       "      <td>1.755300e+05</td>\n",
       "      <td>-2.172668</td>\n",
       "      <td>51.465689</td>\n",
       "      <td>5.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2018.000000</td>\n",
       "      <td>459198.500000</td>\n",
       "      <td>2.208650e+05</td>\n",
       "      <td>-1.126264</td>\n",
       "      <td>51.870257</td>\n",
       "      <td>22.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>2019.000000</td>\n",
       "      <td>529290.000000</td>\n",
       "      <td>3.865510e+05</td>\n",
       "      <td>-0.136389</td>\n",
       "      <td>53.372899</td>\n",
       "      <td>45.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>2020.000000</td>\n",
       "      <td>655391.000000</td>\n",
       "      <td>1.209512e+06</td>\n",
       "      <td>1.759641</td>\n",
       "      <td>60.763722</td>\n",
       "      <td>99.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>24.000000</td>\n",
       "      <td>59.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>8 rows × 29 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       accident_year  location_easting_osgr  location_northing_osgr  \\\n",
       "count  461352.000000          461236.000000            4.612360e+05   \n",
       "mean     2018.368378          452593.115756            2.801803e+05   \n",
       "std         1.091566           94822.718705            1.515501e+05   \n",
       "min      2017.000000           64084.000000            1.023500e+04   \n",
       "25%      2017.000000          388539.000000            1.755300e+05   \n",
       "50%      2018.000000          459198.500000            2.208650e+05   \n",
       "75%      2019.000000          529290.000000            3.865510e+05   \n",
       "max      2020.000000          655391.000000            1.209512e+06   \n",
       "\n",
       "           longitude       latitude   police_force  accident_severity  \\\n",
       "count  461226.000000  461226.000000  461352.000000      461352.000000   \n",
       "mean       -1.246998      52.408849      28.193388           2.784436   \n",
       "std         1.389702       1.365101      24.801609           0.443648   \n",
       "min        -7.525273      49.912362       1.000000           1.000000   \n",
       "25%        -2.172668      51.465689       5.000000           3.000000   \n",
       "50%        -1.126264      51.870257      22.000000           3.000000   \n",
       "75%        -0.136389      53.372899      45.000000           3.000000   \n",
       "max         1.759641      60.763722      99.000000           3.000000   \n",
       "\n",
       "       number_of_vehicles  number_of_casualties    day_of_week  ...  \\\n",
       "count       461352.000000         461352.000000  461352.000000  ...   \n",
       "mean             1.840441              1.301245       4.108171  ...   \n",
       "std              0.709869              0.746398       1.927216  ...   \n",
       "min              1.000000              1.000000       1.000000  ...   \n",
       "25%              1.000000              1.000000       2.000000  ...   \n",
       "50%              2.000000              1.000000       4.000000  ...   \n",
       "75%              2.000000              1.000000       6.000000  ...   \n",
       "max             24.000000             59.000000       7.000000  ...   \n",
       "\n",
       "       pedestrian_crossing_human_control  \\\n",
       "count                      461352.000000   \n",
       "mean                            0.262112   \n",
       "std                             1.460680   \n",
       "min                            -1.000000   \n",
       "25%                             0.000000   \n",
       "50%                             0.000000   \n",
       "75%                             0.000000   \n",
       "max                             9.000000   \n",
       "\n",
       "       pedestrian_crossing_physical_facilities  light_conditions  \\\n",
       "count                            461352.000000     461352.000000   \n",
       "mean                                  1.099518          2.036235   \n",
       "std                                   2.333113          1.724691   \n",
       "min                                  -1.000000         -1.000000   \n",
       "25%                                   0.000000          1.000000   \n",
       "50%                                   0.000000          1.000000   \n",
       "75%                                   0.000000          4.000000   \n",
       "max                                   9.000000          7.000000   \n",
       "\n",
       "       weather_conditions  road_surface_conditions  \\\n",
       "count       461352.000000            461352.000000   \n",
       "mean             1.649918                 1.385239   \n",
       "std              1.819426                 0.955180   \n",
       "min             -1.000000                -1.000000   \n",
       "25%              1.000000                 1.000000   \n",
       "50%              1.000000                 1.000000   \n",
       "75%              1.000000                 2.000000   \n",
       "max              9.000000                 9.000000   \n",
       "\n",
       "       special_conditions_at_site  carriageway_hazards  urban_or_rural_area  \\\n",
       "count               461352.000000        461352.000000        461352.000000   \n",
       "mean                     0.217519             0.170518             1.325463   \n",
       "std                      1.240981             1.121964             0.469057   \n",
       "min                     -1.000000            -1.000000            -1.000000   \n",
       "25%                      0.000000             0.000000             1.000000   \n",
       "50%                      0.000000             0.000000             1.000000   \n",
       "75%                      0.000000             0.000000             2.000000   \n",
       "max                      9.000000             9.000000             3.000000   \n",
       "\n",
       "       did_police_officer_attend_scene_of_accident  trunk_road_flag  \n",
       "count                                461352.000000    461352.000000  \n",
       "mean                                      1.334877         1.680147  \n",
       "std                                       0.556352         0.851554  \n",
       "min                                      -1.000000        -1.000000  \n",
       "25%                                       1.000000         2.000000  \n",
       "50%                                       1.000000         2.000000  \n",
       "75%                                       2.000000         2.000000  \n",
       "max                                       3.000000         2.000000  \n",
       "\n",
       "[8 rows x 29 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ac.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "bc0b2993-3204-42bb-aea6-ff102ff75d41",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 461352 entries, 0 to 461351\n",
      "Data columns (total 36 columns):\n",
      " #   Column                                       Non-Null Count   Dtype  \n",
      "---  ------                                       --------------   -----  \n",
      " 0   accident_index                               461352 non-null  object \n",
      " 1   accident_year                                461352 non-null  int64  \n",
      " 2   accident_reference                           461352 non-null  object \n",
      " 3   location_easting_osgr                        461236 non-null  float64\n",
      " 4   location_northing_osgr                       461236 non-null  float64\n",
      " 5   longitude                                    461226 non-null  float64\n",
      " 6   latitude                                     461226 non-null  float64\n",
      " 7   police_force                                 461352 non-null  int64  \n",
      " 8   accident_severity                            461352 non-null  int64  \n",
      " 9   number_of_vehicles                           461352 non-null  int64  \n",
      " 10  number_of_casualties                         461352 non-null  int64  \n",
      " 11  date                                         461352 non-null  object \n",
      " 12  day_of_week                                  461352 non-null  int64  \n",
      " 13  time                                         461352 non-null  object \n",
      " 14  local_authority_district                     461352 non-null  int64  \n",
      " 15  local_authority_ons_district                 461352 non-null  object \n",
      " 16  local_authority_highway                      461352 non-null  object \n",
      " 17  first_road_class                             461352 non-null  int64  \n",
      " 18  first_road_number                            461352 non-null  int64  \n",
      " 19  road_type                                    461352 non-null  int64  \n",
      " 20  speed_limit                                  461352 non-null  int64  \n",
      " 21  junction_detail                              461352 non-null  int64  \n",
      " 22  junction_control                             461352 non-null  int64  \n",
      " 23  second_road_class                            461352 non-null  int64  \n",
      " 24  second_road_number                           461352 non-null  int64  \n",
      " 25  pedestrian_crossing_human_control            461352 non-null  int64  \n",
      " 26  pedestrian_crossing_physical_facilities      461352 non-null  int64  \n",
      " 27  light_conditions                             461352 non-null  int64  \n",
      " 28  weather_conditions                           461352 non-null  int64  \n",
      " 29  road_surface_conditions                      461352 non-null  int64  \n",
      " 30  special_conditions_at_site                   461352 non-null  int64  \n",
      " 31  carriageway_hazards                          461352 non-null  int64  \n",
      " 32  urban_or_rural_area                          461352 non-null  int64  \n",
      " 33  did_police_officer_attend_scene_of_accident  461352 non-null  int64  \n",
      " 34  trunk_road_flag                              461352 non-null  int64  \n",
      " 35  lsoa_of_accident_location                    461352 non-null  object \n",
      "dtypes: float64(4), int64(25), object(7)\n",
      "memory usage: 126.7+ MB\n"
     ]
    }
   ],
   "source": [
    "df_ac.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37619a37-ef79-4da5-b652-107200e15967",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "7a49fe92-996d-4fc2-89e3-ac2b5596dd3e",
   "metadata": {},
   "source": [
    "**Select three columns for analysis**\n",
    "\n",
    "accident_severity, speed_limit, weather_conditions (rules can be if speed_limit, weather_conditions then accident_severity)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "8709c83e-108e-4014-aa5c-1b66ce878716",
   "metadata": {},
   "outputs": [],
   "source": [
    "# select three columns\n",
    "df_ac_s = df_ac[['speed_limit', 'weather_conditions', 'accident_severity']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "ae325315-dccc-4f1d-a295-d070cdf35369",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>speed_limit</th>\n",
       "      <th>weather_conditions</th>\n",
       "      <th>accident_severity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461347</th>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461348</th>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461349</th>\n",
       "      <td>60</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461350</th>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461351</th>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>461352 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        speed_limit  weather_conditions  accident_severity\n",
       "0                30                   1                  1\n",
       "1                30                   1                  3\n",
       "2                30                   1                  3\n",
       "3                30                   2                  3\n",
       "4                20                   1                  2\n",
       "...             ...                 ...                ...\n",
       "461347           30                   1                  2\n",
       "461348           30                   1                  3\n",
       "461349           60                   1                  2\n",
       "461350           30                   1                  3\n",
       "461351           30                   1                  3\n",
       "\n",
       "[461352 rows x 3 columns]"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ac_s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5aee63e9-fe7a-43db-8902-8504dd84c9ff",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "e9c840e6-1a4c-4db7-b87f-4a0bf6868031",
   "metadata": {},
   "outputs": [],
   "source": [
    "# One-hot encoding transfer df colomuns \n",
    "\n",
    "df_list = []\n",
    "for c in ['speed_limit', 'weather_conditions', 'accident_severity']:\n",
    "    df_c = pd.get_dummies(df_ac_s[c], prefix=c)\n",
    "    df_list.append(df_c)\n",
    "\n",
    "# axis=1 indicates that the dataframes in df_list are being concatenated horizontally, i.e., by columns. \n",
    "# This means that the resulting dataframe df_oh will have columns from all the dataframes in df_list combined side by side, with the indices aligned.\n",
    "df_oh = pd.concat(df_list, axis=1) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "d5a8f464-7e52-4bca-8338-c78078fd6242",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>speed_limit_-1</th>\n",
       "      <th>speed_limit_20</th>\n",
       "      <th>speed_limit_30</th>\n",
       "      <th>speed_limit_40</th>\n",
       "      <th>speed_limit_50</th>\n",
       "      <th>speed_limit_60</th>\n",
       "      <th>speed_limit_70</th>\n",
       "      <th>weather_conditions_-1</th>\n",
       "      <th>weather_conditions_1</th>\n",
       "      <th>weather_conditions_2</th>\n",
       "      <th>weather_conditions_3</th>\n",
       "      <th>weather_conditions_4</th>\n",
       "      <th>weather_conditions_5</th>\n",
       "      <th>weather_conditions_6</th>\n",
       "      <th>weather_conditions_7</th>\n",
       "      <th>weather_conditions_8</th>\n",
       "      <th>weather_conditions_9</th>\n",
       "      <th>accident_severity_1</th>\n",
       "      <th>accident_severity_2</th>\n",
       "      <th>accident_severity_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461347</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461348</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461349</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461350</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>461351</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>461352 rows × 20 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        speed_limit_-1  speed_limit_20  speed_limit_30  speed_limit_40  \\\n",
       "0                False           False            True           False   \n",
       "1                False           False            True           False   \n",
       "2                False           False            True           False   \n",
       "3                False           False            True           False   \n",
       "4                False            True           False           False   \n",
       "...                ...             ...             ...             ...   \n",
       "461347           False           False            True           False   \n",
       "461348           False           False            True           False   \n",
       "461349           False           False           False           False   \n",
       "461350           False           False            True           False   \n",
       "461351           False           False            True           False   \n",
       "\n",
       "        speed_limit_50  speed_limit_60  speed_limit_70  weather_conditions_-1  \\\n",
       "0                False           False           False                  False   \n",
       "1                False           False           False                  False   \n",
       "2                False           False           False                  False   \n",
       "3                False           False           False                  False   \n",
       "4                False           False           False                  False   \n",
       "...                ...             ...             ...                    ...   \n",
       "461347           False           False           False                  False   \n",
       "461348           False           False           False                  False   \n",
       "461349           False            True           False                  False   \n",
       "461350           False           False           False                  False   \n",
       "461351           False           False           False                  False   \n",
       "\n",
       "        weather_conditions_1  weather_conditions_2  weather_conditions_3  \\\n",
       "0                       True                 False                 False   \n",
       "1                       True                 False                 False   \n",
       "2                       True                 False                 False   \n",
       "3                      False                  True                 False   \n",
       "4                       True                 False                 False   \n",
       "...                      ...                   ...                   ...   \n",
       "461347                  True                 False                 False   \n",
       "461348                  True                 False                 False   \n",
       "461349                  True                 False                 False   \n",
       "461350                  True                 False                 False   \n",
       "461351                  True                 False                 False   \n",
       "\n",
       "        weather_conditions_4  weather_conditions_5  weather_conditions_6  \\\n",
       "0                      False                 False                 False   \n",
       "1                      False                 False                 False   \n",
       "2                      False                 False                 False   \n",
       "3                      False                 False                 False   \n",
       "4                      False                 False                 False   \n",
       "...                      ...                   ...                   ...   \n",
       "461347                 False                 False                 False   \n",
       "461348                 False                 False                 False   \n",
       "461349                 False                 False                 False   \n",
       "461350                 False                 False                 False   \n",
       "461351                 False                 False                 False   \n",
       "\n",
       "        weather_conditions_7  weather_conditions_8  weather_conditions_9  \\\n",
       "0                      False                 False                 False   \n",
       "1                      False                 False                 False   \n",
       "2                      False                 False                 False   \n",
       "3                      False                 False                 False   \n",
       "4                      False                 False                 False   \n",
       "...                      ...                   ...                   ...   \n",
       "461347                 False                 False                 False   \n",
       "461348                 False                 False                 False   \n",
       "461349                 False                 False                 False   \n",
       "461350                 False                 False                 False   \n",
       "461351                 False                 False                 False   \n",
       "\n",
       "        accident_severity_1  accident_severity_2  accident_severity_3  \n",
       "0                      True                False                False  \n",
       "1                     False                False                 True  \n",
       "2                     False                False                 True  \n",
       "3                     False                False                 True  \n",
       "4                     False                 True                False  \n",
       "...                     ...                  ...                  ...  \n",
       "461347                False                 True                False  \n",
       "461348                False                False                 True  \n",
       "461349                False                 True                False  \n",
       "461350                False                False                 True  \n",
       "461351                False                False                 True  \n",
       "\n",
       "[461352 rows x 20 columns]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_oh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "8d424564-8dc1-43e7-b349-c2fcee1d746b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.595235</td>\n",
       "      <td>(speed_limit_30)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.795436</td>\n",
       "      <td>(weather_conditions_1)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.798299</td>\n",
       "      <td>(accident_severity_3)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.476458</td>\n",
       "      <td>(weather_conditions_1, speed_limit_30)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.486100</td>\n",
       "      <td>(accident_severity_3, speed_limit_30)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.631260</td>\n",
       "      <td>(accident_severity_3, weather_conditions_1)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.387476</td>\n",
       "      <td>(accident_severity_3, weather_conditions_1, sp...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                                           itemsets\n",
       "0  0.595235                                   (speed_limit_30)\n",
       "1  0.795436                             (weather_conditions_1)\n",
       "2  0.798299                              (accident_severity_3)\n",
       "3  0.476458             (weather_conditions_1, speed_limit_30)\n",
       "4  0.486100              (accident_severity_3, speed_limit_30)\n",
       "5  0.631260        (accident_severity_3, weather_conditions_1)\n",
       "6  0.387476  (accident_severity_3, weather_conditions_1, sp..."
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create frequent sets for rules\n",
    "df_fre = apriori(df_oh, min_support=0.2, use_colnames=True)\n",
    "df_fre"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2dfc0168-6c0a-490a-bfec-a03f4dfd4256",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "bcfc3e98-b87d-4972-a8c8-a44a16687a13",
   "metadata": {},
   "outputs": [],
   "source": [
    "# select rules with confidence >0.7\n",
    "df_rule = association_rules(df_fre, metric='confidence', min_threshold=0.7) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "3d69227d-901f-47c9-b6cd-df00a1db1ab7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>antecedents</th>\n",
       "      <th>consequents</th>\n",
       "      <th>antecedent support</th>\n",
       "      <th>consequent support</th>\n",
       "      <th>support</th>\n",
       "      <th>confidence</th>\n",
       "      <th>lift</th>\n",
       "      <th>leverage</th>\n",
       "      <th>conviction</th>\n",
       "      <th>zhangs_metric</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(speed_limit_30)</td>\n",
       "      <td>(weather_conditions_1)</td>\n",
       "      <td>0.595235</td>\n",
       "      <td>0.795436</td>\n",
       "      <td>0.476458</td>\n",
       "      <td>0.800454</td>\n",
       "      <td>1.006308</td>\n",
       "      <td>0.002987</td>\n",
       "      <td>1.025146</td>\n",
       "      <td>0.015487</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(speed_limit_30)</td>\n",
       "      <td>(accident_severity_3)</td>\n",
       "      <td>0.595235</td>\n",
       "      <td>0.798299</td>\n",
       "      <td>0.486100</td>\n",
       "      <td>0.816651</td>\n",
       "      <td>1.022989</td>\n",
       "      <td>0.010924</td>\n",
       "      <td>1.100092</td>\n",
       "      <td>0.055518</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>(accident_severity_3, speed_limit_30)</td>\n",
       "      <td>(weather_conditions_1)</td>\n",
       "      <td>0.486100</td>\n",
       "      <td>0.795436</td>\n",
       "      <td>0.387476</td>\n",
       "      <td>0.797113</td>\n",
       "      <td>1.002109</td>\n",
       "      <td>0.000815</td>\n",
       "      <td>1.008267</td>\n",
       "      <td>0.004094</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>(weather_conditions_1, speed_limit_30)</td>\n",
       "      <td>(accident_severity_3)</td>\n",
       "      <td>0.476458</td>\n",
       "      <td>0.798299</td>\n",
       "      <td>0.387476</td>\n",
       "      <td>0.813243</td>\n",
       "      <td>1.018719</td>\n",
       "      <td>0.007120</td>\n",
       "      <td>1.080016</td>\n",
       "      <td>0.035098</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                              antecedents             consequents  \\\n",
       "0                        (speed_limit_30)  (weather_conditions_1)   \n",
       "1                        (speed_limit_30)   (accident_severity_3)   \n",
       "4   (accident_severity_3, speed_limit_30)  (weather_conditions_1)   \n",
       "5  (weather_conditions_1, speed_limit_30)   (accident_severity_3)   \n",
       "\n",
       "   antecedent support  consequent support   support  confidence      lift  \\\n",
       "0            0.595235            0.795436  0.476458    0.800454  1.006308   \n",
       "1            0.595235            0.798299  0.486100    0.816651  1.022989   \n",
       "4            0.486100            0.795436  0.387476    0.797113  1.002109   \n",
       "5            0.476458            0.798299  0.387476    0.813243  1.018719   \n",
       "\n",
       "   leverage  conviction  zhangs_metric  \n",
       "0  0.002987    1.025146       0.015487  \n",
       "1  0.010924    1.100092       0.055518  \n",
       "4  0.000815    1.008267       0.004094  \n",
       "5  0.007120    1.080016       0.035098  "
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select rules with lift >1\n",
    "df_rule.query('lift > 1')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f18728a0-e9e6-4517-a28f-7359a6266b06",
   "metadata": {},
   "source": [
    "As 'weather_conditions_1' represents 'Fine no high winds' and 'accident_severity_3' represents 'Slight', so the useful rule can be generated as ' Weather with fine no high winds' and 'the road with speed limit is 30' $\\Rightarrow$ 'Accident severity is slight'"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}