{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "TSG036 - Controller logs\n", "========================\n", "\n", "Get the last \u2018n\u2019 hours of controller logs.\n", "\n", "Steps\n", "-----\n", "\n", "### Parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "since_hours = 2\n", "since_seconds = since_hours * 3600 # seconds in hour\n", "\n", "coalesce_duplicates = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instantiate Kubernetes client" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide_input" ] }, "outputs": [], "source": [ "# Instantiate the Python Kubernetes client into 'api' variable\n", "\n", "import os\n", "\n", "try:\n", " from kubernetes import client, config\n", " from kubernetes.stream import stream\n", "\n", " if \"KUBERNETES_SERVICE_PORT\" in os.environ and \"KUBERNETES_SERVICE_HOST\" in os.environ:\n", " config.load_incluster_config()\n", " else:\n", " config.load_kube_config()\n", "\n", " api = client.CoreV1Api()\n", "\n", " print('Kubernetes client instantiated')\n", "except ImportError:\n", " from IPython.display import Markdown\n", " display(Markdown(f'HINT: Use [SOP059 - Install Kubernetes Python module](../install/sop059-install-kubernetes-module.ipynb) to resolve this issue.'))\n", " raise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get the namespace for the big data cluster\n", "\n", "Get the namespace of the big data cluster from the Kuberenetes API.\n", "\n", "NOTE: If there is more than one big data cluster in the target\n", "Kubernetes cluster, then set \\[0\\] to the correct value for the big data\n", "cluster." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide_input" ] }, "outputs": [], "source": [ "# Place Kubernetes namespace name for BDC into 'namespace' variable\n", "\n", "try:\n", " namespace = api.list_namespace(label_selector='MSSQL_CLUSTER').items[0].metadata.name\n", "except IndexError:\n", " from IPython.display import Markdown\n", " display(Markdown(f'HINT: Use [TSG081 - Get namespaces (Kubernetes)](../monitor-k8s/tsg081-get-kubernetes-namespaces.ipynb) to resolve this issue.'))\n", " display(Markdown(f'HINT: Use [TSG010 - Get configuration contexts](../monitor-k8s/tsg010-get-kubernetes-contexts.ipynb) to resolve this issue.'))\n", " display(Markdown(f'HINT: Use [SOP011 - Set kubernetes configuration context](../common/sop011-set-kubernetes-context.ipynb) to resolve this issue.'))\n", " raise\n", "\n", "print('The kubernetes namespace for your big data cluster is: ' + namespace)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get controller logs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "container = \"controller\"\n", "\n", "pod_list = api.list_namespaced_pod(namespace, label_selector=\"app=controller\")\n", "\n", "entries_for_analysis = []\n", "\n", "for pod in pod_list.items:\n", " print (f\"Logs for controller pod: {pod.metadata.name}\")\n", " try:\n", " logs = api.read_namespaced_pod_log(pod.metadata.name, namespace, container=container, since_seconds=since_seconds)\n", " except Exception as err:\n", " print(f\"ERROR: {err}\")\n", " pass\n", " else:\n", " if coalesce_duplicates:\n", " previous_line = \"\"\n", " duplicates = 1\n", " for line in logs.split('\\n'):\n", " if line[27:] != previous_line[27:]:\n", " if duplicates != 1:\n", " print(f\"\\t{previous_line} (x{duplicates})\")\n", " print(f\"\\t{line}\")\n", " duplicates = 1\n", " else:\n", " duplicates = duplicates + 1\n", " continue\n", "\n", " if line[25:34] == \"| ERROR |\" or line[25:33] == \"| WARN |\":\n", " entries_for_analysis.append(line)\n", "\n", " previous_line = line\n", " else:\n", " print(logs)\n", "\n", "print (f\"There were {len(entries_for_analysis)} warnings and errors found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analyze log entries and suggest relevant Troubleshooting Guides" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide_input" ] }, "outputs": [], "source": [ "# Analyze log entries and suggest further relevant troubleshooting guides\n", "\n", "from IPython.display import Markdown\n", "\n", "import os\n", "import json\n", "import requests\n", "import ipykernel\n", "import datetime\n", "\n", "from urllib.parse import urljoin\n", "from notebook import notebookapp\n", "\n", "def get_notebook_name():\n", " \"\"\"\n", " Return the full path of the jupyter notebook. Some runtimes (e.g. ADS) \n", " have the kernel_id in the filename of the connection file. If so, the \n", " notebook name at runtime can be determined using `list_running_servers`.\n", " Other runtimes (e.g. azdata) do not have the kernel_id in the filename of\n", " the connection file, therefore we are unable to establish the filename\n", " \"\"\"\n", " connection_file = os.path.basename(ipykernel.get_connection_file())\n", " \n", " # If the runtime has the kernel_id in the connection filename, use it to\n", " # get the real notebook name at runtime, otherwise, use the notebook \n", " # filename from build time.\n", " try: \n", " kernel_id = connection_file.split('-', 1)[1].split('.')[0]\n", " except:\n", " pass\n", " else:\n", " for servers in list(notebookapp.list_running_servers()):\n", " try:\n", " response = requests.get(urljoin(servers['url'], 'api/sessions'), params={'token': servers.get('token', '')}, timeout=.01)\n", " except:\n", " pass\n", " else:\n", " for nn in json.loads(response.text):\n", " if nn['kernel']['id'] == kernel_id:\n", " return nn['path']\n", "\n", "def load_json(filename):\n", " with open(filename, encoding=\"utf8\") as json_file:\n", " return json.load(json_file)\n", "\n", "def get_notebook_rules():\n", " \"\"\"\n", " Load the notebook rules from the metadata of this notebook (in the .ipynb file)\n", " \"\"\"\n", " file_name = get_notebook_name()\n", "\n", " if file_name == None:\n", " return None\n", " else:\n", " j = load_json(file_name)\n", "\n", " if \"azdata\" not in j[\"metadata\"] or \\\n", " \"expert\" not in j[\"metadata\"][\"azdata\"] or \\\n", " \"log_analyzer_rules\" not in j[\"metadata\"][\"azdata\"][\"expert\"]:\n", " return []\n", " else:\n", " return j[\"metadata\"][\"azdata\"][\"expert\"][\"log_analyzer_rules\"]\n", "\n", "rules = get_notebook_rules()\n", "\n", "if rules == None:\n", " print(\"\")\n", " print(f\"Log Analysis only available when run in Azure Data Studio. Not available when run in azdata.\")\n", "else:\n", " hints = 0\n", " if len(rules) > 0:\n", " for entry in entries_for_analysis:\n", " for rule in rules:\n", " if entry.find(rule[0]) != -1:\n", " print (entry)\n", "\n", " display(Markdown(f'HINT: Use [{rule[2]}]({rule[3]}) to resolve this issue.'))\n", " hints = hints + 1\n", "\n", " print(\"\")\n", " print(f\"{len(entries_for_analysis)} log entries analyzed (using {len(rules)} rules). {hints} further troubleshooting hints made inline.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Notebook execution complete.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Related\n", "-------\n", "\n", "- [TSG027 - Observe cluster\n", " deployment](../diagnose/tsg027-observe-bdc-create.ipynb)" ] } ], "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3" }, "azdata": { "side_effects": false, "expert": { "log_analyzer_rules": [ [ "doc is missing key: /data", "TSG038", "TSG038 - BDC create failures due to - doc is missing key", "../repair/tsg038-doc-is-missing-key-error.ipynb" ], [ "Failed when starting controller service. System.TimeoutException:\nOperation timed out after 10 minutes", "TSG057", "TSG057 - Failed when starting controller service. System.TimeoutException", "../repair/tsg057-failed-when-starting-controller.ipynb" ] ] } } } }