{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "TSG100 - The Big Data Cluster troubleshooter\n", "============================================\n", "\n", "Description\n", "-----------\n", "\n", "Follow these steps to troubleshoot Big Data Cluster (BDC) issues that\n", "are not covered by the more specific troubleshooters in this chapter.\n", "\n", "Steps\n", "-----\n", "\n", "### Get the versions of `azdata`, the BDC and Kubernetes cluster\n", "\n", "Often the first question asked is \u201cwhat version are you using\u201d. There\n", "are versions of several things that are useful to know:\n", "\n", "- [SOP007 - Version information (azdata, bdc,\n", " kubernetes)](../common/sop007-get-key-version-information.ipynb)\n", "\n", "### Verify `azdata login` works\n", "\n", "The most fundemental operation that needs to work is `login`. All the\n", "other troubleshooters in this chapter depend on `azdata login` working.\n", "Run this SOP to verify `azdata login` works, this SOP will also analyze\n", "any error output and suggest follow on TSGs as appropriate to help\n", "resolve any issues.\n", "\n", "- [SOP028 - azdata login](../common/sop028-azdata-login.ipynb)\n", "\n", "### Verify the cluster health monitor is reporting \u2018Healthy\u2019\n", "\n", "- [TSG078 - Is cluster\n", " healthy](../diagnose/tsg078-is-cluster-healthy.ipynb)\n", "\n", "### Verify that all the pods for \u2018Running\u2019\n", "\n", "Verify the pods for the `kube-system` namespace and the big data cluster\n", "name space are all in the \u201cRunning\u201d state, and all the Kubernetes nodes\n", "are \u201cReady\u201d\n", "\n", "- [TSG006 - Get system pod\n", " status](../monitor-k8s/tsg006-view-system-pod-status.ipynb)\n", "- [TSG007 - Get BDC pod\n", " status](../monitor-k8s/tsg007-view-bdc-pod-status.ipynb)\n", "- [TSG009 - Get nodes\n", " (Kubernetes)](../monitor-k8s/tsg009-get-nodes.ipynb)\n", "\n", "### Verify there are no crash dumps in the cluser\n", "\n", "The Big Data Cluster should run without any process crashing. Run this\n", "TSG to analyze the entire cluster to verify that no crash dumps have\n", "been created.\n", "\n", "- [TSG029 - Find dumps in the\n", " cluster](../diagnose/tsg029-find-dumps-in-the-cluster.ipynb)\n", "\n", "### Next steps\n", "\n", "This troubleshooter has helped verify the cluster itself is responding\n", "to logins. Use the troubleshooters linked below to drill down into\n", "specific funtionality in the cluster that may not be working correctly.\n", "\n", "Related\n", "-------\n", "\n", "- [TSG101 - SQL Server\n", " troubleshooter](../troubleshooters/tsg101-troubleshoot-sql-server.ipynb)\n", "\n", "- [TSG102 - HDFS\n", " troubleshooter](../troubleshooters/tsg102-troubleshoot-hdfs.ipynb)\n", "\n", "- [TSG103 - Spark\n", " troubleshooter](../troubleshooters/tsg103-troubleshoot-spark.ipynb)\n", "\n", "- [TSG104 - Control\n", " troubleshooter](../troubleshooters/tsg104-troubleshoot-control.ipynb)\n", "\n", "- [TSG105 - Gateway\n", " troubleshooter](../troubleshooters/tsg105-troubleshoot-gateway.ipynb)\n", "\n", "- [TSG106 - App\n", " troubleshooter](../troubleshooters/tsg106-troubleshoot-app.ipynb)" ] } ], "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3" }, "azdata": { "side_effects": false } } }