{"id":146,"date":"2022-05-31T21:48:44","date_gmt":"2022-05-31T21:48:44","guid":{"rendered":"https:\/\/kindsonthegenius.com\/apache-spark\/?p=146"},"modified":"2022-06-01T17:03:53","modified_gmt":"2022-06-01T17:03:53","slug":"spark-installation-and-setup","status":"publish","type":"post","link":"https:\/\/www.kindsonthegenius.com\/apache-spark\/spark-installation-and-setup\/","title":{"rendered":"Spark &#8211; Installation and Setup"},"content":{"rendered":"<p>In this tutorial, you will learn how to install Apache Spark on Mac and Windows.<\/p>\n<p>We would take to following steps<\/p>\n<ol>\n<li><a href=\"#t1\">Install Homebrew<\/a><\/li>\n<li><a href=\"#t2\">Install Java<\/a><\/li>\n<li><a href=\"#t3\">Install Scala<\/a><\/li>\n<li><a href=\"#t4\">Install Apache Spark<\/a><\/li>\n<li><a href=\"#t5\">Verify Apache Spark Installation<\/a><\/li>\n<li><a href=\"#t6\">Install Spark on Windows<\/a><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>We&#8217;ll first go through Spark installation on MacOS. Next we go through the Windows installation as well.<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t1\">1. Install Homebrew<\/strong><\/h4>\n<p>Homebrew is a package manager for MacOS.\u00a0 You use it to install packages. So go ahead to install it using the command<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #333333;\">\/<\/span>bin<span style=\"color: #333333;\">\/<\/span>bash <span style=\"color: #333333;\">-<\/span>c <span style=\"background-color: #fff0f0;\">\"$(curl -fsSL https:\/\/raw.githubusercontent.com\/Homebrew\/install\/master\/install.sh)\"<\/span>\r\n<\/pre>\n<p>This command would install XCode command line tools and Homebrew<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t2\">2. Install Java<\/strong><\/h4>\n<p>Spark using Java tools for performing various operations. Therefore, you need to have Java installed in your system.<\/p>\n<p>Install Java using the command:<\/p>\n<pre style=\"margin: 0; line-height: 125%;\">brew install openjdk\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p>Check Java installation using the command:<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #333333;\">java --version<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t3\">3. Install Scala<\/strong><\/h4>\n<p>Spark was actually written in Scala. So we would need Scala installed as well.<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #333333;\">brew install Scala<\/span>\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p>You can find <a href=\"https:\/\/www.kindsonthegenius.com\/scala\/\" target=\"_blank\" rel=\"noopener\">Scala tutorials here<\/a><\/p>\n<h4><strong id=\"t4\">4. Install Apache Spark<\/strong><\/h4>\n<p>You now need to install Spark using this command<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #333333;\">brew install apache-spark<\/span><\/pre>\n<p>&nbsp;<\/p>\n<p>This command would install Apache Spark. You can then launch the Spark shell using the command below:<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #333333;\">spark-shell<\/span><\/pre>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t5\">5. Verify Apache Spark Installation<\/strong><\/h4>\n<p>Let&#8217;s now validate Spark installation by writing a simple command that creates a Spark DataFrame<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008800; font-weight: bold;\">import<\/span> <span style=\"color: #0e84b5; font-weight: bold;\">spark.implicits._<\/span>\r\n<span style=\"color: #008800; font-weight: bold;\">val<\/span> data <span style=\"color: #008800; font-weight: bold;\">=<\/span> <span style=\"color: #bb0066; font-weight: bold;\">Seq<\/span><span style=\"color: #333333;\">((<\/span><span style=\"background-color: #fff0f0;\">\"Java\"<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"Programmin\"<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"60800\"<\/span><span style=\"color: #333333;\">),<\/span> <span style=\"color: #333333;\">(<\/span><span style=\"background-color: #fff0f0;\">\"Python\"<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"Analysis\"<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"150000\"<\/span><span style=\"color: #333333;\">),<\/span> <span style=\"color: #333333;\">(<\/span><span style=\"background-color: #fff0f0;\">\"Scala\"<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"Coding\"<\/span><span style=\"color: #333333;\">,<\/span> <span style=\"background-color: #fff0f0;\">\"3500\"<\/span><span style=\"color: #333333;\">))<\/span>\r\n<span style=\"color: #008800; font-weight: bold;\">val<\/span> df <span style=\"color: #008800; font-weight: bold;\">=<\/span> data<span style=\"color: #333333;\">.<\/span>toDF<span style=\"color: #333333;\">()<\/span> \r\ndf<span style=\"color: #333333;\">.<\/span>show<span style=\"color: #333333;\">()<\/span>\r\n<\/pre>\n<p>Note: You can type the codes on line at a time<\/p>\n<p>The output screen is as shown below:<\/p>\n<figure id=\"attachment_147\" aria-describedby=\"caption-attachment-147\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-147 size-large\" src=\"https:\/\/kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-05-31-at-22.13.29-e1654028167147-1024x390.png\" alt=\"Creating a DataFrame in Spark\" width=\"1024\" height=\"390\" srcset=\"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-05-31-at-22.13.29-e1654028167147-1024x390.png 1024w, https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-05-31-at-22.13.29-e1654028167147-300x114.png 300w, https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-05-31-at-22.13.29-e1654028167147-768x292.png 768w, https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-05-31-at-22.13.29-e1654028167147.png 1127w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption id=\"caption-attachment-147\" class=\"wp-caption-text\">Creating a DataFrame in Spark<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t6\">6. Spark Installation on Window<\/strong><\/h4>\n<p>Follow the steps below to install Apache Spark on Window.<\/p>\n<p><strong>Step 1<\/strong> &#8211; Navigate the Apache Spark download page using this link &#8211; <a href=\"https:\/\/spark.apache.org\/downloads.html\" target=\"_blank\" rel=\"noopener\">https:\/\/spark.apache.org\/downloads.html<\/a><\/p>\n<p><strong>Step 2<\/strong> &#8211; Download the tgz file<\/p>\n<p><strong>Step 3<\/strong> &#8211; Unzip the file into a local directory<\/p>\n<p><strong>Step 4<\/strong> &#8211; Create a directory path in your C drive and copy the Spark folder content into the directory<\/p>\n<p><strong>Step 5<\/strong> &#8211; Set the JAVA_HOME, SPARK_HOME, HADOOP_HOME an PATH environment variables. Use the data below:<\/p>\n<pre style=\"margin: 0; line-height: 125%;\">JAVA_HOME = C:\\Program Files\\Java\\jdk18.0.1.1\r\nPATH = %PATH%;%JAVA_HOME%\r\n\r\nSPARK_HOME  = C:\\apps\\opt\\spark-3.0.0-bin-hadoop2.7\r\nHADOOP_HOME = C:\\apps\\opt\\spark-3.0.0-bin-hadoop2.7\r\nPATH=%PATH%;%SPARK_HOME%\r\n<\/pre>\n<p><strong>Note<\/strong>: The directory path should be exactly the location on you local drive<\/p>\n<p><strong>Step 6<\/strong> &#8211; Download <strong>winutils.exe<\/strong> for Hadoop and copy it to the bin folder in the spark path. Get it from here <a href=\"https:\/\/github.com\/cdarlint\/winutils\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/cdarlint\/winutils<\/a><\/p>\n<p><strong>Step 7<\/strong> &#8211; Open command prompt. Navigate into the bin directory of the Spark installation.<\/p>\n<p><strong>Step 8<\/strong> &#8211; Start spark-shell just like we did in Mac<\/p>\n<p>&nbsp;<\/p>\n<h4><strong>7. Access the Spark Web UI<\/strong><\/h4>\n<p>Apache Spark comes with a suite of Web User Interface(UI) to help you monitor\u00a0 your spark application. The Spark Web UI can be access via the link: <strong>http:\/\/localhost:4040\/<\/strong><\/p>\n<p>This is shown below<\/p>\n<figure id=\"attachment_153\" aria-describedby=\"caption-attachment-153\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.kindsonthegenius.com\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-153 size-large\" src=\"https:\/\/kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-06-01-at-18.59.09-1024x620.png\" alt=\"Spark Web UI\" width=\"1024\" height=\"620\" srcset=\"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-06-01-at-18.59.09-1024x620.png 1024w, https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-06-01-at-18.59.09-300x182.png 300w, https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-06-01-at-18.59.09-768x465.png 768w, https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-content\/uploads\/sites\/13\/2022\/05\/Screenshot-2022-06-01-at-18.59.09.png 1156w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-153\" class=\"wp-caption-text\">Spark Web UI<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, you will learn how to install Apache Spark on Mac and Windows. We would take to following steps Install Homebrew Install Java &hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,3],"tags":[32,33,35,34],"class_list":["post-146","post","type-post","status-publish","format-standard","hentry","category-pyspark","category-spark","tag-hadoop","tag-java","tag-scala","tag-winutils"],"_links":{"self":[{"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/posts\/146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/comments?post=146"}],"version-history":[{"count":5,"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/posts\/146\/revisions"}],"predecessor-version":[{"id":155,"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/posts\/146\/revisions\/155"}],"wp:attachment":[{"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/media?parent=146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/categories?post=146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kindsonthegenius.com\/apache-spark\/wp-json\/wp\/v2\/tags?post=146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}