<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AZKABAN on Coinidea's Blog</title><link>https://blog.coinidea.com/en/tags/azkaban/</link><description>Recent content in AZKABAN on Coinidea's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sat, 15 Jun 2019 00:47:43 +0000</lastBuildDate><atom:link href="https://blog.coinidea.com/en/tags/azkaban/index.xml" rel="self" type="application/rss+xml"/><item><title>AZKABAN - Open Source Task Scheduling System</title><link>https://blog.coinidea.com/en/p/azkaban-open-source-task-scheduling-system/</link><pubDate>Sat, 15 Jun 2019 00:47:43 +0000</pubDate><guid>https://blog.coinidea.com/en/p/azkaban-open-source-task-scheduling-system/</guid><description>&lt;p&gt;Azkaban is an open-source task scheduling system used for scheduling and running tasks (such as data warehouse scheduling), serving as a replacement for crontab in Linux. Official website: &lt;a class="link" href="https://azkaban.github.io/" target="_blank" rel="noopener"
&gt;https://azkaban.github.io/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Azkaban mainly consists of three components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;MySQL: Azkaban uses MySQL to store projects and execution information.&lt;/li&gt;
&lt;li&gt;Azkaban Web Server: Azkaban uses Jetty as the web server, serving as the controller and providing the web interface.&lt;/li&gt;
&lt;li&gt;Azkaban Executor Server: The Azkaban executor server executes submitted workflows.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This article primarily uses Azkaban 3.43. If you need help generating compiled tar packages from GitHub source code, feel free to leave a comment, and I can share the installation package files that I have already tested.&lt;br&gt;
The installation package files mainly include:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;azkaban-db-3.43.0.tar.gz&lt;/li&gt;
&lt;li&gt;azkaban-solo-server-3.43.0.tar.gz&lt;/li&gt;
&lt;li&gt;azkaban-exec-server-3.43.0.tar.gz&lt;/li&gt;
&lt;li&gt;azkaban-web-server-3.43.0.tar.gz&lt;/li&gt;
&lt;li&gt;azkaban-hadoop-security-plugin-3.43.0.tar.gz&lt;br&gt;
Among these, solo is the standalone version. I set up a version with multiple executor nodes and a single web server node.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note: This article assumes the Azkaban directory is located at /azkaban.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;br&gt;
First, you need to install the MySQL database, then create a database called azkaban. Execute the following in MySQL:&lt;/p&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;source /azkaban/azkaban-db/create-all-sql-0.1.0-SNAPSHOT.sql
&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Configuration Files&lt;/strong&gt;&lt;br&gt;
0. Configure the keystore. The keystore is located at /azkaban/azkaban-web/conf/ (the path must match the one in the configuration file).&lt;/p&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password: azkaban
Re-enter new password: azkaban
What is your first and last name? [Unknown]: Skip
What is the name of your organizational unit? [Unknown]: Skip
What is the name of your organization? [Unknown]: Skip
What is the name of your City or Locality? [Unknown]: Skip
What is the name of your State or Province? [Unknown]: Skip
What is the two-letter country code for this unit? [Unknown]: CN
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=CN correct?
[no]: Y
Enter key password for &lt;jetty&gt; (RETURN if same as keystore password):
&lt;/pre&gt;
&lt;ol&gt;
&lt;li&gt;/azkaban/azkaban-web/conf/azkaban.properties&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;# Azkaban Personalization Settings
azkaban.name=Allin
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=/azkaban/azkaban-web-server/web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=/azkaban/azkaban-web-server/conf/azkaban-users.xml
# Loader for projects
executor.global.properties=/azkaban/azkaban-web-server/conf/global.properties
azkaban.project.dir=/azkaban/azkaban-web-server/projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8443
jetty.keystore=//azkaban/azkaban-web-server/conf/keystore
jetty.password=yourpassword
jetty.keypassword=yourpassword
jetty.truststore=//azkaban/azkaban-web-server/conf/keystore
jetty.trustpassword=yourpassword
# Azkaban Executor settings
executor.port=12321
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -&gt; myazkabanhost:443 -&gt; proxy -&gt; localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=/azkaban/azkaban-web-server/plugins/jobtypes
database.type=mysql
mysql.port=3306
mysql.host=127.0.0.1
mysql.database=azkaban
mysql.user=root
mysql.password=yourmysqlpassword
mysql.numconnections=100
&lt;/pre&gt;
&lt;ol start="2"&gt;
&lt;li&gt;/azkaban/azkaban-exec/conf/azkaban.properties&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;# Azkaban Personalization Settings
default.timezone.id=Asia/Shanghai
# Loader for projects
executor.global.properties=/azkaban/azkaban-exec-server/conf/global.properties
azkaban.project.dir=/azkaban/azkaban-exec-server/projects
azkaban.jobtype.plugin.dir=/azkaban/azkaban-exec-server/plugins/jobtypes
database.type=mysql
mysql.port=3306
mysql.host=127.0.0.1
mysql.database=azkaban
mysql.user=root
mysql.password=yourpassword
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30
&lt;/pre&gt;
&lt;ol start="3"&gt;
&lt;li&gt;/azkaban/azkaban-web/conf/log4j.properties /azkaban/azkaban-exec/conf/log4j.properties&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;log4j.rootLogger=INFO,C
log4j.appender.C=org.apache.log4j.ConsoleAppender
log4j.appender.C.Target=System.err
log4j.appender.C.layout=org.apache.log4j.PatternLayout
log4j.appender.C.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
&lt;/pre&gt;
&lt;ol start="4"&gt;
&lt;li&gt;For multiple executor nodes, you need to add the following configuration to /azkaban/azkaban-web/conf/azkaban.properties, and insert the IP addresses and port numbers of the corresponding executor nodes into the database.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;azkaban.use.multiple.executors=true
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
&lt;/pre&gt;
&lt;p&gt;Insert executor node IP addresses and port numbers:&lt;/p&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;insert into executors(host,port) values("your ip1",12321);
insert into executors(host,port) values("your ip2",12321);
&lt;/pre&gt;
&lt;ol start="5"&gt;
&lt;li&gt;
&lt;p&gt;Configure username and password at /azkaban/azkaban-web/conf/azkaban-users.xml&lt;br&gt;
With basic computer knowledge, opening this configuration file is self-explanatory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create log directories&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;mkdir /azkaban/azkaban-web/logs
mkdir /azkaban/azkaban-exec/logs
&lt;/pre&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Start the services&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;azkaban/azkaban-exec/bin/start-exec.sh
azkaban/azkaban-web/bin/start-web.sh
&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Common Pitfalls&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The keystore location must match the file path set in the configuration file.&lt;/li&gt;
&lt;li&gt;Azkaban 3 and above supports multiple executor nodes.&lt;/li&gt;
&lt;li&gt;The startup method in step 7 runs in silent mode. During initial testing, it is recommended to use:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;azkaban/azkaban-exec/bin/azkaban-web-start.sh
azkaban/azkaban-exec/bin/azkaban-exec-start.sh
&lt;/pre&gt;
&lt;p&gt;This way you can see whether errors occur and what causes them.&lt;br&gt;
4. Based on error messages, you may not have correctly created the logs directory or the path may be incorrect.&lt;br&gt;
5. Since the web server checks the exec server, it is recommended to start the exec server first.&lt;br&gt;
6. If tasks are not executing in a multi-node setup, it may be because the executor nodes have insufficient resources. Check the configuration carefully. The configuration specifies under what memory, CPU, and other resource conditions the server will execute tasks, and how executor nodes are assigned if no specific node is designated. If you do not want to limit resources, modify or comment out the relevant configuration.&lt;/p&gt;
&lt;pre class="brush: bash; title: ; notranslate" title=""&gt;azkaban.use.multiple.executors=true
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
&lt;/pre&gt;
&lt;ol start="7"&gt;
&lt;li&gt;How to specify a node to execute a job:&lt;br&gt;
Set &amp;ldquo;useExecutor&amp;rdquo; = EXECUTOR_ID in the flow params.&lt;br&gt;
For details, please refer to:&lt;br&gt;
&lt;a class="link" href="https://www.jianshu.com/p/ffb7bbc1988f" target="_blank" rel="noopener"
&gt;https://www.jianshu.com/p/ffb7bbc1988f&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</description></item></channel></rss>