Azkaban is an open-source task scheduling system used for scheduling and running tasks (such as data warehouse scheduling), serving as a replacement for crontab in Linux. Official website: https://azkaban.github.io/
Azkaban mainly consists of three components:
- MySQL: Azkaban uses MySQL to store projects and execution information.
- Azkaban Web Server: Azkaban uses Jetty as the web server, serving as the controller and providing the web interface.
- Azkaban Executor Server: The Azkaban executor server executes submitted workflows.
This article primarily uses Azkaban 3.43. If you need help generating compiled tar packages from GitHub source code, feel free to leave a comment, and I can share the installation package files that I have already tested.
The installation package files mainly include:
- azkaban-db-3.43.0.tar.gz
- azkaban-solo-server-3.43.0.tar.gz
- azkaban-exec-server-3.43.0.tar.gz
- azkaban-web-server-3.43.0.tar.gz
- azkaban-hadoop-security-plugin-3.43.0.tar.gz
Among these, solo is the standalone version. I set up a version with multiple executor nodes and a single web server node.
Note: This article assumes the Azkaban directory is located at /azkaban.
Database
First, you need to install the MySQL database, then create a database called azkaban. Execute the following in MySQL:
source /azkaban/azkaban-db/create-all-sql-0.1.0-SNAPSHOT.sql
Configuration Files
0. Configure the keystore. The keystore is located at /azkaban/azkaban-web/conf/ (the path must match the one in the configuration file).
keytool -keystore keystore -alias jetty -genkey -keyalg RSA Enter keystore password: azkaban Re-enter new password: azkaban What is your first and last name? [Unknown]: Skip What is the name of your organizational unit? [Unknown]: Skip What is the name of your organization? [Unknown]: Skip What is the name of your City or Locality? [Unknown]: Skip What is the name of your State or Province? [Unknown]: Skip What is the two-letter country code for this unit? [Unknown]: CN Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=CN correct? [no]: Y Enter key password for(RETURN if same as keystore password):
- /azkaban/azkaban-web/conf/azkaban.properties
# Azkaban Personalization Settings azkaban.name=Allin azkaban.label=My Local Azkaban azkaban.color=#FF3601 azkaban.default.servlet.path=/index web.resource.dir=/azkaban/azkaban-web-server/web/ default.timezone.id=Asia/Shanghai # Azkaban UserManager class user.manager.class=azkaban.user.XmlUserManager user.manager.xml.file=/azkaban/azkaban-web-server/conf/azkaban-users.xml # Loader for projects executor.global.properties=/azkaban/azkaban-web-server/conf/global.properties azkaban.project.dir=/azkaban/azkaban-web-server/projects # Velocity dev mode velocity.dev.mode=false # Azkaban Jetty server properties. jetty.use.ssl=false jetty.maxThreads=25 jetty.port=8443 jetty.keystore=//azkaban/azkaban-web-server/conf/keystore jetty.password=yourpassword jetty.keypassword=yourpassword jetty.truststore=//azkaban/azkaban-web-server/conf/keystore jetty.trustpassword=yourpassword # Azkaban Executor settings executor.port=12321 # mail settings mail.sender= mail.host= # User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users. # enduser -> myazkabanhost:443 -> proxy -> localhost:8081 # when this parameters set then these parameters are used to generate email links. # if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used. # azkaban.webserver.external_hostname=myazkabanhost.com # azkaban.webserver.external_ssl_port=443 # azkaban.webserver.external_port=8081 job.failure.email= job.success.email= lockdown.create.projects=false cache.directory=cache # JMX stats jetty.connector.stats=true executor.connector.stats=true # Azkaban plugin settings azkaban.jobtype.plugin.dir=/azkaban/azkaban-web-server/plugins/jobtypes database.type=mysql mysql.port=3306 mysql.host=127.0.0.1 mysql.database=azkaban mysql.user=root mysql.password=yourmysqlpassword mysql.numconnections=100
- /azkaban/azkaban-exec/conf/azkaban.properties
# Azkaban Personalization Settings default.timezone.id=Asia/Shanghai # Loader for projects executor.global.properties=/azkaban/azkaban-exec-server/conf/global.properties azkaban.project.dir=/azkaban/azkaban-exec-server/projects azkaban.jobtype.plugin.dir=/azkaban/azkaban-exec-server/plugins/jobtypes database.type=mysql mysql.port=3306 mysql.host=127.0.0.1 mysql.database=azkaban mysql.user=root mysql.password=yourpassword mysql.numconnections=100 # Azkaban Executor settings executor.maxThreads=50 executor.port=12321 executor.flow.threads=30
- /azkaban/azkaban-web/conf/log4j.properties /azkaban/azkaban-exec/conf/log4j.properties
log4j.rootLogger=INFO,C
log4j.appender.C=org.apache.log4j.ConsoleAppender
log4j.appender.C.Target=System.err
log4j.appender.C.layout=org.apache.log4j.PatternLayout
log4j.appender.C.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
- For multiple executor nodes, you need to add the following configuration to /azkaban/azkaban-web/conf/azkaban.properties, and insert the IP addresses and port numbers of the corresponding executor nodes into the database.
azkaban.use.multiple.executors=true azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1 azkaban.executorselector.comparator.Memory=1 azkaban.executorselector.comparator.LastDispatched=1 azkaban.executorselector.comparator.CpuUsage=1
Insert executor node IP addresses and port numbers:
insert into executors(host,port) values("your ip1",12321);
insert into executors(host,port) values("your ip2",12321);
Configure username and password at /azkaban/azkaban-web/conf/azkaban-users.xml
With basic computer knowledge, opening this configuration file is self-explanatory.Create log directories
mkdir /azkaban/azkaban-web/logs mkdir /azkaban/azkaban-exec/logs
- Start the services
azkaban/azkaban-exec/bin/start-exec.sh azkaban/azkaban-web/bin/start-web.sh
Common Pitfalls
- The keystore location must match the file path set in the configuration file.
- Azkaban 3 and above supports multiple executor nodes.
- The startup method in step 7 runs in silent mode. During initial testing, it is recommended to use:
azkaban/azkaban-exec/bin/azkaban-web-start.sh azkaban/azkaban-exec/bin/azkaban-exec-start.sh
This way you can see whether errors occur and what causes them.
4. Based on error messages, you may not have correctly created the logs directory or the path may be incorrect.
5. Since the web server checks the exec server, it is recommended to start the exec server first.
6. If tasks are not executing in a multi-node setup, it may be because the executor nodes have insufficient resources. Check the configuration carefully. The configuration specifies under what memory, CPU, and other resource conditions the server will execute tasks, and how executor nodes are assigned if no specific node is designated. If you do not want to limit resources, modify or comment out the relevant configuration.
azkaban.use.multiple.executors=true azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1 azkaban.executorselector.comparator.Memory=1 azkaban.executorselector.comparator.LastDispatched=1 azkaban.executorselector.comparator.CpuUsage=1
- How to specify a node to execute a job:
Set “useExecutor” = EXECUTOR_ID in the flow params.
For details, please refer to:
https://www.jianshu.com/p/ffb7bbc1988f