如何在 Debian 12 上安装 Apache Hadoop

Linux命令 Edge插件网 1年前 (2023-09-17) 329次浏览 已收录 0个评论

大数据是现代数据驱动型业务的支柱,Hadoop已成为处理和分析海量数据集的首选解决方案。如果你想在 Debian 12 系统上利用 Hadoop 的强大功能,那么你来对地方了。

如何在 Debian 12 上安装 Apache Hadoop

在 Debian 12 书虫上安装 Apache Hadoop

第 1 步。在我们安装任何软件之前,通过在终端中运行以下命令来确保您的系统是最新的非常重要:apt

<span class="pln">sudo apt update</span>

此命令将刷新存储库,允许您安装最新版本的软件包。

第 2 步。安装 Java 开发工具包 (JDK)。

Hadoop依赖于Java,所以请确保你安装了JDK:

<span class="pln">sudo apt install openjdk</span><span class="pun">-</span><span class="lit">11</span><span class="pun">-</span><span class="pln">jdk</span>

使用以下命令验证 Java 版本:

<span class="pln">java </span><span class="pun">--</span><span class="pln">version</span>

第 3 步。准备 Hadoop 环境

在深入研究 Hadoop 安装之前,最好为 Hadoop 创建一个专用用户并设置必要的目录:

<span class="pln">sudo adduser hadoopuser</span>

授予新用户 sudo 权限并将其添加到组中:users

<span class="pln">sudo usermod </span><span class="pun">-</span><span class="pln">aG sudo hadoopuser
sudo usermod </span><span class="pun">-</span><span class="pln">aG users hadoopuser</span>

第 4 步。在 Debian 12 上安装 Hadoop。

访问Apache Hadoop官方网站并下载适合您需求的Hadoop发行版。在本指南中,我们将使用 Hadoop 3.3.6:

<span class="pln">wget https</span><span class="pun">:</span><span class="com">//www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.6/hadoop-3.3.6-src.tar.gz</span>

通过验证 SHA-256 校验和确保下载未损坏:

<span class="pln">wget https</span><span class="pun">:</span><span class="com">//downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6-src.tar.gz.sha512</span><span class="pln">
sha256sum </span><span class="pun">-</span><span class="pln">c hadoop</span><span class="pun">-</span><span class="lit">3.3</span><span class="pun">.</span><span class="lit">6</span><span class="pun">-</span><span class="pln">src</span><span class="pun">.</span><span class="pln">tar</span><span class="pun">.</span><span class="pln">gz</span><span class="pun">.</span><span class="pln">sha512</span>

接下来,为 Hadoop 创建一个目录并提取下载的存档:

<span class="pln">sudo mkdir </span><span class="pun">/</span><span class="pln">opt</span><span class="pun">/</span><span class="pln">hadoop
sudo tar </span><span class="pun">-</span><span class="pln">xzvf hadoop</span><span class="pun">-</span><span class="lit">3.3</span><span class="pun">.</span><span class="lit">6.tar</span><span class="pun">.</span><span class="pln">gz </span><span class="pun">-</span><span class="pln">C </span><span class="pun">/</span><span class="pln">opt</span><span class="pun">/</span><span class="pln">hadoop </span><span class="pun">--</span><span class="pln">strip</span><span class="pun">-</span><span class="pln">components</span><span class="pun">=</span><span class="lit">1</span>

第5步。配置Hadoop。

Hadoop的配置对于其正常运行至关重要。让我们深入研究必要的配置。

A. 了解核心 Hadoop 配置文件

Hadoop 有几个 XML 配置文件,但我们主要关注四个:、 和 。core-site.xmlhdfs-site.xmlyarn-site.xmlmapred-site.xml

B. 编辑核心站点.xml

编辑核心站点.xml配置文件:

<span class="pln">sudo nano </span><span class="pun">/</span><span class="pln">opt</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">core</span><span class="pun">-</span><span class="pln">site</span><span class="pun">.</span><span class="pln">xml</span>

将以下属性添加到标记:<configuration>

<span class="tag"><property></span>
<span class="tag"><name></span><span class="pln">fs.defaultFS</span><span class="tag"></name></span>
<span class="tag"><value></span><span class="pln">hdfs://localhost:9000</span><span class="tag"></value></span>
<span class="tag"></property></span>

C. 编辑 hdfs 站点.xml

编辑配置文件:hdfs-site.xml

<span class="pln">sudo nano </span><span class="pun">/</span><span class="pln">opt</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">hdfs</span><span class="pun">-</span><span class="pln">site</span><span class="pun">.</span><span class="pln">xml</span>

Add the following properties:

<span class="tag"><property></span>
<span class="tag"><name></span><span class="pln">dfs.replication</span><span class="tag"></name></span>
<span class="tag"><value></span><span class="pln">1</span><span class="tag"></value></span>
<span class="tag"></property></span>

D. 配置纱线站点.xml

编辑配置文件:yarn-site.xml

<span class="pln">sudo nano </span><span class="pun">/</span><span class="pln">opt</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">yarn</span><span class="pun">-</span><span class="pln">site</span><span class="pun">.</span><span class="pln">xml</span>

添加以下属性:

<span class="tag"><property></span>
<span class="tag"><name></span><span class="pln">yarn.nodemanager.aux-services</span><span class="tag"></name></span>
<span class="tag"><value></span><span class="pln">mapreduce_shuffle</span><span class="tag"></value></span>
<span class="tag"></property></span>

E. 配置映射站点.xml

编辑配置文件:mapred-site.xml

<span class="pln">sudo nano </span><span class="pun">/</span><span class="pln">opt</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">etc</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">mapred</span><span class="pun">-</span><span class="pln">site</span><span class="pun">.</span><span class="pln">xml</span>

添加以下属性:

<span class="tag"><property></span>
<span class="tag"><name></span><span class="pln">mapreduce.framework.name</span><span class="tag"></name></span>
<span class="tag"><value></span><span class="pln">yarn</span><span class="tag"></value></span>
<span class="tag"></property></span>

第 6 步。设置 SSH 身份验证。

Hadoop依靠SSH来实现节点之间的安全通信。让我们设置 SSH 密钥。

为 Hadoop 用户生成 SSH 密钥:

<span class="pln">sudo su </span><span class="pun">-</span><span class="pln"> hadoopuser
ssh</span><span class="pun">-</span><span class="pln">keygen </span><span class="pun">-</span><span class="pln">t rsa </span><span class="pun">-</span><span class="pln">P </span><span class="str">""</span>

将公钥复制到文件:authorized_keys

<span class="pln">cat </span><span class="pun">~</span><span class="str">/.ssh/</span><span class="pln">id_rsa</span><span class="pun">.</span><span class="pln">pub </span><span class="pun">>></span> <span class="pun">~</span><span class="str">/.ssh/</span><span class="pln">authorized_keys</span>

测试与本地主机和其他节点的 SSH 连接:

<span class="pln">ssh localhost</span>

步骤 7.格式化 Hadoop 分布式文件系统 (HDFS)。

在启动Hadoop服务之前,我们需要格式化Hadoop分布式文件系统(HDFS)。

初始化 NameNode:

<span class="pln">hdfs namenode </span><span class="pun">-</span><span class="pln">format</span>

为 HDFS 创建必要的目录:

<span class="pln">hdfs dfs </span><span class="pun">-</span><span class="pln">mkdir </span><span class="pun">-</span><span class="pln">p </span><span class="pun">/</span><span class="pln">user</span><span class="pun">/</span><span class="pln">hadoopuser
hdfs dfs </span><span class="pun">-</span><span class="pln">chown hadoopuser</span><span class="pun">:</span><span class="pln">hadoopuser </span><span class="pun">/</span><span class="pln">user</span><span class="pun">/</span><span class="pln">hadoopuser</span>

通过浏览位于 的 NameNode Web 界面来验证 HDFS 状态。http://localhost:9870

第8步。启动 Hadoop 服务。

是时候启动Hadoop服务了。启动 Hadoop NameNode 和 DataNode:

<span class="pln">start</span><span class="pun">-</span><span class="pln">dfs</span><span class="pun">.</span><span class="pln">sh</span>

启动资源管理器和节点管理器:

<span class="pln">start</span><span class="pun">-</span><span class="pln">yarn</span><span class="pun">.</span><span class="pln">sh</span>

为确保一切顺利运行,请使用位于 的资源管理器 Web 界面检查 Hadoop 集群的状态。http://localhost:8088

第9步。运行一个简单的 Hadoop 作业。

现在,让我们通过运行一个简单的MapReduce作业来测试我们的Hadoop设置。

A. 准备输入数据

创建输入目录并上传示例文本文件:

<span class="pln">hdfs dfs </span><span class="pun">-</span><span class="pln">mkdir </span><span class="pun">-</span><span class="pln">p </span><span class="pun">/</span><span class="pln">input
hdfs dfs </span><span class="pun">-</span><span class="pln">put </span><span class="pun">/</span><span class="pln">path</span><span class="pun">/</span><span class="pln">to</span><span class="pun">/</span><span class="pln">your</span><span class="pun">/</span><span class="pln">inputfile</span><span class="pun">.</span><span class="pln">txt </span><span class="pun">/</span><span class="pln">input</span>

B. 运行 MapReduce 作业

运行字数统计示例:

<span class="pln">hadoop jar </span><span class="pun">/</span><span class="pln">opt</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">share</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">/</span><span class="pln">mapreduce</span><span class="pun">/</span><span class="pln">hadoop</span><span class="pun">-</span><span class="pln">mapreduce</span><span class="pun">-</span><span class="pln">examples</span><span class="pun">-</span><span class="lit">3.3</span><span class="pun">.</span><span class="lit">6.jar</span><span class="pln"> wordcount </span><span class="pun">/</span><span class="pln">input </span><span class="pun">/</span><span class="pln">output</span>

C. 监视作业进度

通过访问资源管理器 Web 界面来监视作业进度。

第10步。排查常见问题

虽然Hadoop功能强大,但它可能具有挑战性。以下是一些常见问题及其解决方案。

A. 诊断 Hadoop 启动问题

  • 检查日志中的错误消息。/opt/hadoop/logs
  • 确保正确编辑所有配置文件。

B. 调试 HDFS 问题

  • 通过浏览 NameNode Web 界面来验证 HDFS 状态。
  • 检查数据目录中的磁盘空间和权限问题。

C. 处理资源分配问题

  • 调整纱线站点.xml文件中的资源分配。
  • 在资源管理器 Web 界面中监视资源使用情况。

感谢您使用本教程在 Debian 12 Bookworm 上安装最新版本的 Apache Hadoop。有关其他帮助或有用信息,我们建议您查看Hadoop官方网站


Edge插件网 , 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:如何在 Debian 12 上安装 Apache Hadoop
喜欢 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址