暂无说说

RHadoop系列之rhdfs

R jiajun 3个月前 (09-26) 43次浏览 0个评论 扫描二维码

RHadoop 项目由 rhdfs、rhbase、plyrmr、rmr2 及 ravro 五个R包组成。其中 rhdfs 提供R与 Hadoop 分布式文件系统的基本连接。通过 rhdfs,R开发者可以通过R简单地浏览,读取,写入和修改存储在 HDFS 中的文件。rhdfs 仅在运行R客户端的节点上安装即可。本文介绍 rhdfs 的安装及简单的操作。

一、操作环境

1、ubuntu16.04 桌面版
2、hadoop 版本:hadoop2.7.2
3、rhdfs 版本:rhdfs_1.0.8

二、安装过程

1、依赖包安装

sudo R
install.packages(c("rJava","Rcpp","RJSONIO","digest","functional","reshape2","stringr","plyr","caTools","stringi","magrittr","bitops"),lib="/usr/local/lib/R/site-library")
q()


2、添加环境变量

sudo nano /etc/profile
export LD_LIBRARY_PATH=/soft/hadoop/lib/native
export HADOOP_CMD=/soft/hadoop/bin/hadoop

注:配置完环境变量后,最好重启下系统,防止在 rstudio 中找不到 HADOOP_CMD 跟 LD_LIBRARY_PATH

3、安装 rhdfs
下载地址:https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads
安装

su root
source /etc/profile
R CMD INSTALL -l /usr/local/lib/R/site-library rhdfs_1.0.8.tar.gz
exit

4、测试
启动 hadoop

start-dfs.sh
start-yarn.sh

启动 R,初始化 hdfs

R
library("rhdfs")
载入需要的程辑包:rJava

HADOOP_CMD=/soft/hadoop/bin/hadoop
Be sure to run hdfs.init()

hdfs.init()


查看目录

hdfs.ls("/")
  permission  owner      group size          modtime   file
1 -rw-r--r-- hadoop supergroup   19 2018-08-10 23:33 /a.txt
2 drwxr-xr-x hadoop supergroup    0 2018-08-11 14:33 /hello
3 drwxr-xr-x hadoop supergroup    0 2018-08-11 12:51   /out


创建目录
 

hdfs.mkdir("/test")
[1] TRUE
hdfs.ls("/")
permission  owner      group size          modtime   file
1 -rw-r--r-- hadoop supergroup   19 2018-08-10 23:33 /a.txt
2 drwxr-xr-x hadoop supergroup    0 2018-08-11 14:33 /hello
3 drwxr-xr-x hadoop supergroup    0 2018-08-11 12:51   /out
4 drwxr-xr-x hadoop supergroup    0 2018-08-12 07:39  /test


上传文件

write.csv(iris,"iris.csv")
hdfs.put("iris.csv","/test/iris")
[1] TRUE

查看文件

hdfs.cat("/test/iris")
  [1] "\"\",\"Sepal.Length\",\"Sepal.Width\",\"Petal.Length\",\"Petal.Width\",\"Species\""
  [2] "\"1\",5.1,3.5,1.4,0.2,\"setosa\""                                                  
  [3] "\"2\",4.9,3,1.4,0.2,\"setosa\""                                                    
  [4] "\"3\",4.7,3.2,1.3,0.2,\"setosa\""                                                  
  [5] "\"4\",4.6,3.1,1.5,0.2,\"setosa\""                                                  
  [6] "\"5\",5,3.6,1.4,0.2,\"setosa\""                                                    
  [7] "\"6\",5.4,3.9,1.7,0.4,\"setosa\""                                                  
  [8] "\"7\",4.6,3.4,1.4,0.3,\"setosa\""                                                  
  [9] "\"8\",5,3.4,1.5,0.2,\"setosa\""                                                    
 [10] "\"9\",4.4,2.9,1.4,0.2,\"setosa\""                                                  
 [11] "\"10\",4.9,3.1,1.5,0.1,\"setosa\""  

删除文件

hdfs.rm("/test/iris")
18/08/12 07:53:03 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted hdfs://hadoop:9000/test/iris
[1] TRUE

删除目录

hdfs.rmr("/test")
18/08/12 07:53:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted hdfs://hadoop:9000/test
[1] TRUE


附:hdfs 的所有命令
 

hdfs.cat             hdfs.file            hdfs.read
hdfs.chmod           hdfs.file.info       hdfs.read.text.file
hdfs.chown           hdfs.flush           hdfs.rename
hdfs.close           hdfs.get             hdfs.rm
hdfs.copy            hdfs.init            hdfs.rmr
hdfs.cp              hdfs.line.reader     hdfs.seek
hdfs.defaults        hdfs.ls              hdfs.stat
hdfs.del             hdfs.mkdir           hdfs.tell
hdfs.delete          hdfs.move            hdfs.write
hdfs.dircreate       hdfs.mv              
hdfs.exists          hdfs.put

 

喜欢 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址