博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Set replication in Hadoop
阅读量:7104 次
发布时间:2019-06-28

本文共 1831 字,大约阅读时间需要 6 分钟。

I was trying loading file using hadoop API as an experiment.

I want to set replication to minimum as this one is for experiment. I first tried this with FileSystem.setReplication():

Configuration config = new Configuration();config.set("fs.defaultFS","hdfs://192.168.248.166:8020");FileSystem dfs2 = FileSystem.get(config);Path src2 = new Path("C:\\Users\\abc\\Desktop\\testfile.txt");Path dst2 = new Path(dfs2.getWorkingDirectory()+"/tempdir");dfs2.copyFromLocalFile(src2, dst2);dfs2.setReplication(dst2, (short)1);  /**setting replication**/

The replica was shown as 1, but it was available on 3 datanodes.

When I tried it with Configuration.set():

Configuration config = new Configuration();config.set("fs.defaultFS","hdfs://192.168.248.166:8020");config.set("dfs.replication", "1");  /**setting replication**/FileSystem dfs2 = FileSystem.get(config);Path src2 = new Path("C:\\Users\\abc\\Desktop\\testfile.txt");Path dst2 = new Path(dfs2.getWorkingDirectory()+"/tempdir");

This gave the desired outcome (1 replica available on 1 datanode)

Why there are two APIs for the same thing? What is the difference between these two?

The difference is that Filesystem's  sets the replication of an existing file on HDFS. In your case, you first copy the local file testFile.txt to HDFS, using the default replication factor (3) and then change the replication factor of this file to 1. After this command, it takes a while until the over-replicated blocks get deleted. ()

On the other hand, when you use the config.set("dfs.replication", "1"); command to set the replication, you can copy the local file after that, so its blocks get copied just once, from the first time.

In other words, I believe (but I might be wrong) that both commands have the same final result, but you have to wait a little bit until the first one is carried out.

 

转载地址:http://nfchl.baihongyu.com/

你可能感兴趣的文章
《jQuery Cookbook中文版》——1.7 返回破坏性修改之前的选择
查看>>
阿里云CDN + nginx多级代理获取客户端IP
查看>>
不用无限手套,人人都能开发BI系统
查看>>
ES6 module加载机制
查看>>
JavaScript判断数据类型
查看>>
TechEd 2012极为紧张的5天行程简单分享如下!
查看>>
局域网里加入新机
查看>>
一家德资企业的网络管理心得
查看>>
IBM WebSphere Portal 6.0的主题与皮肤开发
查看>>
我的友情链接
查看>>
软件研发中缺失的一环:人
查看>>
《云计算》教材配套课件合集
查看>>
linux进程管理
查看>>
java中资源的加载方法
查看>>
python——twisted
查看>>
我的友情链接
查看>>
监听异常关闭
查看>>
通过WifiManager,DhcpInfo获取android IP地址及网关等信息(两种方式)
查看>>
Akka学习笔记:Actor消息传递(1)
查看>>
JDOM xml转map
查看>>