apache - Hadoop DistributedCache caching files without absolute path? -


i in process of migrating yarn , seems behavior of distributedcache changed.

previously, add files cache follows:

for (string file : args) {    path path = new path(cache_root, file);    uri uri = new uri(path.touri().tostring());    distributedcache.addcachefile(uri, conf); } 

the path typically

/some/path/to/my/file.txt 

which pre-exists on hdfs , end in distributedcache

/$distro_cache/some/path/to/my/file.txt 

i symlink in current working directory , use distributedcache.getlocalcachefiles()

with yarn, seems file instead ends in cache as:

/$distro_cache/file.txt 

ie, 'path' part of file uri got dropped , filename remains.

how work different absolute paths ending same filename? consider following case:

distributedcache.addcachefile("some/path/to/file.txt", conf); distributedcache.addcachefile("some/other/path/to/file.txt", conf); 

arguably use fragments:

distributedcache.addcachefile("some/path/to/file.txt#file1", conf); distributedcache.addcachefile("some/other/path/to/file.txt#file2", conf); 

but seems unnecessarily harder manage. imagine scenario command-line arguments, somehow need manage 2 filenames, although different absolute paths clash in distributedcache , therefore need re-map these filenames fragments , propagate such rest of program?

is there easier way manage this?

try add files job

it's how you're configuring job , accessing them in mapper.

when you're setting job you're going like

    job.addcachefile(new path("cache/file1.txt").touri());     job.addcachefile(new path("cache/file2.txt").touri()); 

then in mapper code urls going stored in array can accessed so.

    uri file1uri = context.getcachefiles()[0];     uri file2uri = context.getcachefiles()[1]; 

hope you.


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

gradle error "Cannot convert the provided notation to a File or URI" -

[C++][SFML 2.2] Strange Performance Issues - Moving Mouse Lowers CPU Usage -