hadoop - Flume folder routing based on HTTP header -


using curl , flume, post csv files on local machine/hdfs @ different locations based on values of http header. example, http header (network-element: ggsn) files stored on local machine in folder named ggsn.

i have following flume configuration

  • a http source
  • a memory channel
  • a hdfs sink routes events files different locations depending on http header

i post csv files using curl:

find /path/files -type f -exec curl -x post http://localhost:9043 -h "content-type: text/xml" -h "network-element: ggsn" --data-binary "@{}" -v \; 

these logs generated:

* connect() localhost port 9043 (#0) *   trying ::1... connection refused *   trying 127.0.0.1... connected * connected localhost (127.0.0.1) port 9043 (#0) > post / http/1.1 > user-agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 nss/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > host: localhost:9043 > accept: */* > content-type: text/xml > network-element: ggsn > content-length: 972660 > expect: 100-continue > < http/1.1 100 continue < http/1.1 200 ok < transfer-encoding: chunked < server: jetty(6.1.26) < * connection #0 host localhost left intact * closing connection #0 

flume logs show following:

2015-03-16 19:41:14,887 debug org.apache.flume.sink.solr.morphline.blobhandler: requestheaders: {expect=100-continue, host=localhost:9043, content-length=972660, network-element=ggsn, user-agent=curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 nss/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2, content-type=text/xml, accept=*/*} 2015-03-16 19:41:14,891 debug org.apache.flume.sink.solr.morphline.blobhandler: blobevent: [event headers = {content-type=text/xml}, body.length = 972660 ] 

i use flume configuration:

sa.sources  = httpsource1 sa.channels = memorychannel1 sa.sinks    = localsink1  sa.sources.httpsource1.type     = http sa.sources.httpsource1.handler     = org.apache.flume.sink.solr.morphline.blobhandler sa.sources.httpsource1.port     = 9043 sa.sources.httpsource1.channels = memorychannel1  sa.channels.memorychannel1.type   = memory sa.channels.memorychannel1.capacity   = 10000 sa.channels.memorychannel1.transactioncapacity   = 1000  sa.sinks.localsink1.type         = file_roll sa.sinks.localsink1.channel      = memorychannel1 sa.sinks.localsink1.sink.directory   = /path/%{network-element} sa.sinks.localsink1.sink.rollinterval = 36000 

for reason files cannot placed under path: /path/%{network-element} looks path not exist, if have manually created ggsn folder , set permissions it.


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

gradle error "Cannot convert the provided notation to a File or URI" -

python - NameError: name 'subprocess' is not defined -