hadoop - Flume folder routing based on HTTP header -
using curl , flume, post csv files on local machine/hdfs @ different locations based on values of http header. example, http header (network-element: ggsn) files stored on local machine in folder named ggsn.
i have following flume configuration
- a http source
- a memory channel
- a hdfs sink routes events files different locations depending on http header
i post csv files using curl:
find /path/files -type f -exec curl -x post http://localhost:9043 -h "content-type: text/xml" -h "network-element: ggsn" --data-binary "@{}" -v \;
these logs generated:
* connect() localhost port 9043 (#0) * trying ::1... connection refused * trying 127.0.0.1... connected * connected localhost (127.0.0.1) port 9043 (#0) > post / http/1.1 > user-agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 nss/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2 > host: localhost:9043 > accept: */* > content-type: text/xml > network-element: ggsn > content-length: 972660 > expect: 100-continue > < http/1.1 100 continue < http/1.1 200 ok < transfer-encoding: chunked < server: jetty(6.1.26) < * connection #0 host localhost left intact * closing connection #0
flume logs show following:
2015-03-16 19:41:14,887 debug org.apache.flume.sink.solr.morphline.blobhandler: requestheaders: {expect=100-continue, host=localhost:9043, content-length=972660, network-element=ggsn, user-agent=curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 nss/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2, content-type=text/xml, accept=*/*} 2015-03-16 19:41:14,891 debug org.apache.flume.sink.solr.morphline.blobhandler: blobevent: [event headers = {content-type=text/xml}, body.length = 972660 ]
i use flume configuration:
sa.sources = httpsource1 sa.channels = memorychannel1 sa.sinks = localsink1 sa.sources.httpsource1.type = http sa.sources.httpsource1.handler = org.apache.flume.sink.solr.morphline.blobhandler sa.sources.httpsource1.port = 9043 sa.sources.httpsource1.channels = memorychannel1 sa.channels.memorychannel1.type = memory sa.channels.memorychannel1.capacity = 10000 sa.channels.memorychannel1.transactioncapacity = 1000 sa.sinks.localsink1.type = file_roll sa.sinks.localsink1.channel = memorychannel1 sa.sinks.localsink1.sink.directory = /path/%{network-element} sa.sinks.localsink1.sink.rollinterval = 36000
for reason files cannot placed under path: /path/%{network-element} looks path not exist, if have manually created ggsn folder , set permissions it.
Comments
Post a Comment