Connecting to Cassandra with Spark -


first, have bought new o'reilly spark book , tried cassandra setup instructions. i've found other stackoverflow posts , various posts , guides on web. none of them work as-is. below far get.

this test handful of records of dummy test data. running recent cassandra 2.0.7 virtual box vm provided plasetcassandra.org linked main cassandra project page.

i downloaded spark 1.2.1 source , got latest cassandra connector code github , built both against scala 2.11. have jdk 1.8.0_40 , scala 2.11.6 setup on mac os 10.10.2.

i run spark shell cassandra connector loaded:

bin/spark-shell --driver-class-path ../spark-cassandra-connector/spark-cassandra-connector/target/scala-2.11/spark-cassandra-connector-assembly-1.2.0-snapshot.jar 

then should simple row count type test on test table of 4 records:

import com.datastax.spark.connector._ sc.stop val conf = new org.apache.spark.sparkconf(true).set("spark.cassandra.connection.host", "192.168.56.101") val sc = new org.apache.spark.sparkcontext(conf) val table = sc.cassandratable("mykeyspace", "playlists") table.count 

i following error. confusing is getting errors trying find cassandra @ 127.0.0.1, recognizes host name configured 192.168.56.101.

15/03/16 15:56:54 info cluster: new cassandra host /192.168.56.101:9042 added 15/03/16 15:56:54 info cassandraconnector: connected cassandra cluster: cluster on stick 15/03/16 15:56:54 error serversidetokenrangesplitter: failure while fetching splits cassandra java.io.ioexception: failed open thrift connection cassandra @ 127.0.0.1:9160 <snip> java.io.ioexception: failed fetch splits of tokenrange(0,0,set(cassandranode(/127.0.0.1,/127.0.0.1)),none) endpoints: cassandranode(/127.0.0.1,/127.0.0.1) 

btw, can use configuration file @ conf/spark-defaults.conf above without having close/recreate spark context or pass in --driver-clas-path argument. hit same error though, , above steps seem easier communicate in post.

any ideas?

check rpc_address config in cassandra.yaml file on cassandra node. it's spark connector using value system.local/system.peers tables , may set 127.0.0.1 in cassandra.yaml.

the spark connector uses thrift token range splits cassandra. i'm betting replaced c* 2.1.4 has new table called system.size_estimates (cassandra-7688). looks it's getting host metadata find nearest host , making query using thrift on port 9160.


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

gradle error "Cannot convert the provided notation to a File or URI" -

python - NameError: name 'subprocess' is not defined -