Connecting to Cassandra with Spark -
first, have bought new o'reilly spark book , tried cassandra setup instructions. i've found other stackoverflow posts , various posts , guides on web. none of them work as-is. below far get.
this test handful of records of dummy test data. running recent cassandra 2.0.7 virtual box vm provided plasetcassandra.org linked main cassandra project page.
i downloaded spark 1.2.1 source , got latest cassandra connector code github , built both against scala 2.11. have jdk 1.8.0_40 , scala 2.11.6 setup on mac os 10.10.2.
i run spark shell cassandra connector loaded:
bin/spark-shell --driver-class-path ../spark-cassandra-connector/spark-cassandra-connector/target/scala-2.11/spark-cassandra-connector-assembly-1.2.0-snapshot.jar
then should simple row count type test on test table of 4 records:
import com.datastax.spark.connector._ sc.stop val conf = new org.apache.spark.sparkconf(true).set("spark.cassandra.connection.host", "192.168.56.101") val sc = new org.apache.spark.sparkcontext(conf) val table = sc.cassandratable("mykeyspace", "playlists") table.count
i following error. confusing is getting errors trying find cassandra @ 127.0.0.1, recognizes host name configured 192.168.56.101.
15/03/16 15:56:54 info cluster: new cassandra host /192.168.56.101:9042 added 15/03/16 15:56:54 info cassandraconnector: connected cassandra cluster: cluster on stick 15/03/16 15:56:54 error serversidetokenrangesplitter: failure while fetching splits cassandra java.io.ioexception: failed open thrift connection cassandra @ 127.0.0.1:9160 <snip> java.io.ioexception: failed fetch splits of tokenrange(0,0,set(cassandranode(/127.0.0.1,/127.0.0.1)),none) endpoints: cassandranode(/127.0.0.1,/127.0.0.1)
btw, can use configuration file @ conf/spark-defaults.conf above without having close/recreate spark context or pass in --driver-clas-path argument. hit same error though, , above steps seem easier communicate in post.
any ideas?
check rpc_address config in cassandra.yaml file on cassandra node. it's spark connector using value system.local/system.peers tables , may set 127.0.0.1 in cassandra.yaml.
the spark connector uses thrift token range splits cassandra. i'm betting replaced c* 2.1.4 has new table called system.size_estimates (cassandra-7688). looks it's getting host metadata find nearest host , making query using thrift on port 9160.
Comments
Post a Comment