release 2.1.1-8, add support for static columns

strapdata · Apr 17, 2016 · 2f51f4c · 2f51f4c
1 parent 3817897
commit 2f51f4c
Show file tree

Hide file tree

Showing 28 changed files with 1,380 additions and 1,183 deletions.
diff --git a/CHANGES.txt b/CHANGES.txt
@@ -68,6 +68,13 @@
   * Disable cassandra timestamp update when Elasticsearch _timestamp is disabled (improve insert performance, regression from 2.1.1-4)
   * Fix a bug on flush and refresh operations causing a performance issue with kibana.
 
-2.1.1-7 - 2016-004-02
+2.1.1-7 - 2016-04-02
   * Fix a ClassCastException when indexing a document with single partition key other than a string.
   * Add support for mapping update of nested object (update cassandra UDT).
+
+2.1.1-8 - 2016-04-18
+  * Add mapping attributes cql_partition_key:boolean and cql_primary_key_order:integer to build cassandra table with composite primary key.
+  * Add support for static column with the mapping attribute cql_static_column:boolean. This give the ability to index static columns in Elasticsearch.
+  * Fix various mapping issues.
+
+
diff --git a/README.md b/README.md
@@ -12,12 +12,13 @@ alt="Elassandra demo" width="240" height="180" border="10" /></a>
 
 ## News
 
-* **2016-03-18 Release 2.1.1-6 Add support for SQL plugin (from [NLPchina](https://github.com/NLPchina/elasticsearch-sql)) and JDBC driver (from [Anchormen](https://github.com/Anchormen/sql4es))**.
-* **2016-02-16 Release 2.1.1-2 Remove build dependency to elasticsearch parent project**.
-* **2016-02-01 Release 2.1.1-1 Add support for parent-child relationship**.
-* **2016-01-28 Release 2.1.1 based on Elasticsearch 2.1.1 and cassandra 2.2.4**
-* **2015-12-20 Release 0.5 Re-index you data from cassandra 2.2.4 with zero downtime**.
-* **2015-11-15 Release 0.4 New elassandra tarball ready-to-run**.
+* **2016-04-17 Release 2.1.1-8 New feature, index cassandra static columns**
+* **2016-03-18 Release 2.1.1-6 Add support for SQL plugin (from [NLPchina](https://github.com/NLPchina/elasticsearch-sql)) and JDBC driver (from [Anchormen](https://github.com/Anchormen/sql4es)).**
+* **2016-02-16 Release 2.1.1-2 Remove build dependency to elasticsearch parent project.**
+* **2016-02-01 Release 2.1.1-1 Add support for parent-child relationship.**
+* **2016-01-28 Release 2.1.1 based on Elasticsearch 2.1.1 and cassandra 2.2.4.**
+* **2015-12-20 Release 0.5 Re-index you data from cassandra 2.2.4 with zero downtime.**
+* **2015-11-15 Release 0.4 New elassandra tarball ready-to-run.**
 
 ## Benefits of Elassandra
 
@@ -586,7 +587,6 @@ localhost/127.0.0.1
 
 # Elasticsearch document mapping
 
-
 Here is the mapping from Elasticsearch field basic types to CQL3 types :
 
 Elasticearch Types | CQL Types | Comment
@@ -610,7 +610,10 @@ Parameter | Values | Description
 cql_collection | **list**, set or singleton | Control how the field of type X is mapped to a column list<X>, set<X> or X. Default is **list** because Elasticsearch fields are multivalued.
 cql_struct | **udt** or map | Control how an object or nested field is mapped to a User Defined Type or to a cassandra map<text,?>. Default is **udt**.
 cql_partial_update | **true** or false | Elasticsearch index full document. For partial CQL updates, this control which fields should be read to index a full document from a row. Default is **true** meaning that updates involve reading all missing fields.
+cql_primary_key_order | **integer** | Field position in the cassandra the primary key of the underlying cassandra table. Default is **-1** meaning that the field is not part of the cassandra primary key.
+cql_partition_key | true or **false** | When the cql_primary_key_order >= 0, specify if the field is part of the cassandra partition key. Default is **false** meaning that the field is not part of the cassandra partition key.
 
+For more information about cassandra collection types and compound primary key, see https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html and https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html.
 
 ## Elasticsearch mapping from an existing cassandra table.
 
@@ -638,6 +641,104 @@ When mapping an existing cassandra table to an Elasticsearch index.type, primary
 * Single primary key is converted to a string.
 * Compound primary key is converted to a JSON array stored as string in the  `_id` field.
 
+## Indexing cassandra static columns
+
+In a table that use clustering columns, a [static columns](http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refStaticCol.html) is shared by all the rows with the same partition key. A slight modification of cassandra code provides support of secondary index on static columns, allowing to search on static columns values (CQL search on static columns remains unsupported). Each time a static columns is modified, a document containing the partition key and only static columns is indexed in Elasticserach. Static columns are not indexed with every [wide rows](http://www.planetcassandra.org/blog/wide-rows-in-cassandra-cql/) because any update on a static column would require reindexation of all wide rows. However, you can request for fields backed by a static columns on any get/search request. 
+
+The following example demonstrates how to use static columns to store meta information of timeseries.
+
+``` 
+curl -XPUT "http://localhost:9200/test" -d '{
+..."mappings" : {
+          "timeseries" : {
+            "properties" : {
+              "t" : {
+                "type" : "date",
+                "format" : "strict_date_optional_time||epoch_millis",
+                "cql_primary_key_order" : 1,
+                "cql_collection" : "singleton"
+              },
+              "meta" : {
+                "type" : "nested",
+                "cql_struct" : "map",
+                "cql_static_column" : true,
+                "cql_collection" : "singleton",
+                "include_in_parent" : true,
+                "properties" : {
+                  "region" : {
+                    "type" : "string"
+                  }
+                }
+              },
+              "v" : {
+                "type" : "double",
+                "cql_collection" : "singleton"
+              },
+              "m" : {
+                "type" : "string",
+                "cql_partition_key" : true,
+                "cql_primary_key_order" : 0,
+                "cql_collection" : "singleton"
+              }
+            }
+          }
+     }
+}'
+
+cqlsh <<EOF
+INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:30', 10);
+INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:31', 20);
+INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:32', 15);
+INSERT INTO test.timeseries (m, meta) VALUES ('server1-cpu', { 'region':'west' } );
+SELECT * FROM test.timeseries;
+EOF
+
+ m           | t                           | meta               | v
+-------------+-----------------------------+--------------------+----
+ server1-cpu | 2016-04-10 11:30:00.000000z | {'region': 'west'} | 10
+ server1-cpu | 2016-04-10 11:31:00.000000z | {'region': 'west'} | 20
+ server1-cpu | 2016-04-10 11:32:00.000000z | {'region': 'west'} | 15
+``` 
+
+Search for wide rows only where v=10 and fetch the meta.region field.
+``` 
+curl -XGET "http://$NODE:9200/test/timeseries/_search?pretty=true&q=v:10&fields=m,t,v,meta.region"
+...
+"hits" : [ {
+      "_index" : "test",
+      "_type" : "timeseries",
+      "_id" : "[\"server1-cpu\",1460287800000]",
+      "_score" : 1.9162908,
+      "_routing" : "server1-cpu",
+      "fields" : {
+        "meta.region" : [ "west" ],
+        "t" : [ "2016-04-10T11:30:00.000Z" ],
+        "m" : [ "server1-cpu" ],
+        "v" : [ 10.0 ]
+      }
+    } ]
+``` 
+
+Search for rows where meta.region=west, returns only the partition key and static columns.
+``` 
+curl -XGET "http://$NODE:9200/test/timeseries/_search?pretty=true&q=meta.region:west&fields=m,t,v,meta.region"
+....
+"hits" : {
+    "total" : 1,
+    "max_score" : 1.5108256,
+    "hits" : [ {
+      "_index" : "test",
+      "_type" : "timeseries",
+      "_id" : "server1-cpu",
+      "_score" : 1.5108256,
+      "_routing" : "server1-cpu",
+      "fields" : {
+        "m" : [ "server1-cpu" ],
+        "meta.region" : [ "west" ]
+      }
+    } ]
+``` 
+
 ## Mapping-change-with-zero-downtime
 
 You can map servral Elasticsearch indices with different mapping to the same cassandra keyspace. By default, an index is mapped to a keyspace with the same name, but you can specify a target keyspace. 
@@ -720,19 +821,57 @@ _id  | message              | user
 
 Since version 0.3, nested document can be mapped to [User Defined Type](https://docs.datastax.com/en/cql/3.1/cql/cql_using/cqlUseUDT.html) or to CQL [map](http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_map_t.html#toc_pane). In the following example, the cassandra map is automatically mapped with `cql_partial_update:true`, so a partial CQL update cause a read of the whole map to re-index a document in the elasticsearch index. 
 
+Create an index (a keyspace in your elassandra-aware datacenter)
+```
+curl -XPUT "http://localhost:9200/twitter"
+```
+Create a cassandra table with a map column.
 ```
-cqlsh>CREATE KEYSPACE IF NOT EXISTS twitter WITH replication={ 'class':'NetworkTopologyStrategy', 'DC1':'1' };
 cqlsh>CREATE TABLE twitter.user ( 
 name text,
 attrs map<text,text>,
 primary key (name)
 );
 cqlsh>INSERT INTO twitter.user (name,attrs) VALUES ('bob',{'email':'[email protected]','firstname':'bob'});
 ```
-
+Create the type mapping from the cassandra table and search for the *bob* entry.
 ```
-curl -XPUT "http://localhost:9200/twitter/" -d '{ "settings" : { "number_of_shards" : 1, "number_of_replicas" : 0 } }'
 curl -XPUT "http://localhost:9200/twitter/_mapping/user" -d '{ "user" : { "columns_regexp" : ".*" }}'
+{"acknowledged":true}
+
+curl -XGET 'http://localhost:9200/twitter/_mapping/user?pretty=true'
+{
+  "twitter" : {
+    "mappings" : {
+      "user" : {
+        "properties" : {
+          "attrs" : {
+            "type" : "nested",
+            "cql_struct" : "map",
+            "cql_collection" : "singleton",
+            "properties" : {
+              "email" : {
+                "type" : "string"
+              },
+              "firstname" : {
+                "type" : "string"
+              }
+            }
+          },
+          "name" : {
+            "type" : "string",
+            "cql_collection" : "singleton",
+            "cql_partition_key" : true,
+            "cql_primary_key_order" : 0
+          }
+        }
+      }
+    }
+  }
+}
+```
+Get the *bob* entry.
+```
 curl -XGET "http://localhost:9200/twitter/user/bob?pretty=true"
 {
   "_index" : "twitter",
@@ -745,7 +884,6 @@ curl -XGET "http://localhost:9200/twitter/user/bob?pretty=true"
 ```
 
 Now insert a new entry in the attrs map column and search for a nested field `attrs.city:paris`.
-
 ```
 cqlsh>UPDATE twitter.user SET attrs = attrs + { 'city':'paris' } WHERE name = 'bob';
 ```

diff --git a/bin/plugin b/bin/plugin
@@ -1,5 +1,5 @@
 #!/bin/sh
-ES_HOME=$CASSANDRA_HOME
+
 
 CDPATH=""
 SCRIPT="$0"
@@ -23,29 +23,91 @@ ES_HOME=`dirname "$SCRIPT"`/..
 ES_HOME=`cd "$ES_HOME"; pwd`
 
 
+# Sets the default values for elasticsearch variables used in this script
+if [ -z "$CONF_DIR" ]; then
+  CONF_DIR="$ES_HOME/config"
+fi
+
+# The default env file is defined at building/packaging time.
+# For a tar.gz package, the value is "".
+ES_ENV_FILE=""
+
+# If an include is specified with the ES_INCLUDE environment variable, use it
+if [ -n "$ES_INCLUDE" ]; then
+    ES_ENV_FILE="$ES_INCLUDE"
+fi
+
+# Source the environment file
+if [ -n "$ES_ENV_FILE" ]; then
+
+  # If the ES_ENV_FILE is not found, try to resolve the path
+  # against the ES_HOME directory
+  if [ ! -f "$ES_ENV_FILE" ]; then
+      ES_ENV_FILE="$ELASTIC_HOME/$ES_ENV_FILE"
+  fi
+
+  . "$ES_ENV_FILE"
+  if [ $? -ne 0 ]; then
+      echo "Unable to source environment file: $ES_ENV_FILE" >&2
+      exit 1
+  fi
+fi
+
+# don't let JAVA_TOOL_OPTIONS slip in (e.g. crazy agents in ubuntu)
+# works around https://bugs.launchpad.net/ubuntu/+source/jayatana/+bug/1441487
+if [ "x$JAVA_TOOL_OPTIONS" != "x" ]; then
+    echo "Warning: Ignoring JAVA_TOOL_OPTIONS=$JAVA_TOOL_OPTIONS"
+    unset JAVA_TOOL_OPTIONS
+fi
+
+# CONF_FILE setting was removed
+if [ ! -z "$CONF_FILE" ]; then
+    echo "CONF_FILE setting is no longer supported. elasticsearch.yml must be placed in the config directory and cannot be renamed."
+    exit 1
+fi
+
 if [ -x "$JAVA_HOME/bin/java" ]; then
     JAVA=$JAVA_HOME/bin/java
 else
     JAVA=`which java`
 fi
 
+if [ ! -x "$JAVA" ]; then
+    echo "Could not find any executable java binary. Please install java in your PATH or set JAVA_HOME"
+    exit 1
+fi
+
 # real getopt cannot be used because we need to hand options over to the PluginManager
 while [ $# -gt 0 ]; do
   case $1 in
     -D*=*)
-      properties="$properties $1"
+      properties="$properties \"$1\""
       ;;
     -D*)
       var=$1
       shift
-      properties="$properties $var=$1"
+      properties="$properties \"$var\"=\"$1\""
       ;;
     *)
-      args="$args $1"
+      args="$args \"$1\""
   esac
   shift
 done
 
+# check if properties already has a config file or config dir
+if [ -e "$CONF_DIR" ]; then
+  case "$properties" in
+    *-Des.default.path.conf=*|*-Des.path.conf=*)
+    ;;
+    *)
+      properties="$properties -Des.default.path.conf=\"$CONF_DIR\""
+    ;;
+  esac
+fi
 
-exec "$JAVA" $JAVA_OPTS $ES_JAVA_OPTS -Xmx64m -Xms16m -Delasticsearch -Des.path.home="$ES_HOME" $properties -cp "$ES_HOME/lib/*" org.elasticsearch.plugins.PluginManagerCliParser $args
+# full hostname passed through cut for portability on systems that do not support hostname -s
+# export on separate line for shells that do not support combining definition and export
+HOSTNAME=`hostname | cut -d. -f1`
+export HOSTNAME
 
+eval "$JAVA" -client -Delasticsearch -Des.path.home="\"$ES_HOME\"" $properties -cp "\"$ES_HOME/lib/*\"" org.elasticsearch.plugins.PluginManagerCliParser $args
diff --git a/dependency-reduced-pom.xml b/dependency-reduced-pom.xml
@@ -4,7 +4,7 @@
   <groupId>org.elassandra</groupId>
   <artifactId>elassandra</artifactId>
   <name>Elassandra</name>
-  <version>2.1.1-7</version>
+  <version>2.1.1-8</version>
   <description>Elassandra - ElasticSearch for Cassandra</description>
   <licenses>
     <license>
@@ -236,6 +236,8 @@
                     <exclude>org/apache/cassandra/service/CassandraDaemon*.class</exclude>
                     <exclude>org/apache/cassandra/service/StorageService$*.class</exclude>
                     <exclude>org/apache/cassandra/service/StorageService.class</exclude>
+                    <exclude>org/apache/cassandra/cql3/statements/CreateIndexStatement*.class</exclude>
+                    <exclude>org/apache/cassandra/db/index/SecondaryIndexManager*.class</exclude>
                   </excludes>
                 </filter>
               </filters>

diff --git a/pom.xml b/pom.xml
@@ -13,7 +13,7 @@
 
     <groupId>org.elassandra</groupId>
     <artifactId>elassandra</artifactId>
-    <version>2.1.1-7</version>
+    <version>2.1.1-8</version>
     <name>Elassandra</name>
     <description>Elassandra - ElasticSearch for Cassandra</description>
 
@@ -699,6 +699,8 @@
                                         <exclude>org/apache/cassandra/service/CassandraDaemon*.class</exclude>
                                         <exclude>org/apache/cassandra/service/StorageService$*.class</exclude>
                                         <exclude>org/apache/cassandra/service/StorageService.class</exclude>
+                                        <exclude>org/apache/cassandra/cql3/statements/CreateIndexStatement*.class</exclude>
+                                        <exclude>org/apache/cassandra/db/index/SecondaryIndexManager*.class</exclude>
                                    </excludes>
                                 </filter>
                            </filters>