Skip to content

Commit

Permalink
release 2.1.1-8, add support for static columns
Browse files Browse the repository at this point in the history
  • Loading branch information
vroyer committed Apr 17, 2016
1 parent 3817897 commit 2f51f4c
Show file tree
Hide file tree
Showing 28 changed files with 1,380 additions and 1,183 deletions.
9 changes: 8 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,13 @@
* Disable cassandra timestamp update when Elasticsearch _timestamp is disabled (improve insert performance, regression from 2.1.1-4)
* Fix a bug on flush and refresh operations causing a performance issue with kibana.

2.1.1-7 - 2016-004-02
2.1.1-7 - 2016-04-02
* Fix a ClassCastException when indexing a document with single partition key other than a string.
* Add support for mapping update of nested object (update cassandra UDT).

2.1.1-8 - 2016-04-18
* Add mapping attributes cql_partition_key:boolean and cql_primary_key_order:integer to build cassandra table with composite primary key.
* Add support for static column with the mapping attribute cql_static_column:boolean. This give the ability to index static columns in Elasticsearch.
* Fix various mapping issues.


160 changes: 149 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@ alt="Elassandra demo" width="240" height="180" border="10" /></a>

## News

* **2016-03-18 Release 2.1.1-6 Add support for SQL plugin (from [NLPchina](https://github.com/NLPchina/elasticsearch-sql)) and JDBC driver (from [Anchormen](https://github.com/Anchormen/sql4es))**.
* **2016-02-16 Release 2.1.1-2 Remove build dependency to elasticsearch parent project**.
* **2016-02-01 Release 2.1.1-1 Add support for parent-child relationship**.
* **2016-01-28 Release 2.1.1 based on Elasticsearch 2.1.1 and cassandra 2.2.4**
* **2015-12-20 Release 0.5 Re-index you data from cassandra 2.2.4 with zero downtime**.
* **2015-11-15 Release 0.4 New elassandra tarball ready-to-run**.
* **2016-04-17 Release 2.1.1-8 New feature, index cassandra static columns**
* **2016-03-18 Release 2.1.1-6 Add support for SQL plugin (from [NLPchina](https://github.com/NLPchina/elasticsearch-sql)) and JDBC driver (from [Anchormen](https://github.com/Anchormen/sql4es)).**
* **2016-02-16 Release 2.1.1-2 Remove build dependency to elasticsearch parent project.**
* **2016-02-01 Release 2.1.1-1 Add support for parent-child relationship.**
* **2016-01-28 Release 2.1.1 based on Elasticsearch 2.1.1 and cassandra 2.2.4.**
* **2015-12-20 Release 0.5 Re-index you data from cassandra 2.2.4 with zero downtime.**
* **2015-11-15 Release 0.4 New elassandra tarball ready-to-run.**

## Benefits of Elassandra

Expand Down Expand Up @@ -586,7 +587,6 @@ localhost/127.0.0.1

# Elasticsearch document mapping


Here is the mapping from Elasticsearch field basic types to CQL3 types :

Elasticearch Types | CQL Types | Comment
Expand All @@ -610,7 +610,10 @@ Parameter | Values | Description
cql_collection | **list**, set or singleton | Control how the field of type X is mapped to a column list<X>, set<X> or X. Default is **list** because Elasticsearch fields are multivalued.
cql_struct | **udt** or map | Control how an object or nested field is mapped to a User Defined Type or to a cassandra map<text,?>. Default is **udt**.
cql_partial_update | **true** or false | Elasticsearch index full document. For partial CQL updates, this control which fields should be read to index a full document from a row. Default is **true** meaning that updates involve reading all missing fields.
cql_primary_key_order | **integer** | Field position in the cassandra the primary key of the underlying cassandra table. Default is **-1** meaning that the field is not part of the cassandra primary key.
cql_partition_key | true or **false** | When the cql_primary_key_order >= 0, specify if the field is part of the cassandra partition key. Default is **false** meaning that the field is not part of the cassandra partition key.

For more information about cassandra collection types and compound primary key, see https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html and https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html.

## Elasticsearch mapping from an existing cassandra table.

Expand Down Expand Up @@ -638,6 +641,104 @@ When mapping an existing cassandra table to an Elasticsearch index.type, primary
* Single primary key is converted to a string.
* Compound primary key is converted to a JSON array stored as string in the `_id` field.

## Indexing cassandra static columns

In a table that use clustering columns, a [static columns](http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refStaticCol.html) is shared by all the rows with the same partition key. A slight modification of cassandra code provides support of secondary index on static columns, allowing to search on static columns values (CQL search on static columns remains unsupported). Each time a static columns is modified, a document containing the partition key and only static columns is indexed in Elasticserach. Static columns are not indexed with every [wide rows](http://www.planetcassandra.org/blog/wide-rows-in-cassandra-cql/) because any update on a static column would require reindexation of all wide rows. However, you can request for fields backed by a static columns on any get/search request.

The following example demonstrates how to use static columns to store meta information of timeseries.

```
curl -XPUT "http://localhost:9200/test" -d '{
..."mappings" : {
"timeseries" : {
"properties" : {
"t" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis",
"cql_primary_key_order" : 1,
"cql_collection" : "singleton"
},
"meta" : {
"type" : "nested",
"cql_struct" : "map",
"cql_static_column" : true,
"cql_collection" : "singleton",
"include_in_parent" : true,
"properties" : {
"region" : {
"type" : "string"
}
}
},
"v" : {
"type" : "double",
"cql_collection" : "singleton"
},
"m" : {
"type" : "string",
"cql_partition_key" : true,
"cql_primary_key_order" : 0,
"cql_collection" : "singleton"
}
}
}
}
}'
cqlsh <<EOF
INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:30', 10);
INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:31', 20);
INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:32', 15);
INSERT INTO test.timeseries (m, meta) VALUES ('server1-cpu', { 'region':'west' } );
SELECT * FROM test.timeseries;
EOF
m | t | meta | v
-------------+-----------------------------+--------------------+----
server1-cpu | 2016-04-10 11:30:00.000000z | {'region': 'west'} | 10
server1-cpu | 2016-04-10 11:31:00.000000z | {'region': 'west'} | 20
server1-cpu | 2016-04-10 11:32:00.000000z | {'region': 'west'} | 15
```

Search for wide rows only where v=10 and fetch the meta.region field.
```
curl -XGET "http://$NODE:9200/test/timeseries/_search?pretty=true&q=v:10&fields=m,t,v,meta.region"
...
"hits" : [ {
"_index" : "test",
"_type" : "timeseries",
"_id" : "[\"server1-cpu\",1460287800000]",
"_score" : 1.9162908,
"_routing" : "server1-cpu",
"fields" : {
"meta.region" : [ "west" ],
"t" : [ "2016-04-10T11:30:00.000Z" ],
"m" : [ "server1-cpu" ],
"v" : [ 10.0 ]
}
} ]
```

Search for rows where meta.region=west, returns only the partition key and static columns.
```
curl -XGET "http://$NODE:9200/test/timeseries/_search?pretty=true&q=meta.region:west&fields=m,t,v,meta.region"
....
"hits" : {
"total" : 1,
"max_score" : 1.5108256,
"hits" : [ {
"_index" : "test",
"_type" : "timeseries",
"_id" : "server1-cpu",
"_score" : 1.5108256,
"_routing" : "server1-cpu",
"fields" : {
"m" : [ "server1-cpu" ],
"meta.region" : [ "west" ]
}
} ]
```

## Mapping-change-with-zero-downtime

You can map servral Elasticsearch indices with different mapping to the same cassandra keyspace. By default, an index is mapped to a keyspace with the same name, but you can specify a target keyspace.
Expand Down Expand Up @@ -720,19 +821,57 @@ _id | message | user

Since version 0.3, nested document can be mapped to [User Defined Type](https://docs.datastax.com/en/cql/3.1/cql/cql_using/cqlUseUDT.html) or to CQL [map](http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_map_t.html#toc_pane). In the following example, the cassandra map is automatically mapped with `cql_partial_update:true`, so a partial CQL update cause a read of the whole map to re-index a document in the elasticsearch index.

Create an index (a keyspace in your elassandra-aware datacenter)
```
curl -XPUT "http://localhost:9200/twitter"
```
Create a cassandra table with a map column.
```
cqlsh>CREATE KEYSPACE IF NOT EXISTS twitter WITH replication={ 'class':'NetworkTopologyStrategy', 'DC1':'1' };
cqlsh>CREATE TABLE twitter.user (
name text,
attrs map<text,text>,
primary key (name)
);
cqlsh>INSERT INTO twitter.user (name,attrs) VALUES ('bob',{'email':'[email protected]','firstname':'bob'});
```

Create the type mapping from the cassandra table and search for the *bob* entry.
```
curl -XPUT "http://localhost:9200/twitter/" -d '{ "settings" : { "number_of_shards" : 1, "number_of_replicas" : 0 } }'
curl -XPUT "http://localhost:9200/twitter/_mapping/user" -d '{ "user" : { "columns_regexp" : ".*" }}'
{"acknowledged":true}
curl -XGET 'http://localhost:9200/twitter/_mapping/user?pretty=true'
{
"twitter" : {
"mappings" : {
"user" : {
"properties" : {
"attrs" : {
"type" : "nested",
"cql_struct" : "map",
"cql_collection" : "singleton",
"properties" : {
"email" : {
"type" : "string"
},
"firstname" : {
"type" : "string"
}
}
},
"name" : {
"type" : "string",
"cql_collection" : "singleton",
"cql_partition_key" : true,
"cql_primary_key_order" : 0
}
}
}
}
}
}
```
Get the *bob* entry.
```
curl -XGET "http://localhost:9200/twitter/user/bob?pretty=true"
{
"_index" : "twitter",
Expand All @@ -745,7 +884,6 @@ curl -XGET "http://localhost:9200/twitter/user/bob?pretty=true"
```

Now insert a new entry in the attrs map column and search for a nested field `attrs.city:paris`.

```
cqlsh>UPDATE twitter.user SET attrs = attrs + { 'city':'paris' } WHERE name = 'bob';
```
Expand Down
72 changes: 67 additions & 5 deletions bin/plugin
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/sh
ES_HOME=$CASSANDRA_HOME


CDPATH=""
SCRIPT="$0"
Expand All @@ -23,29 +23,91 @@ ES_HOME=`dirname "$SCRIPT"`/..
ES_HOME=`cd "$ES_HOME"; pwd`


# Sets the default values for elasticsearch variables used in this script
if [ -z "$CONF_DIR" ]; then
CONF_DIR="$ES_HOME/config"
fi

# The default env file is defined at building/packaging time.
# For a tar.gz package, the value is "".
ES_ENV_FILE=""

# If an include is specified with the ES_INCLUDE environment variable, use it
if [ -n "$ES_INCLUDE" ]; then
ES_ENV_FILE="$ES_INCLUDE"
fi

# Source the environment file
if [ -n "$ES_ENV_FILE" ]; then

# If the ES_ENV_FILE is not found, try to resolve the path
# against the ES_HOME directory
if [ ! -f "$ES_ENV_FILE" ]; then
ES_ENV_FILE="$ELASTIC_HOME/$ES_ENV_FILE"
fi

. "$ES_ENV_FILE"
if [ $? -ne 0 ]; then
echo "Unable to source environment file: $ES_ENV_FILE" >&2
exit 1
fi
fi

# don't let JAVA_TOOL_OPTIONS slip in (e.g. crazy agents in ubuntu)
# works around https://bugs.launchpad.net/ubuntu/+source/jayatana/+bug/1441487
if [ "x$JAVA_TOOL_OPTIONS" != "x" ]; then
echo "Warning: Ignoring JAVA_TOOL_OPTIONS=$JAVA_TOOL_OPTIONS"
unset JAVA_TOOL_OPTIONS
fi

# CONF_FILE setting was removed
if [ ! -z "$CONF_FILE" ]; then
echo "CONF_FILE setting is no longer supported. elasticsearch.yml must be placed in the config directory and cannot be renamed."
exit 1
fi

if [ -x "$JAVA_HOME/bin/java" ]; then
JAVA=$JAVA_HOME/bin/java
else
JAVA=`which java`
fi

if [ ! -x "$JAVA" ]; then
echo "Could not find any executable java binary. Please install java in your PATH or set JAVA_HOME"
exit 1
fi

# real getopt cannot be used because we need to hand options over to the PluginManager
while [ $# -gt 0 ]; do
case $1 in
-D*=*)
properties="$properties $1"
properties="$properties \"$1\""
;;
-D*)
var=$1
shift
properties="$properties $var=$1"
properties="$properties \"$var\"=\"$1\""
;;
*)
args="$args $1"
args="$args \"$1\""
esac
shift
done

# check if properties already has a config file or config dir
if [ -e "$CONF_DIR" ]; then
case "$properties" in
*-Des.default.path.conf=*|*-Des.path.conf=*)
;;
*)
properties="$properties -Des.default.path.conf=\"$CONF_DIR\""
;;
esac
fi

exec "$JAVA" $JAVA_OPTS $ES_JAVA_OPTS -Xmx64m -Xms16m -Delasticsearch -Des.path.home="$ES_HOME" $properties -cp "$ES_HOME/lib/*" org.elasticsearch.plugins.PluginManagerCliParser $args
# full hostname passed through cut for portability on systems that do not support hostname -s
# export on separate line for shells that do not support combining definition and export
HOSTNAME=`hostname | cut -d. -f1`
export HOSTNAME

eval "$JAVA" -client -Delasticsearch -Des.path.home="\"$ES_HOME\"" $properties -cp "\"$ES_HOME/lib/*\"" org.elasticsearch.plugins.PluginManagerCliParser $args
4 changes: 3 additions & 1 deletion dependency-reduced-pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<groupId>org.elassandra</groupId>
<artifactId>elassandra</artifactId>
<name>Elassandra</name>
<version>2.1.1-7</version>
<version>2.1.1-8</version>
<description>Elassandra - ElasticSearch for Cassandra</description>
<licenses>
<license>
Expand Down Expand Up @@ -236,6 +236,8 @@
<exclude>org/apache/cassandra/service/CassandraDaemon*.class</exclude>
<exclude>org/apache/cassandra/service/StorageService$*.class</exclude>
<exclude>org/apache/cassandra/service/StorageService.class</exclude>
<exclude>org/apache/cassandra/cql3/statements/CreateIndexStatement*.class</exclude>
<exclude>org/apache/cassandra/db/index/SecondaryIndexManager*.class</exclude>
</excludes>
</filter>
</filters>
Expand Down
4 changes: 3 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

<groupId>org.elassandra</groupId>
<artifactId>elassandra</artifactId>
<version>2.1.1-7</version>
<version>2.1.1-8</version>
<name>Elassandra</name>
<description>Elassandra - ElasticSearch for Cassandra</description>

Expand Down Expand Up @@ -699,6 +699,8 @@
<exclude>org/apache/cassandra/service/CassandraDaemon*.class</exclude>
<exclude>org/apache/cassandra/service/StorageService$*.class</exclude>
<exclude>org/apache/cassandra/service/StorageService.class</exclude>
<exclude>org/apache/cassandra/cql3/statements/CreateIndexStatement*.class</exclude>
<exclude>org/apache/cassandra/db/index/SecondaryIndexManager*.class</exclude>
</excludes>
</filter>
</filters>
Expand Down
Loading

0 comments on commit 2f51f4c

Please sign in to comment.