-
Notifications
You must be signed in to change notification settings - Fork 199
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
release 2.1.1-8, add support for static columns
- Loading branch information
Showing
28 changed files
with
1,380 additions
and
1,183 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,12 +12,13 @@ alt="Elassandra demo" width="240" height="180" border="10" /></a> | |
|
||
## News | ||
|
||
* **2016-03-18 Release 2.1.1-6 Add support for SQL plugin (from [NLPchina](https://github.com/NLPchina/elasticsearch-sql)) and JDBC driver (from [Anchormen](https://github.com/Anchormen/sql4es))**. | ||
* **2016-02-16 Release 2.1.1-2 Remove build dependency to elasticsearch parent project**. | ||
* **2016-02-01 Release 2.1.1-1 Add support for parent-child relationship**. | ||
* **2016-01-28 Release 2.1.1 based on Elasticsearch 2.1.1 and cassandra 2.2.4** | ||
* **2015-12-20 Release 0.5 Re-index you data from cassandra 2.2.4 with zero downtime**. | ||
* **2015-11-15 Release 0.4 New elassandra tarball ready-to-run**. | ||
* **2016-04-17 Release 2.1.1-8 New feature, index cassandra static columns** | ||
* **2016-03-18 Release 2.1.1-6 Add support for SQL plugin (from [NLPchina](https://github.com/NLPchina/elasticsearch-sql)) and JDBC driver (from [Anchormen](https://github.com/Anchormen/sql4es)).** | ||
* **2016-02-16 Release 2.1.1-2 Remove build dependency to elasticsearch parent project.** | ||
* **2016-02-01 Release 2.1.1-1 Add support for parent-child relationship.** | ||
* **2016-01-28 Release 2.1.1 based on Elasticsearch 2.1.1 and cassandra 2.2.4.** | ||
* **2015-12-20 Release 0.5 Re-index you data from cassandra 2.2.4 with zero downtime.** | ||
* **2015-11-15 Release 0.4 New elassandra tarball ready-to-run.** | ||
|
||
## Benefits of Elassandra | ||
|
||
|
@@ -586,7 +587,6 @@ localhost/127.0.0.1 | |
|
||
# Elasticsearch document mapping | ||
|
||
|
||
Here is the mapping from Elasticsearch field basic types to CQL3 types : | ||
|
||
Elasticearch Types | CQL Types | Comment | ||
|
@@ -610,7 +610,10 @@ Parameter | Values | Description | |
cql_collection | **list**, set or singleton | Control how the field of type X is mapped to a column list<X>, set<X> or X. Default is **list** because Elasticsearch fields are multivalued. | ||
cql_struct | **udt** or map | Control how an object or nested field is mapped to a User Defined Type or to a cassandra map<text,?>. Default is **udt**. | ||
cql_partial_update | **true** or false | Elasticsearch index full document. For partial CQL updates, this control which fields should be read to index a full document from a row. Default is **true** meaning that updates involve reading all missing fields. | ||
cql_primary_key_order | **integer** | Field position in the cassandra the primary key of the underlying cassandra table. Default is **-1** meaning that the field is not part of the cassandra primary key. | ||
cql_partition_key | true or **false** | When the cql_primary_key_order >= 0, specify if the field is part of the cassandra partition key. Default is **false** meaning that the field is not part of the cassandra partition key. | ||
|
||
For more information about cassandra collection types and compound primary key, see https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html and https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html. | ||
|
||
## Elasticsearch mapping from an existing cassandra table. | ||
|
||
|
@@ -638,6 +641,104 @@ When mapping an existing cassandra table to an Elasticsearch index.type, primary | |
* Single primary key is converted to a string. | ||
* Compound primary key is converted to a JSON array stored as string in the `_id` field. | ||
|
||
## Indexing cassandra static columns | ||
|
||
In a table that use clustering columns, a [static columns](http://docs.datastax.com/en/cql/3.1/cql/cql_reference/refStaticCol.html) is shared by all the rows with the same partition key. A slight modification of cassandra code provides support of secondary index on static columns, allowing to search on static columns values (CQL search on static columns remains unsupported). Each time a static columns is modified, a document containing the partition key and only static columns is indexed in Elasticserach. Static columns are not indexed with every [wide rows](http://www.planetcassandra.org/blog/wide-rows-in-cassandra-cql/) because any update on a static column would require reindexation of all wide rows. However, you can request for fields backed by a static columns on any get/search request. | ||
|
||
The following example demonstrates how to use static columns to store meta information of timeseries. | ||
|
||
``` | ||
curl -XPUT "http://localhost:9200/test" -d '{ | ||
..."mappings" : { | ||
"timeseries" : { | ||
"properties" : { | ||
"t" : { | ||
"type" : "date", | ||
"format" : "strict_date_optional_time||epoch_millis", | ||
"cql_primary_key_order" : 1, | ||
"cql_collection" : "singleton" | ||
}, | ||
"meta" : { | ||
"type" : "nested", | ||
"cql_struct" : "map", | ||
"cql_static_column" : true, | ||
"cql_collection" : "singleton", | ||
"include_in_parent" : true, | ||
"properties" : { | ||
"region" : { | ||
"type" : "string" | ||
} | ||
} | ||
}, | ||
"v" : { | ||
"type" : "double", | ||
"cql_collection" : "singleton" | ||
}, | ||
"m" : { | ||
"type" : "string", | ||
"cql_partition_key" : true, | ||
"cql_primary_key_order" : 0, | ||
"cql_collection" : "singleton" | ||
} | ||
} | ||
} | ||
} | ||
}' | ||
cqlsh <<EOF | ||
INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:30', 10); | ||
INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:31', 20); | ||
INSERT INTO test.timeseries (m, t, v) VALUES ('server1-cpu', '2016-04-10 13:32', 15); | ||
INSERT INTO test.timeseries (m, meta) VALUES ('server1-cpu', { 'region':'west' } ); | ||
SELECT * FROM test.timeseries; | ||
EOF | ||
m | t | meta | v | ||
-------------+-----------------------------+--------------------+---- | ||
server1-cpu | 2016-04-10 11:30:00.000000z | {'region': 'west'} | 10 | ||
server1-cpu | 2016-04-10 11:31:00.000000z | {'region': 'west'} | 20 | ||
server1-cpu | 2016-04-10 11:32:00.000000z | {'region': 'west'} | 15 | ||
``` | ||
|
||
Search for wide rows only where v=10 and fetch the meta.region field. | ||
``` | ||
curl -XGET "http://$NODE:9200/test/timeseries/_search?pretty=true&q=v:10&fields=m,t,v,meta.region" | ||
... | ||
"hits" : [ { | ||
"_index" : "test", | ||
"_type" : "timeseries", | ||
"_id" : "[\"server1-cpu\",1460287800000]", | ||
"_score" : 1.9162908, | ||
"_routing" : "server1-cpu", | ||
"fields" : { | ||
"meta.region" : [ "west" ], | ||
"t" : [ "2016-04-10T11:30:00.000Z" ], | ||
"m" : [ "server1-cpu" ], | ||
"v" : [ 10.0 ] | ||
} | ||
} ] | ||
``` | ||
|
||
Search for rows where meta.region=west, returns only the partition key and static columns. | ||
``` | ||
curl -XGET "http://$NODE:9200/test/timeseries/_search?pretty=true&q=meta.region:west&fields=m,t,v,meta.region" | ||
.... | ||
"hits" : { | ||
"total" : 1, | ||
"max_score" : 1.5108256, | ||
"hits" : [ { | ||
"_index" : "test", | ||
"_type" : "timeseries", | ||
"_id" : "server1-cpu", | ||
"_score" : 1.5108256, | ||
"_routing" : "server1-cpu", | ||
"fields" : { | ||
"m" : [ "server1-cpu" ], | ||
"meta.region" : [ "west" ] | ||
} | ||
} ] | ||
``` | ||
|
||
## Mapping-change-with-zero-downtime | ||
|
||
You can map servral Elasticsearch indices with different mapping to the same cassandra keyspace. By default, an index is mapped to a keyspace with the same name, but you can specify a target keyspace. | ||
|
@@ -720,19 +821,57 @@ _id | message | user | |
|
||
Since version 0.3, nested document can be mapped to [User Defined Type](https://docs.datastax.com/en/cql/3.1/cql/cql_using/cqlUseUDT.html) or to CQL [map](http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_map_t.html#toc_pane). In the following example, the cassandra map is automatically mapped with `cql_partial_update:true`, so a partial CQL update cause a read of the whole map to re-index a document in the elasticsearch index. | ||
|
||
Create an index (a keyspace in your elassandra-aware datacenter) | ||
``` | ||
curl -XPUT "http://localhost:9200/twitter" | ||
``` | ||
Create a cassandra table with a map column. | ||
``` | ||
cqlsh>CREATE KEYSPACE IF NOT EXISTS twitter WITH replication={ 'class':'NetworkTopologyStrategy', 'DC1':'1' }; | ||
cqlsh>CREATE TABLE twitter.user ( | ||
name text, | ||
attrs map<text,text>, | ||
primary key (name) | ||
); | ||
cqlsh>INSERT INTO twitter.user (name,attrs) VALUES ('bob',{'email':'[email protected]','firstname':'bob'}); | ||
``` | ||
|
||
Create the type mapping from the cassandra table and search for the *bob* entry. | ||
``` | ||
curl -XPUT "http://localhost:9200/twitter/" -d '{ "settings" : { "number_of_shards" : 1, "number_of_replicas" : 0 } }' | ||
curl -XPUT "http://localhost:9200/twitter/_mapping/user" -d '{ "user" : { "columns_regexp" : ".*" }}' | ||
{"acknowledged":true} | ||
curl -XGET 'http://localhost:9200/twitter/_mapping/user?pretty=true' | ||
{ | ||
"twitter" : { | ||
"mappings" : { | ||
"user" : { | ||
"properties" : { | ||
"attrs" : { | ||
"type" : "nested", | ||
"cql_struct" : "map", | ||
"cql_collection" : "singleton", | ||
"properties" : { | ||
"email" : { | ||
"type" : "string" | ||
}, | ||
"firstname" : { | ||
"type" : "string" | ||
} | ||
} | ||
}, | ||
"name" : { | ||
"type" : "string", | ||
"cql_collection" : "singleton", | ||
"cql_partition_key" : true, | ||
"cql_primary_key_order" : 0 | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
Get the *bob* entry. | ||
``` | ||
curl -XGET "http://localhost:9200/twitter/user/bob?pretty=true" | ||
{ | ||
"_index" : "twitter", | ||
|
@@ -745,7 +884,6 @@ curl -XGET "http://localhost:9200/twitter/user/bob?pretty=true" | |
``` | ||
|
||
Now insert a new entry in the attrs map column and search for a nested field `attrs.city:paris`. | ||
|
||
``` | ||
cqlsh>UPDATE twitter.user SET attrs = attrs + { 'city':'paris' } WHERE name = 'bob'; | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.