Skip to content

Commit

Permalink
Merge branch 'django_updates'
Browse files Browse the repository at this point in the history
  • Loading branch information
Mraoul committed Jun 15, 2017
2 parents e051257 + 3999702 commit bdb9506
Show file tree
Hide file tree
Showing 13 changed files with 195 additions and 156 deletions.
104 changes: 66 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,27 +22,29 @@ For more information on the PHP implementation please see the [readme](../master
keep reading...


ElasticSearch
==============
ElasticSearch Support
=====================

<b>Important pyDat 3.0 ElasticSearch Notes</b>:

<b>The ElasticSearch backend code is still under testing, please consider the following before using ES as a backend:</b>
Note this is the only release (and overdue) for 3.0 as work is under way for pyDat 4.0.
pyDat 4.0 will remove support for MongoDB and requires a minimum of ElasticSearch 5.2 but
should be easier to work with and considerably faster due to significant improvements in
ElasticSearch 5.x. It will also, more than likely, require a full re-ingestion of source
data.

- Some things might be broken
- I.e., some error handling might be non-existent
- There might be random debug output printed out
- The search language might not be complete
- The data template used with ElasticSearch might change
- Which means you might have ot re-ingest all of your data at some point!
This release supports only ElasticSearch 2.x !!


<b>PreReqs to run with ElasticSearch</b>:

- ElasticSearch installed somewhere
- python elasticsearch library (pip install elasticsearch)
- python elasticsearch library (pip install elasticsearch>=2.0.0,<3.0.0)
- python lex yacc library (pip install ply)
- below specified prereqs too

<b>ElasticSearch Scripting</b>

ElasticSearch comes with dynamic Groovy scripting disabled due to potential sandbox breakout issues with the Groovy container. Unfortunately, the only way to do certain things in ElasticSearch is via this scripting language. Because the default installation of ES does not have a work-around, there is a setting called ES_SCRIPTING_ENABLED in the pyDat settings file which is set to False by default. When set to True, the pyDat advanced search capability will expose an extra feature called 'Unique Domains' which given search results that will return multiple results for a given domain (e.g., due to multiple versions of a domain matching) will return only the latest entry instead of all entries. Before setting this option to True, you must install a script server-side on every ES node -- to do this, please copy the file called \_score.groovy from the es_scripts directory to your scripts directory located in the elasticsearch configuration directory. On package-based installs of ES on RedHat/CentOS or Ubuntu this should be /etc/elasticsearch/scripts. If the scripts directory does not exist, please create it. Note you have to restart the Node for it to pick up the script.

<b> ElasticSearch Plugins</b>
Expand Down Expand Up @@ -76,48 +78,74 @@ all data is ingested properly. Anyone setting up their database, should read the
script before running it to ensure they've tweaked it for their setup. The following is the output from
elasticsearch_populate -h

<pre>
Usage: elasticsearch_populate.py [options]
Version 3.0 introduces ElasticSearch 2.x as a backend for whois data

Options:
<pre>
usage: elasticsearch_populate.py [-h] (-f FILE | -d DIRECTORY) [-e EXTENSION]
[-r] [-v] [--vverbose] [-s]
[-x EXCLUDE | -n INCLUDE] [-o COMMENT]
[-u [ES_URI [ES_URI ...]]] [-p INDEX_PREFIX]
[-i IDENTIFIER] [-B BULK_SIZE]
[--optimize-import] [-t THREADS]
[--bulk-serializers BULK_SERIALIZERS]
[--bulk-threads BULK_THREADS]
[--enable-delta-indexes]

optional arguments:
-h, --help show this help message and exit
-f FILE, --file=FILE Input CSV file
-d DIRECTORY, --directory=DIRECTORY
Directory to recursively search for CSV files -
prioritized over 'file'
-e EXTENSION, --extension=EXTENSION
-f FILE, --file FILE Input CSV file
-d DIRECTORY, --directory DIRECTORY
Directory to recursively search for CSV files --
mutually exclusive to '-f' option
-e EXTENSION, --extension EXTENSION
When scanning for CSV files only parse files with
given extension (default: 'csv')
-i IDENTIFIER, --identifier=IDENTIFIER
Numerical identifier to use in update to signify
version (e.g., '8' or '20140120')
-t THREADS, --threads=THREADS
Number of workers, defaults to 2. Note that each
worker will increase the load on your ES cluster
-B BULK_SIZE, --bulk-size=BULK_SIZE
Size of Bulk Insert Requests
-r, --redo Attempt to re-import a failed import or import more
data, uses stored metatdata from previous import (-o,
-n, and -x not required and will be ignored!!)
-v, --verbose Be verbose
--vverbose Be very verbose (Prints status of every domain parsed,
very noisy)
-s, --stats Print out Stats after running
-x EXCLUDE, --exclude=EXCLUDE
-x EXCLUDE, --exclude EXCLUDE
Comma separated list of keys to exclude if updating
entry
-n INCLUDE, --include=INCLUDE
-n INCLUDE, --include INCLUDE
Comma separated list of keys to include if updating
entry (mutually exclusive to -x)
-o COMMENT, --comment=COMMENT
-o COMMENT, --comment COMMENT
Comment to store with metadata
-r, --redo Attempt to re-import a failed import or import more
data, uses stored metatdata from previous import (-o
and -x not required and will be ignored!!)
-u ES_URI, --es-uri=ES_URI
Location of ElasticSearch Server (e.g.,
foo.server.com:9200)
-p INDEX_PREFIX, --index-prefix=INDEX_PREFIX
-u [ES_URI [ES_URI ...]], --es-uri [ES_URI [ES_URI ...]]
Location(s) of ElasticSearch Server (e.g.,
foo.server.com:9200) Can take multiple endpoints
-p INDEX_PREFIX, --index-prefix INDEX_PREFIX
Index prefix to use in ElasticSearch (default: whois)
--bulk-threads=BULK_THREADS
How many threads to use for making bulk requests to ES
-i IDENTIFIER, --identifier IDENTIFIER
Numerical identifier to use in update to signify
version (e.g., '8' or '20140120')
-B BULK_SIZE, --bulk-size BULK_SIZE
Size of Bulk Elasticsearch Requests
--optimize-import If enabled, will change ES index settings to speed up
bulk imports, but if the cluster has a failure, data
might be lost permanently!
-t THREADS, --threads THREADS
Number of workers, defaults to 2. Note that each
worker will increase the load on your ES cluster since
it will try to lookup whatever record it is working on
in ES
--bulk-serializers BULK_SERIALIZERS
How many threads to spawn to combine messages from
workers. Only increase this if you're are running a
lot of workers and one cpu is unable to keep up with
the load
--bulk-threads BULK_THREADS
How many threads to spawn to send bulk ES messages.
The larger your cluster, the more you can increase
this
--enable-delta-indexes
If enabled, will put changed entries in a separate
index. These indexes can be safely deleted if space is
an issue, also provides some other improvements
</pre>


Expand Down
2 changes: 1 addition & 1 deletion docker/apache.config
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ WSGIScriptAlias "/" "/opt/WhoDat/pydat/pydat/wsgi.py" process-group=pydat applic

# Static content - CSS, Javascript, images, etc.
Alias /static/ /opt/WhoDat/pydat/pydat/static/
<Directory /opt/WhoDat/pydat/pydat/static>
<Directory /opt/WhoDat/pydat/extras/www/static>
Order allow,deny
Allow from all
</Directory>
Expand Down
4 changes: 2 additions & 2 deletions docker/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ pymongo
requests
unicodecsv
markdown
django
elasticsearch
django<=1.11.12
elasticsearch>=2.0.0,<3.0.0
ply
2 changes: 1 addition & 1 deletion pydat/pydat/ajax.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from django.conf import settings
from django.template import RequestContext
from django.core.urlresolvers import reverse
from django.urls import reverse
from django.shortcuts import render_to_response, HttpResponse
import urllib

Expand Down
57 changes: 31 additions & 26 deletions pydat/pydat/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@

DEBUG = False

TEMPLATE_DEBUG = DEBUG

SITE_ROOT = os.path.dirname(os.path.realpath(__file__))

HANDLER = 'mongo'
Expand Down Expand Up @@ -135,56 +133,63 @@
STATIC_URL = '/static/'

# Additional locations of static files
STATICFILES_DIRS = (
STATICFILES_DIRS = [
# Put strings here, like "/home/html/static" or "C:/www/django/static".
# Always use forward slashes, even on Windows.
# Don't forget to use absolute paths, not relative paths.
os.path.join(SITE_ROOT, 'static'),
)
]

# List of finder classes that know how to find static files in
# various locations.
STATICFILES_FINDERS = (
STATICFILES_FINDERS = [
'django.contrib.staticfiles.finders.FileSystemFinder',
'django.contrib.staticfiles.finders.AppDirectoriesFinder',
# 'django.contrib.staticfiles.finders.DefaultStorageFinder',
)
]

# Make this unique, and don't share it with anybody.
SECRET_KEY = 'o=skwv+igf2%#6n&p!nd##w(a*wqugkcq4-2=wugz0(715*!l#'

# List of callables that know how to import templates from various sources.
TEMPLATE_LOADERS = (
'django.template.loaders.filesystem.Loader',
'django.template.loaders.app_directories.Loader',
# 'django.template.loaders.eggs.Loader',
)

TEST_RUNNER = 'django.test.runner.DiscoverRunner'

MIDDLEWARE_CLASSES = (
MIDDLEWARE = [
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
#'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
# Uncomment the next line for simple clickjacking protection:
# 'django.middleware.clickjacking.XFrameOptionsMiddleware',
)
]

ROOT_URLCONF = 'pydat.urls'

# Python dotted path to the WSGI application used by Django's runserver.
WSGI_APPLICATION = 'pydat.wsgi.application'

TEMPLATE_DIRS = (
# Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
# Always use forward slashes, even on Windows.
# Don't forget to use absolute paths, not relative paths.
os.path.join(SITE_ROOT, 'templates'),
)

INSTALLED_APPS = (
_TEMPLATE_DIRS_ =[os.path.join(SITE_ROOT, 'templates')]
TEMPLATES = [
{
"BACKEND": "django.template.backends.django.DjangoTemplates",
"DIRS": _TEMPLATE_DIRS_,
"OPTIONS":{
"context_processors":[
'django.contrib.auth.context_processors.auth',
'django.template.context_processors.debug',
'django.template.context_processors.i18n',
'django.template.context_processors.media',
'django.template.context_processors.static',
'django.template.context_processors.tz',
'django.contrib.messages.context_processors.messages',
'django.template.context_processors.csrf'
],
'debug': DEBUG,
},

},
]

INSTALLED_APPS = [
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
Expand All @@ -196,7 +201,7 @@
# Uncomment the next line to enable admin documentation:
# 'django.contrib.admindocs',
'pydat',
)
]

# A sample logging configuration. The only tangible logging
# performed by this configuration is to send an email to
Expand Down
15 changes: 8 additions & 7 deletions pydat/pydat/templates/base.html
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
{% load static %}
<!DOCTYPE HTML>
<html>
<head>
<title>pyDat: {% block title %}WHOIS exploration{% endblock %}</title>
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery-ui-1.10.4.css">
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery.dataTables.css">
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/pydat.css">
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery-ui-1.10.4.css' %}">
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery.dataTables.css' %}">
<link rel="stylesheet" type="text/css" href="{% static 'css/pydat.css' %}">
{% block css %}
{% endblock %}
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-1.11.0.min.js"></script>
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-ui-1.10.4.js"></script>
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery.dataTables.js"></script>
<script type="text/javascript" src="{% static '/js/jquery-1.11.0.min.js' %}"></script>
<script type="text/javascript" src="{% static '/js/jquery-ui-1.10.4.js' %}"></script>
<script type="text/javascript" src="{% static '/js/jquery.dataTables.js' %}"></script>
<script type="text/javascript">
var resolve_url = "{% url 'ajax_resolve' %}";
var csrf_token = '{{ csrf_token }}';
var latest_version = '{{ latest_version }}';
</script>
{% block js_constants %}
{% endblock %}
<script type="text/javascript" src="{{STATIC_URL}}/js/pydat.js"></script>
<script type="text/javascript" src="{% static '/js/pydat.js' %}"></script>
{% block js %}
{% endblock %}
</head>
Expand Down
5 changes: 3 additions & 2 deletions pydat/pydat/templates/domain_results.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{% extends 'base.html' %}
{% load static %}

{% block title %}Domain Search{% endblock %}

Expand All @@ -24,9 +25,9 @@

{% block js %}
{% if legacy_search %}
<script type="text/javascript" src="{{STATIC_URL}}/js/domain.js"></script>
<script type="text/javascript" src="{% static '/js/domain.js' %}"></script>
{% else %}
<script type="text/javascript" src="{{STATIC_URL}}/js/domain_advanced.js"></script>
<script type="text/javascript" src="{% static '/js/domain_advanced.js' %}"></script>
{% endif %}
{%endblock %}

Expand Down
15 changes: 8 additions & 7 deletions pydat/pydat/templates/nosearchbase.html
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
{% load static %}
<!DOCTYPE HTML>
<html>
<head>
<title>pyDat: {% block title %}WHOIS exploration{% endblock %}</title>
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery-ui-1.10.4.css">
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/jquery.dataTables.css">
<link rel="stylesheet" type="text/css" href="{{STATIC_URL}}/css/pydat.css">
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery-ui-1.10.4.css' %}">
<link rel="stylesheet" type="text/css" href="{% static '/css/jquery.dataTables.css' %}">
<link rel="stylesheet" type="text/css" href="{% static '/css/pydat.css' %}">
{% block css %}
{% endblock %}
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-1.11.0.min.js"></script>
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery-ui-1.10.4.js"></script>
<script type="text/javascript" src="{{STATIC_URL}}/js/jquery.dataTables.js"></script>
<script type="text/javascript" src="{% static '/js/jquery-1.11.0.min.js' %}"></script>
<script type="text/javascript" src="{% static '/js/jquery-ui-1.10.4.js' %}"></script>
<script type="text/javascript" src="{% static '/js/jquery.dataTables.js' %}"></script>
<script type="text/javascript">
var resolve_url = "{% url 'ajax_resolve' %}";
var csrf_token = '{{ csrf_token }}';
var latest_version = '{{ latest_version }}';
</script>
{% block js_constants %}
{% endblock %}
<script type="text/javascript" src="{{STATIC_URL}}/js/pydat.js"></script>
<script type="text/javascript" src="{% static '/js/pydat.js' %}"></script>
{% block js %}
{% endblock %}
</head>
Expand Down
3 changes: 2 additions & 1 deletion pydat/pydat/templates/pdns_results.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{% extends 'base.html' %}
{% load static %}

{% block title %}pDNS{% endblock %}

Expand All @@ -10,7 +11,7 @@
{% endblock %}

{% block js %}
<script type="text/javascript" src="{{STATIC_URL}}/js/pdns.js"></script>
<script type="text/javascript" src="{% static '/js/pdns.js' %}"></script>
{% endblock %}

{% block searchBar %}
Expand Down
3 changes: 2 additions & 1 deletion pydat/pydat/templates/rpdns_results.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{% extends 'base.html' %}
{% load static %}

{% block title %}pDNS{% endblock %}

Expand All @@ -10,7 +11,7 @@
{% endblock %}

{% block js %}
<script type="text/javascript" src="{{STATIC_URL}}/js/pdns.js"></script>
<script type="text/javascript" src="{% static '/js/pdns.js' %}"></script>
{% endblock %}

{% block searchBar %}
Expand Down
Loading

0 comments on commit bdb9506

Please sign in to comment.