Skip to content
piatrashkakanstantinass edited this page Jun 9, 2021 · 23 revisions

PyWhat has its own API, it will return a JSON object like:

{
    "File Signatures": null,
    "Language": null,
    "Regexes": [
        {
            "Matched": "https://google.com/",
            "Regex Pattern": {
                "Name": "Uniform Resource Locator (URL)",
                "Regex": "(https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|www\\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9]+\\.[^\\s]{2,}|www\\.[a-zA-Z0-9]+\\.[^\\s]{2,})",
                "Description": "A Uniform Resource Location (URL) pointing to a web address.",
                "Rarity": 1,
                "Tags": [
                    "Identifiers"
                ]
            }
        }
    ]
}

To use this API, run this code:

from pywhat import Identifier
id = Identifier()
id.identify(text)

Identifier.identify() parameters

All parameters to identify() are keyword-only except the text itself.

id.identify(text,
            only_text=False, # If this is True, PyWhat will not read data from the file
            dist=None,       # Distribution to use (see below for more info regarding Distributions)
            key=None,        # Key used for sorting, defaults to Keys.NONE (see below for more info regarding sorting)
            reverse=False    # If this is True, the output is sorted in descending order
)

Filters & Distributions

To filter out what regexes should be used or shown, we can use distributions. A distribution is just a regex list but with a filter applied to it.

A nice use-case is Wannacry. Using distributions you can only get all the domains from malware (no crypto-addresses) and use that to auto-buy those domains if possible. Potentially stopping the malware if it has a built in kill-switch!

We start by importing the necessary libraries:

from pywhat import pywhat_tags, Distribution

Now we can make a filter:

filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}

We only support:

  • MinRarity. Rarity is a measure of how unlikely it is for something to be a false-positive. Rarity of 1 == it can't be a false positive.

Rarity of 0.1 == Very likely to be a false positive.

MinRarity is the absolute minimum you'll want to see. Up this to avoid false positives!

  • MaxRarity

Max rarity is the absolute maximum rarity you want to see.

  • Tags. Every regex is tagged. To only use AWS specific tags, use AWS as the tag.

To see all tags, run what --tags 😄

  • ExcludeTags. What tags do you not want to see?

Let's make another filter:

from pywhat import pywhat_tags, Distribution

filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}

Logical Operators

Distributions support logical operators! Want every tag that's in both filter1 and filter2?

from pywhat import pywhat_tags, Distribution

filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}

dist = Distribution(filter1) & Distribution(filter2)

r = identifier.Identifier(dist=dist)
r.identify(text)

Or:

from pywhat import pywhat_tags, Distribution

filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}

dist = Distribution(filter1) 
dist &= Distribution(filter2)

r = identifier.Identifier(dist=dist)
r.identify(text)

We also support logical or! Get all the items in distribution1 or distribution2!

from pywhat import pywhat_tags, Distribution

filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}
filter3 = {"ExcludeTags": ["AWS"]}

dist = Distribution(filter1) | Distribution(filter2)
dist |= Distribution(filter3)

r = identifier.Identifier(dist=dist)
r.identify(text)

Using Distributions and Identifier

There are 2 ways to use distributions with identifiers.

You can assign one per object:

r = Identifier(dist=dist)
r.identify(text)

Or you can call it in the identifier:

no_networking_tags = Distribution(filter2)
r.identify(text, dist=no_networking_tags)

Sorting

Pywhat supports sorting. You can get sorted output this way:

from pywhat import *
r = Identifier()
r.identify(text, key=Keys.RARITY) # returns matches sorted by rarity in ascending order
r2 = Identifier(key=Keys.MATCHED, reverse=True)
r2.identify(text) # returns matches sorted alphabetically in descending order

Available keys

Keys.NAME # Sort by the name of regex pattern
Keys.RARITY # Sort by rarity
Keys.MATCHED # Sort by a matched string
Keys.NONE # No sorting is done (the default)
Clone this wiki locally