Design and implement a "risk metric" for websites #30

dzeber · 2018-12-31T20:52:14Z

We would like to use the information about website behaviour learnt from the crawl data to build out a system to help users stay safe online. The original idea was to design a scalar "risk" or "badness" metric that would alert the user to the potential riskiness of each page they visit.

However, we immediately run into the question of how to define "badness". While we can claim that some sites are intuitively less trustworthy than others, and some website behaviours are undesirable (eg. respawning cookies deleted by the user), it is difficult to define an overall notion of badness that applies to all sites, as it is highly subjective and nuanced.

Some reasonable candidates for a definition of badness are trustworthiness, privacy loss or security risk. However, these are often difficult to quantify without imposing value judgements on both users and website owners. In particular, it is not realistic to attempt to quantify privacy loss in an absolute way, since different users are comfortable sharing different degrees of personal information, often in exchange for the utility or convenience they gain from some content or a service. Furthermore, a single user may have different thresholds of privacy risk for different content providers, depending on
how much they trust them. Security risk is more objective, but is difficult to measure using the crawl data.

Some examples to consider:

Facebook may be considered "risk" from a privacy point of view, since it solicits vast amounts of personal information from users and uses it to target ads. According to recent reports, it also appears to have been sharing user information with third parties without their knowledge. However, flagging Facebook as a "bad" site is not necessarily practical since it is relied on by millions of people worldwide who may be unlikely to act on this information by quitting Facebook. Furthermore, an outcome like this, which goes against many users' personal privacy tradeoffs, could result in them distrusting our risk metric instead.
Many news organization websites depend on advertising revenue, and loading their sites triggers requests to third-party content such ad ad servers and known tracking services. However, this does not necessarily tell us anything about the trustworthiness of the news organization itself.
Session replay can be considered privacy-invasive, since it records users' interactions with sites in detail, possibly collecting sensitive information entered into text fields, such as credit card numbers. While this may be considered more risky if the session replay service provider is a third party, the site owner may have legitimate reasons for using session replay, and may host the service themselves.
Cryptojacking consumes system resources on the user's device by mining cryptocurrency in the background while they are browsing a site. While this may be considered an annoyance, it is not necessarily a privacy risk if no personal information is collected.
Fingerprinting is often used as a way to identify unique users to a site, and is often considered undesirable from a privacy point of view. However, the fact that a site uses fingerprinting is not necessarily "bad" per se, and it may have legitimate uses. It can become a privacy risk when it is used by the site owner or a third party to associate different units of personal information in a way that the user does not want. However, this is not something we can generally measure from the crawl data we collected.

With this in mind, we propose an approach to assessing website riskiness that draws inspiration from the "Nutrition Facts" that are common in food labelling in many countries. We would want the metric to meet the following conditions:

It should be objective or fact-based, and should not depend on value judgements of either the site owner or third-parties.
It should be measurable on any site.
It should be easy for the user to trust. One way to do this would be to make it reasonably intuitive.
It should be used as a relative measure that users can easily compare between sites or against a personal baseline, and determine that one site has a higher score. Thus, it should probably avoid familiar scales that induce absolute assessments of individual sites, such as letter grades or percentages.
Ideally, it should be accessible at different levels of detail, so that interested users can drill down into multiple measurement dimensions, but there is an overall summary number than can be used to quickly compare two sites.

As an initial version of such a metric, we propose to count things that happen behind the scenes when a webpage is loaded. While this needs some refinement, we generally consider this to mean "what happens in the browser when a page is loaded, outside of rendering visible content". This would include things like requests to third parties, background scripts, calls to Javascript APIs that are not directly related to the page content, etc., which are generally opaque to most users of the modern Web.

By design, this does not directly report a measure of risk or badness. However, it relies on the assumption that, when undesirable behaviour occurs, it occurs behind the scenes. Therefore, this metric would in fact provide information on such behaviours to the user, since they would be covered by the counts. Moreover, pages with higher counts can be considered more risky than those with lower
counts, since they provide more opportunities for undesirable behaviour to occur, ie. a larger potential attack surface.

To implement this metric, we propose the following:

Make a list of concrete behaviours that provide a decent overview of what happens behind the scenes in the browser, such as: a third-party request was made, a cookie was set, local storage was accessed, etc.
When a page loads, count occurrences of each of these behaviours.
Compute an aggregate score for the page by combining the counts in some way, eg. summing them or taking a weighted average.
Provide UI in the browser to display the summary score and allow the user to view the individual counts, possibly organized in a hierarchical way (eg. third-party requests may split into known ad servers, known trackers, etc)

dzeber · 2018-12-31T20:54:29Z

Another potential direction we could take is to define "badness" as not respecting choices or preferences the user has expressed. In this approach, we could use a similar implementation of counting occurrences of behaviors. This seems like a more universal way we could assign value judgements to website behaviours. However, the tricky part will be determining the preferences expressed by the user and detecting violations. This requires more thought, but some initial examples include tracking (eg. sending cookies to known trackers) when a user has Do Not Track set, and respawning cookies that have been deleted by the user.

mlopatka added the research question Outstanding questions that have not been investigated yet. label Jan 11, 2019

birdsarah added discussion Item is still in discussion and removed research question Outstanding questions that have not been investigated yet. labels Mar 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and implement a "risk metric" for websites #30

Design and implement a "risk metric" for websites #30

dzeber commented Dec 31, 2018

dzeber commented Dec 31, 2018

Design and implement a "risk metric" for websites #30

Design and implement a "risk metric" for websites #30

Comments

dzeber commented Dec 31, 2018

dzeber commented Dec 31, 2018