Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication on Measurement Upload #158

Open
nobodyinperson opened this issue Apr 3, 2018 · 14 comments
Open

Authentication on Measurement Upload #158

nobodyinperson opened this issue Apr 3, 2018 · 14 comments
Assignees
Labels

Comments

@nobodyinperson
Copy link

When uploading new measurement data via the API HTTP POST method (according to the docs at https://docs.opensensemap.org/#api-Measurements-postNewMeasurement) there is no authentication that authorizes the uploader to do so. At least for my custom test box that I created via the openSenseMap web interface this results in anyone being able to upload arbitrary measurement data to my box via a simple

curl \
    -H "Content-Type: application/json" \
    -d '{"value":"3333"}' \
    https://api.opensensemap.org/boxes/$SENSEBOX_ID/$SENSOR_ID

where $SENSEBOX_ID and $SENSOR_ID are publicly available, e.g. via the openSenseMap web interface.

While of course the whole openSenseMap project is based on nice people collaborating and feeding an awesome measurement network together, "trust is good, control is better", I would say. :-)

Expected Behavior

The new Measurement HTTP POST API method should require a mean of authentication.

Current Behavior

The new Measurement HTTP POST API method does not require a mean of authentication.

Possible solution

I understand that introducing an authentication to the upload mechanism increases the complexity of the sensor software. A solution could be to add an option for upload only with authentication to the senseBox which forces the upload to happen with authentication, e.g. with the standard header-based authentication with the JSON Web Token. Maybe even via another API method.

Your Environment

  • Operating System and version: XUbuntu 16.04
@noerw
Copy link
Member

noerw commented Apr 3, 2018

Afaik the reason this was not added yet is because this would break compatibility with all already deployed boxes.
I personally like the idea of a flag to circumvent this.
It would be set by default for new boxes, and could be set for existing boxes by users (or would be set automatically when updating the sketch).

@nobodyinperson
Copy link
Author

I just realized that the Arduinos use http://ingress.opensensemap.org which is unencrypted HTTP for upload (because plain Arduinos are too weak for SSL I assume). Unencrypted authentication is rather a joke but still a bigger obstacle for malicious data upload that having none.

@nobodyinperson
Copy link
Author

Still, the Arduino sketch would need to include the sign-in/token refreshing process which adds a little boilerplate. Should be manageble though.

@ubergesundheit
Copy link
Member

Hi @nobodyinperson,

thanks for opening this issue! Your observations are correct and there is no authentication or authorization present when uploading new values.

As you've already guessed, introducing a means of authorization and authentication of uploading stations means really big changes to the current architecture on both server and stations. In fact, we are actively looking for a good solution for making sure, measurements come from the right source.

Before we look at possible solutions, here are some requirements the solution should bring:

  • Should withstand easy guessing of authentication token
  • Should be simple to implement on the microcontrollers side
  • No plain text transport of the authentication means => TLS/HTTPS

Some solutions that come to mind:

A simple shared secret between device and server (static API token or credential pair for HTTP Basic authentication) which is sent with each request. Upon registering a new station, the server generates a simple API token which is given to the station.
Pro: Really easy to implement
Cons: Without transport encryption, this secret is easyly readable for everyone looking at TCP traffc between station and server. Our current stations only support HTTP, so this solution is easily broken.

Hashing of the payload. Upon registering a station, a secret key is generated and given to the station. Each set of measurements is hashed using a cryptographic hash function with the secret key. The hash is sent along with the measurements and is used to make sure the station is authorized to send measurements.
Some limitations come to mind: Since sensor ids, the acutal measurements and the formatting and encoding of requests are public, using a long secret is a must to prevent brute force guessing of the secret. A longer secrets means higher computational cost. A solution for this could be to use a time-based one-time password as secret key, but here the internal clocks of the server and stations must be synchronized to work. This requires the device and server to use network time. (Already present on the server).
Pro: If implemented right, should be reasonably secure, even without transport encryption since the secret never leaves the device.
Cons: Hard to get right. High computational cost.

So adding opt-in authentication should be the right way to go forward, but I think the decision which method to implement should'nt be rushed. The chosen solution should work for everyone and should be easy to implement for everyone. We recently announced our new board which should pack enough computational power to make something possible.

Maybe you have another good idea or some thoughts?

The endpoint ingress.opensensemap.org also supports https.

@nobodyinperson
Copy link
Author

Thanks for your detailed reply, @ubergesundheit !

This PSK (pre shared key) method you are suggesting sounds very promising for this purpose. I think what you described there is commonly known as salting.

  • No plain text transport of the authentication means => TLS/HTTPS

I think this requirement can be neglected for the hashing approach.

Cons: Hard to get right. High computational cost.

There seems to be at least a simple MD5 hashing library for the Arduino (https://github.com/tzikis/ArduinoMD5/). MD5-hashing strings of around 1k bytes length seems to be pretty fast enough on an Arduino Uno. SHA1 might be a little more computationally expensive but should still work. I didn't find a SHA1 library for the Arduino though...

Maybe it isn't necessary to use the whole payload for hashing, say only the first 1k bytes. Together with a 40-byte SHA or base64-encoded PSK it seems resonably safe to me. This is, after all, just a measure of preventing spam or unauthorized data upload, not traffic encryption.

@poempelfox
Copy link

To solve the main problem of "everybody can send fake data for other peoples sensors" and leaving aside transport security which I'd consider far less of a problem:

Couldn't you simply introduce new, additional "public" IDs for senseboxes and sensors? The current ones would then be renamed to "private" IDs. The API would accept either the public or private IDs for all read-only calls, but only the private IDs for everything that changes things, e.g. posting new data. The public part of the website would simply switch to using the public IDs, no longer exposing the private IDs.
That approach would certainly require changes server-side for API and webinterface, but no changes
at all for the clients sending data.

You could perhaps just use something like the SHA512-sum of the private ID as the public ID, so everyone knowing the private IDs could generate the matching public ID themselves without talking to your database.

@nobodyinperson
Copy link
Author

Hi @poempelfox,

You are effectively suggesting to drop the salting mechanism and just send some kind of PSK (your private key) along with the data (and the sensor ID - your public key), right?

If we plan to implement an authentication method to prevent people from uploading bogus to other people's sensors, that method should be solid enough. The Arduinos cannot encrypt their HTTP traffic. So anybody in the same network can easily scrap anything it sends through the network - including the "private" key. The PSK hash salting approach prevents exactly that although the traffic is unencrypted.

@poempelfox
Copy link

poempelfox commented May 7, 2018 via email

@nobodyinperson
Copy link
Author

In another (currently private) project on GitLab.com I have tested the salting mechanism to prevent malicious uploads. It works very well. The Arduino MD5-library works perfectly fine to hash the HTTP salted payload. The server can check that hash against its own salts and only accept the measurement if the hashes match.

@mpfeil
Copy link
Member

mpfeil commented Sep 17, 2018

@nobodyinperson thanks for the feedback. Could you provide a sample for us?

@nobodyinperson
Copy link
Author

When the project is ready, I will make it public so you can access it too.

@nobodyinperson
Copy link
Author

nobodyinperson commented Sep 25, 2018

Here is our project python3-co2logserver: https://gitlab.com/tue-umphy/co2mofetten/python3-co2logserver

Here is an excerpt from the README:

Authentication

If you want to control who is allowed to upload data to the server, you
may use the PSK (pre-shared-key) salting mechanism built into the
server.

Set CO2LOGSERVER_UPLOAD_REQUIRES_AUTH=True and specify one or more PSK salt
strings, e.g. CO2LOGSERVER_CHECKSUM_SALTS = ["my-super-secret-psk"].

By default, the server then only accepts requests including at least one header
field Content-HASHALGORITHM-Salted containing the hexadecimal hash of the
sent payload with the salt appended calculated with HASHALGORITHM (e.g. MD5,
SHA1, SHA256, etc...).

For example, if you want to upload the JSON data
{"time_utc":[43,23],"co2":[1223,2351]} and your salt string is
my-super-secret-psk, your header field Content-MD5-Salted would be
b71e91feb2be18ccca019914a1da5b1d which is the MD5-sum of
{"time_utc":[43,23],"co2":[1223,2351]}my-super-secret-psk.

This is a simple yet effective way of preventing spam uploads.

Security Note

Note, however, that communication to the server is still unencrypted (only
HTTP, not HTTPS). The reason for this is that embedded devices like Arduinos
do not have the capabilities for encrypted web traffic. Thus, the sent data
including the checksums can theoretically be intercepted and reused to
reupload the exact same dataset.

The interesting code parts lives here.

On the Arduino side, the Arduino MD5 library is used to hash the payload.

It works perfectly fine.

@mpfeil mpfeil added v7 and removed v6 labels Nov 20, 2018
@nobodyinperson
Copy link
Author

I just made the LogserverClient Ardunino Library public. You can find the function for calculating the salted MD5 hash here. It uses my fork of the ArduinoMD5 library which fixes heap fragmentation by avoiding dynamic memory allocation. It works perfectly fine for a basic, unencrypted authentication based on a pre-shared key.

@noerw
Copy link
Member

noerw commented Feb 27, 2019

@nobodyinperson this looks great!
Do I understand correctly that the hash salted with the secret is sent in the Content-MD5-Salted HTTP header? (never mind, I just read your previous comment again)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants