Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try materialized views to speed time series I/O #7

Open
aufdenkampe opened this issue Sep 7, 2017 · 2 comments
Open

Try materialized views to speed time series I/O #7

aufdenkampe opened this issue Sep 7, 2017 · 2 comments

Comments

@aufdenkampe
Copy link
Member

aufdenkampe commented Sep 7, 2017

Roelof Versteeg, @roelofversteeg, gave the ODM2 team a presentation on Aug. 29 that described how he massively improved performance for fetching time series data by creating materialized views in ODM2 (using MySQL on 3-y old hardware). These are the notes from that meeting:

ODM2 Performance Optimization

  • Time series
    • 5 or 15 min intervals, via CampbellSci LoggerNet
    • Lots of sensors per station
  • Enters DB in very simple flatfile structure, similar to LoggerNet files
  • Subsequent mapping to get proper metadata
    • Into ODM2?
    • Continuously runs Python script
  • Create materialized view of auto-QAQCed data (via Python scripts)
    • Optimized in MySQL
    • Regular Views didn’t perform
    • Jeff: how do you create/manage these in MySQL?
      • Continuously runs Python script
    • Database is bigger, but all running well on 3-y old hardware
@aufdenkampe
Copy link
Member Author

@horsburgh, @sreeder, @emiliom, @lsetiawan, @miguelcleon

I found this recent (Feb 2017) article, entitled "Is PostgreSQL good enough?"
http://renesd.blogspot.com/2017/02/is-postgresql-good-enough.html
It compares PostgreSQL to many different speciality databases (including InfluxDB) and the conclusion is that you can do 99% of what you need with features that exist in recent versions of PostreSQL. The article describes those features.

A good ways down the article is a section on Time Series, with a lot of very promising looking performance tips, including:

  • Materialized views
  • Array functions and binary types
  • Block Range Index (BRIN)
  • Dataframe structures for fast I/O via SQLalchemy and Pandas
  • many others

The article links to many more resources and tutorials.
It describes using PostgreSQL to "sustain a load of ~6K datapoints per second across 6K series on a 2010 laptop."

I think there is a lot of promise here.

@aufdenkampe
Copy link
Member Author

aufdenkampe commented Feb 6, 2019

@miguelcleon, check out the article in my #7 (comment) above for a lot of great ideas on how to improve time series performance without the need for InfluxDB or even the TimescaleDB extension to PostgreSQL (#8 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant