-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi threaded sample request, products API #717
Comments
Most likely I think your bottleneck is going to be rate limits of the Shopify REST API. Depending on your plan it's anywhere between 2 and 40 requests/second. So the lower bound you can achieve is somewhere between 1 and 20 minutes for ~600k products. I don't think it's that bad. This should hopefully be a one-time thing and then subsequent syncs should be done with the If you're not hitting rate limits at any point that is a good sign that you can speed things up. You can go multi-threaded by partitioning your product queries. An easy example is that each thread uses consecutive and non-overlapping intervals of e.g And one last strategy, if your bottleneck is in the processing of your results (e.g writing to another data store) you can use a multi-threaded approach for your process but consuming products from the Shopify PageIterator in the same thread. This requires overwriting the Hope this helps. |
Thank you Alex - we're enterprise so there is still more gas in the tank for the API limits. For me, it’s the gathering of the products not our own processing taking a long time. I like the idea of partitioning based upon two different queries. I was stuck in my big commerce mindset that lets me easily split the pages of the same initial query up into different threads. Thread 1 gets stack 1 which is pages 1-100, thread 2, stack 2 pages 101-200, etc. |
Ah, I get it. That kind of thing can not be done here. Here is a simple multi-threaded approach to fetching products: from datetime import datetime, timedelta
from typing import Iterator
import shopify
from shopify import Product
from concurrent.futures import ThreadPoolExecutor
from threading import Lock
SHOP_URL = "<YOUR_SHOP_URL>"
API_VERSION = "2023-10"
TOKEN = "<YOUR_TOKEN>"
N_WORKERS = 10
INTERVAL_DAY_SIZE = 10
FIRST_PRODUCT_DATE = datetime.today().date() - timedelta(
days=100
) # The date of the first product created
class DateGetter:
def __init__(self):
self.current_date = FIRST_PRODUCT_DATE
self.lock = Lock()
def get_interval(self) -> tuple[datetime.date, datetime.date]:
"""Get the next interval of dates to query products for."""
with self.lock:
from_ = self.current_date
to_ = self.current_date + timedelta(days=INTERVAL_DAY_SIZE)
self.current_date = to_ + timedelta(days=1)
return from_, to_
def get_products(
created_at_min: datetime.date, created_at_max: datetime.date
) -> Iterator[Product]:
"""Get products created between the given dates."""
with shopify.Session.temp(SHOP_URL, API_VERSION, TOKEN):
for product in Product.find(
created_at_min=created_at_min,
created_at_max=created_at_max,
no_iter_next=False,
limit=250,
):
yield product
def worker(date_getter: DateGetter):
while True:
from_, to_ = date_getter.get_interval()
if from_ > datetime.today().date():
break
for product in get_products(from_, to_):
pass # do something with the products
def main():
date_getter = DateGetter()
results = []
pool = ThreadPoolExecutor(max_workers=N_WORKERS)
for _ in range(N_WORKERS):
results.append(pool.submit(worker, date_getter))
for result in results: # propagate exceptions, if any
result.result() Note: I don't remember whether Hope it's helpful to you! |
This issue is stale because it has been open for 60 days with no activity. It will be closed if no further action occurs in 14 days. |
We are closing this issue because it has been inactive for a few months. If you still encounter this issue with the latest stable version, please reopen using the issue template. You can also contribute directly by submitting a pull request– see the CONTRIBUTING.md file for guidelines Thank you! |
Can anyone provide a multi threaded sample to get products?
We have 612k total products (inc variants) and 250 at a time is very slow. I tried reducing the field payload and that seems to have its own challenges. What I would like to do would be to kick off say 5 threads that each have low/high limits and simultaneously download key product metrics.
The text was updated successfully, but these errors were encountered: