Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi threaded sample request, products API #717

Closed
focusede opened this issue May 22, 2024 · 5 comments
Closed

multi threaded sample request, products API #717

focusede opened this issue May 22, 2024 · 5 comments
Labels

Comments

@focusede
Copy link

focusede commented May 22, 2024

Can anyone provide a multi threaded sample to get products?

We have 612k total products (inc variants) and 250 at a time is very slow. I tried reducing the field payload and that seems to have its own challenges. What I would like to do would be to kick off say 5 threads that each have low/high limits and simultaneously download key product metrics.

@alex5207
Copy link

alex5207 commented Jun 12, 2024

Most likely I think your bottleneck is going to be rate limits of the Shopify REST API. Depending on your plan it's anywhere between 2 and 40 requests/second. So the lower bound you can achieve is somewhere between 1 and 20 minutes for ~600k products.

I don't think it's that bad. This should hopefully be a one-time thing and then subsequent syncs should be done with the updated_at_min parameter which should be much less.

If you're not hitting rate limits at any point that is a good sign that you can speed things up. You can go multi-threaded by partitioning your product queries. An easy example is that each thread uses consecutive and non-overlapping intervals of e.g created_at_min and created_at_max. You can do this because the library is thread-safe but its effect is obviously limited to your ability to partition the products well on a parameter that the API supports. High/low on the ids themselves won't work since they're global.

And one last strategy, if your bottleneck is in the processing of your results (e.g writing to another data store) you can use a multi-threaded approach for your process but consuming products from the Shopify PageIterator in the same thread. This requires overwriting the threadlocal.

Hope this helps.

@focusede
Copy link
Author

Thank you Alex - we're enterprise so there is still more gas in the tank for the API limits. For me, it’s the gathering of the products not our own processing taking a long time. I like the idea of partitioning based upon two different queries. I was stuck in my big commerce mindset that lets me easily split the pages of the same initial query up into different threads. Thread 1 gets stack 1 which is pages 1-100, thread 2, stack 2 pages 101-200, etc.

@alex5207
Copy link

alex5207 commented Jun 12, 2024

Thank you Alex - we're enterprise so there is still more gas in the tank for the API limits. For me, it’s the gathering of the products not our own processing taking a long time. I like the idea of partitioning based upon two different queries. I was stuck in my big commerce mindset that lets me easily split the pages of the same initial query up into different threads. Thread 1 gets stack 1 which is pages 1-100, thread 2, stack 2 pages 101-200, etc.

Ah, I get it. That kind of thing can not be done here.

Here is a simple multi-threaded approach to fetching products:

from datetime import datetime, timedelta
from typing import Iterator
import shopify
from shopify import Product
from concurrent.futures import ThreadPoolExecutor
from threading import Lock


SHOP_URL = "<YOUR_SHOP_URL>"
API_VERSION = "2023-10"
TOKEN = "<YOUR_TOKEN>"
N_WORKERS = 10
INTERVAL_DAY_SIZE = 10
FIRST_PRODUCT_DATE = datetime.today().date() - timedelta(
    days=100
)  # The date of the first product created


class DateGetter:
    def __init__(self):
        self.current_date = FIRST_PRODUCT_DATE
        self.lock = Lock()

    def get_interval(self) -> tuple[datetime.date, datetime.date]:
        """Get the next interval of dates to query products for."""
        with self.lock:
            from_ = self.current_date
            to_ = self.current_date + timedelta(days=INTERVAL_DAY_SIZE)
            self.current_date = to_ + timedelta(days=1)

            return from_, to_


def get_products(
    created_at_min: datetime.date, created_at_max: datetime.date
) -> Iterator[Product]:
    """Get products created between the given dates."""
    with shopify.Session.temp(SHOP_URL, API_VERSION, TOKEN):
        for product in Product.find(
            created_at_min=created_at_min,
            created_at_max=created_at_max,
            no_iter_next=False,
            limit=250,
        ):
            yield product


def worker(date_getter: DateGetter):
    while True:
        from_, to_ = date_getter.get_interval()
        if from_ > datetime.today().date():
            break
        for product in get_products(from_, to_):
            pass  # do something with the products


def main():
    date_getter = DateGetter()

    results = []
    pool = ThreadPoolExecutor(max_workers=N_WORKERS)
    for _ in range(N_WORKERS):
        results.append(pool.submit(worker, date_getter))

    for result in results:  # propagate exceptions, if any
        result.result()

Note: I don't remember whether created_at_min and created_at_max are actually inclusive so you might want to check that and adjust as needed.

Hope it's helpful to you!

Copy link

This issue is stale because it has been open for 60 days with no activity. It will be closed if no further action occurs in 14 days.

@github-actions github-actions bot added the Stale label Aug 12, 2024
Copy link

We are closing this issue because it has been inactive for a few months.
This probably means that it is not reproducible or it has been fixed in a newer version.
If it’s an enhancement and hasn’t been taken on since it was submitted, then it seems other issues have taken priority.

If you still encounter this issue with the latest stable version, please reopen using the issue template. You can also contribute directly by submitting a pull request– see the CONTRIBUTING.md file for guidelines

Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants