Let context be merged into import data #27

benjaminwil · 2024-03-24T20:06:42Z

Hi, hope you all are well.

In some cases, the given import data will not be enough to create valid Active Record objects. Developers may need to seed the import data to allievate errors related to required belongs_to associations, and so on. This commit provides an interface for this.

In some cases, the given import data will not be enough to create valid Active Record objects. Developers may need to seed the import data to allievate errors related to required `belongs_to` associations, and so on. This commit provides an interface for this.

stevepolitodesign

Thank you for opening this. I'm interested in exploring this more, but in the meantime, I was wondering if the existing interface could support relationships?

csv_string = CSV.generate do |csv|
  csv << %w[title1 content1 1]
  csv << %w[title2 content2 1]
end

ArtVandelay::Import.new(:posts, rollback: true)
  .csv(
    csv_string,
    headers: [:title, :content, :user_id],
  )
end

benjaminwil · 2024-03-30T22:15:03Z

You're right that the existing interface can support relationships by having a CSV column like user_id to make an association. I will try to better explain why I think this would still be valuable given that fact. I will also explain my intended use case, where having a CSV column like user_id is not possible.

First, here is why I think it would be valuable even if having a column with an ID (like user_id) is a possibility:

A record's ID may not be the same across all environments. This means that you would need to prepare multiple CSV files to test imports across development, test, staging, and production environments.
An end user may not have access to the IDs they need to prepare the data, meaning there may be some back-and-forth between the user and the developer. Plus, the end user must trust the developer to provide the remaining import data without any real ability to check and confirm their work before the import.
If the developer made a self-serve imports UI for the end user to import their own data using Art Vandelay, there is no way for the imports to be scoped to the user's account. (Imagine an application with multiple tenants, or users who manage many accounts.) This leads into my intended use case...

I intend to use Art Vandelay as the import/export backend to a self-serve import/export UI. It allows the end user to import and export the contents of their account(s). The end user does not know their own user_id or account_ids. In order to make this import/export UI safe, I need to merge context like current_user and Account.find_by(name: params[:selected_account_name], administrators: [current_user]).

Maybe a self-serve imports/exports UI is beyond the scope what you envision Art Vandelay being used for. I would love to discuss that further. But even if that is not in your vision of what Art Vandelay should be used for, I think points 1. and 2. still make this a valuable feature.

With regard to point 2., I can imagine a developer setting up a script to be reused for multiple accounts in an application with many tenants:

ACCOUNT_NAME=art_vandelay_industries CSV=AVI.csv rake run_import

ACCOUNT_NAME=j_peterman_catalog CSV=JPC.csv rake run_import

I hope this clarifies why I think this feature is valuable. I'm of course happy to discuss it further.

stevepolitodesign · 2024-04-19T12:09:19Z

Thank you for the additional context!

A record's ID may not be the same across all environments. This means that you would need to prepare multiple CSV files to test imports across development, test, staging, and production environments.

This is a valid point, but I imagine in this case, you could use a different unique identifier, if available. However, I recognize that not all records have a unique identifier other than the id.

An end user may not have access to the IDs they need to prepare the data, meaning there may be some back-and-forth between the user and the developer. Plus, the end user must trust the developer to provide the remaining import data without any real ability to check and confirm their work before the import.

In this case, wouldn't an end user still need to know something unique about the record, though? In the example provided in the tests, we create a record, but I'd imagine in practice we'd need to find a user. Most likely by their email, but an end user may not always have access to that.

If the developer made a self-serve imports UI for the end user to import their own data using Art Vandelay, there is no way for the imports to be scoped to the user's account. (Imagine an application with multiple tenants, or users who manage many accounts.) This leads into my intended use case...

In this case, I think there's an opportunity to parse the CSV before passing it to ArtVandelay::Import. I say this because with your intended use case, you still need to query for a user.

csv_string = CSV.generate do |csv|
  csv << ["email", "password"]
  csv << ["george@vandelay_industries.com", "bosco"]
  csv << ["kel_varnsen@vandelay_industries.com", nil]
end

csv_data = CSV.parse(csv_string, headers: true)

csv_data.each do |row|
  row["user_id"] = current_user.id
end

result = ArtVandelay::Import.new(:users).csv(csv_string)

Admittedly, the API you're introducing is more direct, but I worry that it might be a premature feature.

benjaminwil · 2024-04-19T18:10:55Z

In this case, I think there's an opportunity to parse the CSV before passing it to ArtVandelay::Import.

I think this makes sense until the CSV has millions of rows. I expect to be importing CSVs with millions of rows.

I am trying to reduce the amount of boilerplate code I have to manage (per new import type) to safely import large amounts of data (in batches).

I don't know exactly how memory management works when you need to parse and transform a large CSV, so maybe I am wrong, but I think this leads to the developer having to manage a lot of the transformations using the CSV interface.

This pull request is actually just a baby step in the direction I ended up going in my own implementation, where context allows you to specify a proc as a value so you can transform the import data in place.

Here is how I am initializing an import in my application with changes to the gem in my fork, for example:

    ArtVandelay::Import
      .new(Bookmark, filtered_attributes: FILTERED_ATTRIBUTES)
      .csv(
        contents,
        attributes: ATTRIBUTES_MAPPING,
        context: {
          account: account,
          private: ->(value) { value == "yes" }, # A boolean attribute.
          read_at: ->(value) { value == "no" ? Time.now : nil }, # A datetime attribute that is a boolean in the CSV.
          tags: ->(value) { find_or_create_tags_from(value) } # A has_many relationship that is a list of comma-separated tags in the CSV.
        }
      )

I hope this demonstrates the benefits that context could provide. If context is a premature or unwanted feature for the importer, I completely understand and will close this pull request.

stevepolitodesign · 2024-04-22T19:35:13Z

Thank you for sharing that detailed example. I like the interface you've come up with, and can see its value.

For now, though, I think I want to sit on this for a bit before pursing it. My main concern being that I'm not sure if there's a wider need for thus feature (yet).

benjaminwil requested a review from stevepolitodesign as a code owner March 24, 2024 20:06

benjaminwil force-pushed the import-context branch from b32e050 to bfb3d44 Compare March 24, 2024 20:09

benjaminwil force-pushed the import-context branch from bfb3d44 to 5bbd1b8 Compare March 24, 2024 20:10

stevepolitodesign reviewed Mar 29, 2024

View reviewed changes

stevepolitodesign mentioned this pull request Jun 18, 2024

How might we export calculated values? #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let context be merged into import data #27

Let context be merged into import data #27

benjaminwil commented Mar 24, 2024

stevepolitodesign left a comment

benjaminwil commented Mar 30, 2024

stevepolitodesign commented Apr 19, 2024 •

edited

Loading

benjaminwil commented Apr 19, 2024 •

edited

Loading

stevepolitodesign commented Apr 22, 2024

Let context be merged into import data #27

Are you sure you want to change the base?

Let context be merged into import data #27

Conversation

benjaminwil commented Mar 24, 2024

stevepolitodesign left a comment

Choose a reason for hiding this comment

benjaminwil commented Mar 30, 2024

stevepolitodesign commented Apr 19, 2024 • edited Loading

benjaminwil commented Apr 19, 2024 • edited Loading

stevepolitodesign commented Apr 22, 2024

stevepolitodesign commented Apr 19, 2024 •

edited

Loading

benjaminwil commented Apr 19, 2024 •

edited

Loading