Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archipelago Summer 2022 Roadmap: 1.0.0 #190

Open
DiegoPino opened this issue Jul 13, 2022 · 0 comments
Open

Archipelago Summer 2022 Roadmap: 1.0.0 #190

DiegoPino opened this issue Jul 13, 2022 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation Drupal9 Drupal9 is the new Drupal8 which was the new Drupal7 wich was the... help wanted Extra attention is needed Release Duties We are all duty here, heavy duty tigresses and bears Community work and Archipelago Travel Working Group's 💜 Imagined, curated and loved by the Working Group
Milestone

Comments

@DiegoPino
Copy link
Member

DiegoPino commented Jul 13, 2022

Archipelago Summer 2022 Roadmap

See also #5 and #35 and #79 and #80 and #103 #172 for a complete historic recreation.
This is our working enumeration of concrete tasks until July of 2022 , per Component and Service, for public evaluation (new ideas, requests, critics and comments welcome).

Checked tasks are ready, unchecked are in progress or planned. Priority is not given by this order.

Please feel free to comment, request more info or ask for clarification. Feature requests are also highly appreciated and taken in account (always, please!).

Strawberryfield

  • Field Property exposure to Drupal strategies
    • JSON KEY Provider (flattener)
    • JSON KEY Provider
    • JSON Flatten Keys as a field Property(any)
    • JSON Flatten Keys as a field Property(only with values)
    • JSONPATH/JMESPATH
    • JMESPATH supports string, ISO8601 and EDTF date casting with ranges (NEW)
    • Entity Reference Casting Provider (Using UUID loading and configurable entity type) using JSON based hints to expose any semantic relationship to Search API.
    • JSON stored Service Endpoints with extended logic (e.g HOCR) - A.k.a Strawberry Flavor Data Source.
    • Multi Map/ join: many properties to single. e.g All keys - Authorities- referring to creators, contributors etc unified as Agents keys. This leads to Fractal Ontologies and our Buckets approach.
  • File downloads and streaming
    • Ranged Request Streamer with back-to-front S3 managment and buffer/memory managment. For any exposed Binary Endpoint. Also streaming, fixed Local Files. This was some not sleeping much!
  • Strawberry Flavor Data structures can now hold NLP data and metadata
  • Strawberry Flavor Data structures and indexed Documents in Solr have cleanup on deletes and caching management
  • SMART (very) Breadcrumb generation with strategy selection (Longest Path, common repeating Path)
  • Special Queue for Garbage (temporary files) cleanup. With expiration time.

JSON representation and enrichment

  • Better File management (Better than Drupal)
    • File referencing via UUID instead of via Entity ID
    • Handle temporary files when moving from TEMP storage to PERMANENT
    • Increment file usage count on new versions
    • Decrement file usage count on version removal
    • Change file usage on Delete, EDIT on existing active content and versions
    • Add Webform based UI managment (reorder, replace, delete) for files
    • File based Post processing
      • TECHMD (EXIF, MEDIAINFO, PDFINFO, IDENTIFY)
      • Pronom Service/Preservation
  • New JSON Service Architecture reference
  • Deposit/save on Node save whole, selfs sustainable Strawberry JSON blob in S3/Minio/FileSystem
  • Keep track of Service and action on Ingest/edit using Activity Streams
  • Add more agent information on our activity streams for provenance and tracking. AMI now also adds Set IDs
  • Add More Event Driven Subscribers. And better
  • Hook-able and override-able storage Pattern for files.
  • Selective size of TECHMD generation based on amount of files present on a single ADO
  • Automatic Cache clear of parent entities of Child entity persistence
  • Automatic deletion of Strawberry Flavor (OCR, etc) on ADO removal

Webforms integration

  • Webform Driven UI Ingest with custom handler and widget
  • Handler allows direct CRUD without any node attached and also prepopulation of data using an existing node UUID @alliomeria we need docs here too 🥰 🥰
  • Create a set of Demo Webforms that cover base of our GLAM source data needs
  • Full Autosaving during Creation (sessions are kept alive for a week. Users can skip Steps, jump back and forth and Validation will still happen but at the end. Log out, come back, continue.
  • Allow Webform Field Widget selection be driven by RDF type and permissions.
  • Webform Widgets can start Open/Rendered or closed via settings and have "cancel edit" hidding to avoid users leaving the edit realm.
  • New Solr Aware Entity Select Views (with code code to handle Solr to Entity) which allows
  • Complex autocomplete elements (like get me all Digital Objects of Type Book with a green Cover the user can see
  • New Fine grained Entity (node to node) reference possible through this.
  • CSV to JSON importer element
  • XML to JSON importer element
  • Strawberry transplanter. Any JSON into filled Webform Elements (display) using a twig template.
  • Special Date element ISO8601, with Ranges, Single Dates and free form representation.
  • EDTF support for Special Date element ISO8601
  • Create new, better, LoD Webform elements
    • WIKIDATA
    • LoC (with support for any Suggest endpoint)
    • LoC with support MADS RDF Types
    • WIKIDATA Agents with LD Roles
    • WIKIDATA using custom SPARQL
    • Viaf
    • EUROPEANA
    • SNAC/Orgs/Names/Family Names
    • MeSH (PubMed)
    • Multi Source, Multi Agent Element. Agents/Corporate can use now multiple Authority Controls.
    • Getty with exact and fuzzy search (updated to be better!)
    • Nominatim Geo reconciliation. Normal and Reverse.
    • Panorama Tour Building App (like 1200 lines of code, gosh!)
    • Image and EXIF extraction on upload for UI/facing previews.
    • GBIF entity/taxonomy autocomplete
  • Create Stub (temporary) WIKIDATA entities if query shows desired WIKIDATA entity does not exist upstream.
    • "publish" to wikibase functionality
    • Replace repo wide stub uri with official one once pushed.
    • Keep track on the stub who is referencing it is (bidirectional reference?)
  • Move Strawberryfield harvest Webform handler's logic to Event Subscribers. Stronger capabilities now.
    • Deal with as:images
    • Deal with as:documents, as:video, as:sound, as:dataset elements
    • Deal with as:models
  • Allow anonymous submits to be converted into proper Nodes by Admin (Self deposit, crowd sourced metadata) WOHO! This also allows self standing endpoints and custom mappings.
  • Make Webform API Interaction work with States(JS) by removing one From wrapper.
  • Make Webform API Interaction more versatile for our use. Use as schema validator. WIP. AMI.
  • Add JS to avoid main node CRUD to submit/validate embedded Webform as widget
  • Better handling of MultiStep Forms with direct links to others and final/before submit validation

Media Displays Entities

  • Display settings, new tab that shows only the active View Mode for an ADO
  • Admin/contextual block that shows how ADO to Type was chosen by the system (admin hint)
  • Add expected mime/type output to Media displays. Allows to tag media displays as JSON, XML, CSV, JSON-LD or HTML only.
    • React to mime type to allow JSON or XML output to be downloaded too.
    • Native/self rendering and Content-Type tagging with caching.
    • Automatic extraction From template of required/used variables (context). Not front facing yet but for sure useful for building a Pick-and-chose (or Data color picker) to aid in Twig Template building
  • Webforms are injected as Context. So a Webform Element Title can be used to match its value.
  • AMI set id and URLs are injected as Context during batch ingest
  • Add new Data Views Plugin integration to allow Media Displays to preprocess values on views exposed as API endpoints
  • Version/Revision Media Display Entities (This is config, annotations and Update Hooks)
  • Inline Preview with ADO selection. Means users can see the data, test the data and see the output with Live Updates even without
  • Inline Preview with Validation of destination format
  • Preview more contextual data (e.g Original Data before an AMI update)
  • Per Metadata Display Extra data injection via any strawberry field that is added. @alliomeria we need docs!
  • Provide example Twig templates for
    • MODS
    • DC
    • JSON-LD
    • GEOJSON
    • IIIF Manifest 2.1
    • IIIF Manifest 3.0
    • EAD2002 (With recursive C Element generation from CSV)
    • EAD3 (With recursive C Element generation from CSV)
    • IIIF Manifests for Creative WorkSeries and Children based on Views
    • a Carrousel
  • Metadata Display Exposed endpoints (reuse as Standalone API/download/streams)
  • New Twig Extensions:
  • Functions: sbf_entity_ids_by_label()NEW: ami_lod_reconcile(),clipboard_copy(),sbf_search_api()
  • Filters: markdown_2_html, html_2_markdown,sbf_json_decode NEW: bibliography, edtf_2_human_date
  • API builder via UI using Endpoints. Any API, OAI, IIIF, etc. Allows a VIEW to be injected to feed data. Arguments are filtered and fully customizable. This uses OpenAPI and argument parsing too

Field Formatters

  • Static IIIF Images
  • Open Seadragon IIIF Images
    • W3C Web Annotations! Box and Polygon, fully IIIF compliant with CRUD endpoints. Caches until you are ready to save.
    • Face and polygon/edge detection (mid colors, highlights) via OpenCV and Web Worker
    • Add thumbnail navigation
  • IABookreader IIIF Images
  • Panorama via IIIF now with webGL max texture calculator and max Image size/memory preprocessing to avoid breaking Cantaloupe when using 400MP images.
  • Panorama Tours via other Panorama Objects and IIIF, including Hotspots of many types
  • Panorama Tour talks to maps sending NODE that is being presently displayed
  • Metadata up-casters
  • Metadata up-casters with download endpoint (Metadata Display Exposed endpoints)
  • Video (HTML5) with Subtitles (with grouping, multi Video, multi Subtitle)
  • Audio (HTML5) with Subtitles (with grouping, multi Audio, multi Subtitle)
  • PDF with multi file selection(custom, derived from the base PDF.js library. Not fancy. But Mozilla asks people to NOT use their fancy one directly and we agreed.
  • Web annotations (IIIF) with JMESPATH fine grained selector of which Files to attach
  • Complex nested structures (Whole graphs)
  • 3D! (Three + JSM) with Full Material Support and UV Textures
  • 3D UV Mapping using IIIF Sources and Scene/Light settings
  • 3D Point Clouds from JSON or URLS
  • Mirador 3.1 (With Resource comparison and multi sourced IIIF manifests, using full release now)
  • Mirador 3 (second JS) with HOCR/Text Highlights using https://github.com/dbmdz/mirador-textoverlay
  • Expose View Mode to JSON Type value mapping that triggers automatic View Mode Selection
  • Webrecorder.io native player (WARC replay) with WACZ capabilities version 1.3.2
  • CiteProc (Citation) Formatter with citation mode selection and JS injection
  • Lazy Image Loading via CSS class. JS driven, only loads (when used) Images when visible by the user (+100 px to give them some time to load while users navigate)
  • All formatters can handle Embargoes based on Time and IP address/ranges with caching. Includes alternative Source for Media when embargoed. Embargo info is passed to Templates too as an argument. Embargos are self un-caching to trigger regeneration of NODE displays.
  • All formatters can handle with JMESPATH fine grained selector of which Files to attach
  • New Views Submodule for Maps that provides JS and Theming (a style plugin) using leaflet. Allows grouping and facets
  • Optimized JS loading on every page to avoid large/heavy pages

API Ingest, Migration and backup

  • Strawberryfield Normalizer: expands JSON string as a JSON when exporting
  • Strawberryfield denormalizer: string-ify JSON when importing
  • Wrap JSONAPI on a set of Drush script to (Strawberry Seeds)
    • Allow Single command line invoke files and node ingest
    • Create virtual field Entity "bucket" to allow Media to be ingested into those as links and routed to internal Strawberryfield elements (utility methods for ingest)
  • AMI (Archipelago Multi Import)
    • API Source (Other repos, ContentDM, generic Solr)
    • API Source (ISLANDORA Solr)
    • Google Spreadsheets (same as IMI)
    • Complete Drush 9 integration
    • AMI Set Entities
    • AMI Sets Entity processing via Batch or Enqueuing (for Hydroponics)
    • Separate processing for remote/single files allowing longer processing
    • AMI Sets Delete Ingested ADOs by this Set via batch (to clear and reingest)
    • LoD Reconciliation with complete per Label Processing and multiple Endpoint calls. Can be edited/refined and reused in a Metadata Display . Better and stronger
    • LoD can be provided/replaced via a Spreadsheet and will update the internal cached version
    • AMI Update action now can "replace, append, full update" with "keep files safe" addition
    • Reusable, canned public facing AMI ingest strategies. Users can only add the source data, all the rest is pre-setup.
    • S3 Sources for AMI
    • Local file (server) Sources for AMI
    • Remote HTTP sources for AMI
    • ZIP (on the works)
    • Folder as a source (on the works)
    • Vouchers
  • Filesystem drop-and-forget ingest. You save a JSON file into S3, Archipelago creates entities and relationships.
  • Use JSON API to allow seamless moving of dependent assets between repositories and also for backups

Service Architecture (Strawberry Runners)

  • Develop webhook driven notification service for derivatives
  • Custom, user facing Plugins. Build your own derivative workflows (system calls, JSON processing, etc)
  • Document/deploy webhook triggers for minio S3 per mimetype
  • Document/deploy webhook triggers for AWS S3 (via lambda) per mimetype
  • Develop Shell processing using Custom Plugins (Processors) and user configurable for each case (rule system)
  • Allow Processor to be chained! And have multiple outputs.
  • Queue-worker processing
  • Generate JSON reference-able Services (plugins) for complex non descriptive metadata and data
    • HOCR with Language Detection (after), Language selection (via metadata) and better NLP
    • Full text from PDF to HOCR (miniOCR) via custom PDFAlto with language detection and NLP
    • HOCR of single images
    • TECHMD
    • WACZ
    • Web Annotations
    • Tabular datasets
    • Transcripts (similar to Web Annotations, mostly dependant)
    • File Conversions (any that your Shell allows) with reingest
    • Smart checks on existing processed output to avoid double processing.
    • ~~ Build slim Content entity that can be used to index natively that content into Solr via search API ~~ This is now a fully capable Search API Datasource that can hold any output. one (node) to many (files) to even more sequences.
    • Allow Services to be self explaining of its capabilities. WIP how we expose this to the world. Probably GET will be allowed
    • Two Hydroponics approaches. Single Thread lineal one (default) and Multi Child, with how many children are spawn config. All using ReactPHP

SEO and API

  • Allow Media displays output to be embeded in HTML head for SEO
  • Test/Develop nested DATA VIEWS integration for OAI-ORE and OAI-PMH (See Format Strawberryfield and API builder)
  • Create (TWIG, metadata displays) and expose as endpoints full set of IIIF API JSON outputs.
  • Add helper methods and twig extensions to allow Metadata displays to access pre existing views (like object listings for a collection) to help build those lists.

ACL / Permissions

  • Integrate custom ACL with JSON Paths into per NODE ACL. Allowing this way to apply permissions to individual metadata elements/paths.
  • Embargoes with JSON key setup for dates/IPs (Individual and ranges). Includes Cron "release" system (deletes caches) and applies to Formatters, Metadata endpoints too
  • Same but needs better UI for referenced Services and Media
  • Allow Metadata (rule) to trigger ACL permissions. e.g if embargo_date == bla bla = remove public access
  • Allow for ACL inheritance (from parent, recursive) without hard copies.

Deployment and DevOPS

  • Sync Configurations and remove non used ones for minio branch / periodic for each Drupal release
  • Site-build and remove orphan blocks
  • Add more utility views
  • Enable JSONAPI by default on minio branch
  • Create jsonapi user with jsonapi credentials for minio branch
  • Create basic scripts to automate Docker/Bash operations
  • Update AWS deployer to match minio including docs and Cloud Services integration
  • XDEBUG integration 3. 2 PHP 8.0 Containers, Cookie based, routed by NGINX
  • Natural Language processing Service via Docker update with new Language Capabilities and multi architecture
  • Cantaloupe 6.0.0 Pre Release
  • Redis integration for caching in Archipelago Deployment Live
  • Catmandu Docker container for large data mangling
  • Update all Strawberryfield modules script.
  • Drupal 9.4 and bumps on every module
  • Solr 8.11, MYSQL 8.
  • Archipelago Live with optimized folder structure and Production read AWS EC2 Docker deployment

Batch Operations

  • Bulk Batch Views PURE TEXT plugin to (All this via JSONPATCH so supports any operation)
    • Replace existing JSON values
  • Bulk Batch Views JSONPATH plugin to (All this via JSONPATCH so supports any operation)
    • Replace existing JSON values
    • Add to existing Values
    • Respect data type casted values, (entities, file references)
      [x] Bulk Batch Views Webform Element based plugin to
    • Replace existing JSON values using a Given Webform an a UX driven From/TO option
  • Bulk Batch Views MEDIA plugin to
    • Replace Media
    • Add Media
  • Bulk Batch Views ACL plugin to
    • Replace ACL and inheritance
    • Replace ACL individual Control List Elements
    • Add ACL individual Control List Elements
  • Integrate into Solr Results and Strawberryfield Taxonomy Term pages
  • CSV based export with selective type and AMI Set generation for future "Update" operation
  • Full facet (ajax and none) integration with VBO enabled views. Allows for very fine grained Filters before applying a batch OP.

Future roadmap

  • Solr Cloud/ Consortial ensemble
  • Native Wikibase/Wikidata publishing

Documentation:

  • https://docs.archipelago.nyc (With Search and tags)
  • Devops and new repository deployers
  • Migration to and from.
  • Backup and restoring
  • Permissions, access and ACLs.
  • Twig Template Primer
  • AMI Ingest, Process
  • Metadata Professionals, JSON schema and schema-less. AS, DR and AP internal ontologies. UPDATED
  • Metadata Professionals, Key concepts of Archipelago
  • Metadata, Ingest and edit workflows.
  • Displays, Formatters and Media Plugins (Twig)
  • LoD Reconciliation for AMI
  • Views Integration (Solr and Blocks)
  • Strawberry Field Exposed Keys and Plugins
    • Property Exposing strategies and configs
  • Media Management
  • Solr and Discovery
  • Extending and Coding
  • SEO
@DiegoPino DiegoPino self-assigned this Jul 13, 2022
@DiegoPino DiegoPino added documentation Improvements or additions to documentation help wanted Extra attention is needed Drupal9 Drupal9 is the new Drupal8 which was the new Drupal7 wich was the... Release Duties We are all duty here, heavy duty tigresses and bears Community work and Archipelago Travel Working Group's 💜 Imagined, curated and loved by the Working Group labels Jul 13, 2022
@DiegoPino DiegoPino added this to the 1.0.0 milestone Jul 13, 2022
@DiegoPino DiegoPino pinned this issue Jul 13, 2022
@alliomeria alliomeria unpinned this issue Jun 17, 2024
@alliomeria alliomeria pinned this issue Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Drupal9 Drupal9 is the new Drupal8 which was the new Drupal7 wich was the... help wanted Extra attention is needed Release Duties We are all duty here, heavy duty tigresses and bears Community work and Archipelago Travel Working Group's 💜 Imagined, curated and loved by the Working Group
Projects
None yet
Development

No branches or pull requests

1 participant