Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parsing metadata from IO objects #16

Merged
merged 1 commit into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 73 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,23 @@ Alternatively, install as `gem install repomd_parser`.

Parses `repomd.xml` -- the main repository metadata file, which references other metadata files.

`parse` method returns an array of `RepomdParser::Reference`.
`parse` and `parse_file` methods return an array of `RepomdParser::Reference`.

##### Using the `parse` method

```ruby
metadata_files = RepomdParser::RepomdXmlParser.new('repomd.xml').parse
File.open('repomd.xml') do |fh|
metadata_files = RepomdParser::RepomdXmlParser.new.parse(fh)
metadata_files.each do |metadata_file|
printf "type: %10s, location: %s\n", metadata_file.type, metadata_file.location
end
end
```

##### Using the `parse_file` method

```ruby
metadata_files = RepomdParser::RepomdXmlParser.new.parse_file('repomd.xml')
metadata_files.each do |metadata_file|
printf "type: %10s, location: %s\n", metadata_file.type, metadata_file.location
end
Expand All @@ -33,10 +46,23 @@ end

Parses `primary.xml`, which contains information about RPM packages in the repository.

`parse` method returns an array of `RepomdParser::Reference`.
`parse` and `parse_file` methods return an array of `RepomdParser::Reference`.

##### Using the `parse` method

```ruby
File.open('primary.xml') do |fh|
rpm_packages = RepomdParser::PrimaryXmlParser.new.parse(fh)
rpm_packages.each do |rpm|
printf "arch: %8s, location: %s\n", rpm.arch, rpm.location
end
end
```

##### Using the `parse_file` method

```ruby
rpm_packages = RepomdParser::PrimaryXmlParser.new('primary.xml').parse
rpm_packages = RepomdParser::PrimaryXmlParser.new.parse_file('primary.xml')
rpm_packages.each do |rpm|
printf "arch: %8s, location: %s\n", rpm.arch, rpm.location
end
Expand All @@ -46,15 +72,54 @@ end

Parses `deltainfo.xml`, which contains information about delta-RPM packages in the repository.

`parse` method returns an array of `RepomdParser::Reference`.
`parse` and `parse_file` methods return an array of `RepomdParser::Reference`.

##### Using the `parse` method

```ruby
File.open('deltainfo.xml') do |fh|
rpm_packages = RepomdParser::DeltainfoXmlParser.new.parse(fh)
rpm_packages.each do |rpm|
printf "arch: %8s, location: %s\n", rpm.arch, rpm.location
end
end
```

##### Using the `parse_file` method

```ruby
rpm_packages = RepomdParser::DeltainfoXmlParser.new('deltainfo.xml').parse
rpm_packages = RepomdParser::DeltainfoXmlParser.new.parse_file('deltainfo.xml')
rpm_packages.each do |rpm|
printf "arch: %8s, location: %s\n", rpm.arch, rpm.location
end
```

#### Compressed file support

The gzip and Zstandard compression formats are supported. The `parse_file`
method automatically decompresses files based on the filename, e.g.:

```ruby
rpm_packages = RepomdParser::PrimaryXmlParser.new.parse_file('primary.xml.gz')
rpm_packages.each do |rpm|
printf "arch: %8s, location: %s\n", rpm.arch, rpm.location
end
```

The `RepomdParser.decompress_io` helper is provided to handle
decompression of IO objects for use with the `parse` method:

```ruby
filename = 'primary.xml.gz'
io = RepomdParser.decompress_io(File.open(filename), filename)

rpm_packages = RepomdParser::PrimaryXmlParser.new.parse(io)
rpm_packages.each do |rpm|
printf "arch: %8s, location: %s\n", rpm.arch, rpm.location
end
```


#### RepomdParser::Reference

Represents a file referenced in the metadata file. Has the following accessors:
Expand All @@ -75,7 +140,8 @@ RPM and DRPM files additionally have the following attributes:

## Caveats

* Relies on the file extension to determine if the file is compressed (automatically decompresses `.gz` and `.zst` files)
* File extension is used to determine file compression type (expected
extensions are `.gz` and `.zst` for gzip and Zstandard respectively)

## Development

Expand Down
13 changes: 10 additions & 3 deletions lib/repomd_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,20 @@
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

module RepomdParser
end

require 'repomd_parser/version'
require 'repomd_parser/reference'
require 'repomd_parser/base_parser'
require 'repomd_parser/repomd_xml_parser'
require 'repomd_parser/deltainfo_xml_parser'
require 'repomd_parser/primary_xml_parser'
require 'repomd_parser/zstd_reader'

module RepomdParser
def self.decompress_io(io_object, filename)
case File.extname(filename)
when '.gz' then Zlib::GzipReader.new(io_object)
when '.zst' then RepomdParser::ZstdReader.new(io_object)
else io_object
end
end
end
27 changes: 12 additions & 15 deletions lib/repomd_parser/base_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,27 +19,24 @@
require 'zlib'

class RepomdParser::BaseParser < Nokogiri::XML::SAX::Document
def initialize(filename)
super()
@referenced_files = []
@filename = filename
def parse_file(filename)
io = RepomdParser.decompress_io(File.open(filename), filename)
parse(io)
ensure
io.close
end

def parse
Nokogiri::XML::SAX::Parser.new(self).parse(file_io_class.open(@filename))
@referenced_files
def parse(io_object)
@referenced_files = []
Nokogiri::XML::SAX::Parser.new(self).parse(io_object)
ret_val = @referenced_files
@referenced_files = nil

ret_val
end

protected

def file_io_class
case File.extname(@filename)
when '.gz' then Zlib::GzipReader
when '.zst' then RepomdParser::ZstdReader
else File
end
end

def get_attribute(attributes, name)
attributes.select { |e| e[0] == name }.first[1]
end
Expand Down
10 changes: 6 additions & 4 deletions lib/repomd_parser/repomd_xml_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

class RepomdParser::RepomdXmlParser
def initialize(filename)
@filename = filename
def parse_file(filename)
File.open(filename) do |fh|
parse(fh)
end
end

def parse
def parse(io_object)
files = []
xml = Nokogiri::XML(File.open(@filename))
xml = Nokogiri::XML(io_object)

xml.xpath('/xmlns:repomd/xmlns:data').each do |data_node|
type = data_node.attr('type').to_sym
Expand Down
2 changes: 1 addition & 1 deletion lib/repomd_parser/version.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

module RepomdParser
VERSION = '0.1.6'.freeze
VERSION = '1.0.0'.freeze
end
12 changes: 8 additions & 4 deletions lib/repomd_parser/zstd_reader.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@

require 'zstd-ruby'

class RepomdParser::ZstdReader < File
def initialize(*args)
super(*args)
class RepomdParser::ZstdReader
def initialize(io_object)
@io = io_object
@stream = Zstd::StreamingDecompress.new
@buffer = ''
end

def read(len = nil, out = nil)
@buffer << @stream.decompress(super(len)) while @buffer.size < len && !eof
@buffer << @stream.decompress(@io.read(len)) while @buffer.size < len && !@io.eof

if @buffer.size > len
out = @buffer[0..len]
Expand All @@ -37,4 +37,8 @@ def read(len = nil, out = nil)

out
end

def close
@io.close
end
end
4 changes: 2 additions & 2 deletions spec/lib/repomd_parser/deltainfo_xml_parser_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

RSpec.describe RepomdParser::DeltainfoXmlParser do
let(:parsed_files) do
described_class.new(
described_class.new.parse_file(
file_fixture(
'dummy_repo/repodata/a546b430098b8a3fb7d65493a9ce608fafcb32f451d0ce8bf85410191f347cc3-deltainfo.xml.gz'
)
).parse
)
end

it 'references drpm files' do
Expand Down
12 changes: 6 additions & 6 deletions spec/lib/repomd_parser/primary_xml_parser_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,11 @@

context 'XML compressed with Zstandard' do
let(:parsed_files) do
described_class.new(
described_class.new.parse_file(
file_fixture(
'dummy_repo/repodata/0d499f39e90442b3f052681964debe48ac6e438252cee5fc2dd33002795026f1-primary.xml.zst'
)
).parse
)
end

it 'references rpm files' do
Expand All @@ -83,11 +83,11 @@

context 'XML compressed with gzip' do
let(:parsed_files) do
described_class.new(
described_class.new.parse_file(
file_fixture(
'dummy_repo/repodata/abf421e45af5cd686f050bab3d2a98e0a60d1b5ca3b07c86cb948fc1abfa675e-primary.xml.gz'
)
).parse
)
end

it 'references rpm files' do
Expand All @@ -97,11 +97,11 @@

context 'plain XML' do
let(:parsed_files) do
described_class.new(
described_class.new.parse_file(
file_fixture(
'dummy_repo/repodata/abf421e45af5cd686f050bab3d2a98e0a60d1b5ca3b07c86cb948fc1abfa675e-primary.xml'
)
).parse
)
end

it 'references rpm files' do
Expand Down
4 changes: 2 additions & 2 deletions spec/lib/repomd_parser/repomd_xml_parser_spec.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
require 'repomd_parser'

RSpec.describe RepomdParser::RepomdXmlParser do
let(:parsed_files) { described_class.new(file_fixture('dummy_repo/repodata/repomd.xml')).parse }
let(:parsed_files) { described_class.new.parse_file(file_fixture('dummy_repo/repodata/repomd.xml')) }

it 'references repodata files' do
expect(parsed_files).to eq [
Expand Down Expand Up @@ -36,7 +36,7 @@
]
end

let(:old_style_parsed_files) { described_class.new(file_fixture('old_style_repo/repodata/repomd.xml')).parse }
let(:old_style_parsed_files) { described_class.new.parse_file(file_fixture('old_style_repo/repodata/repomd.xml')) }

it 'handles repomd.xml without size field' do
expect(old_style_parsed_files).to eq [
Expand Down
Loading