[Libosinfo] RFC: Splitting off database into separate package
Daniel P. Berrange
berrange at redhat.com
Thu Sep 17 12:45:49 UTC 2015
On Fri, Jul 24, 2015 at 04:54:57PM +0100, Daniel P. Berrange wrote:
> On Fri, Jul 24, 2015 at 04:50:34PM +0200, Christophe Fergeau wrote:
> > > - Should we restructure the database ?
> > >
> > > eg, we have a single data/oses/fedora.xml file that contains
> > > the data for every Fedora release. This is already 200kb in
> > > size and will grow forever. If we split up all the files
> > > so there is only ever one entity (os, hypervisor, device, etc)
> > > in each XML file, each file will be smaller in size. This would
> > > also let us potentially do database minimization. eg we could
> > > provide a download that contains /all/ OS, and another download
> > > that contains only non-end-of-life OS.
> >
> > I was about to make the same comment as Zeeshan, GNOME has had issues in
> > the past with data scattered among too many small files, in general this
> > is solved by adding a cache file containing a concatenated version of
> > all the files (possibly pre-parsed to some domain-specific format).
>
> If we can avoid loading the entire database, and only load the subset
> of files we want info on, we'd hopefully not have such problems. I
> could see benefit in having some "index" file perhaps which says
> which entity is defined in which file, as a way to avoid dictating
> a filename/dirname convention.
FYI, I wrote a simple perl script to process our current XML files
and split them up into 1 file per entity... This resulted in 438
individual XML files.
I timed libosinfo speed of loading the database with the current
database structure, and with the split structure. There as no
measurable difference in load time. I repeated using vm.drop_caches=3
to clear the FS cache between timing, and still found no difference
in load time. So I think our load time is not dominated by the
number of files we have - most likely the XML parsing & object
allocation is our main timesink.
FWIW, with warm cache it was ~250ms, with cold cache it was 1.9s, though
in the latter number I don't know how much of that time is from loading
the ELF libraries, vs the database. Anyway, it wasn't different according
to file split.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the Libosinfo
mailing list