[Libosinfo] [RFC PATCH 0/5] various optimizations
Daniel P. Berrange
berrange at redhat.com
Wed Jun 4 11:24:32 UTC 2014
On Wed, Jun 04, 2014 at 12:44:07PM +0200, Giuseppe Scrivano wrote:
> "Daniel P. Berrange" <berrange at redhat.com> writes:
>
> >> This series was applied to the libosinfo source.
> >>
> >> function total time spent in the function
> >>
> >> osinfo_loader_get_device 32.83%
> >> osinfo_db_get_device 2.96%
> >> osinfo_db_add_device 4.50%
> >> osinfo_db_new 24.97%
> >>
> >> so creating a new GObject is very slow. I thought about creating just a
> >> model object and then clone it instead of going trough the gobject
> >> constructors for every instance. Does it sound like a good idea? (I am
> >> not still sure if g_type_create_instance will be enough to achieve this,
> >> haven't looked into details).
> >
> > What's the actual wallclock time this demo takes to run for you ?
> >
> > For me it is a fraction of a second, which seems pretty fast and shouldn't
> > be noticable by the user.
> >
> > $ time ./demo
> >
> > real 0m0.238s
> > user 0m0.218s
> > sys 0m0.014s
>
> It takes approximately the same time for me as well. Indeed, it is not
> much time but it seems noticeable in the upstream version of
> virt-manager, where the UI seems a bit slower compared to older versions
> without libosinfo when the data is accessed for the first time and I
> preferred to optimize this in libosinfo instead of a workaround in
> virt-manager.
> Anyway, this series cuts enough wallclock time that this effect is much
> less noticeable now that shouldn't be a problem for virt-manager users
> too.
FYI before applying this series, oprofile reports
489780 100.000 demo
CPU_CLK_UNHALT...|
samples| %|
------------------
173292 35.3816 libc-2.18.so
94375 19.2689 libxml2.so.2.9.1
90366 18.4503 libglib-2.0.so.0.4000.0
50390 10.2883 libgobject-2.0.so.0.4000.0
29642 6.0521 libosinfo-1.0.so.0.2.10
24293 4.9600 no-vmlinux
16796 3.4293 libgio-2.0.so.0.4000.0
10554 2.1548 libpthread-2.18.so
37 0.0076 ld-2.18.so
35 0.0071 [vdso] (tgid:23223 range:0x7fff939fe000-0x7fff939fffff)
After applying it we get
355391 100.000 demo
CPU_CLK_UNHALT...|
samples| %|
------------------
106098 29.8539 libc-2.18.so
88024 24.7682 libglib-2.0.so.0.4000.0
51930 14.6121 libgobject-2.0.so.0.4000.0
29712 8.3604 libosinfo-1.0.so.0.2.10
26496 7.4555 libxml2.so.2.9.1
25416 7.1516 no-vmlinux
17085 4.8074 libgio-2.0.so.0.4000.0
10546 2.9674 libpthread-2.18.so
45 0.0127 [vdso] (tgid:31109 range:0x7ffffb3e9000-0x7ffffb3eafff)
37 0.0104 ld-2.18.so
1 2.8e-04 demo
1 2.8e-04 libpng16.so.16.6.0
So we've clearly improved the XML parsing by a large degree.
Looking at the per-symbol breakdown now we see
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples % image name symbol name
25416 7.2170 no-vmlinux /no-vmlinux
24880 7.0648 libc-2.18.so _int_malloc
24609 6.9878 libc-2.18.so malloc
16643 4.7258 libc-2.18.so _int_free
12653 3.5929 libglib-2.0.so.0.4000.0 g_mutex_get_impl
12036 3.4177 libosinfo-1.0.so.0.2.10 osinfo_entity_get_type
10886 3.0911 libglib-2.0.so.0.4000.0 g_private_get_impl
9844 2.7952 libc-2.18.so vfprintf
9441 2.6808 libgobject-2.0.so.0.4000.0 g_type_value_table_peek
6839 1.9420 libglib-2.0.so.0.4000.0 g_hash_table_lookup
6695 1.9011 libglib-2.0.so.0.4000.0 g_str_hash
6213 1.7642 libglib-2.0.so.0.4000.0 g_hash_table_insert_internal
6193 1.7585 libgobject-2.0.so.0.4000.0 g_type_check_instance_is_a
5657 1.6063 libglib-2.0.so.0.4000.0 g_slice_alloc
5229 1.4848 libglib-2.0.so.0.4000.0 g_pointer_bit_unlock
4754 1.3499 libgio-2.0.so.0.4000.0 scan_for_newline
4685 1.3303 libglib-2.0.so.0.4000.0 g_datalist_id_set_data_full
4502 1.2784 libgobject-2.0.so.0.4000.0 g_type_check_instance_cast
3922 1.1137 libglib-2.0.so.0.4000.0 slab_allocator_alloc_chunk
3848 1.0927 libgio-2.0.so.0.4000.0 g_input_stream_get_type
3831 1.0878 libpthread-2.18.so pthread_mutex_lock
3671 1.0424 libpthread-2.18.so pthread_mutex_unlock
3520 0.9995 libosinfo-1.0.so.0.2.10 osinfo_device_get_type
3016 0.8564 libc-2.18.so malloc_consolidate
2773 0.7874 libglib-2.0.so.0.4000.0 g_pointer_bit_lock
2763 0.7846 libpthread-2.18.so pthread_getspecific
2654 0.7536 libc-2.18.so calloc
2593 0.7363 libc-2.18.so _IO_default_xsputn
2459 0.6982 libxml2.so.2.9.1 xmlXPathCompOpEval.part.59
2422 0.6877 libgobject-2.0.so.0.4000.0 g_type_class_ref
2350 0.6673 libc-2.18.so strlen
2269 0.6443 libc-2.18.so __memcpy_sse2_unaligned
2257 0.6409 libglib-2.0.so.0.4000.0 g_hash_table_insert_node
2115 0.6006 libc-2.18.so __GI___strcmp_ssse3
2091 0.5937 libglib-2.0.so.0.4000.0 g_datalist_id_dup_data
2029 0.5761 libgobject-2.0.so.0.4000.0 g_type_class_peek_static
1995 0.5665 libosinfo-1.0.so.0.2.10 osinfo_list_get_type
1967 0.5585 libosinfo-1.0.so.0.2.10 osinfo_db_get_type
I believe that much of the libc malloc overhead comes from XML parsing
still, but the g_str_hash function appearence does suggest it might be
worth optimizing our hash table usage to use g_intern_string or GQuark
so we can do plain pointer comparisons instead of strcmp.
I'm surprised that osinfo_entity_get_type is soo high up there too.
Perhaps we do need to optimize that so we don't call it so frequently
during parsing.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the Libosinfo
mailing list