We will here analyze what the BSBM Explore workload does. This is necessary in order to compare benchmark results at different scales. Historically, BSBM had a Query 6 whose share of the metric approached 100% as scale increased. The present mix does not have this query, but different queries still have different relative importance at different scales.
We will here look at database-running statistics for BSBM at different scales. Finally, we look at CPU profiles.
But first, let us see what BSBM reads in general. The system is in steady state after around 1500 query mixes; after this the working set does not shift much. After several thousand query mixes, we have:
SELECT TOP 10 * FROM sys_d_stat ORDER BY reads DESC;
KEY_TABLE INDEX_NAME TOUCHES READS READ_PCT N_DIRTY N_BUFFERS
================= ============================ ========== ======= ======== ======= =========
DB.DBA.RDF_OBJ RDF_OBJ 114105938 3302150 2 0 3171275
DB.DBA.RDF_QUAD RDF_QUAD 977426773 2041156 0 0 1970712
DB.DBA.RDF_IRI DB_DBA_RDF_IRI_UNQC_RI_ID 8250414 509239 6 15 491631
DB.DBA.RDF_QUAD RDF_QUAD_POGS 3677233812 183860 0 0 175386
DB.DBA.RDF_IRI RDF_IRI 32 99710 302151 5 95353
DB.DBA.RDF_QUAD RDF_QUAD_OP 30597 51593 168 0 48941
DB.DBA.RDF_QUAD RDF_QUAD_SP 265474 47210 17 0 46078
DB.DBA.RDF_PREFIX DB_DBA_RDF_PREFIX_UNQC_RP_ID 6020 212 3 0 212
DB.DBA.RDF_PREFIX RDF_PREFIX 0 167 16700 0 157
The first column is the table, then the index, then the number of times a row was found. The fourth number is the count of disk pages read. The last number is the count of 8K buffer pool pages in use for caching pages of the index in question. Note that the index is clustered, i.e., there is no table data structure separate from the index. Most of the reads are for strings or RDF literals. After this comes the PSOG
index for getting a property value given the subject. After this, but much lower, we have lookups of IRI strings given the ID. The index from object value to subject is used the most but the number of pages is small; only a few properties seem to be concerned. The rest is minimal in comparison.
Now let us reset the counts and see what the steady state I/O profile is.
SELECT key_stat (key_table, name_part (key_name, 2), 'reset') FROM sys_keys WHERE key_migrate_to IS NULL;
SELECT TOP 10 * FROM sys_d_stat ORDER BY reads DESC;
KEY_TABLE INDEX_NAME TOUCHES READS READ_PCT N_DIRTY N_BUFFERS
================= ============================ ========== ======= ======== ======= =========
DB.DBA.RDF_OBJ RDF_OBJ 30155789 79659 0 0 3191391
DB.DBA.RDF_QUAD RDF_QUAD 259008064 8904 0 0 1948707
DB.DBA.RDF_QUAD RDF_QUAD_SP 68002 7730 11 0 53360
DB.DBA.RDF_IRI RDF_IRI 12 5415 41653 6 98804
DB.DBA.RDF_QUAD RDF_QUAD_POGS 975147136 1597 0 0 173459
DB.DBA.RDF_IRI DB_DBA_RDF_IRI_UNQC_RI_ID 2213525 1286 0 17 485093
DB.DBA.RDF_QUAD RDF_QUAD_OP 7999 904 11 0 48568
DB.DBA.RDF_PREFIX DB_DBA_RDF_PREFIX_UNQC_RP_ID 1494 1 0 0 213
Literal strings dominate. The SP
index is used only for situations where the P
is not specified, i.e., the DESCRIBE
query. Based on this, I/O seems to be attributable mostly to this. The first RDF_IRI
represents translations from string to IRI id; the second represents translations from IRI id to string. The touch count for the first RDF_IRI
is not properly recorded, hence the miss % is out of line. We see SP
missing the cache the most since its use is infrequent in the mix.
We will next look at query processing statistics. For this we introduce a new meter.
The db_activity
SQL function provides a session-by-session cumulative statistic of activity. The fields are:
-
rnd
- Count of random index lookups. Each first row of a select or insert counts as one, regardless of whether something was found.
-
seq
- Count of sequential rows. Every move to next row on a cursor counts as 1, regardless of whether conditions match.
-
same seg
- For column store only; counts how many times the next row in a vectored join using an index falls in the same segment as the previous random access. A segment is the stretch of rows between entries in the sparse top level index on the column projection.
-
same pg
- Counts how many times a vectored index join finds the next match on the same page as the previous one.
-
same par
- Counts how many times the next lookup in a vectored index join falls on a different page than the previous but still under the same parent.
-
disk
- Counts how many disk reads were made, including any speculative reads initiated.
-
spec disk
- Counts speculative disk reads.
-
messages
- Counts cluster interconnect messages
-
B (KB, MB, GB)
- is the total length of the cluster interconnect messages.
-
fork
- Counts how many times a thread was forked (started) for query parallelization.
The numbers are given with 4 significant digits and a scale suffix. G is 10^9 (1,000,000,000); M is 10^6 (1,000,000), K is 10^3 (1,000).
We run 2000 query mixes with 16 Users. The special http
account keeps a cumulative account of all activity on web server threads.
SELECT db_activity (2, 'http');
1.674G rnd 3.223G seq 0 same seg 1.286G same pg 314.8M same par 6.186M disk 6.461M spec disk 0B / 0 messages 298.6K fork
We see that random access dominates. The seq
number is about twice the rnd
number, meaning that the average random lookup gets two rows. Getting a row at random obviously takes more time than getting the next row. Since the index used is row-wise, the same seg
is 0; the same pg
indicates that 77% of the random accesses fall on the same page as the previous random access; most of the remaining random accesses fall under the same parent as the previous one.
There are more speculative reads than disk reads which is an artifact of counting some concurrently speculated reads twice. This does indicate that speculative reads dominate. This is because a large part of the run was in the warm-up state with aggressive speculative reading. We reset the counts and run another 2000 mixes.
Now let us look at the same reading after 2000 mixes, 16 user at 100Mt.
234.3M rnd 420.5M seq 0 same seg 188.8M same pg 29.09M same par 808.9K disk 919.9K spec disk 0B / 0 messages 76K fork
We note that the ratios between the random and sequential and same page/parent counts are about the same. The sequential number looks to be even a bit smaller in proportion. The count of random accesses for the 100Mt run is 14% of the count for the 1000Mt run. The count of query parallelization threads is also much lower since it is worthwhile to schedule a new thread only if there are at least a few thousand operations to perform on it. The precise criterion for making a thread is that according to the cost model guess, the thread must have at least 5ms worth of work.
We note that the 100 Mt throughput is a little over three-times that of the 1000 Mt throughput, as reported before. We might justifiably ask why the 100 Mt run is not seven-times faster instead, for this much less work.
We note that for one-off random access, it makes no real difference whether the tree has 100 M or 1000 M rows; this translates to roughly 27 vs 30 comparisons, so the depth of the tree is not a factor per se. Besides, vectoring makes the tree often look only one or two levels deep, so the total row count matters even less there.
To elucidate this last question, we look at the CPU profiles. We take an oprofile of 100 Single User mixes at both scales.
For 100 Mt:
61161 10.1723 cmpf_iri64n_iri64n_anyn_gt_lt
31321 5.2093 box_equal
19027 3.1646 sqlo_parse_tree_has_node
15905 2.6453 dk_alloc
15647 2.6024 itc_next_set_neq
12702 2.1126 itc_vec_split_search
12487 2.0768 itc_dive_transit
11450 1.9044 itc_bm_vec_row_check
10646 1.7706 itc_page_rcf_search
9223 1.5340 id_hash_get
9215 1.5326 gen_qsort
8867 1.4748 sqlo_key_part_best
8807 1.4648 itc_param_cmp
8062 1.3409 cmpf_iri64n_iri64n
6820 1.1343 sqlo_in_list
6005 0.9987 dc_iri_id_cmp
5905 0.9821 dk_free_tree
5801 0.9648 box_hash
5509 0.9163 dks_esc_write
5444 0.9054 sql_tree_hash_1
For 1000 Mt
754331 31.4149 cmpf_iri64n_iri64n_anyn_gt_lt
146165 6.0872 itc_vec_split_search
144795 6.0301 itc_next_set_neq
131671 5.4836 itc_dive_transit
110870 4.6173 itc_page_rcf_search
66780 2.7811 gen_qsort
66434 2.7667 itc_param_cmp
58450 2.4342 itc_bm_vec_row_check
55213 2.2994 dk_alloc
47793 1.9904 cmpf_iri64n_iri64n
44277 1.8440 dc_iri_id_cmp
39489 1.6446 cmpf_int64n
36880 1.5359 dc_append_bytes
36601 1.5243 dv_compare
31286 1.3029 dc_any_value_prefetch
25457 1.0602 itc_next_set
20852 0.8684 box_equal
19895 0.8285 dk_free_tree
19698 0.8203 itc_page_insert_search
19367 0.8066 dc_copy
The top function in both is the compare for an equality of two leading IRIs and a range for the trailing any. This corresponds to the range check in Q5. At the larger scale this is three times more important. At the smaller scale, the share of query optimization is about 6.5 times greater. The top function in this category is box_equal
with 5.2% vs 0.87%. The remaining SQL compiler functions are all in proportion to this, totaling 14.3% of the 100 Mt top-20 profile.
From this sample it appears ten times more scale is seven times more database operations. This is not taken into account in the metric. Query compilation is significant at the small end, and no longer significant at 1000 Mt. From these numbers, we could say that Virtuoso is about two times more efficient in terms of database operation throughput at 1000 Mt than at 100 Mt.
We may conclude that different BSBM scales measure different things. The TPC workloads are relatively better in that they have a balance between metric components that stay relatively constant across a large range of scales.
This is not necessarily something that should be fixed in the BSBM Explore mix. We must however take these factors better into account in developing the BI mix.
Let us also remember that BSBM Explore is a relational workload. Future posts in this series will outline how we propose to make RDF-friendlier benchmarks.
Benchmarks, Redux Series
- Benchmarks, Redux (part 1): On RDF Benchmarks
-
Benchmarks, Redux (part 2): A Benchmarking Story
-
Benchmarks, Redux (part 3): Virtuoso 7 vs 6 on BSBM Load and Explore
-
Benchmarks, Redux (part 4): Benchmark Tuning Questionnaire
-
Benchmarks, Redux (part 5): BSBM and I/O; HDDs and SSDs
-
Benchmarks, Redux (part 6): BSBM and I/O, continued
-
Benchmarks, Redux (part 7): What Does BSBM Explore Measure? (this post)
-
Benchmarks, Redux (part 8): BSBM Explore and Update
-
Benchmarks, Redux (part 9): BSBM With Cluster
-
Benchmarks, Redux (part 10): LOD2 and the Benchmark Process
-
Benchmarks, Redux (part 11): On the Substance of RDF Benchmarks
-
Benchmarks, Redux (part 12): Our Own BSBM Results Report
-
Benchmarks, Redux (part 13): BSBM BI Modifications
-
Benchmarks, Redux (part 14): BSBM BI Mix
-
Benchmarks, Redux (part 15): BSBM Test Driver Enhancements