Multiple Index Regression Analysis

June 24, 2015, 4:00 am

≫ Next: DB2 Basics: Filesystems for DB2 on Unix and Linux Systems

I actually had a blog entry started on this topic before IDUG. I knew this was possible, but not exactly how to do it. Then I sat in Scott Hayes’ session – D04 – More Sage Advice: Invaluable DB2 LUW Performance Insights from Around the Globe, and he had all the details I needed to really get rolling on trying this out – saving me valuable time in research.

Indexing is relatively easy with a simple data model, or even a complicated one that doesn’t join a lot of tables, once you get used to it. But get to a point where you’ve got views and tables and dozens of objects involved, where the time is not primarily spent on reading data from the tables and indexes, but in the joining and filtering, and it starts to get difficult.

I’m calling this multiple index regression analysis because we’re looking for which recommended indexes most impact our query performance. The DB2 design advisor can recommend many indexes to improve performance, and some of them are more impactful than others. In some analysis I was doing last week, one of the indexes turned out to only reduce query execution time by 0.0001 timerons – not worth the overhead and space for that particular index. This does not use actual statistical formulas like those used in Multiple Linear Regression, but I could see a whole-workload approach where that would be interesting.

This is a highly advanced DB2 topic. It is assumed that the reader understands terms like Timeron and gets the concepts and can read an explain and db2advis output.

Scenario

In this example, I’m working with this query:

select * from order_trace  where  c_state = ? and dl_ap_date2 >= ? and  reference not in (select reference from edi_204  where edi_204.reference = reference and edi_204.order_id = order_id  and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? )  order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id;

Doesn’t look so bad, right? except order_trace is a view that is quite complicated. So my explain plan looks like this:

Access Plan:
-----------
    Total Cost:         35208.6
    Query Degree:       1

                                                                                                                                                                                       Rows 
                                                                                                                                                                                      RETURN
                                                                                                                                                                                      (   1)
                                                                                                                                                                                       Cost 
                                                                                                                                                                                        I/O 
                                                                                                                                                                                        |
                                                                                                                                                                                      2.92646 
                                                                                                                                                                                     >^NLJOIN
                                                                                                                                                                                     (   2)
                                                                                                                                                                                      35208.6 
                                                                                                                                                                                      11324.4 
                                                                                                                                                                          /-------------+--------------\
                                                                                                                                                                      2.92646                             1 
                                                                                                                                                                     >^NLJOIN                          IXSCAN
                                                                                                                                                                     (   3)                            (  66)
                                                                                                                                                                      35062.3                          50.0091 
                                                                                                                                                                      11318.5                             2 
                                                                                                                                                      /-----------------+-----------------\              |
                                                                                                                                                  2.92646                                    1         595580 
                                                                                                                                                 >^NLJOIN                                 IXSCAN   INDEX: DB2     
                                                                                                                                                 (   4)                                   (  65)      ORDER_ID
                                                                                                                                                  34989.1                                 25.0083        Q1
                                                                                                                                                  11315.6                                    1 
                                                                                                                              /---------------------+----------------------\                |
                                                                                                                          2.92646                                             1            13398 
                                                                                                                         >^NLJOIN                                          IXSCAN     INDEX: SYSIBM  
                                                                                                                         (   5)                                            (  64)   SQL991011111520810
                                                                                                                          34915.9                                          25.0074          Q2
                                                                                                                          11312.7                                             1 
                                                                                             /------------------------------+-------------------------------\                |
                                                                                         2.92646                                                               1            2976 
                                                                                         NLJOIN                                                             IXSCAN     INDEX: SYSIBM  
                                                                                         (   6)                                                             (  63)   SQL991011113441210
                                                                                         34842.7                                                            25.0112          Q3
                                                                                         11309.8                                                               1 
                                        /--------------------------------------------------+---------------------------------------------------\              |
                                    2.92646                                                                                                       1          13398 
                                    TBSCAN                                                                                                     NLJOIN   INDEX: DB2     
                                    (   7)                                                                                                     (  32)      DRIVER_ID
                                    33816.6                                                                                                    268.586        Q4
                                    11237.2                                                                                                      19 
                                      |                                                                                             /------------+-------------\
                                    2.92646                                                                                        1                              1 
                                    SORT                                                                                        NLJOIN                         TBSCAN
                                    (   8)                                                                                      (  33)                         (  59)
                                    33816.6                                                                                     243.01                         12.7956 
                                    11237.2                                                                                       17                              1 
                                      |                                                                                /----------+----------\                   |
                                    2.92646                                                                           1                         1                 1 
                                    HSJOIN                                                                         NLJOIN                    TBSCAN            TEMP  
                                    (   9)                                                                         (  34)                    (  56)            (  60)
                                    33816.6                                                                        191.871                   38.3582           12.7911 
                                    11237.2                                                                          13                         3                 1 
                  /-------------------+--------------------\                                                /--------+--------\                |                 |
              139659                                       2.92646                                         1                     1              1                 1 
              ^HSJOIN                                      HSJOIN                                       ^NLJOIN               TBSCAN         TEMP              GRPBY 
              (  10)                                       (  13)                                       (  35)                (  38)         (  57)            (  61)
              9380.69                                      24432.7                                      37.7969               140.072        38.3536           12.7892 
              3516.17                                       7721                                           2                    10              3                 1 
         /------+-------\                        /-----------+-----------\                         /------+------\              |              |                 |
     139659              97515               2685.75                     152.176                  1                 1            1              1             0.364844 
     IXSCAN             IXSCAN               NLJOIN                      ^NLJOIN               IXSCAN            IXSCAN       TEMP           IXSCAN            IXSCAN
     (  11)             (  12)               (  14)                      (  19)                (  36)            (  37)       (  39)         (  58)            (  62)
     5805.14            3567.73              6632.09                     17800.5               25.0075           25.5699      140.067        38.3518           12.7892 
     2657.04            859.125              2970.8                      4750.2                   1                 2           10              3                 1 
       |                  |               /----+-----\                 /---+----\                |                 |            |              |                 |
     139659              97515        1875.29        1.43218       152.176         1            2976             139659          1           440024             19946 
 INDEX: DB2         INDEX: DB2        TBSCAN         IXSCAN        TBSCAN       IXSCAN    INDEX: BFAIRCHILD  INDEX: DB2       NLJOIN     INDEX: DB2        INDEX: DB2     
      WIZ29       IDX201171957580000  (  15)         (  18)        (  20)       (  31)   IDX201162148270000      WIZ1171      (  40)   IDX201161942390000    IDX_STOP_1
       Q23                Q25         209.649        38.3519       15258.7      25.5699          Q10               Q11        140.066          Q8                Q5
                                      17.3389           3          3605.98         2                                            10 
                                        |              |             |            |                                /------------+-------------\
                                      1875.29        139659        152.176      139659                            1                              1 
                                      SORT       INDEX: DB2        SORT     INDEX: DB2                         TBSCAN                         TBSCAN
                                      (  16)   IDX201171959320000  (  21)       WIZ1171                        (  41)                         (  50)
                                      209.649          Q24         15258.7        Q22                          76.1435                        51.1416 
                                      17.3389                      3605.98                                     5.00001                           4 
                                        |                            |                                           |                              |
                                      1875.29                      152.176                                   1.3729e-005                         1 
                                      IXSCAN                       NLJOIN                                      SORT                           TEMP  
                                      (  17)                       (  22)                                      (  42)                         (  51)
                                      209.323                      15258.6                                     76.1433                        51.137 
                                      17.3389                      3605.98                                     5.00001                           4 
                                        |                     /------+-------\                                   |                              |
                                       97515                56               2.71743                         1.3729e-005                         1 
                                  INDEX: DB2              TBSCAN             IXSCAN                            NLJOIN                         TBSCAN
                                IDX201171957580000        (  23)             (  26)                            (  43)                         (  52)
                                        Q26               25.0247            332.82                            76.1429                        51.1352 
                                                             1               35.2989                           5.00001                           4 
                                                            |              /---+----\                    /-------+-------\                      |
                                                            56         803.566       54670              1              1.3729e-005               1 
                                                          SORT         TBSCAN   INDEX: DB2           TBSCAN              FETCH                SORT  
                                                          (  24)       (  27)    IDX_ORDER_01        (  44)              (  48)               (  53)
                                                          25.0246      78.3923        Q21            51.1352             25.0077              51.135 
                                                             1         6.08317                          4                1.00001                 4 
                                                            |            |                             |              /----+-----\              |
                                                            56         803.566                          1       1.3729e-005        519        1.00058 
                                                          IXSCAN       TEMP                          SORT         IXSCAN     TABLE: DB2       FETCH 
                                                          (  25)       (  28)                        (  45)       (  49)          CODE        (  54)
                                                          25.0201      78.1369                       51.135       25.0072          Q15        51.1347 
                                                             1         6.08317                          4            1                           4 
                                                            |            |                             |            |                    /------+------\
                                                            519        803.566                       1.00058        519              1.00058         1.9199e+007 
                                                      INDEX: DB2       UNIQUE                        FETCH    INDEX: DB2             IXSCAN        TABLE: EDI     
                                                        IDX_CODE_01    (  29)                        (  46)     IDX_CODE_01          (  55)   EDI_204_ADDITIONAL_FIELDS
                                                            Q27        78.0063                       51.1347        Q15              38.3527             Q18
                                                                       6.08317                          4                               3 
                                                                         |                      /------+------\                        |
                                                                       803.566              1.00058         1.9199e+007            1.9199e+007 
                                                                       IXSCAN               IXSCAN        TABLE: EDI             INDEX: EDI     
                                                                       (  30)               (  47)   EDI_204_ADDITIONAL_FIELDS   EDI_204_A_F_NDX
                                                                       77.9929              38.3527             Q13                    Q18
                                                                       6.08317                 3 
                                                                         |                    |
                                                                    2.23029e+007          1.9199e+007 
                                                                   INDEX: DB2           INDEX: EDI     
                                                                 IDX201171956050000     EDI_204_A_F_NDX
                                                                         Q28                  Q13

A bit more horrendous, that, and looking at it in detail, there’s no obvious gigantic table scan or even fetch from a table after reading an index that provides a significant amount of the time.

Running db2advis on that, we get:

  19  indexes in current solution
 [34706.0000] timerons  (without recommendations)
 [19035.0000] timerons  (with current solution)
 [45.15%] improvement


--
--
-- LIST OF RECOMMENDED INDEXES
-- ===========================
-- index[1],    0.548MB
   CREATE UNIQUE INDEX "ECROOKS "."IDX1505210120080"
   ON "DB2     "."ORDER" ("TRAILER_SIZE" ASC, "DL_AP_DATE2"
   ASC, "CUSTOMER_RATE" ASC, "NOTIFIED_DATE" ASC, "DATE_ENTERED"
   ASC, "COMMODITY" ASC, "NEEDS_PAPERWORK" ASC, "LINE_OF_BUSINESS"
   ASC, "TOTAL_CHARGE" ASC, "MILES" ASC, "TERMINAL" ASC,
   "SCAC" ASC, "SEAL" ASC, "PO_NUM" ASC, "PU_NUM" ASC,
   "BL_NUM" ASC, "REFERENCE" ASC, "WHO_ENTERED" ASC,
   "ORDER_TYPE" ASC, "TRACTOR_ID" ASC, "DRIVER_ID" ASC,
   "DL_AT_TIME2" ASC, "DL_AT_DATE2" ASC, "DL_AT_TIME1"
   ASC, "DL_AT_DATE1" ASC, "DL_AP_TIME2" ASC, "DL_AP_TIME1"
   ASC, "DL_AP_DATE1" ASC, "PU_AT_TIME2" ASC, "PU_AT_DATE2"
   ASC, "PU_AT_TIME1" ASC, "PU_AT_DATE1" ASC, "PU_AP_TIME2"
   ASC, "PU_AP_DATE2" ASC, "PU_AP_TIME1" ASC, "PU_AP_DATE1"
   ASC, "S_LOCATION_ID" ASC, "CHASSIS" ASC, "CHECK_DIGIT"
   ASC, "TRAILER" ASC, "STATUS" ASC, "ORDER_ID" ASC)
   INCLUDE ("C_LOCATION_ID", "ACT_LOCATION_ID", "BTO_LOCATION_ID")
   ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[2],   20.087MB
   CREATE INDEX "ECROOKS "."IDX1505210116380" ON "DB2     "."EDI_204"
   ("CUST_S_LOCATION_ID" ASC, "ORDER_ID" ASC, "REFERENCE"
   ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[3],    0.048MB
   CREATE UNIQUE INDEX "ECROOKS "."IDX1505210115300"
   ON "DB2     "."VENDOR_UNIT" ("VENDOR_UNIT_ID" ASC)
   INCLUDE ("VENDOR_ID") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[4],    5.778MB
   CREATE UNIQUE INDEX "ECROOKS "."IDX1505210115350"
   ON "DB2     "."LOCATION" ("LOCATION_ID" ASC) INCLUDE
   ("NAME") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[5], 1673.419MB
   CREATE INDEX "ECROOKS "."IDX1505210115360" ON "EDI     "."EDI_204_ADDITIONAL_FIELDS"
   ("REFERENCE" ASC, "FIELD_NAME" ASC, "INSERT_DT" ASC,
   "FIELD_VALUE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[6],    0.013MB
   CREATE INDEX "ECROOKS "."IDX1505210116350" ON "DB2     "."CODE"
   ("CODE_TEXT" ASC, "TYPE" ASC, "CODE_ID" ASC, "CODE"
   ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[7],    9.739MB
   CREATE UNIQUE INDEX "ECROOKS "."IDX1505210121040"
   ON "DB2     "."RAIL_ETA" ("ORDER_ID" ASC) INCLUDE
   ("STATE") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;

Some of those indexes I’m not likely to try. index[1] for example is awfully wide. index[2] is nearly identical to an existing index, with the difference being in the order of the last two columns. And 19 total indexes that DB2 wants to use for this one query? Wow, that’s a lot.

Also, I don’t have a full-size test or QA environment – the data size in my lower environments is significantly smaller, so it is hard for me to scale down to see what happens with different indexes in those environments. One option here is to mimic statistics in my lower environments. I’ve done that before, but I want to try mimicking the new indexes instead.

Setup steps

First, my explain tables are all currently shared tables. With multiple DBAs potentially working in this environment, I have two real choices to isolate my data – #1 is to create my own set of explain tables and make use of them. #2 is to note the time before I collect data and then the time after I collect explain and db2advis data and look only within that period to verify that only the data I want is included. I’m chosing the second option. With only 3 potential people running explains and none of us dedicated to that full time, this seems like a good option – we’re not likely to conflict.

Running the Explain and db2advis

I need to capture the explain and db2advis data before I work on anything. Note that I happen to be working in a Windows environment from a powershell prompt for this work – everything here would be the same on Linux/UNIX.

To capture my initial data, I’ll use:

PS D:\xtivia\queries> db2 "select current timestamp from sysibm.sysdummy1"

1
--------------------------
2015-05-21-10.00.07.793000

  1 record(s) selected.

PS D:\xtivia\queries> db2 set current explain mode explain
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 -tvf query13.sql
select * from db2.order_trace  where  c_state = ? and dl_ap_date2 >= ? and  reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order
_id  and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? )  order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id
SQL0217W  The statement was not executed as only Explain information requests
are being processed.  SQLSTATE=01604

PS D:\xtivia\queries> db2exfmt -d comtrak2 -e db2admin -1 -o query13_exfmt.txt
DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connecting to the Database.
Connect to Database Successful.
Output is in query13_exfmt.txt.
Executing Connect Reset -- Connect Reset was Successful.
PS D:\xtivia\queries> db2advis -d comtrak2 -i query13.sql -p |tee query13_advis.txt

Using user id as default schema name. Use -n option to specify schema
execution started at timestamp 2015-05-21-10.01.12.315000
found [1] SQL statements from the input file
Recommending indexes...
total disk space needed for initial set [1715.827] MB
total disk space constrained to         [35280.399] MB
Trying variations of the solution set.
Optimization finished.
  19  indexes in current solution
 [38772.0000] timerons  (without recommendations)
 [18742.0000] timerons  (with current solution)
 [51.66%] improvement


--
--
-- LIST OF RECOMMENDED INDEXES
-- ===========================
-- index[1],    0.532MB
   CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231651490"
   ON "DB2     "."ORDER" ("TRAILER_SIZE" ASC, "DL_AP_DATE2"
   ASC, "CUSTOMER_RATE" ASC, "NOTIFIED_DATE" ASC, "DATE_ENTERED"
   ASC, "COMMODITY" ASC, "NEEDS_PAPERWORK" ASC, "LINE_OF_BUSINESS"
   ASC, "TOTAL_CHARGE" ASC, "MILES" ASC, "TERMINAL" ASC,
   "SCAC" ASC, "SEAL" ASC, "PO_NUM" ASC, "PU_NUM" ASC,
   "BL_NUM" ASC, "REFERENCE" ASC, "WHO_ENTERED" ASC,
   "ORDER_TYPE" ASC, "TRACTOR_ID" ASC, "DRIVER_ID" ASC,
   "DL_AT_TIME2" ASC, "DL_AT_DATE2" ASC, "DL_AT_TIME1"
   ASC, "DL_AT_DATE1" ASC, "DL_AP_TIME2" ASC, "DL_AP_TIME1"
   ASC, "DL_AP_DATE1" ASC, "PU_AT_TIME2" ASC, "PU_AT_DATE2"
   ASC, "PU_AT_TIME1" ASC, "PU_AT_DATE1" ASC, "PU_AP_TIME2"
   ASC, "PU_AP_DATE2" ASC, "PU_AP_TIME1" ASC, "PU_AP_DATE1"
   ASC, "S_LOCATION_ID" ASC, "CHASSIS" ASC, "CHECK_DIGIT"
   ASC, "TRAILER" ASC, "STATUS" ASC, "ORDER_ID" ASC)
   INCLUDE ("C_LOCATION_ID", "ACT_LOCATION_ID", "BTO_LOCATION_ID")
   ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[2],   21.126MB
   CREATE INDEX "DB2ADMIN"."IDX1505231648190" ON "DB2     "."EDI_204"
   ("CUST_S_LOCATION_ID" ASC, "ORDER_ID" ASC, "REFERENCE"
   ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[3],    0.048MB
   CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231647110"
   ON "DB2     "."VENDOR_UNIT" ("VENDOR_UNIT_ID" ASC)
   INCLUDE ("VENDOR_ID") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[4],    5.782MB
   CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231647160"
   ON "DB2     "."LOCATION" ("LOCATION_ID" ASC) INCLUDE
   ("NAME") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[5], 1678.521MB
   CREATE INDEX "DB2ADMIN"."IDX1505231647170" ON "EDI     "."EDI_204_ADDITIONAL_FIELDS"
   ("REFERENCE" ASC, "FIELD_NAME" ASC, "INSERT_DT" ASC,
   "FIELD_VALUE" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[6],    0.013MB
   CREATE INDEX "DB2ADMIN"."IDX1505231648160" ON "DB2     "."CODE"
   ("CODE_TEXT" ASC, "TYPE" ASC, "CODE_ID" ASC, "CODE"
   ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;
-- index[7],    9.806MB
   CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231652450"
   ON "DB2     "."RAIL_ETA" ("ORDER_ID" ASC) INCLUDE
   ("STATE") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
   COMMIT WORK ;


snip

1449 solutions were evaluated by the advisor
DB2 Workload Performance Advisor tool is finished.
PS D:\xtivia\queries> db2 set current explain mode no
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 "select current timestamp from sysibm.sysdummy1"

1
--------------------------
2015-05-21-10.03.06.369000

  1 record(s) selected.

What this gets me is all of the details about this query without any changes, and the recommended indexes – in this case 7 new indexes. I don’t really like all of those indexes, but I need to figure out which ones will give me the most impact and analyze other characteristics about them. Note the use of the -p option on the db2advis. This causes DB2 to retain information about the explain plan and recommended indexes.

We also need to note from this process the starting time and the ending time for the analysis – in this case:
Start: 2015-05-21-10.00.07
End: 2015-05-21-10.03.06

Note Costs

The next step is to gather the costs of this query before and after the indexes recommended by the db2 design advisor. Note that we have to fill in the timestamps from above to limit the data returned.

select 
    dec(total_cost,20,4) as before_total_cost
    , dec(io_cost,20,4) as before_io_cost
    , dec(CPU_cost,20,4) as before_cpu_cost
    , dec(Comm_cost,20,4) as before_comm_cost
from Explain_Operator
    ,(select min(explain_time) as mintime
        from Explain_Operator 
        where operator_type = 'RETURN' 
          and explain_time between timestamp('2015-05-21-10.00.07') and timestamp('2015-05-21-10.03.06')) as b
where 
    explain_time = b.mintime
    and operator_type = 'RETURN' 
with UR ;
BEFORE_TOTAL_COST      BEFORE_IO_COST         BEFORE_CPU_COST        BEFORE_COMM_COST
---------------------- ---------------------- ---------------------- ----------------------
            38772.4492             11552.6152         705355200.0000                 0.0000

  1 record(s) selected.

And:

select 
    dec(total_cost,20,4) as after_total_cost
    , dec(io_cost,20,4) as after_io_cost
    , dec(CPU_cost,20,4) as after_cpu_cost
    , dec(Comm_cost,20,4) as after_comm_cost
from Explain_Operator
    ,(select max(explain_time) as maxtime
        from Explain_Operator 
        where operator_type = 'RETURN' 
          and explain_time between timestamp('2015-05-21-10.00.07') and timestamp('2015-05-21-10.03.06')) as b
where 
    explain_time = b.maxtime
    and operator_type = 'RETURN' 
with UR ;
AFTER_TOTAL_COST       AFTER_IO_COST          AFTER_CPU_COST         AFTER_COMM_COST
---------------------- ---------------------- ---------------------- ----------------------
            18741.7597              5339.1113         131031752.0000                 0.0000

  1 record(s) selected.

What we have in the explain tables at this point is really two sets of data – the data from the original explain, and the data from an explain as if all of the recommended indexes existed.

Now here’s a query I prefer to work with of the existing and recommended indexes for this query. It uses both data on existing indexes from syscat.indexes and syscat.tables, along with data on the proposed new indexes from advise_index. It’s also one I run over and over again to help me keep track of what indexes I’m trying things with:

with ts as ( select max(explain_time) as maxtime
        from Explain_Operator 
        where operator_type = 'RETURN' 
          and explain_time between timestamp('2015-05-21-10.00.07') and timestamp('2015-05-21-10.03.06'))
select substr(name,1,25) as indname
    , substr(tbcreator,1,13) as tabschema
    , substr(tbname,1,18) as tabname
    , coalesce(si.fullkeycard, ai.fullkeycard) as fullkeycard
    , st.card
    , coalesce(si.uniquerule, ai.uniquerule) as uniquerule
    , use_index
    , exists
    , substr(coalesce(si.colnames, ai.colnames),1,60) as colnames 
from advise_index ai 
    left outer join syscat.indexes si on ai.tbcreator=si.tabschema and ai.tbname=si.tabname and ai.name=si.indname 
    left outer join syscat.tables st on st.tabschema=ai.tbcreator and st.tabname=ai.tbname 
    join ts on explain_time = ts.maxtime
order by exists, use_index, uniquerule desc, indname with ur;

I’m actually masaging that a bit with clpplus to get the output I want because substr isn’t doing so well at the powershell prompt for me at the moment:

                                                                              Index                Table                                                                                         
Index Name           Table Schema    Table Name                         Cardinality          Cardinality UNIQUERULE USE_INDEX EXISTS Column Names                                                
-------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------
IDX1505231647110     DB2             VENDOR_UNIT                               2976                 2976 U          Y         N      +VENDOR_UNIT_ID-VENDOR_ID                                   
IDX1505231647160     DB2             LOCATION                                140445               140445 U          Y         N      +LOCATION_ID-NAME                                           
IDX1505231651490     DB2             ORDER                                      642                52497 U          Y         N      +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E
                                                                                                                                     NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR
                                                                                                                                     GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE...  

IDX1505231652450     DB2             RAIL_ETA                                649793               649793 U          Y         N      +ORDER_ID-STATE                                             
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          Y         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          Y         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231648160     DB2             CODE                                         5                  520 D          Y         N      +CODE_TEXT+TYPE+CODE_ID+CODE                                
IDX1505231648190     DB2             EDI_204                                 295885              1014841 D          Y         N      +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE                      
DRIVER_ID            DB2             DRIVER                                   13519                13519 U          Y         Y      +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER     
IDX201162151160000   DB2             CITY                                     97516                97516 U          Y         Y      +CITY_ID+STATE+CITY+TIME_ZONE                               
WIZ1166              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID            
WIZ1171              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+NAME+NUMBER                                    
SQL991011111520810   DB2             DRIVER                                   13519                13519 P          Y         Y      +DRIVER_ID                                                  
SQL991011113441210   DB2             VENDOR_UNIT                               2976                 2976 P          Y         Y      +VENDOR_UNIT_ID                                             
IDX_CODE_01          DB2             CODE                                       518                  520 D          Y         Y      +TYPE+CODE_TEXT+CODE                                        
IDX_STOP_1           DB2             STOP                                     15883                21568 D          Y         Y      +ORDER_ID                                                   
IDX201161942390000   EDI             EDI_204_HUB_CUSTOMER                    451173               451173 D          Y         Y      +REFERENCE+DESCRIPTION                                      
IDX201171957580000   DB2             CITY                                     97516                97516 D          Y         Y      +STATE+CITY+CITY_ID                                         
IDX201171959320000   DB2             LOCATION                                140445               140445 D          Y         Y      +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID

I pull that output into Excel so I can note details as I work through each index and easily play with the numbers. It looks something like this:

I hide and add some columns, but I’m running this same query between every change I make so I keep track of what I’ve done and not done.

Analysis Techniques

At this point, there are two different methods that Scott laid out in his presentation on this topic – Index Addition and Index Subtraction.

Index Addition

The point of index addition is to run an explain on the query with each proposed index individually, to see the impact of that index alone. This works particuarly well when a great impact is seen from a few indexes and the combination of indexes does not make much difference. To use this method, we update ADVISE_INDEX to have USE_INDEX set to N for all indexes that do not already exist:

PS > db2 "update advise_index set use_index = 'N' where exists= 'N'"
DB20000I  The SQL command completed successfully.
                                                                              Index                Table                                                                                         
Index Name           Table Schema    Table Name                         Cardinality          Cardinality UNIQUERULE USE_INDEX EXISTS Column Names                                                
-------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------
IDX1505231647110     DB2             VENDOR_UNIT                               2976                 2976 U          N         N      +VENDOR_UNIT_ID-VENDOR_ID                                   
IDX1505231647160     DB2             LOCATION                                140445               140445 U          N         N      +LOCATION_ID-NAME                                           
IDX1505231651490     DB2             ORDER                                      642                52497 U          N         N      +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E
                                                                                                                                     NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR
                                                                                                                                     GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE...  

IDX1505231652450     DB2             RAIL_ETA                                649793               649793 U          N         N      +ORDER_ID-STATE                                             
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          N         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          N         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231648160     DB2             CODE                                         5                  520 D          N         N      +CODE_TEXT+TYPE+CODE_ID+CODE                                
IDX1505231648190     DB2             EDI_204                                 295885              1014841 D          N         N      +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE                      
DRIVER_ID            DB2             DRIVER                                   13519                13519 U          Y         Y      +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER     
IDX201162151160000   DB2             CITY                                     97516                97516 U          Y         Y      +CITY_ID+STATE+CITY+TIME_ZONE                               
WIZ1166              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID            
WIZ1171              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+NAME+NUMBER                                    
SQL991011111520810   DB2             DRIVER                                   13519                13519 P          Y         Y      +DRIVER_ID                                                  
SQL991011113441210   DB2             VENDOR_UNIT                               2976                 2976 P          Y         Y      +VENDOR_UNIT_ID                                             
IDX_CODE_01          DB2             CODE                                       518                  520 D          Y         Y      +TYPE+CODE_TEXT+CODE                                        
IDX_STOP_1           DB2             STOP                                     15883                21568 D          Y         Y      +ORDER_ID                                                   
IDX201161942390000   EDI             EDI_204_HUB_CUSTOMER                    451173               451173 D          Y         Y      +REFERENCE+DESCRIPTION                                      
IDX201171957580000   DB2             CITY                                     97516                97516 D          Y         Y      +STATE+CITY+CITY_ID                                         
IDX201171959320000   DB2             LOCATION                                140445               140445 D          Y         Y      +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID

Now one by one we set use_index equal to ‘Y’ for each index and run an explain using the ‘evaluate indexes’ option.

PS D:\xtivia\queries> db2 "update advise_index set use_index = 'Y' where name = 'IDX1505231647110'"
DB20000I  The SQL command completed successfully.

                                                                              Index                Table                                                                                         
Index Name           Table Schema    Table Name                         Cardinality          Cardinality UNIQUERULE USE_INDEX EXISTS Column Names                                                
-------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------
IDX1505231647160     DB2             LOCATION                                140445               140445 U          N         N      +LOCATION_ID-NAME                                           
IDX1505231651490     DB2             ORDER                                      642                52497 U          N         N      +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E
                                                                                                                                     NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR
                                                                                                                                     GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE...  

IDX1505231652450     DB2             RAIL_ETA                                649793               649793 U          N         N      +ORDER_ID-STATE                                             
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          N         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          N         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231648160     DB2             CODE                                         5                  520 D          N         N      +CODE_TEXT+TYPE+CODE_ID+CODE                                
IDX1505231648190     DB2             EDI_204                                 295885              1014841 D          N         N      +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE                      
IDX1505231647110     DB2             VENDOR_UNIT                               2976                 2976 U          Y         N      +VENDOR_UNIT_ID-VENDOR_ID                                   
DRIVER_ID            DB2             DRIVER                                   13519                13519 U          Y         Y      +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER     
IDX201162151160000   DB2             CITY                                     97516                97516 U          Y         Y      +CITY_ID+STATE+CITY+TIME_ZONE                               
WIZ1166              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID            
WIZ1171              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+NAME+NUMBER                                    
SQL991011111520810   DB2             DRIVER                                   13519                13519 P          Y         Y      +DRIVER_ID                                                  
SQL991011113441210   DB2             VENDOR_UNIT                               2976                 2976 P          Y         Y      +VENDOR_UNIT_ID                                             
IDX_CODE_01          DB2             CODE                                       518                  520 D          Y         Y      +TYPE+CODE_TEXT+CODE                                        
IDX_STOP_1           DB2             STOP                                     15883                21568 D          Y         Y      +ORDER_ID                                                   
IDX201161942390000   EDI             EDI_204_HUB_CUSTOMER                    451173               451173 D          Y         Y      +REFERENCE+DESCRIPTION                                      
IDX201171957580000   DB2             CITY                                     97516                97516 D          Y         Y      +STATE+CITY+CITY_ID                                         
IDX201171959320000   DB2             LOCATION                                140445               140445 D          Y         Y      +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID

With that set, we set the current explain mode, and explain the query:

PS D:\xtivia\queries> db2 set current explain mode evaluate indexes
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 -tvf query13.sql
select * from db2.order_trace  where  c_state = ? and dl_ap_date2 >= ? and  reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order
_id  and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? )  order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id
SQL0217W  The statement was not executed as only Explain information requests
are being processed.  SQLSTATE=01604

PS D:\xtivia\queries> db2 set current explain mode no
DB20000I  The SQL command completed successfully.

Note that the explain mode used is not yes or explain, but evaluate indexes.

Next, we must query the cost to record in our spreadsheet:

select 
    dec(total_cost,20,4) as this_index_total_cost
    , dec(io_cost,20,4) as this_index_io_cost
    , dec(CPU_cost,20,4) as this_index_cpu_cost
    , dec(Comm_cost,20,4) as this_index_comm_cost
from Explain_Operator
    ,(select max(explain_time) as maxtime
        from Explain_Operator 
        where operator_type = 'RETURN') as b
where 
    explain_time = b.maxtime
    and operator_type = 'RETURN' 
with UR ;

select 
THIS_INDEX_TOTAL_COST  THIS_INDEX_IO_COST     THIS_INDEX_CPU_COST    THIS_INDEX_COMM_COST
---------------------- ---------------------- ---------------------- ----------------------
            38772.4492             11552.6152         705353856.0000                 0.0000

  1 record(s) selected.

If you’re keeping track, that’s ZERO impact on timerons for that particular index. Now we have to move through each index in turn. I’m only showing the second one so you understand that each time we mark the other indexes as not being used, and mark only one index as being used:

PS D:\xtivia\queries> db2 "update advise_index set use_index = 'N' where exists= 'N'"
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 "update advise_index set use_index = 'Y' where name = 'IDX1505231647160'"
DB20000I  The SQL command completed successfully.

                                                                              Index                Table                                                                                         
Index Name           Table Schema    Table Name                         Cardinality          Cardinality UNIQUERULE USE_INDEX EXISTS Column Names                                                
-------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------
IDX1505231647110     DB2             VENDOR_UNIT                               2976                 2976 U          N         N      +VENDOR_UNIT_ID-VENDOR_ID                                   
IDX1505231651490     DB2             ORDER                                      642                52497 U          N         N      +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E
                                                                                                                                     NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR
                                                                                                                                     GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE...  

IDX1505231652450     DB2             RAIL_ETA                                649793               649793 U          N         N      +ORDER_ID-STATE                                             
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          N         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          N         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231648160     DB2             CODE                                         5                  520 D          N         N      +CODE_TEXT+TYPE+CODE_ID+CODE                                
IDX1505231648190     DB2             EDI_204                                 295885              1014841 D          N         N      +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE                      
IDX1505231647160     DB2             LOCATION                                140445               140445 U          Y         N      +LOCATION_ID-NAME                                           
DRIVER_ID            DB2             DRIVER                                   13519                13519 U          Y         Y      +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER     
IDX201162151160000   DB2             CITY                                     97516                97516 U          Y         Y      +CITY_ID+STATE+CITY+TIME_ZONE                               
WIZ1166              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID            
WIZ1171              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+NAME+NUMBER                                    
SQL991011111520810   DB2             DRIVER                                   13519                13519 P          Y         Y      +DRIVER_ID                                                  
SQL991011113441210   DB2             VENDOR_UNIT                               2976                 2976 P          Y         Y      +VENDOR_UNIT_ID                                             
IDX_CODE_01          DB2             CODE                                       518                  520 D          Y         Y      +TYPE+CODE_TEXT+CODE                                        
IDX_STOP_1           DB2             STOP                                     15883                21568 D          Y         Y      +ORDER_ID                                                   
IDX201161942390000   EDI             EDI_204_HUB_CUSTOMER                    451173               451173 D          Y         Y      +REFERENCE+DESCRIPTION                                      
IDX201171957580000   DB2             CITY                                     97516                97516 D          Y         Y      +STATE+CITY+CITY_ID                                         
IDX201171959320000   DB2             LOCATION                                140445               140445 D          Y         Y      +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID

PS D:\xtivia\queries> db2 set current explain mode evaluate indexes
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 -tvf query13.sql
select * from db2.order_trace  where  c_state = ? and dl_ap_date2 >= ? and  reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order
_id  and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? )  order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id
SQL0217W  The statement was not executed as only Explain information requests
are being processed.  SQLSTATE=01604

PS D:\xtivia\queries> db2exfmt -d comtrak2 -1 -o query13_ind2_exfmt.txt
DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connecting to the Database.
Connect to Database Successful.
Base table information incomplete
Output is in query13_ind2_exfmt.txt.
Executing Connect Reset -- Connect Reset was Successful.
PS D:\xtivia\queries> db2 set current explain mode no
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 -tvf this_index_cost_query.sql
select dec(total_cost,20,4) as this_index_total_cost , dec(io_cost,20,4) as this_index_io_cost , dec(CPU_cost,20,4) as this_index_cpu_cost , dec(Comm_cost,20,4) as this_index_comm_cost from Explain_Op
erator ,(select max(explain_time) as maxtime from Explain_Operator where operator_type = 'RETURN') as b where explain_time = b.maxtime and operator_type = 'RETURN' with UR

THIS_INDEX_TOTAL_COST  THIS_INDEX_IO_COST     THIS_INDEX_CPU_COST    THIS_INDEX_COMM_COST
---------------------- ---------------------- ---------------------- ----------------------
            38772.4492             11552.6152         705354496.0000                 0.0000

  1 record(s) selected.

Note that I do tend to run a db2exfmt each time. I’m not technically required to, but I find them useful to have to go back and look at. Again, this index appears to have ZERO impact by itself.

When I’ve gone through each one, this is what my spreadsheet looks like:

Note that by far, the biggest single impact is that wide index that I really don’t like the idea of. Only one other of the 7 recommended new indexes shows any impact by itself. This leads me to wonder why DB2 is even recommending some of the others. I’m hoping that it is because a combination of them makes a real difference. Let’s try Index Subtraction to see if that is true.

Index Subtraction

For Index Subtraction, we tell DB2 to give us the numbers as if all of the proposed indexes exist, and we take away only one at a time to see the impact of not having that particular index. The steps are very similar, just with slightly different updates to the USE_INDEX column.

To start with, we want all of the recommended indexes for this query to be marked with USE_INDEX of Y.

PS D:\xtivia\queries> db2 "update advise_index set use_index='Y' where name in ('IDX1505231647110','IDX1505231647160','IDX1505231652450','IDX1505231647170','IDX1505231648160','IDX1505231648190','IDX15
05231651490')"
DB20000I  The SQL command completed successfully.
                                                                              Index                Table                                                                                         
Index Name           Table Schema    Table Name                         Cardinality          Cardinality UNIQUERULE USE_INDEX EXISTS Column Names                                                
-------------------- --------------- ------------------------- -------------------- -------------------- ---------- --------- ------ ------------------------------------------------------------
IDX1505231647110     DB2             VENDOR_UNIT                               2976                 2976 U          Y         N      +VENDOR_UNIT_ID-VENDOR_ID                                   
IDX1505231647160     DB2             LOCATION                                140445               140445 U          Y         N      +LOCATION_ID-NAME                                           
IDX1505231651490     DB2             ORDER                                      642                52497 U          Y         N      +TRAILER_SIZE+DL_AP_DATE2+CUSTOMER_RATE+NOTIFIED_DATE+DATE_E
                                                                                                                                     NTERED+COMMODITY+NEEDS_PAPERWORK+LINE_OF_BUSINESS+TOTAL_CHAR
                                                                                                                                     GE+MILES+TERMINAL+SCAC+SEAL+PO_NUM+PU_NUM+BL_NUM+REFERE...  

IDX1505231652450     DB2             RAIL_ETA                                649793               649793 U          Y         N      +ORDER_ID-STATE                                             
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          Y         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231647170     EDI             EDI_204_ADDITIONAL_FIELDS             19897638             19897638 D          Y         N      +REFERENCE+FIELD_NAME+INSERT_DT+FIELD_VALUE                 
IDX1505231648160     DB2             CODE                                         5                  520 D          Y         N      +CODE_TEXT+TYPE+CODE_ID+CODE                                
IDX1505231648190     DB2             EDI_204                                 295885              1014841 D          Y         N      +CUST_S_LOCATION_ID+ORDER_ID+REFERENCE                      
DRIVER_ID            DB2             DRIVER                                   13519                13519 U          Y         Y      +DRIVER_ID+FIRST_NAME+LAST_NAME+PREVIOUS_NUM+DISPATCHER     
IDX201162151160000   DB2             CITY                                     97516                97516 U          Y         Y      +CITY_ID+STATE+CITY+TIME_ZONE                               
WIZ1166              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+ZIP+LONGITUDE+LATITUDE+NAME+CITY_ID            
WIZ1171              DB2             LOCATION                                140445               140445 U          Y         Y      +LOCATION_ID+NAME+NUMBER                                    
SQL991011111520810   DB2             DRIVER                                   13519                13519 P          Y         Y      +DRIVER_ID                                                  
SQL991011113441210   DB2             VENDOR_UNIT                               2976                 2976 P          Y         Y      +VENDOR_UNIT_ID                                             
IDX_CODE_01          DB2             CODE                                       518                  520 D          Y         Y      +TYPE+CODE_TEXT+CODE                                        
IDX_STOP_1           DB2             STOP                                     15883                21568 D          Y         Y      +ORDER_ID                                                   
IDX201161942390000   EDI             EDI_204_HUB_CUSTOMER                    451173               451173 D          Y         Y      +REFERENCE+DESCRIPTION                                      
IDX201171957580000   DB2             CITY                                     97516                97516 D          Y         Y      +STATE+CITY+CITY_ID                                         
IDX201171959320000   DB2             LOCATION                                140445               140445 D          Y         Y      +CITY_ID+DROP+ZIP+LONGITUDE+LATITUDE+NAME+LOCATION_ID

Now one at a time, we will mark the indexes to not be used, and run the explain using evaluate indexes.

PS D:\xtivia\queries> db2 "update advise_index set use_index='N' where name = 'IDX1505231647110'"
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 set current explain mode evaluate indexes
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 -tvf query13.sql
select * from db2.order_trace  where  c_state = ? and dl_ap_date2 >= ? and  reference not in (select reference from db2.edi_204 edi_204 where edi_204.reference = reference and edi_204.order_id = order
_id  and CUST_s_location_id = ? ) and (status in (? ,? ) OR ORDER_TYPE = ? )  order by dl_ap_date2, dl_ap_time2, act_state, act_city, actual, order_id
SQL0217W  The statement was not executed as only Explain information requests
are being processed.  SQLSTATE=01604

PS D:\xtivia\queries> db2exfmt -d comtrak2 -1 -o query13_excl_ind1_exfmt.txt
DB2 Universal Database Version 9.7, 5622-044 (c) Copyright IBM Corp. 1991, 2009
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool

Connecting to the Database.
Connect to Database Successful.
Base table information incomplete
Base table information incomplete
Base table information incomplete
Base table information incomplete
Base table information incomplete
Base table information incomplete
Output is in query13_excl_ind1_exfmt.txt.
Executing Connect Reset -- Connect Reset was Successful.
PS D:\xtivia\queries> db2 set current explain mode no
DB20000I  The SQL command completed successfully.
PS D:\xtivia\queries> db2 -tvf this_index_cost_query.sql
select dec(total_cost,20,4) as this_index_total_cost , dec(io_cost,20,4) as this_index_io_cost , dec(CPU_cost,20,4) as this_index_cpu_cost , dec(Comm_cost,20,4) as this_index_comm_cost from Explain_Op
erator ,(select max(explain_time) as maxtime from Explain_Operator where operator_type = 'RETURN') as b where explain_time = b.maxtime and operator_type = 'RETURN' with UR

THIS_INDEX_TOTAL_COST  THIS_INDEX_IO_COST     THIS_INDEX_CPU_COST    THIS_INDEX_COMM_COST
---------------------- ---------------------- ---------------------- ----------------------
            18741.7597              5339.1113         131033120.0000                 0.0000

  1 record(s) selected.

When I cycle through each index that way, ensuring that in each round I eliminate only one, this is what my spreadsheet looks like:

Results

With this information, I can immediately eliminate indexes: IDX1505231647110, IDX1505231647160, IDX1505231652450, IDX1505231648160, and IDX1505231647170. This leaves me with two indexes to consider. Given the fact that EDI_204 is one of the largest and most active tables in my database, I won’t be adding that index for the small additional impact that it gives for this query. I’m left with just one major index:

   CREATE UNIQUE INDEX "DB2ADMIN"."IDX1505231651490"
   ON "DB2     "."ORDER" ("TRAILER_SIZE" ASC, "DL_AP_DATE2"
   ASC, "CUSTOMER_RATE" ASC, "NOTIFIED_DATE" ASC, "DATE_ENTERED"
   ASC, "COMMODITY" ASC, "NEEDS_PAPERWORK" ASC, "LINE_OF_BUSINESS"
   ASC, "TOTAL_CHARGE" ASC, "MILES" ASC, "TERMINAL" ASC,
   "SCAC" ASC, "SEAL" ASC, "PO_NUM" ASC, "PU_NUM" ASC,
   "BL_NUM" ASC, "REFERENCE" ASC, "WHO_ENTERED" ASC,
   "ORDER_TYPE" ASC, "TRACTOR_ID" ASC, "DRIVER_ID" ASC,
   "DL_AT_TIME2" ASC, "DL_AT_DATE2" ASC, "DL_AT_TIME1"
   ASC, "DL_AT_DATE1" ASC, "DL_AP_TIME2" ASC, "DL_AP_TIME1"
   ASC, "DL_AP_DATE1" ASC, "PU_AT_TIME2" ASC, "PU_AT_DATE2"
   ASC, "PU_AT_TIME1" ASC, "PU_AT_DATE1" ASC, "PU_AP_TIME2"
   ASC, "PU_AP_DATE2" ASC, "PU_AP_TIME1" ASC, "PU_AP_DATE1"
   ASC, "S_LOCATION_ID" ASC, "CHASSIS" ASC, "CHECK_DIGIT"
   ASC, "TRAILER" ASC, "STATUS" ASC, "ORDER_ID" ASC)
   INCLUDE ("C_LOCATION_ID", "ACT_LOCATION_ID", "BTO_LOCATION_ID")
   ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;

Given how wide this index is, I’m really concerned about adding it to a table that is already a bit over-indexed, so I haven’t decided yet if I’m going to add it. But at least I now understand that it’s the only index really worth considering for this query.

I find that for now, I like to use both the index addition and subtraction methods to really see what difference there might be in the results.

A big thanks to Scott Hayes for covering this topic in depth at his IDUG North America presentation this year. This process has become a regular tool in my arsenal.

↧

DB2 Basics: Filesystems for DB2 on Unix and Linux Systems

July 1, 2015, 4:00 am

≫ Next: Architecting High-Availability and Disaster Recovery Solutions with DB2

≪ Previous: Multiple Index Regression Analysis

DB2 doesn’t have any defaults for filesystems because that is an OS-level thing. However, there are a few sanity checks and some recommeded separation when installing DB2 on Linux and UNIX operating systems. These are my best practices for installations, in general terms.

Software Location

The default location for installing the DB2 code in /opt/ibm/db2/V10.5 (replacing V10.5 with whatever version). Yes, I typed that from memory without having to look it up. The default location works fine, though I would recommend that /opt is mounted as its own filesystem and not just part of root. There are reasons to put the code in other places, but I wouldn’t unless you have a technical reason to do so. If you’re using db2_install (my favorite installation method), it will prompt you for the install location.

Don’t Create Instances in / or in /home

After software installation, the next location that DB2 uses is the home directory of the instance owner. Before creating the DB2 instance (either using db2setup or using db2icrt), you need to verify the primary group and the home directory of the DB2 instance owner. DO NOT place the home directory of the db2 instance owner on / or /home. The reasoning for this is because those locations may quite easily be filled up by other users on the server, and if the instance home fills up, DB2 becomes unusable. We don’t want a random user transferring a giant file and crashing the database. The home directory of the DB2 instance owner cannot easily be changed after instance creation.

Other Recommended Filesystems

For a minimal usage system, I can live with only two filesystems as described above – /opt and some filesystem for the home directory of the DB2 instance owner. But if I’m setting up a production system to the best reasonable design, I also include separate filesystems for at least the following.

Two Data Filesystems

I place the database home directory in a data filesystem. I like to have two just because it makes sense to me to start with two, even if they are both part of the same storage group.

The easiest way to configure a new database to use these filesystems is on the create database command – specify ON /path1, /path2 DBPATH ON /path1. I will also frequently set the DFTDBPATH DBM CFG parameter to the first data path.

Active Transaction Logs

Active transaction logs are critical to database performance and also to database functioning. Active logs should be on the fastest storage you have, and you should train operations and other people who might touch your database server to NEVER touch the active transaction log files. I have seen a production database crashed by a more junior operations person compressing an active transaction log.

To direct active log files to a new path, you’ll have to set the database configuration parameter NEWLOGPATH and deactivate/activate your database for the change to take effect

Archive Transaction Logs

If you have TSM or another location for Archiving transaction logs that is best and safest. If you are archiving to disk, these MUST be in a different directory from active transaction logs, and preferably in a separate filesystem. The logic for a separate filesystem is that this gives you an extra layer of protection – if you’re monitoring for full filesystems, you will catch a logging issue when it fills up your archive log filesystem, and hopefully have time to address it before it also fills your active log filesystem and makes your database unavailable.

To archive transaction log files to a path, you have to set the LOGARCHMETH1 db cfg parameter to: DISK:/path. If you’re setting LOGARCHMETH1 for the first time, you may also be changing from circular logging to archive logging, which requires an offline database backup, so be cautious.

db2diag/db2dump

The db2 diagnostic log and other diagnostic files by default will be in $INSTHOME/sqllib/db2dump. I like to have them in another filesystem – this ensures that no matter what else fills up or what other filesystem level problems are encountered, I should still get the error messages from those issues.

The location for this data is changed using the DIAGPATH parameter in the DBM cfg. It can be updated with an UPDATE DBM CFG command, and changes to it take effect immediately.

Backups

I like to take backups to a different filesystem. If your filesystems are fully separate I/O paths, this can have some performance benefits. But the real reason is because the backup location is the location you’re most likely to fill up, and you don’t want a filesystem filling up because of a backup and causing an availability issue for your database.

Specify the backup filesystem on any BACKUP DATABASE commands you issue, including those from scripts.

Scripts

I have seen a script go a bit haywire and capture more data that it was thought it would, and fill up the filesystem it is writing data to. For this reason, I like to keep my administrative scripts and their output on a separate filesystem from everything. This makes it much harder for a scripting mistake to cause a database availability issue.

Others

Obviously there are other filesystems that particular designs may call for – such as a shared filesystem between two HADR servers when doing loads with copy yes. But the above are true to just about every DB2 installation.

Separating I/O

This post does not cover storage design in depth. Usually the storage I get is a black box. A client allocates the space locally on the server or on a SAN, and I have a very hard time finding out the storage details. If I actually get to specify anything about where the storage goes or what is separate, my first priority is to separate out my active transaction logs and put them on the fastest storage I have, separate from everything else. If I have more specificity available, I like to separate my two data filesystems from my backup filesystem. After that it’s all icing. In only a very few cases do I see indexes and data separated any more. When I started as a DBA 14 years ago, that was a core tenant of DB database storage design.

It is Only Temporary

I cannot tell you how many stupid things I have been asked to do in the name of “It’s only temporary!” or “It’s only a development environment”, only to have those things become permanent fixtures in a production environment. Just today, I installed DB2 on root, with no separate filesystems at all, after making it perfectly clear I thought it was a horrible idea. My exact words were:

If you want me to move forward with the understanding that a significant percentage of our clients come to us just because of this kind of mis-configuration and the stability issues it causes, then I can. But I want you to understand what a bad idea I think it is.

There really is no such thing as “It’s only temporary” – design correctly from the start and the problems that you encounter are much easier to deal with.

↧

Architecting High-Availability and Disaster Recovery Solutions with DB2

July 15, 2015, 4:00 am

≫ Next: Analyzing Package Cache Size

≪ Previous: DB2 Basics: Filesystems for DB2 on Unix and Linux Systems

NOTE: Please do not attempt to architect a high-availability solution without a DB2 expert involved. There are many questions and answers needed beyond those here to make the best decision and ask all the tough questions.

High Availability vs. Disaster Recovery

High Availability and Disaster Recovery often have different goals and priorities. High Availability can be considered the ability to keep a database available with little to no data loss during common problems. Disaster Recovery usually protects against more major and unusual failures, but may take longer to complete and have slightly more possibility for data loss. Meeting goals for either may have an impact on database performance depending on hardware and network details. I once heard a client discussing Disaster Recovery on a call as “What if we lost Dallas? The whole City?”

Define goals

When embarking on a design process, the first thing to lay out is what your goals are for RTO and RPO. Often you may have different goals for High Availability vs Disaster Recovery, and you may also define different percentages of uptime required for planned outages vs. unplanned outages. Some of these may be defined in an SLA (Service Level Agreement) either internally to your organization or to an external customer. Always design your system(s) for less downtime than you promise the client to allow for unexpected scenarios. When talking through goals, it can be useful to come up with specific scenarios that you want to plan to protect against. Disk failure, major RAID or SAN failure, network failure, server component failure, power failure, natural disaster, human error, etc.

RTO (Recovery Time Objective)

This goal is expressed in units of time. If there is an outage, how soon are you expected to have the database up and running? Usually whether this is achieved via the High Availability solution or the Disaster Recovery solution is determined by the type of failure that caused the initial outage. There may be different Recovery Time Objectives defined for High Availability events versus Disaster Recovery events, or the same objective may be defined for both.

RPO (Recovery Point Objective)

The Recovery Point Objective is also defined as a duration of time. But in this case, it is the maximum age of data recovered in the case of a major failure. This usually applies more strongly to Disaster Recovery plans, as High Availability often accounts for less than a few seconds of data loss, if any. Often RPO will be defined for high availability in hours.

Uptime

One of the most common challenges I see in defining High Availability and Disaster Recovery situations is an executive that has become familiar with the terms four nines or five nines, and asks for this level of availability. Each Nine breaks down this way:

Nines	% Uptime	Time Per Year Down
One Nine	90%	36.5 days
Two Nines	99%	3.65 days
Three Nines	99.9%	8.76 hours
Four Nines	99.99%	52.56 minutes
Five Nines	99.999%	5.26 minutes

There are several things to keep in mind with these numbers. First is that as you increase the uptime, the cost goes up in at least an exponential manner. I can accomplish one nine quite easily with one database server and some offsite data movement (depending on RPO). If I had to guarantee Five Nines, my knee-jerk implementation would involve 10 servers.

Second, the uptime may be defined either over unplanned down time or over both planned and unplanned down time. Since very few methods allow for an on-line version-level upgrade of DB2, that alone pushes us back to three nines depending on database size, or to some more complicated implementations.

Third, uptime may be defined at the application level, so the database’s share of any down-time may be smaller than you think. If the infrastructure team, the database team, and the application team all plan 50 minutes of down time per year, they’re unlikely to be able to overlap all of that, and unlikely to make a 99.99% uptime goal.

Throughput

There is also the definition of what “downtime” is. My database can be up and available 99.99% of the time, but if an application pounds it with volumes of bad SQL, that could be considered downtime. The same workload that could be 99.99% unplanned downtime on one set of hardware could be unable to even hit three nines on significantly less powerful hardware.

High Availability Options

I don’t pretend to cover all options, but want to offer a few of the more common ones.

HADR

The first solution that comes to mind that meets many implemenations’ needs for High Availability is HADR. HADR allows us to have a second (or third or fourth) DB2 server that is rolling forward through transaction logs and can be failed over to in a matter of minutes or seconds. This failover can even be initiated automatically when a problem is detected, using TSAMP (often included with DB2 Licensing). Often the HADR standby servers can be licensed at just 100 PVU instead of the full PVU count for the servers. Usually the standby database servers are not connectable, but in some limited situations, read-only traffic can be directed to them if you’re willing to fully license the database server. Please verify all licensing details with IBM.

HADR has the advantage that it is often free for use with DB2 and it is easy to set-up, monitor, and administer for qualified personnel. It also can be configured to share almost nothing, protecting against a very wide range of failures including disk/RAID/SAN failures. I’ve seen two different situations where a client lost an entire RAID array at once on a production database server, and HADR was a lifesaver for their data.

For High-Availability implementations with HADR, the two database servers should have high-speed networks between them with 100 KM or less of distance. SYNC and NEARSYNC HADR modes are appropriate for High Availability.

Fixpacks can be applied in a rolling fashion with the only downtime being up to a few minutes for a failover. Full DB2 version upgrades require a full outage.

HADR can also run on most hosting providers with no special configurations other than additional network interfaces for high-throughput systems.

HADR can often be implemented by any DB2 DBA with the assistance of operating system and network personnel with average skill levels.

Shared Disk Outside of DB2

I have seen this solution successfully used. Two servers are configured to share one set of disks using software such as HACMP, Power-HA, RHCS, Veritas, TSAMP, or others. The disk is only mounted on one server at a time, and if a failure is detected, the disk is unmounted from one server and mounted on another server.

The thing I don’t like about this solution is that the disk is a frequent point of failure for database systems, and this implementation does not protect against disk failure. If you go with this option, please make sure your RAID Array or SAN is extremely robust with as many parity disks as you can handle and disks manufactured at different times are used, and very detailed monitoring is done to immediately act on any single disk failure. It’s also advisable that you couple this with a Disaster-Recovery solution.

Many hosting providers will support this kind of solution, but may charge more to do so.

This type of shared-disk solution also requires the support of a talented expert in whatever software you use – A DBA and typical system administrators may not be able to do this without help.

DB2 PureScale

PureScale is DB2’s answer to RAC. It is a shared-disk implementation that allows for multiple active database servers at the same time. A typical configuration would include three database servers and two “CF” servers to control load distribution and locking. It is a very robust solution, mostly appropriate for OLTP and other transaction processing databases. The complexity is an order of magnitude beyond HADR, and you will need a talented DB2 DBA or consultant to work with you internal network and system administration teams. It uses GPFS, and a high-speed interconnect between the servers (RoCE or Infiniband). You can use TCPIP for test environments.

Many hosting providers cannot easily support this – you have to ensure that yours will. IBM-related hosting providers like Softlayer tend to be easier to talk to about this. Until recent versions, there were hardware restrictions that are progressively being eased as time goes on.

If you have a good relationship with IBM or a contractor who does, additional help may be available from IBM for this.

Unless you’re engaging IBM Lab services, all database servers should be in the same data center unless you’re combining this with HADR.

DB2 Replication (SQL, Q, CDC)

DB2 Replication can also be used to establish a standby database server. With Replication, it is best to have one server defined as the master and others defined as a slave. With the caveat that you can actually have different indexing on your standby if you’re using it for read-only reporting.

Replication is complicated to set up, and if you have too much transactional volume for SQL replication to handle, the licensing costs for Q-rep or CDC can be very significant. Since it is set up on a table-by-table basis, it requires a lot of DBA effort to set up depending on the number of tables you have. It can also be complicated to integrate with HADR. The time to implement replication depends heavily on the number of tables involved.

The big drivers for choosing replication as an option are the need to access more than one server or to use a second server as a reporting server, and also the fact that this is the only method that will allow you to do a nearly-online version-level upgrade. That is a BIG plus, and any solution for 5 nines or for 4 nines including planned downtime would likely include this.

Disaster Recovery Options

Many of the options from above can also be used as Disaster Recovery options with a few tweaks. But if you’re trying to meet both High Availability and Disaster Recovery goals, you’re nearly always going to be looking at 3 or more database servers. Two servers is rarely going to be able to do both High Availability and Disaster Recovery.

HADR

When used for Disaster Recovery, HADR should incorporate a distance between the database servers – certainly in different data centers and often in different areas of the country. HADR SYNCMODES that are appropriate for Disaster Recovery include ASYNC and SUPERASYNC.

Shared Disk Outside of DB2

Shared disk clusters with a geographic factor are much harder to implement and often include some sort of disk-level replication between disks in two data centers. This relies on how fast the disk replication is as a major component, and involves some complicated fail-over scenarios for virtual IP addresses.

PureScale

PureScale can be geographically distributed, but the experts say that it is very complicated and should not be done without engaging IBM Lab Services. Even so, the cluster cannot be distributed across more than 100 KM, with a very, very fast network connection.

DB2 Replication (SQL, Q, CDC)

DB2 replication is also a good choice for Disaster Recovery, assuming a very fast network connection.

Solutions Combining High Availability and Disaster Recovery

My favorite combination is probably a three or four server HADR cluster with NEARSYC between the Primary database server and the principal standby, and SUPERASYNC with a tertiary database server in a different data center. This is economical and easily meets 3 nines if properly architected and on robust hardware.

I have also seen shared-disk clusters outside of DB2 used for high availability, with HADR used in ASYNC between two data centers. That seemed to work well enough.

PureScale can now be used with HADR to more easily meet both HA and DR requirements, but remember there will still be downtime for DB2 version level upgrades.

Replication can be used in combination with any of the other options.

Are Database Backups Still Needed?

One of the frequent questions I get with High Availabilty or Disaster Recovery solutions is “Do I even need database backups?” My response is that you absolutely do. On at least one occasion, I have had a database backup save my client’s data even when HADR was used. One reason is often human error. One of the most common causes of restores or failovers is human error. Most replication/HADR/shared disk solutions will immediately replicate a human error from one server to another. Another reason is all the failure scenarios that you didn’t plan for – it is hard to imagine everything that can go wrong, and a backup with transaction log files can go miles towards being ready for the unexpected. Backups can also be useful for data movement to a test or development or QA enviornment.

Summary

There are a number of High Availability and Disaster Recovery solutions. Knowing your minimum requirements and needs is critical to architecture a cost-effective and robust solution.

↧

Analyzing Package Cache Size

July 21, 2015, 4:00 am

≫ Next: Calculating PVUs for IBM DB2 on Linux

≪ Previous: Architecting High-Availability and Disaster Recovery Solutions with DB2

Note: updated 7/21 to reflect location of the package cache high water mark in the MON_GET* table functions

I have long been a fan of a smaller package cache size, particularly for transaction processing databases. I have seen STMM choose a very large size for the package cache, and this presents several problems:

Memory used for the package cache might be better used elsewhere
A large package cache makes statement analysis difficult
A large package cache may be masking statement issues – the proper use of parameter markers

Parameter Markers

Parameter markers involve telling DB2 that the same query may be executed many times with slightly different values, and that DB2 should use the same access plan, no matter what the values supplied are. This means that DB2 only has to compile the access plan once, rather than doing the same work repeatedly. However, it also means that DB2 cannot make use of distribution statistics to compute the optimal access plan. That means that parameter markers work best for queries that are executed frequently, and for which the value distribution is likely to be even or at least not drastically skewed.

The use of parameter markers is not a choice that the DBA usually gets to make. It is often a decision made by developers or even vendors. Since it is not an across-the-board best practice to use parameter markers, there are frequently cases where the wrong decisions are made. There are certainly queries and data sets where parameter markers will make things worse.

At the database level, we can use the STMT_CONC database configuration parameter (set to LITERALS) to force the use of common access plans for EVERYTHING. This is not optimal for the following reasons:

There are often some places where the value will always be the same, and in those places SQL would benefit more from a static value.
The SQL in the pacakage cache will essentially never show static values used, which can be difficult when troubleshooting.
With uneven distribution of data, performance of some SQL may suffer.
There have been APARs about incorrect data being returned.

If you have interaction with developers on a deep and meaningful level, proper use of parameter markers is the best choice.

Parameter markers show up as question marks in SQL in the package cache. This statement uses parameter markers:

Select booking_num from SAMPLE.TRAILER_BOOKING where trailer_id = ?

Statement substitutions done by the statement concentrator use :LN, where N is a number representing the position in the statement. This statement shows values affected by the statement concentrator:

select count(*) from event where event_id in ( select event_id from sample.other_table where comm_id=:L0 ) and who_entered != :L1

Sizing the Package Cache

I’ve said that I don’t trust STMM to make the best choices for the package cache. As a result, I recommend setting a static value. How do I come up with the right value?

I often start by setting the PCKCACHESZ database configuration parameter to 8192 or 16384, and tune it upwards until I stop seeing frequent package cache overflows. A package cache overflow will write messages like this to the DB2 diagnostic log:

xxxx-xx-xx-xx.xx.xx.xxxxxx+xxx xxxxxxxxxxxxxx     LEVEL: Event
PID     : xxxxxxx              TID  : xxxxx       PROC : db2sysc
0
INSTANCE: db2             NODE : 000         DB   : SAMPLE
APPHDL  : 0-xxxxx              APPID:
xx.xxx.xxx.xx.xxxxx.xxxxxxxxxxx
AUTHID  : xxxxxxxx

EDUID   : xxxxx                EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, access plan manager, sqlra_cache_mem_please,
probe:100
MESSAGE : ADM4500W  A package cache overflow condition has
occurred. There is
          no error but this indicates that the package cache has
exceeded the
          configured maximum size. If this condition persists,
you should
          perform additional monitoring to determine if you need
to change the
          PCKCACHESZ DB configuration parameter. You could also
set it to
          AUTOMATIC.
REPORT  : APM : Package Cache : info
IMPACT  : Unlikely
DATA #1 : String, 274 bytes
Package Cache Overflow
memory needed             : 753
current used size (OSS)   : 15984666
maximum cache size (APM)  : 15892480
maximum logical size (OSS): 40164894
maximum used size (OSS)   : 48562176
owned size (OSS)          : 26017792
number of overflows       : xxxxx

I address these usually by increasing the package cache by 4096 until they are vastly less frequent. This could still be a considerable size if your application does not make appropriate use of parameter markers.

To look at details of your package cache size, you can look at this section of a database snapshot:

Package cache lookups                      = 16001443673
Package cache inserts                      = 4180445
Package cache overflows                    = 0
Package cache high water mark (Bytes)      = 777720137

~~I’m a bit frustrated that the package cache high water mark doesn’t seem to be in the MON_GET* functions. I’m going to need that before they discontinue the snapshot monitor.~~ To get the high water mark for the package cache, you can use this query on 9.7 and above (thanks to Paul Bird’s twitter comment for pointing me to this):

select memory_pool_used_hwm
from table (MON_GET_MEMORY_POOL(NULL, CURRENT_SERVER, -2)) as mgmp 
where memory_pool_type='PACKAGE_CACHE' 
with ur

MEMORY_POOL_USED_HWM
--------------------
                 832

You can use that value to see how close to the configured maximum size (PCKCACHESZ) the package cache has actually come. In this particular database, the package cache size is 190000 (4K pages). In bytes that would be 778,240,000. That means in this case that the package cache has nearly reached the maximum at some point. But you can tell from the value of package cache overflows that it has not attempted to overflow the configured size.

The numbers above also allow me to calculate the package cache hit ratio. These numbers are also available in MON_GET_WORKLOAD on 9.7 and above or MON_GET_DATABASE on 10.5. The package cache hit ratio is calculated as:

100*(1-(package cache inserts/package cache lookups))

With the numbers above, that is:

100*(1-(4180445/16001443673))

or 99.97%

You do generally want to make sure your package cache hit ratio is over 90%.

In addition to these metrics, you can also look at what percentage of time your database spends on compiling SQL. This can be computed over a specific period of time using MONREPORT.DBSUMMARY. Look for this section:

  Component times
  --------------------------------------------------------------------------------
  -- Detailed breakdown of processing time --

                                      %                 Total
                                      ----------------  --------------------------
  Total processing                    100               10968

  Section execution
    TOTAL_SECTION_PROC_TIME           80                8857
      TOTAL_SECTION_SORT_PROC_TIME    17                1903
  Compile
    TOTAL_COMPILE_PROC_TIME           2                 307
    TOTAL_IMPLICIT_COMPILE_PROC_TIME  0                 0
  Transaction end processing
    TOTAL_COMMIT_PROC_TIME            0                 76
    TOTAL_ROLLBACK_PROC_TIME          0                 0
  Utilities
    TOTAL_RUNSTATS_PROC_TIME          0                 0
    TOTAL_REORGS_PROC_TIME            0                 0
    TOTAL_LOAD_PROC_TIME              0                 0

You generally want to aim for a compile time percentage of 5% or less. Remember that MONREPORT.DBSUMMARY only reports data over the interval that you give it, with a default of 10 seconds, so you want to run this over time and at many different times before making a decision based upon it.

Summary

A properly sized package cache is important to database performance. The numbers and details presented here should help you find the appropriate size for your system.

↧

Calculating PVUs for IBM DB2 on Linux

July 28, 2015, 4:00 am

≫ Next: Error When Running PowerShell Script with IBM.Data.DB2 Driver

≪ Previous: Analyzing Package Cache Size

DBAs do not have to calculate PVUs(Processor Value Units) often. Many times there is a system administrator or someone else who might do this for us. Or if you’re buying everything from IBM, then they’re likely to calculate it. You may also easily be able to get the information you need from your hardware vendor and skip right to the last section on converting the hardware details into PVUs.

Note: Please verify any advice here with IBM before relying upon the PVU count you come up with. I could be wrong on some of the details, especially with the vast variety in environments. This is meant for ballpark/estimating only.

What OS?

I’ll describe how to find the information and do the calculations for Linux. There will be differences for UNIX and Windows, though some of the details here may help you if you’re using other operating systems.

The system the examples in this post are on is Red Hat Enterprise Linux Server 6.6

You do not need root for the commands I share in this post – I’ve run everything as the DB2 instance owner. If you don’t have DB2 installed, the system commands here will still function just fine as other users.

What Information is Needed to Calculate PVUs?

The information needed is:

Type of processor
Total number of processors
Number of processors per socket
Number of sockets on the server

What Kind of Processors Does This Server Have?

It is not just a matter of counting the number of processors you have. Different processors have different PVU values assigned. To find what kind of processors you have, you can use this:

$ more /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 47
model name      : Intel(R) Xeon(R) CPU E7- 4830  @ 2.13GHz
stepping        : 2
microcode       : 55
cpu MHz         : 1064.000
cache size      : 24576 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmp
erf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pci
d dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi f
lexpriority ept vpid
bogomips        : 4256.40
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

That is the output for just one CPU on a multi-core system. The part in red is the main part you’re looking for here.

How Many Sockets Does This Server Have?

You’ll also need to know how many sockets you have. It is also in /proc/cpuinfo – the physical_id tells you which socket each processor belongs to. An easy command to strip that number out is:

$ cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l
2

Thanks to this excellent blog entry for details on that: http://www.ixbrian.com/blog/?p=64

How Many Processors Does This Server Have?

There are several ways you can calculate the number of processors. The output from the above will actually tell you if you just page through all of it. But my favorite is using db2pd:

$ db2pd -osinfo

Operating System Information:

OSName:   Linux
NodeName: dbserver.example.com
Version:  2
Release:  6
Machine:  x86_64
Distros:  Red Hat Enterprise Linux Server 6.6                                   

CPU Information:
TotalCPU    OnlineCPU   ConfigCPU   Speed(MHz)  HMTDegree  Cores/Socket
32          32          32          1064        2           8

Physical Memory and Swap (Megabytes):
TotalMem    FreeMem     AvailMem    TotalSwap   FreeSwap
129007      115904      n/a         64000       64000

Virtual Memory (Megabytes):
Total       Reserved    Available   Free
193007      n/a         n/a         179904
..

Now some of the numbers here can be slippery, and some can be changed in odd ways. The HMTDegree (two in the output above) tells the server to treat physical CPUs as more than one CPU. In many cases this can increase the efficiency of use of the CPUS, and I’ve seen it as high as 4, though I hear it is rare to make it more than two. This is called hyperthreading. What that number tells me to do is to take the number of CPUs reported as OnlineCPU and divide it by the HMTDegree, in this case, by 2. This server appears to have 16 cpus. It is also important to note the Cores/Socket as that could make a difference when calculating the PVUs for a server.

You can verify this conclusion, if you like using this command:

$ cat /proc/cpuinfo | egrep "core id|physical id" | tr -d "\n" | sed s/physical/\\nphysical/g | grep -v ^$ | sort | uniq | wc -l
16

Thanks again to http://www.ixbrian.com/blog/?p=64 for that syntax.

Converting Gathered Information into PVU Values

Once you know the type of Processor, the number of processors, the number of processors per socket, and the number of sockets, you’re ready to refer to IBM’s table for calculating PVUs. For Linux, you’ll want “PVU Table Per Core (section 2 of 2 – x86)”. This table will change over time. As of the writing of this article, it looks like this:

Based on the output above, I’m in the top row of Xeon (of 2), and one of the bottom few entries in that row. I don’t know which of the following it is, but they all have the same rules:

E7-4800 to 4899V3
E7-4800 to 4899
E7-4800 to 4899V2
E7-4800 to 4899V3

Also, there seem to be two entries in there for E7-4800 to 4899V3. I’m not sure why.

I know that I have 8 cores per socket from the db2pd output, and that I have 2 sockets. Following the lines for that, I can see that these CPUs are 70 PVUs each.

16 CPUs at 70 PVUs each calculates to 1120 PVUs for this server. In this case, there’s an additional 100 PVUs for each of two HADR standbys, so this client needs a total of 1320 PVUs.

PVU Calculator

IBM also offers the PVU calculator. Once you’ve gathered the information above, you can try it. I find it a bit on the confusing side. For the example above, I couldn’t just select “Intel Xeon”, but had to select “Intel Xeon 1 or 2 socket”. How I’m supposed to know that, I have no idea. Based on the examples in this post, here are the values I filled in and the results I got:

Exactly the same value as I came up with consulting the table.

One note, if you’re considering buying DB2 licensing – having an expert helping you interface with IBM can save you a lot of money. If you need DB2 licenses (or other IBM licensing), I know someone good and ethical who can help. Contact me and I can put you in touch.

↧

Error When Running PowerShell Script with IBM.Data.DB2 Driver

July 30, 2015, 4:00 am

≫ Next: Multiple Instances in DB2 for Linux/UNIX and in DB2 for Windows

≪ Previous: Calculating PVUs for IBM DB2 on Linux

Having just solved this problem for the second time in two weeks, I thought I’d blog about it to make it easier to find the next time.

Problem

This genereally occurs when you are executing a PowerShell script using the IBM.Data.DB2 driver on a server for the first time. It can also occur after patching of DB2 or the OS. Also, if you fail over from the HADR primary to the HADR standby, you may not have been executing PowerShell scripts on the standby regularly, so that is also a common time to see it. The error looks like this:

Exception calling "GetFactory" with "1" argument(s): "Unable to find the requested .Net Framework Data Provider.  It ma
y not be installed."
At db2_perf_maint.ps1:84 char:64
+ $factory = [System.Data.Common.DbProviderFactories]::GetFactory <<<< ("IBM.Data.DB2")
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : DotNetMethodException

This error leads to an error like this for every executed SQL statement in your script:

You cannot call a method on a null-valued expression.

Obviously there are other causes for that last one, but if you have a previously working script and you move it to a new server, this is one possible cause.

Solution

The solution is easy, and is described in this technote from IBM.

The jist of it is, all you have to do is open up a IBM DB2 Command window (not powershell command prompt) and issue one of the following:

db2nmpsetup -l c:\db2nmpsetup.log

db2lswtch -client -promote

↧

Multiple Instances in DB2 for Linux/UNIX and in DB2 for Windows

August 4, 2015, 4:00 am

≫ Next: db2haicu TSAMP Advanced Topics: Objects Created

≪ Previous: Error When Running PowerShell Script with IBM.Data.DB2 Driver

I have been working with multiple instances on Windows servers lately, and have learned a few things about them, so I thought an article about multiple instances in general would be a good thing.

The Basics

First, if you don’t understand what an instance is, check out my earlier blog entry: DB2 Basics: What is an Instance?

When Multiple Instances Work and When They Don’t

In general, for production systems, the one-server -> one-instance -> one database approach is best. This allows us to fully isolate the workload and is easiest for the intense workload related performance problems that may be encountered for production systems. There are exceptions to that, of course. I’ve seen an application that required 8 (EIGHT!) separate, tiny, and not very busy databases. I would certainly not recommend each of those go on a different server, and even put all eight databases on a single instance. But if you’re talking about databases of greater than 10 GB that are reasonably heavily used, the more isolated, the better.

On non-production database servers, multiple instances are much more common. If you have multiple development/test/QA/staging environments on the same non-production server, you need them to be on separate instances. Why? Well, you need to be able to move a change to an instance-level parameter through the enviornments one-by-one to test it. Perhaps more importantly, you need to be able to independently upgrade the instances so you can move an upgrade through the environments.

Sometimes I see HADR standbys for multiple different production databases servers on the same standby server. I’ve also seen Dev and HADR standbys the the same server. Either of these setups require not just separate instances, but the ability for the separate instances to run on different versions or Fixpacks of the DB2 code – you should be upgrading them independently and likely at different times.

Separate DB2 Base Installation Paths/Copies

We talk not just about multiple instances but multiple base installation paths (Linux/UNIX) or multiple copies (Windows). You can have two instances on the same server running on the same path/copy if they will be upgraded together. But if there is some need to upgrade or patch them separately, they need to run on separate paths/copies.

Multiple Instances on Linux/UNIX

The Instance Owner

On Linux and UNIX systems, there is a DB2 Instance Owner. This ID is the same exact name as the DB2 instance, and the DB2 Instance’s SQLLIB directory is always in the DB2 Instance Owner’s home directory. One of the pre-instance creation verification steps is to make sure that this user exists and has an appropriate home directory, so your Instance Home does not end up in someplace inappropriate like the root or /home filesystems.

The Base Installation Path

When I started working with DB2, it wasn’t possible to change the installation path. Then they realized that some servers require the ability to have different instances on different fixpacks, so they introduced “alternate” fixpacks, which were a bit of a pain to work with. Finally, they gave us the option to choose our own installation path, which makes it so that we can have as many different versions and fixpacks as we like on the same server. This also means that whenever you install a fixpack you have to specify the base installation path (using the -b option or as prompted). You can also change an instance from one base installation path to another using a db2iupdt command (offline).

Administering the Instance

DB2 instances are administered separately, as you can really only work on one instance at a time. A DB2 instance is administered either as the DB2 instance owner or as an ID in the SYSADM group for an instance. In order to take actions for the instance or the databases under it, you will execute the db2profile from the SQLLIB directory of the Instance Home directory. You cannot use the same ID to take actions on two different instances without switching from one to another.

Multiple Instances on Windows

Multiple instance support feels to me like it has come slower to Windows than it did on UNIX/Linux, but I don’t have the facts to support that, as I have only worked extensively with DB2 on Windows in recent years.

The Instance Owner

There isn’t a userid that is quite as tightly tied to an instance as it is on Linux/UNIX. You still have a name for each instance, and can add additional instances on a server.

The DB2 Copy

The equivalent concept to the base installation path on UNIX/Linux is the DB2 Copy on Windows. You’ll have a name for the DB2 Copy in addition to the instance name. By default, this name is DB2COPY1.

Administering the Instance

Many times on DB2 servers, the local administrators will all have SYSADM rights on the database. If not, every ID in the DB2ADMINS group will have SYSADM. You may use the same ID to administer two DB2 instances at once, but any given command window, PowerShell prompt, or DataStudio window only accesses one instance at a time. In a PowerShell window, you can set the environment variable DB2INSTANCE to switch between instances on the same DB2 Copy, and can set the PATH variable along with DB2INSTANCE to switch between instances on different DB2 copies.

↧

db2haicu TSAMP Advanced Topics: Objects Created

August 18, 2015, 4:00 am

≫ Next: Installing a Local Copy of the IBM DB2 10.5 Documentation

≪ Previous: Multiple Instances in DB2 for Linux/UNIX and in DB2 for Windows

I have written quite a few articles on TSA, but thus far, they’ve been strongly focused on the basic how-to. I think this is because so many of us start with the how of a technical topic. As my knowledge has advanced, I have developed more advanced presentations on TSAMP, but I need to also bring that content into the blog.

Objects that db2haicu creates

When you run db2haicu, it is creating a whole host of objects and relationships at the TSAMP level. It can take a while to understand what all of these mean. I found this awesome article that defines a lot of the objects. It also goes into greater detail on a number of things that may be useful if you’re trying to increase your TSAMP knowledge.

The following diagram represents the objects that db2haicu creates:

In this image, the grey rectangles labeled Node1 and Node2 represent the two servers – the primary and the standby. The largest green rounded rectangle is the TSAMP domain that is created. db2haicu asks us for the name, and we make up the name for it. The other objects and elements are created within this domain.

There is then a DB2 instance resource group created for each node and an instance resource within each resource group. These are active and considered up on both the primary and standby servers at once, just like the DB2 instance exists and is active on both servers at once – you can log in and start, stop, and configure it at any time on either server.

An additional resource group is created for the database. Within that database resource group a database resource and a VIP resource (assuming you’re using one) are created. They can only be active on one node at a time and are always offline on the other node.

Differences with Multiple Databases

With the previous diagram in mind, let’s look at what happens if we add in another database on the same DB2 instance:

A whole additional database resource group, and therefore another VIP is added. Yes, that’s right – if you have more than one database, you will have more than one VIP. This means that if the two databases both happen to have their primaries on the same server, either VIP will work, but in case of a failover for only one database, each database will only be accessible at their own VIP. This is the supported and recommended method by IBM. I would like to see instructions on how to instead create a relationship that would force the databases to failover together, allowing me to use only one VIP, but I can certainly see cases where this configuration is the way to go, and clients that would absolutely need it this way.

↧

Installing a Local Copy of the IBM DB2 10.5 Documentation

August 25, 2015, 4:00 am

≫ Next: Using DB2’s New Native Encryption Feature

≪ Previous: db2haicu TSAMP Advanced Topics: Objects Created

I have written about this before for version 9.7. The IBM DB2 Knowledge Center can be a bit unreliable – sometimes the search doesn’t return the obvious results. Sometimes it’s down entirely at the worst time. And for consultants, sometimes we work on networks that will not allow us to get out to the IBM Knowledge Center.

To mitigate, this, you can install a local copy. What is downloaded is an Information Center, but the search is so much more reliable than the IBM DB2 Knowledge center, that I find myself using a local copy even when the IBM DB2 Knowledge Center is working.

Here’s what the documentation looks like:

The IBM DB2 Knowledge Center includes information on installing a local copy of the documentation.

You can dowload the software here: https://www-01.ibm.com/marketing/iwm/iwm/web/reg/download.do?source=swg-dm-db2info&S_PKG=dm&lang=en_US&cp=UTF-8

You’ll have to login with your IBM ID. If you don’t have an IBM ID, they’re free to create and critical to have. Once you answer all the questions it asks you, you’ll want to click on the right download based on your operating system. I show it for windows here, because that’s the workstation I currently have it on:

After it is downloaded, you need to extract the files from the zip and navigate to the extraction location and simply click on the ic-wrkstn-start.bat executable. You can even create a link to it on your desktop or in another convenient location.

The one thing to be aware of with a local copy is that you do not automatically get updates to the documentation as you do if you’re working from an online IBM Knowledge center. You have to remember to manually update your local copy. You can do that by clicking on this icon:

↧

Using DB2’s New Native Encryption Feature

September 1, 2015, 4:00 am

≫ Next: Issues with STMM

≪ Previous: Installing a Local Copy of the IBM DB2 10.5 Documentation

With fixpack 5 of DB2 10.5, IBM introduced Native Encryption for data at rest in DB2. This is a fairly significant new feature for introduction in a fixpack. It does require separate licensing – either the Advanced Edition of ESE or WSE or the separate purchase of the Native Encryption feature.

DB2 Native Encryption is transparent data encryption for data at rest. It does not encrypt data that is in flight or in memory. There are no application changes that are necesary, and it includes functionality for managing encryption keys. You don’t change data encryption keys, but instead can change the key used to access the data encryption keys – the key encryption key.

Planning

DB2 Native Encryption is NOT performance neutral. It is likely to impact performance, and that performance impact is expected to be “less than 100%”. There may be some areas where the impact is more noticeable than others. It largely impacts CPU time. If you implement Native Encryption on a system that already runs at 80% CPU utilization, bad things will likely happen. It is very strongly recommended that you do through performance testing before implementing it in production. The system I’m enabling it on is currently extremely over-sized, averaging LESS than 5% cpu utilization. Because of this, I’m not terribly worried about the impact, but I sure would be with a more reasonably sized system.

The client I’m working with now chose to purchase the Native Encryption feature to use with a standard WSE implementation. The program number to get from IBM is:

5725T25             IBM DB2 Encryption Offering

The code for Native Encryption is included in db2 10.5 fixpack 5, so there is nothing separate to install. To get the license file you’ll need, you’ll need to download the following part from passport advantage:

CN30DML    IBM® DB2® Encryption Offering - Quick Start and Activation
10.5.0.5 for Linux®, UNIX and Windows®

If your DB server is not already on 10.5 fixpack 5, you’ll need to upgrade to it before implementing Native Encryption.

Implementation

The steps for implementing Native Encryption are pretty well laid out in the IBM DB2 Knowledge Center page on Native Encryption. EXCEPT if you copy and paste the command for creating the keystore. I did and got this error:

CTGSK3020W Invalid object: –strong

The problem is documented in the comments on this page. No idea why IBM hasn’t fixed the documentation yet. The ‘-‘ character before two of the options on this command is incorrect in the info center, and it’s barely visable as such. In my steps below, I use the correct kind of dash, so you should be able to copy and paste the below.

Here are the steps for encrypting an existing database – you must do a backup and restore to do it at this time. All actions here are done as the DB2 instance owner.

Apply the license file – unzip/untar the dowloaded activation file and navigate to db2ef/db2/license, and issue:
```
db2licm -a db2ef.lic
```

Ensure your PATH and library variables are set properly. To do this, I added the following lines to my DB2 instance owner’s .bash_profile (you’d use .profile on AIX):

PATH=$PATH:$HOME/sqllib/gskit/bin

export PATH

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/sqllib/lib64/gskit

export LD_LIBRARY_PATH

LIBPATH=$LIBPATH:$HOME/sqllib/lib64/gskit

export LIBPATH

SHLIB_PATH=$SHLIB_PATH:$HOME/sqllib/lib64/gskit

export SHLIB_PATH

Next, issue the command to create your keystore. This is the one with the incorrect dashes in the IBM DB2 Knowledge Center:
```
gsk8capicmd_64 -keydb -create -db /db2home/db2inst1/pdesignkeystore.p12 -pw MfsWq9UntZGGhe96 -strong -type pkcs12 -stash;
```
There is absolutely no output returned by this command. You’ll likely want to change the location, and the password you feed into this.

Update the dbm cfg with the keystore location:

$ db2 update dbm cfg using keystore_type pkcs12 keystore_location /db2home/db2inst1/pdesignkeystore.p12
DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed
successfully.

Backup your database

 db2 backup db sample to /db2backups compress without prompting

Drop your database (man, this is hard to do – I still cringe whenever using a drop command)
```
$ db2 drop db sample
DB20000I  The DROP DATABASE command completed successfully.
```

Restore your database, with the encrypt option

 db2 "restore db sample from /db2backups taken at 20150827182456 encrypt without rolling forward without prompting"

Your database is now encrypted, congratulations!

In my case, I’m dealing with a small database, and I didn’t find my restore/encryption time of less than 10 minutes any different than a recent restore of the same database.

Remember DB2 will now encrypt every backup you take with the same encryption options you’ve set in the dbm cfg. This means that part of what you now need to backup is that keystore that you created. I think you’ll also want to store the keystore password somewhere, as you may need it.

I have so far found that these backups take longer than non-encrypted backups. The backup I took of a database before enabling Native Encryption took 4 minutes. The one afterwards took 11 minutes. You may want to test backup duration as a part of your performance testing process.

Next month, I’ll be implementing Native Encryption for an HADR database, and will blog about it, and the extra wrinkles that adds.

↧

Issues with STMM

September 8, 2015, 4:00 am

≫ Next: DB2 Administrative SQL Cookbook: Listing the Number of Pages in Tablespaces, by Bufferpool

≪ Previous: Using DB2’s New Native Encryption Feature

I thought I’d share some issues with STMM that I’ve seen on Linux lately. I’ve mostly been a fan of STMM, and I still am for small environments that are largely transaction processing and have only one instance on a server.

Here are the details of this environment. The database is a small analytics environment. It used to be a BCU environment that was 4 data nodes and one coordinator node on 9.5. The database was less than a TB, uncompressed. There were also some single-partition databases for various purposes on the coordinator node. I’ve recently migrated it to BLU – 10.5 on Linux. The users are just starting to make heavier use of the environment, though I largely built and moved some data about 6 months ago. The client does essentially a full re-load of all data once a month.

The new environment is two DB2 instances – one for the largely BLU database, and one for a transaction processing database that replaces most of the smaller databases from the coordinator node. Each instance has only one database. The server has 8 CPUS and about 64 GB of memory – the minimums for a BLU environment.

First Crash

The first crash we saw was both instances going down within 2 seconds of each other. The last message before the crash looked like this:

2015-08-06-17.58.02.253956+000 E548084503E579        LEVEL: Severe
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:20
MESSAGE : ADM0503C  An unexpected internal processing error has occurred. All
          DB2 processes associated with this instance have been shutdown.
          Diagnostic information has been recorded. Contact IBM Support for
          further assistance.

2015-08-06-17.58.02.574134+000 E548085083E455        LEVEL: Error
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, base sys utilities, sqleWatchDog, probe:8959
DATA #1 : Process ID, 4 bytes
20775
DATA #2 : Hexdump, 8 bytes
0x00007FEF1BBFD1E8 : 0201 0000 0900 0000                        ........

2015-08-06-17.58.02.575748+000 I548085539E420        LEVEL: Info
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, base sys utilities, sqleCleanupResources, probe:5475
DATA #1 : String, 24 bytes
Process Termination Code
DATA #2 : Hex integer, 4 bytes
0x00000102

2015-08-06-17.58.02.580890+000 I548085960E848        LEVEL: Event
PID     : 20773                TID : 140664939472640 PROC : db2wdog
INSTANCE: db2inst1             NODE : 000
HOSTNAME: dbserver1
EDUID   : 2                    EDUNAME: db2wdog [db2inst1]
FUNCTION: DB2 UDB, oper system services, sqlossig, probe:10
MESSAGE : Sending SIGKILL to the following process id
DATA #1 : signed integer, 4 bytes
...

The most frequent cause of this kind of error in my experience tends to be memory pressure at the OS level – the OS saw that too much memory was being used, and instead of crashing itself, it chooses the biggest consumer of memory to kill. On a DB2 database server, this is almost always db2sysc or another DB2 process. I still chose to open a ticket with support, to get confirmation on this and see if there was a known issue.

IBM support pointed me to this technote, confirming my suspicions: http://www-01.ibm.com/support/docview.wss?uid=swg21449871. They also recommended “have a Linux system administrator review the system memory usage and verify that there is available memory, including disk swap space. Most Linux kernels now allow for the tuning of the OOM-killer. It is recommended that a Linux system administrator perform a review and determine the appropriate settings.” I was a bit frustrated with this response as this box runs on a PureApp environment and runs only DB2. The solution is to tune the OOM-killer at the OS level?

While working on the issue I discovered that I had neglected to set INSTANCE_MEMORY/DATABASE_MEMORY to fixed values, as is best practice on a system with more than one DB2 instance when you’re trying to use STMM. So I set them for both instances and databases, allowing the BLU instance to have most of the memory. I went with the idea that this crash was basically my fault for not better limiting the two DB2 instances on a box. Though I wish STMM would play better for multiple instances.

Second Crash

Several weeks later, I had another crash, though this time only of the BLU instance, not of the other instance. It was clearly the same issue. I re-opened the PMR with support, and asked for help identifying what tuning I needed to do to keep these two instances from stepping on each other. IBM support again confirmed that it was a case of the OS killing DB2 due to memory pressure. This time, they recommended setting the Linux kernel parameter vm.swappiness to 0. While I worked on getting approvals for that, I tweeted about it. The DB2 Knowledge Center does recommend it be set to 0. I had it set to the default of 60.

Resolution

Scott Hayes reached out to me on twitter because he had recently seen a similar issue. After a discussion with him about the details, I decided to implement a less drastic setting for vm.swappines, and to instead abandon the use of STMM. I always set the package cache manually anyway. I had set catalog cache manually. Due to problems with loads, I had already set the utility heap manually. In BLU databases, STMM cannot tune sort memory areas. All of this meant that the only areas STMM was even able to tune in my BLU database were DBHEAP, LOCKLIST, and the buffer pools. I looked at what the current settings were and set these remaining areas to just below what STMM had them at. I have already encountered one minor problem – apparently STMM had been increasing the DBHEAP each night during LOADs, so when they ran LOADs the first night, they failed due to insufficient DBHEAP. That was easy to fix, as the errors in the diagnostic log specified exactly how much DBHEAP was needed, so I manually increased the DBHEAP. I will have to keep a closer eye on performance tuning, but my monitoring already does things like send me an email when buffer pool hit ratios or other KPIs are off, so that’s not much of a stretch for me.

↧

DB2 Administrative SQL Cookbook: Listing the Number of Pages in Tablespaces, by Bufferpool

September 17, 2015, 4:00 am

≫ Next: Speeding up DB2 Backups

≪ Previous: Issues with STMM

Purpose

This is a bit of a rare use case. The main use I see for it is if a database server has an excess of memory and you want to size your bufferpools so that the entire database fits in-memory. That’s not a common situation. In order to properly size the bufferpools for this edge case, I need to know how many pages my tablespaces have by bufferpool, not by tablespace.

Version

This SQL only works on DB2 9.7 and up. It relies on the MON_GET interfaces, which were introduced in 9.7.

Statement

WITH sum_ts(bufferpoolid, tot_pages) AS (
        SELECT TBSP_CUR_POOL_ID
                , sum(TBSP_TOTAL_PAGES)
        FROM table(mon_get_tablespace('',-2)) AS mgt
        GROUP BY TBSP_CUR_POOL_ID
)
SELECT  substr(sb.bpname,1,18) AS bpname
        , pagesize
        , sum_ts.tot_pages
FROM syscat.bufferpools sb
        INNER JOIN sum_ts
            ON sb.bufferpoolid = sum_ts.bufferpoolid
WITH UR;

Note that if you don’t care to report the page size of each bufferpool, this SQL can be done more easily as:

SELECT  substr(sb.bpname,1,18) AS bpname
        , sum(tbsp_total_pages) AS tot_pages
FROM syscat.bufferpools sb JOIN table(mon_get_tablespace('',-2)) mgt ON sb.bufferpoolid=mgt.TBSP_CUR_POOL_ID
GROUP BY bpname
WITH UR;

Sample Output

BPNAME             PAGESIZE    TOT_PAGES
------------------ ----------- --------------------
IBMDEFAULTBP              4096              5644983
TEMPSYSBP                16384                    2
USERBP16                 16384                24624
BP32K                    32768                 1028

  4 record(s) selected.

↧

Speeding up DB2 Backups

September 22, 2015, 4:00 am

≫ Next: DB2 Backups When Using Native Encryption

≪ Previous: DB2 Administrative SQL Cookbook: Listing the Number of Pages in Tablespaces, by Bufferpool

It’s a question I hear frequently – How can I speed up backups? I thought I’d share some details on doing so.

Database Layout

Any backup cannot be faster than it takes to back up the largest tablespace. Parallelism is essentially done on the tablespace level – meaning one thread per tablespace. That means that if you have the majority of the data in a database in a single tablespace, then your backups will be slower than if you had data evenly spread across a number of smaller tablespaces.

Obviously there are a lot of factors that go into tablespace design. But one of the big ones I now take into consideration is backup speed.

WebSphere Commerce is a vended database that by default places all data into one or two tablespaces. This makes things “easy”, but it also makes it so many non-altered WCS databases could have backups that run much faster. Some vendors will not let you change tablespace layout, but WCS does, so if you’re looking at a significantly sized WCS implementation, you will likely want to customize your tablespace layout – for this and other reasons. Smaller WCS implementations it won’t matter as much for.

Other vendors have different rules on whether you can customize your tablespace layout at all. If it’s a custom database, you should certainly not just plop everything in USERSPACE1, especially for databases of 50 GB or greater.

If you have a database that is currently focused on a single tablespace, you can move tables online from tablespace using ADMIN_MOVE_TABLE. Due to RI restrictions, you’ll want to be on 10.1, fixpack 2 or higher to make this reasonable. Particularly high-activity tables may be more problematic to get the final lock on for the move.

UTIL_HEAP_SZ

When a backup starts, DB2 allocates half of the remaining utility heap for the backup operation (this is where the buffers are allocated). This means that you not only must have at least twice the amount of memory that a backup needs allocated to the utility heap, you must also be aware of the usage of the utility heap by other utilities – LOAD, REDISTRIBUTE, etc. See my entry on the utility heap for more information.

But how do you determine the memory backup needs? See the section on tuning the number and size of backup buffers below for details both on seeing what values DB2 chooses and some thoughts on calculating these for yourself.

Tuning Backup Parameters

The traditional wisdom says to let DB2 choose the values for paralellism and the number and size of backup buffers. While that’s good advice 95% of the time, it’s still good to have some ideas and tools to identify and deal with the edge cases where you may want to tune them manually.

Parallelism

Seeing What DB2 Chooses

You can parse the DB2 diagnostic log to see what DB2 has chosen for past backups. The value of this data depends on how much data you have in your diag log and how many backups have been taken with your current memory/disk/cpu configuration. A command like this will show you what DB2 has chosen in the past:
10.1 and earlier:

grep "Autonomic BAR - using parallelism .*"  ~/sqllib/db2dump/db2diag.log

10.5:

grep "Autonomic backup/restore - using parallelism ="  ./db2diag.log

The above assumes a default location for the DB2 diagnostic log on a Linux/UNIX system. Here’s an example of the output you might use:

$ grep "Autonomic backup/restore - using parallelism ="  ./db2diag.log
Autonomic backup/restore - using parallelism = 2.
Autonomic backup/restore - using parallelism = 2.
Autonomic backup/restore - using parallelism = 5.
Autonomic backup/restore - using parallelism = 5.
Autonomic backup/restore - using parallelism = 10.
Autonomic backup/restore - using parallelism = 10.
Autonomic backup/restore - using parallelism = 10.
Autonomic backup/restore - using parallelism = 10.

In this case, DB2 is choosing different levels of parallelism at different times, but it settles down to most of the time at 5 or 10, which is reasonable given the size and layout of this database.

Thoughts on Manual Tuning

If you’re going to try manual tuning here, the two main things to consider are the number of tablespaces of a reasonable size (since each tablespace can only be addressed by a single thread), and the number of CPUs on the server. As an example, I was recently dealing with a database that had about 50 tablespaces, with about 40 or so of them having significant amounts of data. The server the backup was being taken on had 64 CPUs, and this database was the only real load on the server. For purposes of this backup, I didn’t really care about leaving much overhead for other things (backup was offline). For that environment, I would choose a parallelism of 40. If it were an online backup, I would likely have chosen a lower number based on the other load I saw on the server.

Number and Size of Buffers

Again, DB2 often makes the best choice for you, and it’s rare you’ll have to try to do anything manually. You’re more likely to slow the backup down than speed it up by giving it manual values instead of letting it choose the optimum.

Seeing What DB2 Chooses

Again, you can parse the DB2 diagnostic log to see what DB2 is choosing. This grep command works well:

grep "Using buffer size = .*, number = .*"  ./db2diag.log

It produces output like this:

$ grep "Using buffer size = .*, number = .*"  ./db2diag.log
Using buffer size = 4097, number = 2.
Using buffer size = 4097, number = 2.
Using buffer size = 3297, number = 10.
Using buffer size = 3297, number = 10.
Using buffer size = 4097, number = 10.
Using buffer size = 4097, number = 10.
Using buffer size = 4097, number = 10.
Using buffer size = 4097, number = 10.

Thoughts on Manual Tuning

If you are manually selecting values, the buffer size should be a multiple of the extent size of the largest tablespace. I like 4096 – nice round number if I don’t have anything else to start with. You generally want two buffers per thread (from the parallelism, above), and maybe an extra two for good measure. So using the system details from above – that database had a really small extent size of 2 for nearly every tablespace. I chose 82 buffers of size 4096.

Backup Location

One way to speed up a backup is to throw hardware at it, of course. If backup speed is important, SSD or an SSD cache may be useful. Though if you’re going for overall performance impact, limited SSD resources may better be spent on active transaction log space. Still, when you can get it, pure SSD sure is fun. I have 220 GB Windows database that is on pure SSD, having separate SSD drive arrays for data and for backup. It has ample memory and CPU too, and I can back that sucker up in 20 minutes.

For recoverability reasons, your backups should be on separate disk from your data and transaction logs, and it’s even better if they’re externalized immediately using a tool like TSM or some other storage library with a high-speed cache. You have to be careful to ensure that your network to such a location is super fast so it doesn’t become a bottleneck. I’ve seen ridiculously slow backups caused by the fact that the TSM server was in a data center two (western) states away. If you’re backing up to a vendor device like that, talk to the admin for it to find out how many sessions you can create against the device at a time – you can specify that in the db2 backup command. The more, the faster, but some implementations may not be able to support many.

Many larger implementation have the luxury of this sort of thing, but my small and mid-range clients simply back up to disk and then have a process that comes along and externalizes anything on that disk to slower off-site storage.

Backup Memory Usage

For larger databases, and especially those with memory pressure, you can test the use of the DB2 registry parameter DB2_BACKUP_USE_DIO. I’ve heard of some good experiences with it, but the little testing I’ve done with it on a smaller database hasn’t shown much difference. Test it thoroughly before relying on it.

What it does is for the write portion of the backup, it disables OS-level filesystem caching. On one hand this makes sense – DB2 is not going to ever read the data written to the backup image, so why cache it? On the other hand, writes to cache may be much faster than directly to disk. If your SAN has a cache, your disk write speed might support use of this.

Another thought on memory usage and backups – a backup will read every page in your database into the bufferpool before writing it out to the backup image. Thus if you have primed bufferpools and larger bufferpools, this means that DB2 has less to read into memory. If you have an over-abundance of memory, push DB2 to have larger bufferpools so there is less to read in.

OS-level Parameters

The one os-level parameter I’ve heard of making a difference is maxreqs on AIX. Check out this article on maxreqs and backups on aix. Essentially, you want to make sure that maxreqs is at least 256 times the number of threads (as determined by the value of parallelism for the backup).

Summary

Well, there’s my brain-dump on backup tuning. It mostly boils down to:

Split data into multiple tablespaces
Make sure util_heap_sz is large enough
Use fast backup locations
Trust the values DB2 picks up for backup parameters

↧

DB2 Backups When Using Native Encryption

September 29, 2015, 4:00 am

≫ Next: Ongoing Support of DB2’s HADR

≪ Previous: Speeding up DB2 Backups

I’ve recently implemented native encryption for a small database on a server that is somewhat oversized on CPU and memory. One of the things I noticed after encrypting my database was both increased backup duration and increased backup size.

Backup Size

On this particular system, I take compressed DB2 backups to disk, which is later externalized. Immediately after enabling Native Encryption, I noticed that the backup size was much larger – that it didn’t look like my database backup was getting compressed at all. After some back and forth with some very helpful IBMers, I learned that I had missed a critical few lines on this page in the DB2 Knoledge Center. This was surprising as I had spent hours with this page while getting ready to implement Native Encryption. Here is what I missed:

To both compress and encrypt a database backup image, specify the db2compr_encr library (or the libdb2compr_encr library on non-Windows platforms) in place of db2encr (or libdb2encr).

When I then tried to specify libdb2compr_encr as the encryption library in my backup command, I got this:

$ db2 backup db sample online to /db2backups encrypt encrlib 'libdb2compr_encr.so' without prompting
SQL2459N  The BACKUP DATABASE command failed to process an encrypted database
backup or compressed database backup because of configuration or command
errors. Reason code "1".

Looking at the details of that error code, I see:

$ db2 ? SQL2459N


SQL2459N  The BACKUP DATABASE command failed to process an encrypted
      database backup or compressed database backup because of
      configuration or command errors. Reason code "".

Explanation:

Encrypted or compressed database backups require specific configuration
settings. Some of these configuration settings can be specified in the
BACKUP DATABASE command options. This message is returned with a reason
code when configuration settings and BACKUP DATABASE command options are
invalid. Reason code:

1

         The BACKUP DATABASE command specified a compression or
         encryption library or associated options. The database
         configuration parameters ENCRLIB or ENCROPTS were also set.


2

         The BACKUP DATABASE command specified both compression and
         encryption libraries.

User response:

To resolve the issues outlined in the explanation:

1

         The options to specify a compressed or encrypted backup must be
         specified by either the command options or database
         configuration parameters, not both. Run the BACKUP DATABASE
         command without specifying a compression or encryption library
         or associated options. Or, you can clear the database
         configuration parameters ENCRLIB and ENCROPTS and run the
         BACKUP DATABASE command with the original command options as
         specified again.


2

         You can specify that a backup is compressed or encrypted, not
         both. Run the BACKUP DATABASE command specifying only a
         compression or encryption library, not both.


   Related information:
   BACKUP DATABASE command

Odd – you cannot explicitly specify a value for the encryption library if you also have the database parameter ENCRLIB set. Which I do.

So I went to set the ENCRLIB parameter because I always take compressed backups of this database:

$ db2 update db cfg for sample using ENCRLIB /db2home/dbinst1/sqllib/lib/libdb2compr_encr.so
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.

Note that the file suffix varies by platform, and the filename is different on windows. You have to specify the full path, not just the file name.

And whew, finally a compressed encrypted database backup actually worked.

Now, I find it a bit frustrating that it doesn’t warn me when I take a compressed backup but do not specify the right encryption library that the backup taken will simply be an uncompressed backup with no warning or error message generated.

This led me to another question: What if I take a backup without specifying the COMPRESS keyword, but with ENCRLIB set to libdb2compr_encr. Will I get a compressed backup or an uncompressed backup?

It turns out that the backup I get is compressed. So essentially with an encrypted database backup, the COMPRESS keword on the backup command is meaningless. Whether or not you get a compressed backup depends solely on the setting of ENCRLIB, whether specified in the DB cfg or in the BACKUP command.

Backup Duration

After enabling Native Encryption, I also noticed that my backup duration nearly tripled. A backup that used to take 18 minutes now takes 53 minutes. I’m still working to see if I can tune this down some with some memory and other tuning. The tablespaces in this backup are not ideal for backup parallelism.

↧

Ongoing Support of DB2’s HADR

October 6, 2015, 4:00 am

≫ Next: Looking Forward to IBM Insight 2015

≪ Previous: DB2 Backups When Using Native Encryption

There are some things to be aware of with ongoing support of a HADR system. I thought I’d group them together to provide a primer of do’s and don’ts for support of HADR.

Monitoring HADR

HADR does occasionally stop all by itself. Also, system events can cause it to not be active. For these reasons, it is critical that you have some sort of monitoring for HADR. I have my monitoring solution treat HADR down as a sev-1 event, that pages my team out in the middle of the night to get resolved. My reasoning for this is that HADR is often part of a recovery plan, and it only takes one subsequent event to cause a major lapse in High Availability or Recoverability. I have personally seen a severe RAID array failure (which resulted in the disk being unrecoverable on its own) just 3 hours (in the middle of the night) after HADR went down. In that case, we were luckily able to read from (but not write to) the failed RAID array for about an hour before it died completely, and were able to get all of the transaction logs from that 3-hour lapse copied. But it was a lesson for my team at the time to treat HADR failures as immediate emergencies themselves so we’re covered for any subsequent emeregencies. It is really amazing the statistically unlikely failures that can and do occur.

Thus, I monitor HADR using whatever monitoring system I have available. My preferred way to monitor HADR is using db2pd. With 10.1, they changed the db2pd output for the -hadr option to be easier to parse for scripting. And while the MON_GET_HADR table function is nice, it requires a connection to the database, and I also ran across a APAR in 10.5 fixpack 4 where the entire DB2 instance crashed if I ever queried MON_GET_HADR (APAR IT04151), which makes me irrationally afraid of MON_GET_HADR.

To monitor HADR, use syntax like this:

The primary things you are looking for in that output is the HADR_ROLE is what you expect (no unexpected failover has occured), the HADR_STATE is PEER (assuming SYNCMODE of ASYNC or higher), and the HADR_CONNECT_STATUS is CONNECTED. Or as I say in my head when looking at this output: PRIMARY, PEER, CONNECTED. The fourth pink arrow in the above points to something else that is worth monitoring and that is the HADR_LOG_GAP. I generally look for anything over 10% of a single log file to indicate a severe issue.

Obviously, you have to script reading the above to feed into most monitoring infrastructures.

Stopping/Starting Systems using HADR

There are several circumstances under which you need to have your procedures straight for what to do with HADR. If you’re using TSAMP, I recomend that a DBA is always personally involved with any reboot or failover. Without TSAMP, you may be able to train someone junior or a system administrator to take correct action and tell them when to engage a DBA for help.

Standby HADR Server Reboot

If you need to reboot a standby database server (due to system maintenance or maintenance at other non-database levels), you will usually not deactivate HADR. However, you will have to activate HADR after the reboot is complete. HADR will become active when the database is activated, and unless you have added some scripting, the database on a standby server is NOT automatically activated.

If TSAMP is being used

If TSAMP is used to automate HADR failover (using db2haicu), you should disable TSAMP prior to the standby database server becoming unavailable. This is fairly simple, and simply involves issuing the db2haicu command with the -disable option prior to any planned outages. After the outage, you would issue the db2haicu command without any options, and then select 1 to enable TSAMP again. Always check the TSAMP states using lssam (or your favorite other way of looking at it) to ensure that all states are blue or green.

Primary HADR Server Reboot

Usually if you need to reboot or patch or otherwise affect the primary node, you will first fail DB2 over to the principal standby node using the TAKEOVER HADR command on the standby node. Then the reboot of the former primary node is treated just like rebooting a standby server, as described above. Afterwards, some companies prefer to run on the standby node for a while, while others prefer to immediately fail back to the original primary.

If the primary HADR server is rebooted without a failover, it is less likely to need DBA involvement when it comes back up, because HADR will automatically come up when the database is activated, and the primary HADR database is nearly always activated either explicitly or implicitly on first connection.

If TSAMP is being used

As with reboot of the standby server, if the primary server is to be rebooted, no mater whether the database is failed over or not, you should disable TSAMP using the -disable option on the db2haicu command. After the server is back up, you would issue the db2haicu command without any options, and then select 1 to enable TSAMP again. Always check the TSAMP states using lssam (or your favorite other way of looking at it) to ensure that all states are blue or green.

Reboot of Both HADR Servers at Once

It is rare, especially when in produciton, but if you reboot both servers at once, always ensure the standby comes up first. The reason is that when the primary comes back online, with the first connection or activation attempt, will first check to see if it can get to the standby server. If it cannot, then it will not allow any incoming connections. The reason for this is that the primary assumes that there may be a network issue and refuses to allow connections so a scary condition called split-brain does not occur. You can force the primary to start using the “BY FORCE” keywords on the “START HADR” command on the primary – however, it is possible you will have to reset HADR with a restore on the standby database server if you do.

I had a client once with one of the most unstable networks I had ever seen. They were trying to do first-line DB2 support on their own, and became very frustrated when on multiple occasions, they had a network issue that forced them to reboot both live production database servers at the same time. Each time, their primary database did not become available until the standby database server came up. Their conclusion was that it was a flawed HA solution because the database could not be up unless both servers were up, but they simply did not understand the order of bringing things up or the commands to use to bypass that order.

Applying DB2 Fixpack

DB2 fixpacks can be applied to HADR servers in a rolling fashion. I have performed and trained others to perform fixpacks with zero observable downtime going back to DB2 version 8. The failover will necesarily reset transactions already in progress, so this depends on how robust your application is for restarting work. The general order of events is this:

All pre-fixpack prep work is performed on the primary and standby servers
DB2 is deactivated on the standby server
The fixpack code is installed on the standby server
The DB2 instance is updated on the standby server
DB2 is restarted on the standby server
HADR is restarted and the databases are brought back in sync
The database is failed over from the primary server to the standby server
Post-fixpack database actions such as binds and db2updvXX are performed on the database
HADR is stopped
The Fixpack is installed and applied on the Primary database server
HADR is started and the databases are brought back in sync
If desired, the database may be failed back to the primary
Post-fixpack instance actions are performed

This work is still performed at off-peak times to minimize the impact of the failovers.

Maintaining and Verifying Settings that Are Not Logged

HADR copies for you most things that are logged. It does not copy changes to database configuration, database manager configuration, the DB2 registry, or changes made by STMM to bufferpools. Of course, DB2 cannot copy changes to OS-level parameters, filesystem sizes, and that sort of thing. It is important to be diligent in performing these types of changes on all HADR servers. To ensure this, it is important to manually compare all configurations between the two servers from time to time to ensure nothing has been missed.

Keeping Copies of Maintenance Scripts and Crontab

In the case of a failover due to a failure of the primary, you’ll want to make sure that any maintenance, monitoring, or data pruning scripts that you use on the primary are copied to the standby. You can automate this with rsync or manually copy them, but in any case, you want to ensure that everything is on the standby that is on the primary so you can easily pick up operations on the standby without access to the primary.

Health Check on HADR

Periodically, you should perform a health check on the HADR pair using the HADR calculator. This can point out areas where a busy network or other factors might cause HADR to impact database performance on your primary database. See my series on the HADR Tools for details on how to do this.

↧

Looking Forward to IBM Insight 2015

October 12, 2015, 7:00 am

≫ Next: DB2 Administrative SQL Cookbook: BLU Compression Ratios

≪ Previous: Ongoing Support of DB2’s HADR

IBM Insight starts in less than two weeks! You can still register and attend if you haven’t planned to already!

General Tips

Website

If you haven’t registered already, check out the IBM Insight website for information. I’ll be linking to specific pages throughout this post.

Attire

This is the third year I’m going to IBM Insight (well the first year I went, it was still called Information On Demand) in Las Vegas. It comes at a time when things are just starting to get chilly in Denver. We usually get our first snow of the season sometime in October, though it melts fast that time of year. Some part of me always thinks, “Hey, I’m going to Vegas, I should be wearing shorts and t-shirts”, but it’s always so cold indoors that I am more likely to need a sweater.

Shoes

I also choose my footwear very carefully. This sounds like a very girly thing for this tomboy to point out, but footwear is so critical to my enjoyment when I know I’ll have no trouble meeting my 10,000 step goal each day. This year I’m staying over in the Luxor because it’s cheaper. Join me in staying fit with a matchup.io challenge! Even if you’re not going to the conference, you can join in with us and be with us in spirit. I’ll be wearing my favorite Birks for the most part, and may resort to sneakers at times.

Overall Schedule

Saturday and Sunday

Saturday and Sunday are Business Partner and invite-only events (CAC or whatever they’ve renamed it to). This means that registration is open these days too, in case you’re coming in early. The Certification Center opens on Sunday. The first year I went to Insight, I came in early on Sunday and took three certification tests in a row. They were free for me that year, and man I was exhausted after that. I would have passed them more easily if I had taken them at the end of the conference, as they were mostly focused on the (then)new features of 10.5, which were covered in detail in sessions during the week.
Sunday night is the Grand opening of the Expo Hall. Free Food! And maybe drinks! Many of the times in the expo hall, I wander from food table to food table looking for people to talk to. Some of the booths can be interesting. I particularly liked the IBM research booth(s) last year. Had fascinating conversations with some folks about things they’re trying with using GPU-like processing to handle operations that might benefit from it like sorting.

Monday

On Monday (and every weekday of the conference), breakfast is from 7 to 8:30 AM. I’m nearly always there, and find it a time when herded like cattle into the largest dining room I have ever seen, I end up eating with whoever I end up near. This is a good time to find me (DM me on twitter) if you want to meet me and sit with me for a bit.
There are an abunance of keynote and general sessions at Insight and from what I’ve seen the best attended is usually the 8:30 am Monday session. I think that this is because the people who came in only for the Business Partner summit on Sunday are still around – By the Wednesday celebrity session, they’ve generally left.
The Monday general session topic is “Leave no problem unsolved – transform your industry”. I tend to like host Jake Porway and find the most fun part of these sessions is tweeting and seeing what others are tweeting about. The production quality is a bit over the top. Since there are 13,000 people generally, it takes place in a basketball arena that is just huge. You can’t end up in the wrong place for this one – just follow the herds.
The first set of sessions starts at 10:30.
The “Data Management” keynote is from 1-2 PM on Monday in the Mandalay Ballroom. Data Management is the category they’re lumping DB2 into this year. These sessions can be OK, but it can also be a good time to go catch a seat in the Certification Center.
There are more sessions and Labs on Monday afternoon.

Tuesday

Tuesday starts with Breakfast from 7-8:30 and another general session from 8:30 to 10, and then more sessions and labs starting at 10:30. I cover sessions and Labs in separate sections below, so I’m not going into too much detail here. Lunch is daily from 11:30 to 1 PM (except Thursday), and is generally in the same large room breakfast was in.

Wednesday

Breakfast is again 7-8:30. By this point in the week, I’m often getting burnt out on general sessions, but this is usually the slot for the celebrity speaker. I was not impressed with Serena Williams two years ago. Last year’s one by Kevin Spacey was AMAZING, and prompted me to binge-watch House of Cards on Netflix when I got home. Great series, even my political-scientist father has good things to say about it. Anyway this year’s is by Ron Howard. I’ll likely be there from 8:30 to 10 AM. More sessions and Labs start at 10:30
Wednesday night is concert night. This year it’s Maroon 5. Some years I go and others I don’t – loud noise and flashing lights is not generally my thing. I went last year to see No Doubt, but only stayed halfway even though I liked the music – it was just too loud for me.

Thursday

Breakfast is again 7-8:30. Loving the consistency in that timing. But there’s no general session, so the sessions and labs start at 8:30. Last year some of my favorite sessions were on Thursday, and there’s just a much more relaxed vibe on that day. The lunch is usually box-style, eaten while listening to speakers in a much smaller room than the rest of the week. Things are being broken down and packed up in the afternoon with the last labs ending at 5 PM.

Certifications

Certifications from the IBM Business Analytics, Enterprise Content Management and Information Management portfolios will be available for only $30 at Insight. Other IBM Certification tests will be available at $100. The hours are pretty decent this year:

Sunday, October 26 12:00 p.m. - 6:00 p.m.  LAST seating is at 4:00 p.m.
Monday, October 27  10:00 a.m. - 6:00 p.m.  LAST seating is at 4:00 p.m.
Tuesday, October 28 10:00 a.m. - 6:00 p.m.  LAST seating is at 4:00 p.m.
Wednesday, October 29   10:00 a.m. - 8:00 p.m.  LAST seating is at 6:00 p.m.
Thursday, October 30    8:00 a.m. - 4:00 p.m.   LAST seating is at 4:00 p.m.

The Certification Center this year will be in Surf F, MBCC South, Level 2.

I’m a big fan of certifications overall, and hold a fair number of DB2 Certifications. I get excited when a new test comes out so I can try it. And there’s a new test, just released October 9th – test 2090-615 – DB2 10.5 Fundamentals for LUW. IBM is splitting out the fundamentals test from a shared test for LUW and Z/OS to separate tests for each platform, and this is the first test for 10.5 that is not an upgrade exam. It should lead to a certification as a Database Associate for DB2 10.5 LUW.

I learn a lot from studying the details and the things that I don’t do everyday. If you plan on taking a certification test, make sure you have and can log into a Pearson VUE account at least two days before the conference starts. If you have previous certifications, you want to make sure that this is the same account those certifications are under. You’ll also need to bring an ID and a Visa, Master Card, or American Express. Sign in goes faster if you know the number of the test you want to take from the IBM Certification Site.

Check out the details that IBM has published about certification at IBM Insight 2015.

If you want more information on DB2 certifications, check out my blog entry on DB2 Certification.

Labs

I did a couple of Labs last year and was amazed at how good they really were. A few things to understand about labs (at least how they’ve worked in the past):

Scheduled labs involve a presentation followed by time to work through exercises on prepared lab machines. With scheduled labs you get a copy of the handout with all the lab instructions and the instructor may or may not be able to give you copies of the VMs or lab environments. Scheduled labs are usually scheduled to cover 2 or even 3 sessions. The mobile and other scheduling tools will not let you schedule sessions in a time you’ve signed up for a Lab. If you don’t sign up ahead of time, the lab may be full and you may not be able to get a seat.
With drop-in labs, you borrow a copy of the handout, and work through the lab exercises on you own. The lab center may be busy and you may have trouble getting a seat at popular times. You DO NOT get to keep the handout if you do it this way, so don’t write all over it. Last year there was a scheduled time on Wednesday where you could wait in line to get a handout or two of your choice, but it was a LOOOONNNNG line, and they ran out of many of the handouts.

Two labs that I hope to try are:

Introduction to Data Science
Using Data server Manager to optimize DB2 performance for a migration with BLU Acceleration

Sessions

The sessions are my favorite part of the conference after meeting up with old friends. There is more information here than you can get from any other source. There is a wealth of sessions to choose from, so choose wisely. I like the mobile app for setting up my schedule, but you can also look online for session information. Check out db2dean’s blog entry for details on using the sites for building agendas.

To limit it to the DB2 LUW tracks, here are the options I pick:

I like to search sessions for my favorite speakers first, and then fill in with others. Some sessions can be thinly veiled sales sessions, and it’s an art finding the best ones. Here are some of the speakers I prioritize:

Melanie Stopfer (four sessions this year, wow!)
Steve Rees
Matt Huras
Berni Schiefer
Jessica Rockwood
Dale McInnis
Adam Storm
Michael Kwok
Guy Lohman

Here are just a few of my top picks for sessions during the week:

10:30 AM Monday: Ian Bjorhovde and Walid Rjaibi – Safeguarding Your Sensitive Company Data
Walid is THE DB2 Security expert, and Ian is smarter than I am and a guest blogger here on db2commerce.com
2:30 PM Monday: Melanie Stopfer – Meet the Experts: Power of db2pd: Monitor and Troubleshoot Your World
10:30 AM Tuesday: Michael Kwok and David Kalmuk – Scaling up BLU Acceleration with Consistent Performance in a High Concurrency Environment
1:00 PM Tuesday: Melanie Stopfer – Upgrading to DB2 with the Latest Version of BLU Acceleration
I have seen Melanie’s upgrade presentations nearly every year, and have even done a fair number of upgrades myself this year, but every time I go, I pick up something new.
1:00 PM Tuesday: Ask the Experts on DB2 LUW Data Warehousing
I love panels like this. A conflict with Melanie!
4:00 PM Tuesday: Michael Kwok and Jantz Tran – Advances in Analytics Using DB2 with BLU Acceleration on Intel Architecture
1:00 PM Wednesday: DB2 LUW Panel Session on High Performance and Availability for OLTP Systems
2:30 PM Wednesday: Melanie Stopfer – Enhancements and Tips for DB2 LUW and IBM BLU
9:45 AM Thursday: Melanie Stopfer – DB2 Memory Management: Have you Lost Your Memory?
A complete can’t miss session for me
3:30 PM Thursday: Berni Schiefer and Thomas Kalb – Listening to Your Performance “Canary” in the DB2 Database

Volunteer

There’s also a chance to spend some time volunteering while in Vegas. You can sign up a slot to pack meals for Stop Hunger Now.

Other Information

A couple of other blogs you might be interested in about IBM Insight 2015:
http://ibmdatamanagement.co/2015/10/04/insight-2015-making-the-most-of-the-sessions-on-offer/
https://www.toadworld.com/platforms/ibmdb2/b/weblog/archive/2015/09/14/are-you-heading-to-las-vegas-for-ibm-insight
http://www.db2dean.com/Previous/iodAgenda15.html
Hope to see you there!

↧

DB2 Administrative SQL Cookbook: BLU Compression Ratios

October 22, 2015, 4:00 am

≫ Next: Improving Performance of DBCLEAN Deletes

≪ Previous: Looking Forward to IBM Insight 2015

Purpose

This statement calculates the compression ratio for BLU tables. The compression ration can be used to help identify tables where compression is not optimal and you may need to look into why. Compression is critical to optimal performance on BLU.

Understanding Compression Ratios.

Compression ratios across platforms and outside of databases are generally represented as:
Compression Ratio = Compressed Size / Uncompressed Size
When talking about the results we generally refer to the compression ratio as NUM X compression. For example from the results section below, for the last table, I would say we’re seeing 10X compression on that table.

Source

While I’ve modified this statement, it comes from these two sources, both influenced by David Kalmuk:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1407monitor-bluaccel/index.html
http://www.dbisoftware.com/blog/db2nightshow.php?id=619

Statement

SELECT     substr(tabschema,1,18) as tabschema
    ,substr(tabname,1,33) as tabname
    , card
    , npages
    , decimal(1/(1-float(pctpagessaved)/100),5,2) as compr_ratio
FROM SYSCAT.TABLES
WHERE tableorg='C' 
    and tabschema not like 'SYS%'
    and card > 0
order by compr_ratio
with ur;

Sample Output

TABSCHEMA          TABNAME                           CARD                 NPAGES               COMPR_RATIO
------------------ --------------------------------- -------------------- -------------------- -----------
EBT_ODS            EBTUSG_OUTOF_CNTY                               397483                  315        7.69
EBT_ODS            EBT_OUT_CNTY                                    514329                  455        7.69
SSIRS              AGEN_PERF                                       501321                  262        8.33
EBT_ODS            EBT_ACT_TYPE1                                 26598452                49867        9.09
CIS_ODS            INDV                                           2605558                 5787        9.09
SSIRS              ES_CSLD_VER2                                  18659499                40773        9.09
SSIRS              STAFF                                           317367                  172       10.00
CMIPS_ODS          RELBCO01                                        914192                 1934       10.00

Notes

I like to return the card and npages so I can understand if a table is so small that its geometry may affect the compression ratio. The results are actual results from a production BLU database that I support, but represent the best few tables as far as compression goes.

↧

Improving Performance of DBCLEAN Deletes

October 27, 2015, 4:00 am

≫ Next: DB2 Administrative SQL Cookbook: Column Selectivity by Table (BLU)

≪ Previous: DB2 Administrative SQL Cookbook: BLU Compression Ratios

While this post is specific to WebSphere Commerce, it covers concepts that may apply to tuning other delete statements as well.

Using Functions in the Best Place

The CLEANCONF table stores the delete statements that are used by DBCLEAN. Most of them use syntax like this for the date component:

(days(CURRENT TIMESTAMP) - days(prevlastsession)) >= ?

Every time I see this, I cringe.

Applying a function (such as days) to a column/field eliminates the possible use of an index on that column. This essentially means that this SQL is forcing a table scan – sometimes of very large tables like USERS, MEMBER, or ORDER.

This SQL can be very easily re-written to not use a function on table data. The following is only marginally different from the above, and it can significantly improve performance:

prevlastsession < current timestamp - ? days

Now technically, this will also delete things to the middle of a day, so it is slightly different. To mitigate any differences, just keep one more day.

It is supported to make changes to the base Commerce DBCLEAN jobs, though it is a bit safer to create a custom DBCLEAN job with different SQL. Make sure customizations survive WebSphere Commerce fixpacks, feature packs, and upgrades.

Once the SQL is changed like this, one or more indexes may drastically improve performance on the parent table the delete is running against.

Example

Recently, I had the chance to spend some time tuning a WebSphere Commerce environment that I do not regularly support, and one of the areas they wanted the most help with was tuning DBCLEAN statements. One of the statements we started with was a fairly standard one for deleting guest users:

delete from member where member_id in (
    select users_id 
    from users T1 
    where registertype='G' 
        and (days(CURRENT TIMESTAMP) - days(prevlastsession)) >= ? 
        and not Exists (
            select 1 from orders 
            where orders.member_id=T1.users_id 
            and status != 'Q') 
        and (users_id > 0))

Explaining this statement, I discovered that in this environment, this delete statement was estimated at 7,261,900,000 timerons. These deletes are very expensive because they cascade to hundreds of tables. The explain plan had over 4,000 operators. This is how it reads with the altered date syntax:

delete from member where member_id in (
    select users_id 
    from users T1 
    where registertype='G' 
        and prevlastsession < current timestamp - ? days 
        and not Exists (
            select 1 from orders 
            where orders.member_id=T1.users_id 
                and status != 'Q') 
        and (users_id > 0))

With just that simple change, the estimated timerons went down to 6,990,170,000. That still seems like a lot, but that's an improvement of over 250 MILLION timerons. Even better, the design advisor now tells me I can get very significant improvements by adding indexes.

Indexes

Conventional wisdom is that indexes slow down deletes. That's true in the sense that the more indexes there are, the more physical places the delete has to happen. However when a delete has significant conditions on it like these do, indexes to support the conditions may make sense.

Using the Design Advisor on a Delete

Running the design advisor on the deletes on MEMBER or ORDERS will take a while to run. Running it on the guest users delete above comes up with over 300 recommended NEW indexes. THREE HUNDRED. There is no way you want to add 300 indexes based on the design advisor alone. So how do you navigate all those indexes?

Look at the explain plan - maybe there will be something obvious, though the plan is very hard to look at as it is so large
Make your best guesses based on table cardinality, custom tables, and other likely factors
Use a modified version of Multiple Index Regression Analysis

Example

Here is how I approached it when looking at the guest users query above. First, I made some guesses. I looked first at indexes on custom tables. When I did that, I found that the indexes on likely-looking custom tables, if added alone, actually made my delete performance significantly worse. I then found a branch of the explain plan that seemed particularly expensive, looked for indexes in the design advisor output on those tables. Also wrong - again, they actually made the delete performance worse.

I then decided to order the recommended indexes by name, and use the same techniques in the Multiple Index Regression Analysis to look at groups of indexes. My query started out at 6.9 billion timerons. If I added all 300 indexes that the advisor recommended, it suggested I could get it down to 334 million timerons (a 95% improvement). I used the EVALUATE INDEXES explain mode to look at the first 160 indexes and see if how much they would help. If all 160 were added, I could expect the delete to run at a cost of only 184 million timerons!! That's even less than my end goal. Somewhere in the first 160 indexes are some real gems. For my next run using EVALUATE INDEXES, I tried the first 40 indexes, and found again an estimated cost of 180 million timerons. Trying both the first 10 and the first 20 got me 482 million timerons. Starting to work through the first 10, I found that the very first index when they were sorted by name would get me down to 582 million timerons. That's a 12X performance improvement by adding just one index, and it was a very reasonable, high-cardinality, non-duplicate index on USERS (+REGISTERTYPE+PREVLASTSESSION+USERS_ID). This is an index that would not have helped if I hadn't changed the date syntax. Here's a partial look at what my spreadsheet looked like:

I decided to add this index and have the customer evaluate performance to see if it meets their performance goals. I may go back to find more good indexes, but given the effort of finding them, the question is if an additional bit more will be worth it.

Custom Tables

One of the most common areas where indexing is needed in WebSphere Commerce databases is on custom tables. Some custom tables can be particularly large, and may have foreign keys defined to tables that are part of DBCLEAN. These foreign keys must be defined with cascading deletes (if not, they will cause DBCLEAN to fail), and there may not be indexes in support of the foreign keys.

Example

One client recently had trouble with a delete from ORDERS. They were unable to delete 10,000 orders in 2 hours. We used the date improvement described above for an estimated 99% improvement in performance, but that did not solve the issue. What did solve the issue was to add an index on a single custom table that included an ORDERSID column. There was a foreign key on this column, referencing ORDERS_ID in the base Commerce table ORDERS, but there was no index to support the foreign key, and the table had millions of rows. Immediately after adding the index on this single column, deletes completed much faster, with more than 100,000 being able to be completed in the two hour window that previously would not accommodate 10,000.

↧

DB2 Administrative SQL Cookbook: Column Selectivity by Table (BLU)

October 29, 2015, 4:00 am

≫ Next: Insight 2015 Brain Dump

≪ Previous: Improving Performance of DBCLEAN Deletes

Purpose

This statement reports how selective at the column level queries are that run against a specific table. This does not look at overall selectivity or row selectivity. It will only work in DB2 10.5. BLU performs best when not all columns are referenced by queries.

Source

While I’ve modified this statement, it started with statements from these two sources, both influenced by David Kalmuk:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1407monitor-bluaccel/index.html
http://www.dbisoftware.com/blog/db2nightshow.php?id=619

Statement

with t1 as (
SELECT
        substr(tabschema,1,18) as tabschema
        ,substr(tabname,1,33) as tabname
        ,(select count(*) from syscat.columns c where c.tabname=mgt.tabname and c.tabschema=mgt.tabschema) as num_cols
        ,section_exec_with_col_references as num_queries
        ,(num_columns_referenced /
        nullif(section_exec_with_col_references, 0))
        as avg_cols_ref_per_query
FROM table(mon_get_table('', '', -2)) AS mgt
where tab_organization='C'
)
SELECT 
    t1.*
    , decimal((float(t1.avg_cols_ref_per_query)/float(num_cols)),5,2) * 100 as PCT_COLS
FROM table(mon_get_table('', '', -2)) AS mgt2
    join t1 on t1.tabschema = mgt2.tabschema and t1.tabname=mgt2.tabname
where mgt2.tab_organization='C'
    and mgt2.tabschema not like 'SYS%'
order by PCT_COLS desc, t1.num_queries desc
with ur;

Sample Output

TABSCHEMA          TABNAME                           NUM_COLS    NUM_QUERIES          AVG_COLS_REF_PER_QUERY PCT_COLS
------------------ --------------------------------- ----------- -------------------- ---------------------- --------
SSIRS              COGNOS_LOGIN_RAW_MS_EXCHANGE                7               343483                      7   100.00
SSIRS              COGNOS_LOGIN_RAW_MS_EXCHANGESMTP            6               343482                      6   100.00
SSIRS              ROW_COUNT_CMIPSSFTP                         5                 1447                      5   100.00
SSIRS              COGLOG_USERINFO                             8                  131                      8   100.00
SSIRS_DMART        DIM_CS_LCTN                                15                10342                      2    13.00
SSIRS_DMART        DIM_CS_PGM_REG                             39                 3636                      5    12.00
SSIRS              COLL_RFRL                                  55                  886                      7    12.00
SSIRS_DMART        DIMDATE                                    17                 2205                      2    11.00
SSIRS_DMART        DIM_CS_PGM                                 38                14916                      4    10.00
SSIRS              ES_CSLD_VER2_1DAY                          65                 3373                      4     6.00

Notes

100 percent means that on average, every query on this table references every column in the table. The lower this number is, the more likely the power of BLU is being leveraged.

↧

Insight 2015 Brain Dump

November 5, 2015, 4:00 am

≫ Next: Administrative SQL Cookbook: BLU Buffer Pool Hit Ratios

≪ Previous: DB2 Administrative SQL Cookbook: Column Selectivity by Table (BLU)

I call my post-conference blog posts brain dumps because that is largely what they are – information formatted in a way my brain understands but not necessarily as thoroughly organized and researched as you might be used to seeing from me.

Conference Generalities

Overall Themes

I sure wish that DB2 was more highlighted overall, but I’m sure that’s largely a personal bias. I find Insight to be very buzzword oriented. This year’s buzzwords were clearly Cognitive and Spark.

Cognitive

Big Data is no longer enough. Analytics (which I have had to add into all my spell checkers) is no longer enough. The new thing is cognitive computing – the using of “machine learning” and Watson Analytics on data and natural language. This is certainly a fascinating area, but I probably heard the word cognitive 50 times by 10 am every day. I’m still not sure how well this currently applies to mainstream analytics – actually using Watson must be ridiculously expensive, and I’m fairly sure that Watson itself is not behind some of the things they’re labeling as Watson Analytics. Are they going to try to rename DB2 to WatsonDB next year? Or maybe CognitiveDB?

Spark

There was an interesting focus on using Spark for Hadoop. Both Insight and IDUG EMEA really seem to be focusing in it.

Labs

There were no drop-in labs this year, and I missed them. Many labs were fully booked before the conference started, and there were lines for everyone on standby. I felt they could have done better here. I also didn’t find all the technical topics I wanted – I really wanted a lab on pureScale. Several weeks before the conference, I signed up for just one lab – Introduction to Data Science, but was put on standby at that time. When I got to the lab center, I found at least 20 people ahead of me in the standby line vying for what looked like about 8 open spots. Because of all this, I didn’t end up in a single lab this year.

Meals

The food was just fine for provided meals. I’m a vegetarian and that goes fairly well at this conference, but I still dislike the way meals are done at this conference. The conference attendance includes people with a variety of backgrounds and job descriptions, and all 10,000 (or however many actually attend the meals) are herded into the dining room and directed to random areas. So my chance of running into anyone I know or who does what I do is very low. Contrast this to a smaller conference like IDUG where whoever I sit with, they will have SOMETHING to do with DB2, and sitting randomly can be an interesting way to meet new people with whom I share some kind of common interest. I wish that for this conference, they would split the dining room by broad specialties so I would be more likely to have something in common with the people I end up sitting with.

Signage

I LOVED the silly signs on the long walks this week. These made me smile, and made the long walks more doable.

App

Event Connect. You’d think that for a conference that is all about integrating and making sense out of data that the data in event connect could sync with your calendar. Or that the app could actually work on Android (or on my Android anyway). This conference is hard to navigate due to its size, and having to pull out my iPad to see where I was to go next and having even that not work sometimes was frustrating. I may go the route I saw some doing this year and actually print things out on paper – something I usually make fun of others for, but having no idea where to go next is no fun.

Sessions

Tuesday’s General Session

Before the general session, mine was one of the IBM Champion profiles that was displayed. It was pretty cool seeing my own face 8 feet tall.

I was offended by the promotion of a violent gun-promoting video game. At one point on the big screen, there was a gun-sight logo up on the screen, and on the adjacent screen was a picture of a mother and child with an iPad. It was distasteful. Wasn’t there any non-violent game that could have been promoted instead?

Wednesday’s General Session

I found Fredi Lajvardi really inspiring. I think his was the best speech of the whole conference. I love to hear good teachers speak about how deep their thought process goes for nurturing and inspiring students.

Ron Howard and Brian Grazer were also interesting.

Technical sessions

I love the technical sessions at a conference. My favorite speaker as always was Melanie Stopfer. She had a new presentation this conference on Memory management, and it was great. Even though I have directly discussed this topic with Melanie directly, I learned from the session. There were tons of other great technical sessions, and I have so many things to work on and to research that I don’t really even know where to start. I have some great things around BLU to work on.

Blog Ideas

I haven’t really reported this before. I don’t tend to lack for ideas anymore, but I always get a ton from the sessions I’m in. Even when I have seen some presentations before or I have a pretty high level of mastery of the topic, I still get ideas on how to blog on the idea or teach about it. Some that I hope to work on that came from this conference are:

Encryption concepts – maybe a summary linking to other sources?
Native Encryption key rotation
SQL to figure out the minimum table size to have at least one extent for each column
How to monitor for join and group by spilling (BLU/non-BLU)
Taking snapshot-level backup using GPFS
Research if you can use views on mon_get table functions to restrict permissions
Default workload management, both with and without BLU
Tuning sortheap and thresh, with and without BLU
What to do on BLU when someone comes to you with “I have a slow query”
For that matter, what to do on Row-Organized tables when someone has a slow query, though that requires far less research than the last one
Try out DSM query tuning and blog about it
Blog on how to find indexes that could go from 4 levels to 3 levels if they were on a larger page size (do I even have any to test that on?)
How to query MON_GET_SECTION_OBJECT
SYSIBMADM.MON_PKG_CACHE_SUMMARY
Review blogs on tuning package cache, and see if I’ve covered it enough
R&D’s demo on Spark and the insert speeds they’re getting there

I don’t promise all of those will make it through, but those are the things that stood out to me as the best blog ideas of the conference. I don’t think there was any single session like last year that I felt should have been named “10 things Ember should blog about”, but lots of good ideas sprinkled throughout.

Geeking out with DB2 Friends

This is one of my favorite parts of the week – seeing and hanging out with old friends. The Expo hall is actually really useful for this.

Before the General Session on Wenesday

In the Expo Hall

In the Expo hall with a new Spanish friend

Ian crashing a car that’s running on WebSphere

Social Media

I also very much enjoy the Twitter scene at IBM Insight. I always find interesting new people to follow, and this year, I passed 600 followers on Twitter. I created a Storify of some of my favorite moments:

View the story “IBM Insight 2015″ on Storify

Women in Technology

I haven’t generally been vocal on issues related to this, but I am obviously a female in a field that is, at least in the USA, very strongly male. I’m not making any judgments about Insight or IBM here, just observations. In sessions, I saw precisely one female speaker (and one in the TAB on Sunday). The representation on the main stage was much better than that, and I loved seeing female engineers, astronomers, and executives represented there.

As far as attendees, I thought it would be interesting to track the number of women vs. men in the sessions I was in. There were a few sessions I missed counting, and people come in and out of sessions – arriving late, leaving early, so I don’t promise all the counts are accurate – they were as of the time they were taken, usually about 10 to 15 minutes into the session. I also judge male or female based on outward appearance, which I figure is fairly likely to match with gender identity. Here are the breakdowns, without mentioning specific sessions:

Session Identifier	Total Attendees	Female Attendees	Percent of the Audience that was Female
TAB	67	11	16.4%
1896A	40	4	10%
1489A	15	3	20%
3587A	56	13	23.2%
3593A	32	5	15.6%
3039A	22	5	22.7%
1118A	50	10	20.0%
3916A	28	8	(!) 28.6%
2033A	26	5	19.2%
3028A	47	4	8.5%
1179A (female speaker)	79	14	17.7%
1964A	31	5	16.1%
1177A (female speaker)	81	9	11.1%
1581A	67	12	17.9

Next Year

I plan to go again next year. I find this conference valuable. Guess where it is next year? Same place it has been for at least the last 9 years!

↧