16GFC == 16X
If someone walked up to you on the street and said, “Hey, guess what, 16GFC is twice as fast as 8GFC! It’s even 8-fold faster than what we commonly used in 2007” you’d yawn and walk away. In complex (e.g., database) systems there’s more to it than line rate. Much more.
EMC’s press release about 16GFC support effectively means 16-fold improvement over 2007 technology. Allow me to explain (with the tiniest amount of hand-waving).
When I joined the Oracle Exadata development organization in 2007 to focus on performance architecture, the state of the art in enterprise storage for Fibre Channel was 4GFC. However, all too many data centers of that era were slogging along with 2GFC connectivity (more on that in Part II). With HBAs plugged into FSB-challenged systems via PCIe it was uncommon to see a Linux commodity system configured to handle more than about 400 MB/s (e.g., 2 x 2GFC or a single active 4GFC path). I know more was possible, but for database servers that was pretty close to the norm.
We no longer have front-side bus systems holding us back*. Now we have QPI-based systems with large, fast main memory, PCI 2.0 and lots of slots.
Today I’m happy to see that 16GFC is quickly becoming a reality and I think balance will be easy to achieve with modern ingredients (i.e., switches, HBAs). Even 2U systems can handily process data flow via several paths of 16GFC (1600 MB/s). In fact, I see no reason to shy away from plumbing 4 paths of 16GFC to two-socket Xeon systems for low-end DW/BI. That’s 6400 MB/s…and that is 16X better than where we were even as recently as 2007.
Be that as it may, I’m still an Ethernet sort of guy. I’m also still an NFS sort of guy but no offense to Manly Men intended.
A Fresh Perspective
The following are words I’ve waited several years to put into my blog, “Let’s let customers choose.” There, that felt good.
In closing, I’ll remind folks that regardless of how your disks connect to your system, you need to know this:
* I do, of course, know that AMD Opteron-based servers were never bottlenecked by Front Side Bus. I’m trying to make a short blog entry. You can easily google “kevin +opteron” to see related content.
Filed under: oracle
I’ve been considering carefully recently the generation of randomized, representative test data for a test case I’m working on. I will blog the worked test case as well, but in the mean time I thought it worth jotting down a few ideas about generating reasonably representative volumes of test data for Oracle. My problem is [...]
I’ve just bought something over the net using my Barclaycard. As usual, the checkout screen bounces me to a Barclaycard verification screen. As usual it asks me for several letters from my password. As usual my password doesn’t work. As usual I reset the password. Not as usual, the screen then asks me for the 9th letter of my password when I only set an 8 character password. Phone call later my password is reset again. I complete my order. I have to rest my password, but can’t use the one I set previously as it “has been used to recently”.
If I was a thief I really couldn’t be arsed to go through this when I could just mug a granny in the street. I see now how Barclaycard security works. It bores people out of commiting credit card fraud… Sigh…
[Note: I'm writing in Italian since this post is about a local event]
Anche quest'anno Thomas "Tom" Kyte, il "Tom dietro asktom.oracle.com" e autore di diversi libri, è tornato in Italia per tenere una delle sue conferenze ricorrenti più popolari (ecco le slides) - quella sulle features più significative della versione corrente di Oracle (quindi 11gR2 al momento).
Non avevo mai visto Tom sul palco nonostante ci conoscessimo da tempo, per cui ho approfittato subito della gentile offerta dell'Ufficio Stampa Oracle di incontrarlo a valle della conferenza - un'incontro che si è trasformato in una lunga informale chiacchierata di due ore (insieme a Christian Antognini) cominciata a pranzo e continuata sul taxi. Oltre al piacere di parlare con Tom (sempre molto disponibile e amichevole) ne ho tratto diverse informazioni e impressioni che espongo in ordine sparso.
La conferenza in generale. L'aspetto che meglio descrive Tom Kyte, come chiunque abbia avuto modo di leggere uno dei suoi libri può intuire, è di essere una perfetta (quanto rarissima) sintesi fra un ottimo tecnico ed un efficace comunicatore: e difatti, dalla conferenza ho ricavato una conoscenza molto più netta e chiara delle features presentate, anche se già le conoscevo. Mi sono dunque pentito di non aver partecipato alle conferenze degli anni passati - sarebbe stato un ottimo investimento del mio tempo ...
Total Recall (Flashback Data Archive). A mio parere è la killer feature di 11g (era già presente in 10g ma i miglioramenti in 11g riguardo al supporto di molti tipi di DDL sulle tabelle tracciate la rendono nettamente più usabile): poter eseguire una query nel passato (anche anni) semplicemente aggiungendo la clausola "AS OF TIMESTAMP" può davvero "cambiare la vita", sia agli sviluppatori che ai DBA. Basti solo pensare alle investigazioni di problemi segnalati oggi ma verificatosi giorni addietro, al ripristino dei dati a fronte di errori, etc. Uno degli usi che voglio indagare è l'estrazione di dati dai sistemi operazionali verso i DWH; a prima vista è più efficiente (e senz'altro infinitamente più manutenibile e generalizzabile) delle classiche tecniche utilizzate. Importantissimo poi sapere come le history tables delle tabelle tracciate vengano aggiornate leggendo le informazioni dagli undo segments, dunque senza impatti sul tempo di esecuzione degli statements DML operanti sulle tabelle tracciate. Total Recall è anche una delle extra-cost options più economiche.
Smart Flash Cache. L'informazione cruciale è che il disco a stato solido ("SSD" o "Flash") della Flash Cache viene usato solo per i blocchi clean, quelly dirty vengono scritti solo sui datafile: quindi solo sistemi il cui bottleneck sono le letture da disco possono trarne beneficio.
La mia impressione è che il suo uso più interessante sia di rendere disponibile memoria veloce per la nuova feature "In-Memory Parallel Execution" (che permette di leggere blocchi nella buffer cache invece che solo nella UGA del processo); sarebbe interessante verificarlo.
Edition-based Redefinition. Certamente è possibile usare questa feature per installare nuove versioni sia dei dati (tabelle con colonne nuove, etc) che del software (package, stored procedures), ma mentre il primo caso è relativamente complesso da gestire, il secondo è semplicissimo - c'è un baratro nella difficoltà d'uso nei due casi. Quindi ritengo che salvo casi particolarissimi (in sistemi con requisiti di altissima disponibilità, gestiti da personale molto preparato, e con alti budget), questa feature troverà uso diffuso "solo" nel secondo caso. Forse era voluto che la demo della conferenza fosse incentrata proprio sul secondo caso ...
Kernel Programmers. Non faceva parte della conferenza, ma gentilmente Tom ha soddisfatto alcune mie curiosità riguardo il kernel di Oracle. Ho così scoperto che vengono seguite delle coding covention strettissime che sono un poco difficili da comprendere inizialmente, ma che permettono a chiunque sia membro del team di leggere e modificare agevolmente il codice (scritto in C ovviamente) del kernel. Inoltre, per entrare a far parte del Team (o meglio di uno dei Team) che lavorano sul kernel, o si è brillantissimi laureati di Università di primo piano, oppure si proviene dall'interno dell'azienda e dunque si è già conosciuti come ottimi professionisti. Ed ovviamente (ma questo è noto), prima di introdurre una nuova feature, questa viene descritta dettagliatamente in appositi documenti ed analizzata per rilevanza e fattibilità con un preciso flusso decisionale. Insomma, un ambiente di lavoro rigorosissimo ed estremamente professionale - un tipo di ambiente ormai raro oggigiorno, ma certamente al cuore del successo di Oracle.
Conclusione. Queste erano le mie considerazioni principali, che ovviamente riflettono i miei interessi e le mie necessità professionali. Per il futuro, mi ripropongo di partecipare più spesso alle conferenze tenute da Oracle Italia - sono sempre state utili e di alta qualità, con un approccio "americano" : tanti fatti, poche chiacchiere, e impostati sulle necessità di chi ascolta.
I have recently upgraded my lab’s reference machine to Oracle Linux 6 and have experimented today with its network failover capabilities. I seemed to remember that network bonding on xen didn’t work, so was curious to test it on new hardware. As always, I am running this on my openSuSE 11.2 lab server, which features these components:
Now for the fun part-I cloned my OL6REF domU, and in about 10 minutes had a new system to experiment with. The necessary new NIC was added quickly before registering the domU with XenStore. All you need to do in this case is to add another interface, as in this example (00:16:1e:1b:1d:1f already existed):
After registering the domU using a call to “virsh define bondingTest.xml” the system starts as usual, except that it has a second NIC, which at this stage is unconfigured. Remember that the Oracle Linux 5 and 6 network configuration is in /etc/sysconfig/network and /etc/sysconfig/network-scripts/.
The first step is to rename the server-change /etc/sysconfig/network to match your new server name.That’s easy :)
Now to the bonding driver. RHEL6 and OL 6 have deprecated /etc/modprobe.conf in favour of /etc/modprobe.d and its configuration files. It’s still necessary to tell the kernel that it should use the bonding driver for my new device, bond0 so I created a new file /etc/modprobe.d/bonding.conf with just one line in it:
alias bond0 bonding
That’s it, don’t put any further information about module parameters in the file, this is deprecated. The documentation clearly states “Important: put all bonding module parameters in ifcfg-bondN files”.
Now I had to create the configuration files for eth0, eth1 and bond0. They are created as follows:
DEVICE=eth0 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes USERCTL=no
DEVICE=eth1 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes USERCTL=no
DEVICE=bond0 IPADDR=192.168.0.126 NETMASK=255.255.255.0 ONBOOT=yes BOOTPROTO=none USERCTL=no BONDING_OPTS="
Now for the bonding paramters-there are a few of interest. First, I wanted to set the mode to active-passive, which is Oracle recommended (with the rationale: it is simple). Additionally, you have to set either the arp_interval/arp_target parameters or a value to miimon to allow for speedy link failure detection. My BONDING_OPTS for bond0 is therefore as follows:
Have a look at the documentation for more detail about the options.
The test is going to be simple: first I’ll bring up the interface bond0 by issuing a “system network restart” command on the xen console, followed by a “xm network-detach” command.The output of the network restart command is here:
[root@rhel6ref network-scripts]# service network restart Shutting down loopback interface: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface bond0: [ OK ] [root@rhel6ref network-scripts]# ifconfig bond0 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F inet addr:192.168.99.126 Bcast:192.168.99.255 Mask:255.255.255.0 inet6 addr: fe80::216:1eff:fe1b:1d1f/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:297 errors:0 dropped:0 overruns:0 frame:0 TX packets:32 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:9002 (8.7 KiB) TX bytes:1824 (1.7 KiB) eth0 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:214 errors:0 dropped:0 overruns:0 frame:0 TX packets:22 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:6335 (6.1 KiB) TX bytes:1272 (1.2 KiB) Interrupt:18 eth1 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:83 errors:0 dropped:0 overruns:0 frame:0 TX packets:10 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2667 (2.6 KiB) TX bytes:552 (552.0 b) Interrupt:17 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
The kernel traces these operations in /var/log/messages:
May 1 07:55:49 rhel6ref kernel: bonding: bond0: Setting MII monitoring interval to 1000. May 1 07:55:49 rhel6ref kernel: bonding: bond0: setting mode to active-backup (1). May 1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready May 1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth0. May 1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full. May 1 07:55:49 rhel6ref kernel: bonding: bond0: making interface eth0 the new active one. May 1 07:55:49 rhel6ref kernel: bonding: bond0: first active interface up! May 1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth0 as an active interface with an up link. May 1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready May 1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth1. May 1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full. May 1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth1 as a backup interface with an up link.
This shows an active device of eth0, with eth1 as the passive device. Note that the MAC addresses of all devices are identical (which is expected behaviour). Now let’s see what happens to the channel failover when I take a NIC offline. First of all I have to check xenstore which NICs are present:
# xm network-list bondingTest Idx BE MAC Addr. handle state evt-ch tx-/rx-ring-ref BE-path 0 0 00:16:1e:1b:1d:1f 0 4 14 13 /768 /local/domain/0/backend/vif/208/0 1 0 00:16:1e:10:11:1f 1 4 15 1280 /1281 /local/domain/0/backend/vif/208/1
I would like to take the active link away, which is at index 0. Let’s try:
# xm network-detach bondingTest 0
The domU shows the link failover:
May 1 08:00:46 rhel6ref kernel: bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:16:1e:1b:1d:1f - is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts. May 1 08:00:46 rhel6ref kernel: bonding: bond0: releasing active interface eth0 May 1 08:00:46 rhel6ref kernel: bonding: bond0: making interface eth1 the new active one. May 1 08:00:46 rhel6ref kernel: net eth0: xennet_release_rx_bufs: fix me for copying receiver.
Oops, there seems to be a problem with the xennet driver, but never mind. The important information is in the lines above: the active eth0 device has been released, and eth1 jumped in. Next I think I will have to run a workload against the interface to see if that makes a difference.
And the reverse …
I couldn’t possibly leave the system in the “broken” state, so I decided to add the NIC back. That’s yet another online operation I can do:
# xm network-attach bondingTest type='bridge' mac='00:16:1e:1b:1d:1f' bridge=br1 script=/etc/xen/scripts/vif-bridge
Voila-job done. Checking the output of ifconfig I can see the interface is back:
# ifconfig bond0 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F inet addr:192.168.99.126 Bcast:192.168.99.255 Mask:255.255.255.0 inet6 addr: fe80::216:1eff:fe1b:1d1f/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:39110 errors:0 dropped:0 overruns:0 frame:0 TX packets:183 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1171005 (1.1 MiB) TX bytes:32496 (31.7 KiB) eth0 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:7 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:412 (412.0 b) TX bytes:0 (0.0 b) Interrupt:18 eth1 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:39106 errors:0 dropped:0 overruns:0 frame:0 TX packets:186 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1170749 (1.1 MiB) TX bytes:33318 (32.5 KiB) Interrupt:17 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:46 errors:0 dropped:0 overruns:0 frame:0 TX packets:46 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2484 (2.4 KiB) TX bytes:2484 (2.4 KiB)
I can also see that the kernel added the new interface back in.
May 2 05:05:31 rhel6ref kernel: bonding: bond0: Adding slave eth0. May 2 05:05:31 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full. May 2 05:05:31 rhel6ref kernel: bonding: bond0: enslaving eth0 as a backup interface with an up link.
All is well that ends well.
Continuing from the previous part of this series I'll cover in this post some further basics about parallel execution control:
- Keep in mind that there are two classes of parallel hints: PARALLEL and PARALLEL_INDEX. One is about the costing of parallel full table / index fast full scans, the other one about costing (driving) parallel index scans, which are only possible with partitioned indexes (PX PARTITION granule vs. PX BLOCK granule)
- The same applies to the opposite, NO_PARALLEL (or NOPARALLEL in older releases) and NO_PARALLEL_INDEX. It is in particular important to realize that specifying a NO_PARALLEL hint only tells the optimizer to not evaluate a parallel full table scan / index fast full scan, however it might still evaluate a parallel index scan if feasible and therefore still might go parallel. This is particularly important if you are under the impression that using the NO_PARALLEL hint will ensure that no parallel slaves will be used by the statement at execution time. This is not entirely correct - it still might go for a parallel index scan. You would need to use different means, either use a NO_PARALLEL_INDEX hint in addition to NO_PARALLEL, use the OPT_PARAM hint to set the optimizer parameter PARALLEL_EXECUTION_ENABLED to FALSE at statement level or disabling PARALLEL QUERY on session level.
- Furthermore note that the PARALLEL and PARALLEL_INDEX hints merely tell the optimizer to evaluate the costing of a plan with the determined parallel degree. If the optimizer however finds a serial plan with a lower cost, it will prefer that serial execution plan despite the usage of the parallel hints. This even applies to a session "forced" to use parallel query via ALTER SESSION FORCE PARALLEL QUERY.
- Remember that with DML / CTAS DDL there are actually (at least) two potential parallel parts involved: The Create Table/DML part and the query part (except for a single row INSERT INTO...VALUES DML). Both parts independently from each other can be performed in parallel or serial, so you could end up with:
- Serial DDL/DML + serial query
- Serial DDL/DML + parallel query
- Parallel DDL/DML + serial query
- Parallel DDL/DML + parallel query
Whether some combinations make sense or not is a different question, however from a technical point of view all of them are possible.
You therefore need to carefully check which part you want to execute in parallel and control it accordingly (and remember the fact that parallel DML needs to be explicitly enabled in the session). Check the (actual) execution plan to ensure you get the desired parallel execution.
Another way to look at the latency times from Envisioning NFS performance is looking at the different layers of the stack we go through.
On the last post, I instrumented the latency data collection at the TCP layer on the NFS server, but there are a lot of other layers.
The are many layers to analyze. I can analyze the NFS server more easily, since at my work, the NFS server is always running Open Solaris thus I have access to dtrace to analyze the latencies. The latencies in the previous blog were at the TCP level on the NFS Server, but that is just one point in the whole stack.
My goal here is to tune to make sure the server is to make sure the NFS server is responding quickly and to detect if there are any network or client issues. I have access to dtrace on the server but the clients can be anything so I have limited access to data from the clients.
Thus my first step is to instrument data collection on the NFS server to make sure it is responding quickly.
Here’s a link to a truly ambitious document on Metalink (if you’re allowed to log on):
Doc ID 421191.1: Complete checklist for manual upgrades of Oracle databases from any version to any version on any platform
(Actually it only starts at v6 – but I don’t think there are many systems still running v5 and earlier).
Yesterday was my sister’s funeral. The easter holidays and the imminent royal wedding caused some serious delays, so rather than having the funeral within about a week of her death, it ended up being just short of three weeks. Far too long, but it was out of our control so you’ve just got to get on with it.
I was pretty nervous during the day. I had organized most of the stuff and I was just waiting for something to go spectacularly wrong. Fortunately it all went to plan, which was a relief. We are not what I would consider a religious family and I find it more than a little hypocritical that, like many Brits, we roll out religion for births, marriages and deaths, but it seems to be what people want at these times and who am I to argue? We ended up having a Church of England service at a crematorium. The Reverend was a really cool guy (he has a WordsPress blog ) and did a cracking job. Everyone was really pleased with the way it went.
After the service we had a wake at a pub close to my sister’s house. My family’s response to most things is to talk crap and laugh at ourselves. Not surprisingly, we reverted to type pretty quickly, which was good to see. It turns out my cousin might be going to work at the funeral directors we used, which prompted comments like, “If you had got your arse in gear we could have got a discount on the funeral!” etc. Like I said, we talk crap and laugh at ourselves…
… and so ends another chapter…
Nearly forgot. We asked people to donate money to Cancer Research UK, rather than buy flowers. That went really well.