A question that keeps popping up is “Should we use Kafka or Flume to load data to Hadoop clusters?”
This question implies that Kafka and Flume are interchangeable components. It makes as much sense to me as “Should we use cars or umbrellas?”. Sure, you can hide from the rain in your car and you can use your umbrella when moving from place to place. But in general, these are different tools intended for different use-cases.
Flume’s main use-case is to ingest data into Hadoop. It is tightly integrated with Hadoop’s monitoring system, file system, file formats, and utilities such a Morphlines. A lot of the Flume development effort goes into maintaining compatibility with Hadoop. Sure, Flume’s design of sources, sinks and channels mean that it can be used to move data between other systems flexibly, but the important feature is its Hadoop integration.
Kafka’s main use-case is a distributed publish-subscribe messaging system. Most of the development effort is involved with allowing subscribers to read exactly the messages they are interested in, and in making sure the distributed system is scalable and reliable under many different conditions. It was not written to stream data specifically for Hadoop, and using it to read and write data to Hadoop is significantly more challenging than it is in Flume.
Use Flume if you have an non-relational data sources such as log files that you want to stream into Hadoop.
Use Kafka if you need a highly reliable and scalable enterprise messaging system to connect many multiple systems, one of which is Hadoop.
I’ve been a little quiet on the blog lately, mostly because I’ve been spending my time freaking out about the arrangements for the OTN Yathra 2014 tour of India. I had to apply for a visa, which in itself was not too bad, but I spent quite a long time without a passport, thanks to the courier not bothering to attempt a delivery. All that time I didn’t know if my visa application had been successful, so I wasn’t sure if I would have to submit it again. By the time that arrived, it was so late that booking the flights became interesting… At one point I wrote to Debra Lilley and said words to the effect of, I think this is just not meant to be!
Well tonight (Tuesday evening) I got my flights confirmed for my trip that I start on Sunday.
This is proving to be quite an expensive trip.
For those that think this Oracle ACE thing is a free ride, you don’t get any of this money back!
I’ve still got stuff to get, like insect repellent. My pastey white skin seems to be very appetising to insects and I am not a stranger to infections when I do get bitten. Fingers crossed I can avoid the worst of it…
Getting to this point has been a bit of a trial. Thanks to everyone who has helped. Murali put together a great pack to help the speakers plan the trip and Lillian and Vikki from the Oracle ACE Program have done a great job pushing through my last minute travel bookings!
Normally I’m doing the thanks when the events are over.
Let’s hope everything from now on goes a little smoother than it has up until now!
So I survived one more year as the conference director for RMOUG Training Days 2014! There was some question to my survival as we entered the week before the conference, but I can say, I’m surprisingly intact and once I finish this post, it’s onto the next challenge!
If you’re unfamiliar with the conference, Rocky Mountain Oracle User Group’s Training Days has been an incredibly well supported conference by many of the best Oracle speakers, Oak Table members and ACE/ACED in the world. I’m very proud to be a member, a board of director, as well as the conference director for a different number of years, depending on the role… This is my second year as conference director and I’ve really enjoyed the challenge.
This year there were a number of additions and enhancements to the conference. These included and weren’t limited to…
1. Exchanged the first half day ”University Sessions” which always were offered as a additional fee, with “deep dive sessions” which were now included in all full registration passes. The added benefit of these is that they decreased added electrical and AV costs from the convention center, saving us a lot of money in our budget.
2. Single day passes were now available for those that were previously unable to take off two 1/2 days for a conference. These passes were only good for the single, full day they were purchased for and no access to the deep dive sessions, but the percentage of registrations showed that they are something our members were thrilled to take advantage of. The full registration was still the better value, but many had told us, it wasn’t the cost of our conference- we have the best bang for the buck anywhere when it came to price- it was getting the time away from work to attend.
3. Extend the ACE lunches to both days and reserve a seat for the ACE/ACED that the table was for. Yeah, yeah- not fun to come to your table and not have a chair to sit down in. I had wanted a red banner or napkin placed on the chair with a reserved sign, but hey, we’re getting there on how this RMOUG created opportunity for the attendees to speak with their favorite ACE/ACED!
4. OTN and RAC Attack- Laura Ramsey rocked the house with the help of RAC Attack SIG members Bjoern Rost, Bobby Curtis, Maaz Anjum, Javier Ruiz and Leighton Nelson. These guys made sure that the tables were manned at all times and folks had the help they needed to build a great DB12c RAC of their own!
5. First Evening Welcome Reception- We understood clearly that our members were often driving home and weren’t always interested in having drinks, so we added a coffee bar and will continue to enhance this event to give people the best experience. I noted this year that the new setup for the exhibitor area really seemed to keep the networking flowing and everyone had a great time.
6. Speakers- Our speakers are number 1 top knotch and we ensure this by using a large review team, close to 50 individuals, then use this rating system to score abstracts. We do not limit any speaker opportunities by who they work for or how many are from what company, but we do limit how many top presentations are in per track and try to give new speakers a chance. If we can’t at the regional user group level, how will you ever get a chance to hear these future great technical speakers elsewhere?
So if you’re curious how this works, we have trimmed down our conference after last year realizing that our days were simply too long and that we could either extend out another day, (not an option, remember, we’re having trouble with folks getting even 2 1/2 days off…) or could trim, so I went from 142 sessions down to 100 with shortened days and less rooms.
I build the conference by the percentage of each track that abstracts are submitted, so yes, our conference has the percentage of sessions you see for each of the tracks you view below in the spread sheet. We had to choose 100 presentations from 343 abstracts submitted.
Type: PRESENTATIONAbstract Count: 343
|DBA Deep Dive||52||15.2|
|Database Tools of the Trade||30||8.7|
This ended up equating to most folks getting 1 accepted presentation each, with maybe a second one if they were considered a top speaker and had the scores to prove it. With the ratings, you always know that speakers like Tom Kyte and Jonathan Lewis are going to have more that are in that top 5.0 rating, so you need to take their top 2 and simply understand, you can’t take more of their presentations or you won’t be able to give others a chance to speak. I then come back in after the track leads have built their tracks out of the # of abstracts they are allowed that equate to the percentage you see above and “pepper” in the new speakers that we want to ensure to give a chance to. Saying no is never easy and yet considering we are only able to say yes to less than one of every three abstracts, it still happens and it’s a difficult choice to decide who stays and who goes in the schedule.
7. More Guidebook! This mobile app was a hit last year and so we’ve enhanced some of the features and are still working on all the aspects we would like to see in the application. The social media activity for this conference, especially on Twitter was incredible! I had so may people emailing me and asking how they could ensure on attending next year just on the feedback via social media!
8. Introduced RMOUG’s virtual WIT. We now are going virtual with the WIT program as the first SIG to meet virtually. We tried an onsite meeting each month or so and it was just too difficult for many to attend. The Lean In and Mightbell site have offered us a great virtual location, along with Google Hangouts to have our monthly meetings! If interested in signing up, just click here to check it out!
Ideas for Next Year
1. A Virtual track or two that people can register for a lesser fee and attend. This would take some planning, as anyone who works with virtual attendance software knows, it can have some surprising challenges.
2. Electronic evaluations. I hate paper- not sure about you, but I would love to “gamify” it and have it anonymously submit the evaluation, but register the attendee into a drawing for every session he does an evaluation for, (max of how many sessions per day…) towards a raffle for a great prize like an IPad or similar. The amount of work this would eliminate would be fantastic. Now I just need to find the folks that can build this for me into our conference app…
3. More contests via Twitter and Facebook for during the conference. Commonly I’m just toast by the time the conference comes around, so as it goes on, I’m unable to do these types of things, but I wanted to do a “picture with the convention center bear” contest and “best conference gif award” or “best conference lip sync” but never got to it….next year, next year…
I can’t share too much yet, as the conference just ended, but I can tell you that our numbers were up, attendees and speakers feedback has been fantastic so far! I’m content that we are headed in the right direction to continue RMOUG Training Days as the conference that all others model themselves after!
A special thanks goes out to all the hard work and support to Team YCC and the Training Days Conference Committee. I couldn’t do it without my peeps!
At one of the presentations I attended at RMOUG this year the presenter claimed that if a row kept increasing in size and had to migrate from block to block as a consequence then each migration of that row would leave a pointer in the previous block so that an indexed access to the row would start at the original table block and have to follow an ever growing chain of pointers to reach the data.
This is not correct, and it’s worth making a little fuss about the error since it’s the sort of thing that can easily become an urban legend that results in people rebuilding tables “for performance” when they don’t need to.
Oracle behaves quite intelligently with migrated rows. First, the migrated row has a pointer back to the original location and if the row has to migrate a second time the first place that Oracle checks for space is the original block, so the row might “de-migrate” itself; however, even if it can’t migrate back to the original block, it will still revisit the original block to change the pointer in that block to refer to the block it has moved on to – so the row is never more than one step away from its original location. As a quick demonstration, here’s some code to generate and manipulate some data:
create table t1 ( id number(6,0), v1 varchar2(1200) ) pctfree 0 ; prompt ========================================== prompt The following code fits 74 rows to a block prompt ========================================== insert into t1 select rownum - 1, rpad('x',100) from all_objects where rownum <= 75; commit; prompt ====================================== prompt Make the first row migrate and dump it prompt ====================================== update t1 set v1 = rpad('x',400) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',2) prompt =========================================================== prompt Fill the block the long row is now in, force it to migrate, prompt then dump it again. prompt =========================================================== insert into t1 select rownum + 75, rpad('x',100) from all_objects where rownum <= 75; commit; update t1 set v1 = rpad('x',800) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',3) prompt ======================================================== prompt Fill the block the long row is now in and shrink the row prompt to see if it returns to its original block. (It won't.) prompt ======================================================== insert into t1 select rownum + 150, rpad('x',100) from all_objects where rownum <= 75; commit; update t1 set v1 = rpad('x',50) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',3) prompt ======================================================== prompt Make a lot of space in the first block and force the row prompt to migrate again to see if it migrates back. (It does.) prompt ======================================================== delete from t1 where id between 1 and 20; commit; update t1 set v1 = rpad('x',1200) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',3)
My test database was using 8KB blocks (hence the 74 rows per block), and 1MB uniform extents with freelist management. The procedure dump_seg() takes a segment name as its first parameter and a number of blocks as the second (then the segment type and starting block as the third and fourth) and dumps the first N data blocks of the segment. To demonstrate what goes on, I’ve extracted the content of the first row (id = 0) after each of the four dumps:
After the first update - the column count (cc) is zero and the "next rowid" (nrid) is row 1 of block 0x0140000b tab 0, row 0, @0xb3 tl: 9 fb: --H----- lb: 0x2 cc: 0 nrid: 0x0140000b.1 After the second update - the next rowid is row 7 of block 0x0140000c tab 0, row 0, @0xb3 tl: 9 fb: --H----- lb: 0x1 cc: 0 nrid: 0x0140000c.7 After the third update (shrinking the row) the row hasn't moved from block 0x0140000c tab 0, row 0, @0xb3 tl: 9 fb: --H----- lb: 0x2 cc: 0 nrid: 0x0140000c.7 After the fourth update (making space, and growing the row too much) the row moves back home tab 0, row 0, @0x4c1 tl: 1208 fb: --H-FL-- lb: 0x2 cc: 2 col 0: [ 1] 80 col 1:  78 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
My calls to dump blocks included the blocks where the row migrated to, so we’ll have a look at the target locations (as given by the original row location’s nrid) in those blocks over time. First we check block 0x0140000b, row 1 after the first two migrations:
tab 0, row 1, @0x1d7f tl: 414 fb: ----FL-- lb: 0x2 cc: 2 hrid: 0x0140000a.0 col 0: [ 1] 80 col 1:  tab 0, row 1, @0x1d7f tl: 2 fb: ---DFL-- lb: 0x1
After the first migration (the row arrives here) we have a “head rowid” (hrid) pointer telling us where the row came from. After the second migration, when the row has moved on, we simply have a typical “deleted stub” – two bytes reserving the row directory entry until the commit has been done and cleaned out.
Then we can examine the second target (0x140000c, row 7) on the second and third and fourth updates:
tab 0, row 7, @0x1966 tl: 814 fb: ----FL-- lb: 0x2 cc: 2 hrid: 0x0140000a.0 col 0: [ 1] 80 col 1:  tab 0, row 7, @0xb1 tl: 62 fb: ----FL-- lb: 0x1 cc: 2 hrid: 0x0140000a.0 col 0: [ 1] 80 col 1:  tab 0, row 7, @0xb1 tl: 2 fb: ---DFL-- lb: 0x2
As you can see, on arrival this location gets the original rowid as its “head rowid” (hrid), and it knows nothing about the intermediate block where the row was briefly in transit. I’ve copied the length byte (in square brackets) of column 1 in the dumps so that you can see that the row stayed put as it shrank. We can then see on the last update that we are left with a deleted stub in this block as the row migrates back to its original location when we try to extend it beyond the free space in this block.
Migrated rows are only ever one step away from home. It’s not nice to have too many of them, but it’s not necessarily a disaster.
Anvil of God (Book One of the Carolingian Chronicles)
by J. Boyce Gleason brings to mind “Game of Thrones” with a factual
historical setting instead of dragons (although there is a dragon reference in
the extensive footnotes and historical record). Gleason creates a complex and
delightful (at sometimes violent and blunt, at others subtle) set of characters
assigned with plausibility mostly to actual historical figures. As he weaves
his tapestry with fictional details added to the actual historical record you’ll
find yourself rooting and admiring some and hoping the hammer falls on some
others. I doubt you’ll put it down until you take a break to be fresh for the
Author’s Notes. You’ll want to be fresh as Gleason takes you through the
historical references that justify his character choices. Now don’t read the
notes until you finish the rest, but don’t skip them either.
Recently appeared on Mos – “Bug 18219084 : DIFFERENT EXECUTION PLAN ACROSS RAC INSTANCES”
Now, I’m not going to claim that the following applies to this particular case – but it’s perfectly reasonable to expect to see different plans for the same query on RAC, and it’s perfectly possible for the two different plans to have amazingly different performance characteristics; and in this particular case I can see an obvious reason why the two nodes could have different plans.
Here’s the query reported in the bug:
SELECT /*+ INDEX(C IDX3_GOD_USE_LOT)*/ PATTERN_ID, STB_TIME FROM mfgdev.MTR_AUTO_GOD_AGENT_BT C WHERE 1 = 1 AND EXISTS (SELECT /*+ INDEX(B IDX_MTR_STB_LOT_EQP)*/ 1 FROM MFGDEV.MTR_STB_BTH B WHERE B.PATTERN_ID = C.PATTERN_ID AND B.STB_TIME = C.STB_TIME AND B.ACTUAL_START_TIME < SYSDATE AND EXISTS (SELECT /*+ INDEX(D CW_LOTID)*/ 1 FROM F14DM.DM_CW_WIP_BT D WHERE D.LOT_ID = B.LOT_ID AND D.SS = 'BNKI'));
See the reference to “sysdate”. I can show you a system where you had a 15 minute window each day (until the problem was addressed) to optimize a particular query if you wanted a good execution plan; if you optimized it any other time of day you got a bad plan – and the problem was sysdate: it acts like a peeked bind variable.
Maybe, on this system, if you optimize the query at 1:00 am you get one plan, and at 2:00 am you get another – and if those two optimizations occur on different nodes you’ve just met the conditions of this bug report.
Here’s another thought to go with this query: apparently it’s caused enough problems in the past that someone’s written a couple of hints into the code. With three tables and two correlated subqueries in the code a total of three index() hints is not enough. If you’re going to hard-code hints into a query then take a look at the outline it generates when it does the right thing, and that will tell you about the 15 or so hints you’ve missed out. (Better still, consider generating an SQL Baseline from the hinted code and attaching it to the unhinted code.)
In the last tutorial we added a file called Hello.txt in the root of our new repository. In this tutorial we well take a look at what it takes to update and existing file, with new content.
In the last tutorial we added a file called Hello.txt in the root of our new repository. In this tutorial we well take a look at what it takes to update and existing file, with new content.
For those who haven’t looked at this in awhile: these days, it’s dirt simple to attach a file to your SR directly from the server command line.
curl –T /path/to/attachment.tgz –u "firstname.lastname@example.org" "https://transport.oracle.com/upload/issue/0-0000000000/"
Or to use a proxy server,
curl –T /path/to/attachment.tgz –u "email@example.com" "https://transport.oracle.com/upload/issue/0-0000000000/" -px proxyserver:port -U proxyuser
There is lots of info on MOS (really old people call it metalink); doc 1547088.2 is a good place to start. There are some other ways to do this too. But really you can skip all that, you just need the single line above!