oracle aide

April 12, 2015

Pyramids and leverage

Filed under: Uncategorized — oracleaide @ 7:30 pm

Leverage is how consulting firms make money.

Tall pyramids. Low leverage.

Flat pyramids.  High Leverage.

Finders, Minders, and Grinders.



March 31, 2015

What I learned today: Digital Taylorism and Holacracy

Filed under: Uncategorized — oracleaide @ 11:19 pm

there is increasing evidence that “new forms of bureaucratic control and repetitive tasks have been extended to the information sector”- or Digital Taylorism

Or there is this view – there is a high road and a low road that will be followed:

The high road variant can also be associated with the high-trust, high performance firm. Its main features are: decentralisation, creation of comprehensive tasks, establishment of work groups, promotion of competence development and sharing of knowledge as well as interdepartmental co-operation and integrated product development.

The low road type strive to achieve competitiveness through cost-cutting, which among other things expresses itself in staff reduction or outsourcing. For the internal organisation of work this mode means: organisation of work processes according to value creation aspects, acceleration of the processes through the grouping of individual work tasks and activities into business processes, intensification of work, and a tendency to divide staff into a highly qualified core and a low-qualified periphery that are employed to balance out capacity fluctuations.

The starkest portrait of Work to Come is this one – that the future of work, for many people, will be them strapped to an automated digital workflow, continuously prodded and monitored while doing the tasks that machines cannot yet do well or cheaply enough.

It’s actually not a new idea. As an employee at one of Hsieh’s Las Vegas start-ups pointed out to me during a visit, 18thcentury Caribbean pirates created some of the flattest organizational structures in history — long before the theorists Rensis Likert and Stanley Udy touched on such notions in their foundational works in the 1950s and 1960s. The pirates democratically elected captains and other officers (and had the ability to depose them at any time), shared treasure equally, had extensive welfare plans for injured colleagues, and wrote their own constitutions (“ships’ articles”) to permit people from diverse nationalities, races, and religions to collaborate successfully. Their “circles” were vessel teams. Though members came and went, what happened in one crew stayed with that crew. Transparency was limited to those on the vessel at any given time.

This system existed across nearly all the pirate vessels. Holacracy isn’t that common in today’s organizations, but it’s gaining traction, which makes sense. In an age of transparency, people will grab whatever bits of privacy they can find to experiment without retribution. This is an approach that allows organizations to adaptively support them rather than thwart their efforts.

March 5, 2015

Amazon is looking for Data Engineers

Filed under: Uncategorized — oracleaide @ 6:59 pm

Senior Data Engineer – Featured Merchant Algorithm

Amazon’s planet-scale Retail platform has created the largest marketplace in human history. Affording customers unprecedented product selection and merchants access to a global market, our team leverages sophisticated machine learning and big data technologies to allow customers to discover the right product at the right price from the most trusted merchant billions of times every day. We enable over 90% of all purchases on Amazon by choosing which offer wins the Detail Page Buy Box and our service directly impacts the business metrics of every Amazon channel (Retail, FBA and Marketplace) worldwide. If you’re looking for a career-defining opportunity on one of the most visible teams within Amazon, we’d love to hear from you.

Our success depends on our ability to manage and analyze the data that our customers generate. We are looking for an outstanding engineer with a great business sense who has the ability to analyze and understand large amounts of data and help make data-driven key strategic decisions that will drive several customer focused initiatives. Working with our science, engineering and multiple business teams, you will have the opportunity to impact customer experience, design, architecture, and implementation of multiple customer friendly features on Amazon.

In this position, you will be working in one of the world’s largest and most complex data warehouse environments. You should be passionate about working with huge datasets and be someone who loves to bring datasets together to answer business questions. You should have deep expertise in creation and management of datasets and the proven ability to translate the data into meaningful insights. In this role, you will have ownership of end-to-end development of solutions to complex questions, and you’ll play an integral role in strategic decision-making.

The right candidate will possess excellent business and communication skills, be able to work with business owners to develop and define key business questions, and be able to build analyses to answer those questions. You will have regular interactions and will present to senior leaders in Amazon

Basic Qualifications

  • Bachelor’s or Master’s degree in Computer Science Mathematics, Statistics, Finance or related technical field.
  • 4+ years of relevant employment experience.
  • Knowledge and direct experience using at least one industry standard business intelligence reporting tool.
  • Experience in gathering requirements and formulating business metrics for reporting.
  • Excellent knowledge of Oracle SQL and Excel.
  • Experience using SQL, ETL and databases in a business environment with large-scale, complex datasets.
  • Strong verbal/written communication & data presentation skills, including an ability to effectively communicate with both business and technical teams.

Preferred Qualifications

  • Previous E-Commerce Experience
  • Preferred Experience managing scorecards, metrics dashboards

January 4, 2015


Filed under: Uncategorized — oracleaide @ 6:06 pm

This is the word that comes to mind while reading the venerable Dr R.Kimball.  With his flamboyant style he wouldn’t last a week at my job. It is too bad that OpenAmplify removed its free web app. It would be interesting to run his text through their API and see the scores.

It looks like good tech writing should work really well with be conducive to knowledge extraction into RDF, and, consequently, knowledge exploration via SPARQL or OWL.  Which begs the question: “Should we write for humans or for machines?”. From my observations, if machines understand a piece of text, then humans will definitely do.

Change velocity – for code and data

Filed under: Uncategorized — oracleaide @ 5:53 pm

It is a known practice to re-factor code by its change velocity.  Ideally, source code should be resilient to change, and volatile logic should go into a configuration layer (config files or, better, “convention over configuration”).

A similar pattern is known in the DW world. Separation of facts from dimensions is just a single use case of consolidating / grouping / separating data by their change velocity.

Slowly changing dimensions are another example. There are at least two classes of dimensions – static and slowly changing.

Are there fast changing dimensions? Do we call them facts?

December 30, 2014

Slow changing dimensions types 0-4

Filed under: Uncategorized — oracleaide @ 1:02 am

The common enumeration of  SCDs using types 0-7 encodes a single attribute — where the history is stored.

I think there is an obvious pattern for types 0-4 (and emerging mnemonics):

Type 0: history is stored nowhere.

Type 1: history is stored in the dimension itself, in a single current row (history with length of 1 means – no history).

Type 2: history is stored in the dimension itself, in extra rows (history length is 2+)

Type 3: history is stored in the dimension itself, in extra columns (something about 3rd dimension?)

Type 4: history is stored in a separate table.

December 28, 2014

Analysis of time series in RDBMS

Filed under: Uncategorized — oracleaide @ 6:33 pm

Summary: pivoted time series allow trade-offs between speed, space, convenience, scalability.

Many scientists and analysts (a.k.a. humans) visualize time series horizontally, where the time axis goes from left to right and values — parallel to it. The series array is often sparse, e.g. we have no data point for January 2, but have to allocate an array element, so January 1 and January 3 are two days apart. In terms of RDBMS: the time information is stored in columns (even column names), values are stored in rows, which is worse since columns are static and defined via DDL.


This is approach is intuitive and friendly to humans, but not to databases.
Sparse data do waste space. In the world of databases wasted space = wasted time.

Dates as columns are rigid and require developers to hard-code dates in analytic SQL.

An alternative approach is to pivot the model: the time axis goes to a single column, values – to separate columns too.

Every row corresponds to a single point in time and contains values from all the column (e.g. forecast at P50, P80)

The data become dense. Since dates are not a part of the model (no fixed columns) – there is no need to keep empty rows for missing data. The absence of hard-coded dates make SQL simple, compact and makes it easier to run analysis in a moving window of data range.

January 31, 2014

Auftragstaktik vs Kanbanese

Filed under: Uncategorized — Tags: — oracleaide @ 12:25 am

Shouldn’t German tactics work better than Oriental methodologies? 

At least in case of “culturally American” developers?

In such a case the next logical step should be – implement a Blitzkrieg methodology.


December 14, 2013

Top-N with group by in Hive – without analytic functions or ranking UDFs

Filed under: Uncategorized — Tags: — oracleaide @ 2:31 am

Since Hive 0.11 and its analytic functions are not available for my current project I have to resort to simple remedies.

Such as a massive union of top-N queries:

select * from (

select * from (select * from ( select * from v_diff where group_id = 1 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 2 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 3 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 4 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 5 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 6 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 7 ) aa order by abs(diff) desc ) bb limit 100
union all

) uu;


Where v_diff is a simple view comparing two partitions in a single table:

desc v_diff;

group_id string

diff float

Contrary to my expectations – it took a while to complete the query.

According to job tracker, all the individual select statement went into a single queue.

A small change in the hive session settings made a difference. 

set hive.exec.parallel=true;

set hive.exec.parallel.thread.number=16;

With parallel execution turned ON – Hadoop launched 16 queries in parallel and completed the whole select much faster. 

Something I would take for granted in an Oracle database with parallel execution. 


November 3, 2013

It is not Big Data, it is Slow Data ©

Filed under: Uncategorized — oracleaide @ 10:27 pm

Just sayin’…

For an average human it is hard to fathom the volume of data he deals with.  

The notion of “a lot of data” changes with Moore’s law and highly subjective: from a stack of punch cards to a rack of hard drives.

A gigabyte used to mean “a lot of data”.

Not anymore.

What  an average human could fathom is his personal perception of how fast it takes to process data. 

Thus, while working with Hadoop based technologies I couldn’t help noticing — how long it takes to process small samples — comparing to relational databases.

Which is the small (but annoying) price to pay for the overwhelming speed of processing “a lot of data”.

That is why it is critical to run pig in local mode when going through tutorials.

pig -x local

« Newer PostsOlder Posts »

Create a free website or blog at