Hadoop usage most typically begins with the desire to create new analytic applications fueled by data that was not previously being captured. While the specific application will be invariably unique to an industry, or organization, there are many similarities between the types of data.

Common types of Big Data:

    • Social Media Data: Win customers' hearts: With Hadoop, you can mine Twitter, Facebook and other social media conversations for sentiment data about you and your competition, and use it to make targeted, real-time, decisions that increase market share.
    • Server Log Data: Fortify security and compliance: Security breaches happen. And when they do, your server logs may be your best line of defense. Hadoop takes server-log analysis to the next level by speeding and improving security forensics and providing a low cost platform to show compliance.
    • Web Clickstream Data: Show them the way: How do you move customers on to bigger things - like submitting a form or completing a purchase? Get more granular with customer segmentation. Hadoop makes it easier to analyze, visualize and ultimately change how visitors behave on your website.
    • Machine and Sensor Data:Gain insight from your equipment: Your machines know things. From out in the field to the assembly line floor - machines stream low - cost, always-on data. Hadoop makes it easier for you to store and refine that data and identify meaningful patterns, providing you with the insight to make proactive business decisions.
    • Geolocation Data: Profit from predictive analytics: Where is everyone? Geolocation data is plentiful, and that's part of the challenge. The costs to store and process voluminous amounts of data often outweigh the benefits. Hadoop helps reduce data storage costs while providing value driven intelligence from asset tracking to predicting behavior to enable optimization.

    Industry use cases of Hadoop

    Advertisers use Apache Hadoop for confident Advertising & Promotion

    Consumers have never generated so much data on how they research, discuss and buy products. This new data is valuable for shaping and promoting a brand or product, but it doesn't line up neatly to fit in pre-defined, tabular formats. Apache Hadoop brings this "new" data under analysis, by ingesting social media, clickstream, video and transaction data without requiring a pre - defined data schema. This new data can be joined with existing structured data sets for deeper sentiment analysis and targeted promotion.

    Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers

    Interactive query with the Stinger Initiative and Apache Hive running on YARN help the company rapidly process terabytes of data to keep pace with a market that changes by the day. Manufacturers, retailers, and ad agencies use the combined analysis to position their brands or improve their retail experiences, particularly for high-value customers.

    Financial companies use Hadoop

    Banks, insurance companies and securities firms that store and process huge amounts of data in Apache Hadoop have better insight into both their risks and opportunities. Deeper analysis and insight can improve operational margins and protect against one-time events that might cause catastrophic losses.

    Screen New Account Applications for Risk of Default

    Apache Hadoop can store and analyze multiple data streams and help regional bank managers control new account risk in their branches. They can match banker decisions with the risk information presented at the time of decision. This allows them to control risk by sanctioning individuals, updating policies, and identifying patterns of fraud. Over time, the accumulated data informs algorithms that may detect subtle, high-risk behavior patterns unseen by the bank's risk analysts.

    Monetize Anonymous Banking Data in Secondary Markets

    Retail banks have turned to Apache Hadoop as a common cross-company data lake for data from different LOBs: mortgage, consumer banking, personal credit, wholesale and treasury banking. Both internal managers and consumers in the secondary market derive value from the data. A single point of data management allows the bank to operationalize security and privacy measures such as de-identification, masking, encryption, and user authentication.

    Public Sector Use Hadoop for Efficient Government & national Defense

    The public sector is charged with protecting citizens, responding to constituents, providing services and maintaining infrastructure. In many instances, the demands of these responsibilities increase while government resources simultaneously shrink under budget pressures. How can government, defense and intelligence agencies and government contractors do more with less? Apache Hadoop is part of the answer. The open source Apache Hadoop framework is philosophically aligned with the transparency we expect from good government.

    The following is a list of some of ways public sector customers use Hadoop.

    • Understand Public Sentiment about Government Performance

    One federal ministry in a European country wanted to better understand the views of its constituents related to a major initiative to reduce obesity. Direct outreach for feedback might have been effective for a few high-quality interactions with a small number of citizens or school age children, but those methods lacked both reach and persistence.So the Ministry started analyzing social media posts related to its program to reduce obesity. Every day, a team uses Hadoop to analyze tweets, posts and chat sessions and give daily sentiment reports to members of parliament for rapid feedback on which polices work and which flop.

    • Protect Critical Networks from Threats (Both Internal and External

    Large IT networks generate server logs with data on who accesses the network and the actions that they take. Server log data is typically seen as exhaust data, characterized by a "needle - in - a-haystack" dilemma: almost all server logs have no value, but some logs contain information critical to national defense. The challenge is to identify actual risks amongst the noise, before they lead to loss of classified information. Now intruders plan long - term, strategic campaigns referred to as "Advanced Persistent Threats" (APTs). Both internal actors like Edward Snowden or external attackers in foreign governments conduct sophisticated, multi-year intrusion campaigns. Hadoop's processing power makes it easier to find the "needles" left by these intruders across the different data "haystacks".

    Healthcare uses Hadoop to save lives while delivering more efficient care

    Difficult challenges and choices face today's healthcare industry. Hospital administrators, technology and pharmaceutical providers, researchers, and clinicians have to make important decisions - often without sufficient accurate, transparent data.

    Here are some ways that Hadoop makes data less expensive and more available, so that patients have more choices, doctors have more insight, and pharmacy and device manufacturers can deliver more effective, reliable products:

    • Monitor Patient Vitals in Real-Time

    New wireless sensors can capture and transmit patient vitals at much higher frequencies, and these measurements can stream into a Hadoop cluster. Caregivers can use these signals for real-time alerts to respond more promptly to unexpected changes. Over time, this data can go into algorithms that proactively predict the likelihood of an emergency even before that could be detected with a bedside visit.

    • Store Medical Research Data Forever

    Medical and scientific researchers at universities live by the "publish or perish" code. Data supporting a given paper used to be appended in an Excel spreadsheet, but many of today's data sets are just too large. Nevertheless, supporting data sets must be perpetually available in association with its paper. If the data disappears, the paper becomes unsubstantiated.

    Manufactures use Apache Hadoop to Increase Production, Reduce Costs & Improve Quality

    Manufacturing managers try to do three basic things: increase volume, reduce cost and improve quality. Without the right technology, these goals might seem to conflict. For example, how do we produce a better product while also reducing the cost to produce each unit? Now relatively inexpensive sensors can gather and frequently transmit data along many points in the supply chain and production line. This flow of real-time sensor and machine data allows manufacturers to quickly identify problems as they occur.

    • Control Quality with Real-Time & Historical Assembly Line Data

    When a product is returned with problems, the manufacturer can do forensic tests on the product and combine the forensic data with the original sensor data from when the product was manufactured. This added visibility, across a large number of products, helps the manufacturer improve the process and product to levels not possible in a data-scarce environment

    • Avoid Stoppages with Proactive Equipment Maintenance

    Machine learning algorithms can compare maintenance events and machine data for each piece of equipment to its history of malfunctions. These algorithms can derive optimal maintenance schedules, based on real-time information and historical data. This can help maximize equipment utilization, minimize P&E expense, and avoid surprise work stoppages.


    We have tried to include many examples of Big Data. Let us know if you are aware of some more examples, we will be happy to include those as well.

    To gain more insight about Big Data and Hadoop please check out our course "Advanced Big Data and Hadoop course"