Friday, 3 April 2015

somebody struggle to start Big-data Hadoop. This blog help to new learner..

BIG DATA


Big Data refers to data sets whose size are beyond the ability of typical database software tools to capture, store, manage and analyse.
 “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or does not fit the structures of existing database architectures. To gain value from these data, there must be an alternative way to process it.”
Characteristics of Big Data
Big Data is not just about the size of data but also includes data variety and data velocity. Together, these three attributes form the three Vs of Big Data.



Volume is synonymous with the “big” in the term, “Big Data”. Volume is likely to have mere gigabytes or terabytes of data storage as opposed to the petabytes or Exabyte of data that big global enterprises have.
variety of sources and in a variety of types. With the explosion of sensors, smart devices as well as social networking, data in an enterprise has become complex because it includes not only structured traditional relational data, but also semi-structured and unstructured data.
Velocity  speed.

Structured data: 
          Ordinary data. Like text, number , date, time

Semi-structured data:
        xml, text file, xls doc like that.

Unstructured data:
           This type of data consists of formats which cannot easily be indexed into relational tables for analysis or querying. Examples include images, audio and video files.



Why is Big Data important?
                
                   The convergence across business domains has ushered in a new economic system that is redefining relationships among producers, distributors, and consumers or goods and services. In an increasingly complex world, business verticals are intertwined and what happens in one vertical has a direct impact on other verticals. Within an organisation, this complexity makes it difficult for business leaders to rely solely on experience to make decisions. They need to rely on good data services for their decisions. By placing data at the heart of the business operations to provide access to new insights, organisations will then be able to compete more effectively.


Three things have come together to drive attention to Big Data:

1. The technologies to combine and interrogate Big Data have matured to a point where their deployments are practical.
2. The underlying cost of the infrastructure to power the analysis has fallen dramatically, making it economic to mine the information.
3. The competitive pressure on organisations has increased to the point where most traditional strategies are offering only marginal benefits. Big Data has the potential to provide new forms of competitive advantage for organisations.

Proliferation of the Internet of Things:
                      According to Cisco's Internet Business Solutions Group (IBSG)13, 50 billion devices will be connected to the Web by 2020. Meanwhile, Gartner reported that more than 65 billion devices were connected to the internet by 2010. By 2020, this number will go up to 230 billion.

Strong open source initiatives:

                      Many of the technologies within the Big Data ecosystem have an open source origin, due to participation, innovation and sharing by commercial providers in open source development projects. The Hadoop framework, in conjunction with additional software components such as the open source R language and a range of open source Not Only Structured Query Language tools such as Cassandra and Apache HBase, is the core of many Big Data discussions today.
Increasing investments in Big Data technologies

                       Information has always been a differentiator in the business world, allowing better business decisions to be made in an increasingly competitive landscape. Previously, market information was largely made available through traditional market research and data specialists. Today, virtually any company with a large datasets can potentially become a serious player in the new information game. The value of Big Data will become more apparent to corporate leadership as companies seek to become more “data-driven” organisations.

                        A Big Data Insight Group survey of senior personnel from a broad range of industry sectors revealed that many organisations are seeing Big Data as an important area for their organisations. Among the respondents, 50% indicated current research into, and sourcing of, Big Data solutions while another 33% acknowledged that they were implementing or had implemented some form of Big Data solutions. This survey indicates that many organisations perceive Big Data as an important development and this interest could translate into future demand for Big Data technologies.


Research and development involving high-performance computing:


                       Research and Development that involves data-intensive workloads, such as high-performance computing, life sciences and earth sciences, find value in Big Data technologies. For example, at CERN, the Swiss research laboratory outside Geneva, physicists studying the results of tests at the Large Hadron Collider have to decide how to store, process and distribute the usable information accruing from the petabytes of data generated annually.


                      Big Data technologies have to be in place to support such R&D efforts because they have to support the growth of digital content and enable more efficient analysis outputs. Traditional technologies such as symmetric multiprocessing which also enables system scalability can be prohibitively expensive for many granular R&D use case scenarios. Hence, the need for cost-efficient scalable hardware and software resources to process related business logic and data volumes becomes more apparent than before.

Next Hadoop.....