somebody struggle to start Big-data Hadoop. This blog help to new learner..
BIG DATA
Big
Data refers to data sets whose size are beyond the ability of typical database
software tools to capture, store, manage and analyse.
“Big data is data that exceeds the processing
capacity of conventional database systems. The data is too big, moves too fast,
or does not fit the structures of existing database architectures. To gain
value from these data, there must be an alternative way to process it.”
Characteristics of Big Data
Big
Data is not just about the size of data but also includes data variety and data
velocity. Together, these three attributes form the three Vs of Big Data.
Volume
is synonymous with the “big” in the term, “Big Data”. Volume is likely to have
mere gigabytes or terabytes of data storage as opposed to the petabytes or
Exabyte of data that big global enterprises have.
variety of
sources and in a variety of types. With the explosion of sensors, smart devices
as well as social networking, data in an enterprise has become complex because
it includes not only structured traditional relational data, but also semi-structured
and unstructured data.
Velocity speed.
Structured data:
Ordinary data. Like text, number , date, time
Semi-structured data:
xml, text file, xls doc
like that.
Unstructured data:
This type of data consists of formats which cannot easily be indexed
into relational tables for analysis or querying. Examples include images, audio
and video files.
Why is Big Data
important?
The convergence across business domains has
ushered in a new economic system that is redefining relationships among
producers, distributors, and consumers or goods and services. In an
increasingly complex world, business verticals are intertwined and what happens
in one vertical has a direct impact on other verticals. Within an organisation,
this complexity makes it difficult for business leaders to rely solely on
experience to make decisions. They need to rely on good data services for their
decisions. By placing data at the heart of the business operations to provide
access to new insights, organisations will then be able to compete more
effectively.
Three things have come together to drive attention to Big
Data:
1. The technologies to combine and interrogate Big Data have
matured to a point where their deployments are practical.
2. The underlying cost of the infrastructure to power the
analysis has fallen dramatically, making it economic to mine the information.
3. The competitive pressure on organisations has increased to
the point where most traditional strategies are offering only marginal
benefits. Big Data has the potential to provide new forms of competitive
advantage for organisations.
Proliferation of the
Internet of Things:
According to Cisco's Internet Business Solutions Group (IBSG)13, 50
billion devices will be connected to the Web by 2020. Meanwhile, Gartner
reported that more than 65 billion devices were connected to the internet by
2010. By 2020, this number will go up to 230 billion.
Strong open source
initiatives:
Many of the technologies within the Big Data ecosystem have an open
source origin, due to participation, innovation and sharing by commercial
providers in open source development projects. The Hadoop framework, in
conjunction with additional software components such as the open source R
language and a range of open source Not Only Structured Query Language
tools such as Cassandra and Apache HBase, is the core of many Big Data
discussions today.
Increasing investments
in Big Data technologies
Information has always been a differentiator in the business world,
allowing better business decisions to be made in an increasingly competitive
landscape. Previously, market information was largely made available through
traditional market research and data specialists. Today, virtually any company
with a large datasets can potentially become a serious player in the new
information game. The value of Big Data will become more apparent to corporate
leadership as companies seek to become more “data-driven” organisations.
A Big Data Insight Group survey of senior personnel from a broad range
of industry sectors revealed that many organisations are seeing Big Data
as an important area for their organisations. Among the respondents, 50%
indicated current research into, and sourcing of, Big Data solutions while
another 33% acknowledged that they were implementing or had implemented some
form of Big Data solutions. This survey indicates that many organisations perceive
Big Data as an important development and this interest could translate into
future demand for Big Data technologies.
Research and
development involving high-performance computing:
Research and Development that involves data-intensive workloads, such as
high-performance computing, life sciences and earth sciences, find value in Big
Data technologies. For example, at CERN, the Swiss research laboratory outside
Geneva, physicists studying the results of tests at the Large Hadron Collider
have to decide how to store, process and distribute the usable information
accruing from the petabytes of data generated annually.
Big Data technologies have to be in place to support such R&D
efforts because they have to support the growth of digital content and enable
more efficient analysis outputs. Traditional technologies such as symmetric
multiprocessing which also enables system scalability can be prohibitively
expensive for many granular R&D use case scenarios. Hence, the need for
cost-efficient scalable hardware and software resources to process related
business logic and data volumes becomes more apparent than before.
Next Hadoop.....