Today , I am going to share my thoughts on a series of posts for big data system. Lets get familiar with data terminologies so that upcoming posts would be helpful for us.
Below are few simple terminologies related to big data technology.
Data: Anything which resides in digital/binary format in computer system can be called as data. It really does not matter whether we get some useful information or not.
Information: Processed (meaningful data) data is called information. It means any data which can give us any conclusion and through which we can take decision to drive our business is called information.
Structured Data: Data which has been stored in structured or uniform way generally in relation DBMS or in spreadsheet is called structured data. So they are also known as Relational Data as well. Before storing structured data we first create a data model or a plan that how do we want to store our data. Consider an example of excel spreadsheet where first of all we create a template for columns then define the types of data that field will contain, any restriction or validation point if we want so that only correct and accurate information can be entered by user.
Advantage: We can enter, store, query and analyze the structured data without any overhead. It is easy to understand and take decision. These structured data can easily be stored and retrieved by SQL(structured query language.
Unstructured Data: It is type of data that can not be stored in a tradition structured way i.e raw column cased structure.
Example: Example for unstructured data can be text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. many experts says that around 80-90% of the data in any organization is unstructured. And the amount of unstructured data in enterprises is growing significantly, often many times faster than structured databases are growing.
Unstructured Data Management: Organizations use of variety of different software tools to help them organize and manage unstructured data.
These can include the following:
Big data tools: Software like Hadoop can process stores of both unstructured and structured data that are extremely large, very complex and changing rapidly.
Business intelligence software: Also known as BI, business intelligence is a broad category of analytics, data mining, dashboards and reporting tools that help companies make sense of their structured and unstructured data for the purpose of making better business decisions.
Data integration tools: These tools combine data from disparate sources so that they can be viewed or analyzed from a single application. They sometimes include the capability to unify structured and unstructured data.
Document management systems: Also called enterprise content management systems, a DMS can track, store and share unstructured data that is saved in the form of document files.
Information management solutions: This type of software tracks structured and unstructured enterprise data throughout its life cycle.
Search and indexing tools: These tools retrieve information from unstructured data files such as documents, Web pages and photos.
Hope this post gives you very basic idea of data and their type. Your ideas and feedback are always welcome.