Integrated Big data-based Decision Support System
The term Big Data refers to a terminology widely used nowadays to designate very big amounts of data with high-speed generation and wide variety of formats. Big Data are collections of datasets so large and complex to process using classical database management tools.
Big Data can be characterized along five important dimensions, namely volume, velocity, variety, veracity and value known as 5Vs. Although these characteristics accentuate heterogeneity problems, users usually look for a unified view of the data available from heterogeneous data sources.
Big Data integration is a new research area that faces new challenges due to the aforementioned characteristics.
– Volume means that the quantities of data are larger than those conventional relational database infrastructures can cope with. Data are spread in large volumes ranging from gigabytes to terabytes, petabytes and even more,
– Velocity refers to the speed at which the data are generated. Data are always generated in an unprecedented speed and must be dealt with in a timely manner.
– Variety refers to the number of sources and types of both structured and unstructured data.
– Veracity refers to the trustworthiness and accuracy of the data from the sources,
– Finally, the Value can be discovered from the analysis of the hidden data, so big data can provide new findings of new values and opportunities to assist in making decisions. The value can be termed as the business advantage and profits the data can bring to the organization. It depends solely on the data and its source.
Big data can be critical as it involves: Discovering the sources of data, analyzing the sources to gain helpful insights of data, understanding the value of data and analyzing the organizational gains through this data. To handle these challenges, data integration is a key, especially where data comes in both structured and unstructured formats and need to be integrated from disparate sources stored in systems managed by different departments. Effective data integration is crucial for analysts and decision makers, as it can provide a broader picture of the problem at hand, avoiding biased results and misleading conclusions. For example, while the analysis of polls data failed to predict the election of Donald Trump in November 2016, data extracted from Facebook correctly predicted the winner (The Economist, 2016).
The main contribution is to promote the integrated view of datasets coming from different sources. An example of using big data are crises. There are a various crises and events that occur in different places worldwide. These crisis and events are isolated from each other such as earthquakes, floods, shooting and wildfire. For instance, if an earthquake had taken place in Chilli or Japan, I would not have found a query interface that provides rapidly and accurately the latest status of a disaster concerning the number of victims in every country for a specific period, economic damage and the type of crises if is it natural or man – mad ? . The DSS, the first responders, decision makers, and policemen need this query interface to carry out their responses rather than colleting unstructured data from different sources like Twitter, news websites, blogs, Facebook (FB). Therefore, suggests creating a summary or a template to every Crises and Event pointing to their common characteristics carrying out a comparison between them. This is an example of integration unstructured data. Consequently, a summary “template" view facilitates data access to the casual user and the DSS. To apply this integration, use different techniques likes Neuro-Linguistic Programming (NLP), information extraction, schema mapping, record linkage and data links.