digiGeek.ch
Industry 4.0 & Blockchain

Analytics & Co.
by digiGeek

Data, Big Data, Data Lakes, Data Preparation, Analytics & Visualization

Data
Data is called "the gold or the oil of the economy of the digital age".
Basically, data is a set of values, often collected for scientific research, then studied, analyzed, graphically visualized and reported to discover useful information, suggesting conclusions, and to support decision-making.
When collected, data usually "raw" (unprocessed), then has to be "cleaned" and corrected by researchers, to remove outliers or obvious errors. Data processing usually goes through several stages, and the "processed data" from one stage are the "raw data" for the next stage.
Data sets can grow rapidly and can become too large and/or complex to be processed with traditional data processing application software.

Big Data
These days, the quantities of data available are indeed large, but the quantities of data are not the most relevant characteristic for big data.
In big data the diversity of data collected of a variety of different devices like mobile devices, sensors, logs, cameras, microphones and other sources, are a more important characteristic.
Advanced data analytics methods of sensor data of the IoT (internet of things), of user behavior analytics, predictive analytics or complex physics simulations, biology and environmental research usually deal with big data.
Big data challenges include data capturing, transfer, storage, querying, analytics, and others.
Relational database management systems and desktop statistics- and visualization-packages are no longer able to deal with big data. Big data work requires massive in-core processing and parallel software running on a bigger set of servers.

Data Analytics, the analysis of data
Data analytics is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.
These days, multiple approaches or data analytics are known.

Advanced Analytics, Requirements
For a deeper dive into Analytics, use integrated or "end-to-end" software (having a minimum of interfaces) updating seamlessly. Keep your data flowing through your whole exploration and stay in the flow yourself. Slice-and-dice to explore your data, test hypotheses while staying in the analytical flow. The next question should arise right from your actual data exploration.
Do segmentation, grouping, defining sets, subsets and cohorts (birth cohort, age cohort,...) without modifying the underlying data.
Use visual analytics dashboards and input parameter (macros or input controls) for adjustments on the selected data to see, how they influence results.
Use mathematical, string or logic functions to create new fields for your analysis, as your data does not always contain all the fields you need.
Use time series for patterns, trends, rates or temperatures for forecasts, generate meaningful insigts and predict what will happen moving forward.
Use advanced statistics platforms like SPPS, R, Python or SAS to maximise your possibilities.

Data Visualization
Visual communication makes complex data more accessible, understandable and usable. It involves the creation and study of the visual representation of data in a schematic form to analyze and reason about data and evidence and to communicate information clearly and efficiently. While tables are used to look up a specific measurement, charts are used to show patterns or relationships in the data.

Predictive Analytics
Predictive analytics analyzes current and historical facts to make predictions about unknown events or the future and technically refers to a variety of statistical techniques: Predictive Modeling, Machine Learning, and Data Mining.
In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.

Data Preparation
Usually, data preparation is necessary and a critical first step before starting data analytics.
Data cleansing can be a task of data preparation, or loading data, combining data, deliver data for next steps.
The act of preparing raw data into refined information assets that can be used effectively, is called data preparation.

Data Cleansing
Data cleansing activities can be quite intense and complex. Data cleansing has to make sure, the data is:
Valid, complete, consistent, uniform (formats), accurate.

Data Management
Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets. DAMA, the Data Management Body of Knowledge.

Data Quality
If your data quality is poor, all the above is possible but does not make much sense. Yes, of course, you will deliver results and your results can be nicely displayed, but we can call it sh.. in, sh.. out. (Note: sorry, for not using the word 'shit' here)
So, make sure, your data is valid, complete, consistent, uniform (formats) and accurate. If good data quality is available, the management can take good decisions.

Data Protection
Of course, if your data quality is good, you do not want everyone on this planet to access it.
So, make sure, your data is well protected and your data is available only to the people who need to access it.

Cyber Security and Ethical Hacking
To find out, how well your data is actually protected, you really need to hack into your own protected data environment and improve on your lessons learned to protect yourself against other hackers and Black Hat Hackers.
Use Ethical Hacking to try to get access ! Use Network Penetration Testing 'Pen Test', understand devices interaction, methods to crack WEP/WPA/WPA2 encryption, Man-In-The-Middle, ARP Spoofing/Poisonning, Backdooring, Sniffing open ports, read/write/upload/execute files, exploit buffer overflows, extract passwords, cookies, urls, emails, images, pictures, videos, domains, sub-domains, accounts, social media accounts and friends. Bypassing. Social engineering. Pretend fake updates, gain control over computer systems and do Post Exploitation.

In case of questions, don't hesitate to contact us from www.digiGeek.ch !

John

Matthias Seiler

CEO & Founder of digiGeek

Related Topics:
BARC Studies & Surveys on Data and Analytics