Master your Data
When just collected, data is usually "raw" (unprocessed), then raw data has to be "cleaned" & corrected into clean data to remove outliers or obvious errors.
Data has to be valid, complete, consistent, uniform (formats) and accurate to proceed.
Note: do NOT (!) proceed with your data, if one of the criteria are not fullfilled, as time will be wasted when working with poor data quality.
Stage by stage, the "processed data" from one stage are the "raw data" for the next stage.
The act of preparing raw data into refined information assets, such as relational databases, that can be used effectively for querying, is called Data Preparation.
Much of the world's raw data lives in organized data collections, called relational databases.
An effective Data Scientist, yet a Data Champion, knows exactly how to turn raw data info actionable insights !
Starting with basics of querying databases using SQL (say ess-que-ell or sequel), the structured queery language, the world's most popular databasing language. Of course, if you want to have better results, you can also use other dialects of SQL or more advanced languages like SPPS, R, Git, Shell, Shiny, Python and SAS to do so.
Then index and merge, join or update the information of two or more tables into one, just as if the information would be available in one single dataset. You can use merge, join, inner join, outer join, self-join, union, union all, semi-join, anti-join, cross-join, intersection and except clause, depending on what you exactly need to get.
Also combine the amount of information available in several tables into one single table using set, nested queries and/or work with subqueries, if you want to focus only on a specific part of the data available.
Use conditional statements, loops, automate repetitive steps or tasks, vector functions and ERROR handling.
A Data Scientist knows how to wrangle and extract data from these relational databases:
Select columns, single columns or multiple columns, select distinct and count, then summarize columns, filtering rows (filtering numeric values or filtering text) where some criteria of interest are met, between, in, basic comparison operators, NULL and IS NULL, LIKE and NOT LIKE, combining multiple criteria, pattern matching in text and more.
Aggregate functions to summarize data, combining aggregate functions with 'where', and gaining useful insights, arithmetics in SQL, aliases, and make data more readable.
Sorting data, single or multiple columns, ascending or descending, grouping results using 'group by', and choose relevant selections by use of having.
Import Data, Data Manipulation (Num or Text Functions, Parsing, etc.), Data Cleansing, Programming, Object Oriented Programming, Data Profiling, Probability & Statistics, Correlation, Models, Regression, Exploration, Experimental Sampling, Geo Data, Time Series, Estimation, Hypothesis, Sentiment Analysis on positive and negative language, Natural Language Processing, Data Reporting, Plot Graphs & Data Visualization, Supervised Learning, Machine Learning, Tree-Based Models, Unsupervised Learning, Deep Learning, Applied Finance, Equity Valuation, Financial Trading and Portfolio Analysis, modeling Credit Risk, Value-at-Risk, and more Case Studies.
These days, multiple approaches or data analytics are known.
Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets. DAMA, the Data Management Body of Knowledge.
Usually, data preparation is necessary and a critical first step before starting data analytics.
Data cleansing can be a task of data preparation, or loading data, combining data, deliver data for next steps.
The act of preparing raw data into refined information assets that can be used effectively, is called data preparation.
Data cleansing activities can be quite intense and complex. Data cleansing has to make sure, the data is:
Valid, complete, consistent, uniform (formats), accurate.
If your data quality is poor, all the above is possible but does not make much sense. Yes, of course, you will deliver results and your results can be nicely displayed, but we can call it sh.. in, sh.. out. (Note: sorry, for not using the word 'shit' here)
So, make sure, your data is valid, complete, consistent, uniform (formats) and accurate. If good data quality is available, the management can take good decisions.
Of course, if your data quality is good, you do not want everyone on this planet to access it.
So, make sure, your data is well protected and your data is available only to the people who need to access it.
Cyber Security and Ethical Hacking
To find out, how well your data is actually protected, you really need to hack into your own protected data environment and improve on your lessons learned to protect yourself against other hackers and Black Hat Hackers.
Use Ethical Hacking to try to get access ! Use Network Penetration Testing 'Pen Test', understand devices interaction, methods to crack WEP/WPA/WPA2 encryption, Man-In-The-Middle, ARP Spoofing/Poisonning, Backdooring, Sniffing open ports, read/write/upload/execute files, exploit buffer overflows, extract passwords, cookies, urls, emails, images, pictures, videos, domains, sub-domains, accounts, social media accounts and friends. Bypassing. Social engineering. Pretend fake updates, gain control over computer systems and do Post Exploitation.
In case of questions, don't hesitate to contact us from www.digiGeek.ch !