Member-only story
Introduction to data-science tools
Data science is an interdisciplinary approach to extracting knowledge from noisy, structured and unstructured large volumes of data. It encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns.
The process of data mining and analysis involves applying mathematics, statistics, computer science, information science, and domain knowledge to illustrate stories that clearly convey the meaning of results to decision-makers and stakeholders at every level of technical knowledge and understanding. This shows the role of a data scientist, which is someone who creates programming code, and combines it with statistical knowledge to explain how the obtained results can be used to solve business problems.
As a scientific field, data-science unifies scientific methods, processes, algorithms and systems into a set of tools based on statistics, data analysis and informatics. Data science is closely related to data mining, machine learning and big data. The most common tools involve:
Linear algorithms
Linear regression
It creates numerical predictions using the best linear fitting of a data-set. The resulting model is easy to understand and shows the biggest drivers of the results. Nonetheless, it can be too simple to capture more complex relationships among the variables.