Introduction to Data Science
Data science is an interdisciplinary field using scientific methods, algorithms, and systems to extract knowledge from data. It combines statistics, computer science, and domain expertise.
Data Science Process
CRISP-DM: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, Deployment. The process is iterative, not linear.
Types of Analytics
Descriptive (what happened), Diagnostic (why), Predictive (what will happen), Prescriptive (what to do). Each builds on the previous and adds more value.
Tools
Python (pandas, NumPy, scikit-learn), R, SQL, Jupyter notebooks, Tableau/Power BI, Spark, TensorFlow/PyTorch.
Roles
Data Analyst (reports, dashboards), Data Scientist (ML models), Data Engineer (pipelines, infrastructure), ML Engineer (deploy models).
Ethics
Privacy, bias, transparency, consent, accountability. Ethical data science follows principles of fairness, accountability, and transparency.
Summary
Data science combines statistics, programming, and domain knowledge to extract actionable insights from data.