Introduction to Python for Data Science

Python is a powerful high level object oriented programming language with a simple syntax. It has many applications but the major ones are web development, software development and data science. Data science is a field where meaningful insights are extracted from data to allow for decision making and planning in businesses. It combines math, statistics, programming, machine learning, artificial intelligence and advanced analytics. 


Data Science project life cycle

The data science project life cycle consists of various processes which include: data collection, data cleaning, exploratory data analysis, model building and model deployment. Python has multiple libraries which facilitates these processes hence making it suitable for data science. Examples of these libraries are pandas for data analysis, wrangling and cleaning, matplotlib and seaborn for data visualization, tensor flow and scikit-learn for machine learning, keras and pyTorch for deep learning, SciPy and NumPy for mathematical computations and many others. Other factors that make python the preferred language for data science are: the simple syntax hence it is easy to learn, it is open-source, it allows for test driven development and it is compatible with multiple operating systems such as Windows, Linux and MacOS.

Working with python

To work with python one has to first download it then work in their preferred Integrated Development Environment (IDE) for example IDLE, Visual studio code, Spyder, Jupyter notebooks and many more.

One can also work on a web based IDE hence there will be no need to download any. Examples of this are: Google Colab, Jupyter lab etc.

Python basics

To learn python for data science one first has to learn python fundamentals which include data types, operators, sequences/ compound data structures, conditional statements, loops, functions and external libraries.

Python data types comprise of:

Strings: sequence of characters put in single or double quotes. Written as str in python. Examples: ‘apple’, “python”, ‘1’, ‘1+3’, “3.65”, “”, “ ”.

Integers: These are whole numbers that can be either positive, negative or 0. Written as int in python. Examples: -3, 2,0. 


Boolean: These are truth and false values. Written as bool in python. Examples: True, False, 1, 1.0, 0, 0.0.

Floating point numbers: These are numbers expressed as decimals that can be either positive or negative. Written as float in python. Examples: 3.3, -3.3, 5e10, -5e10, 4., 0.0 and many more.

Python sequences store collections of data in a single variable and comprise of:

Lists: These are ordered, indexed, changeable collections of data enclosed in square brackets and separated by commas e.g. list_A=[‘apple’, 1, True, 3.3]

Sets: These are unordered and unindexed collections of data enclosed in curly brackets and separated by commas which can not have duplicates e.g. Set_A= {‘apple’, 0, True, 3.3}

Tuples: These are ordered, unchangeable and indexed collections of data enclosed in parenthesis and separated by commas e.g. Tuple_A= (‘apple’, 0, True, 3.3)

Dictionaries: These are key value pairs that are ordered, changeable and indexed. They comprise of a key separated from a value with a colon then the pairs are separated from others with a comma and it is all enclosed in curly brackets e.g. Dict_A= {‘Name’: ‘Jane’, ‘Age’: 24, ‘Country’: ‘Kenya’}

Python operators are addition, subtraction, division, multiplication, increment operator, decrement operator, exponentiation and modulo. Below are examples of how some are executed:

Conditional statements control the flow of the program’s execution. In python they include: if statements, if else statements, if-elif and if-elif-else statements. Example of conditional statements to determine if a number is odd or even

Loops facilitate iteration which is repeatedly performing a set of instructions until a certain condition is reached. There are for loops, while loops and nested loops. Below is an example of a while loop to print the multiplication table of any number from 10 to 1:

Functions are code written to perform specific tasks. Python has inbuilt functions such as len(), range() and many others but you can also create custom functions. Below is a function to check whether a letter is a vowel or consonant.

Data processing and EDA

Once these basics are learnt one moves to the data processing, cleaning, data wrangling and Exploratory Data Analysis (EDA) libraries which include: pandas, NumPy, matplotlib, seaborn and many more. Their applications will be seen best by taking a hands on approach and using them in a project.

After this the next step is learning statistics. Statistics include: univariate analysis, bivariate analysis, multivariate analysis, sampling, distributions and hypothesis testing. The best approach to this is learning then carrying out a hands on project with the suitable python libraries. 


𝐋𝐢𝐤𝐞

𝐒𝐡𝐚𝐫𝐞

Tags