Introduction to Python for Data Science
Python is a powerful high level object oriented programming language with a simple syntax. It has many applications but the major ones are web development, software development and data science. Data science is a field where meaningful insights are extracted from data to allow for decision making and planning in businesses. It combines math, statistics, programming, machine learning, artificial intelligence and advanced analytics.
Data Science project life cycle
The data science project life cycle consists of various processes which include: data collection, data cleaning, exploratory data analysis, model building and model deployment. Python has multiple libraries which facilitates these processes hence making it suitable for data science. Examples of these libraries are pandas for data analysis, wrangling and cleaning, matplotlib and seaborn for data visualization, tensor flow and scikit-learn for machine learning, keras and pyTorch for deep learning, SciPy and NumPy for mathematical computations and many others. Other factors that make python the preferred language for data science are: the simple syntax hence it is easy to learn, it is open-source, it allows for test driven development and it is compatible with multiple operating systems such as Windows, Linux and MacOS.
Working with python
To work with python one has to first download it then work in their preferred Integrated Development Environment (IDE) for example IDLE, Visual studio code, Spyder, Jupyter notebooks and many more.
One can also work on a web based IDE hence there will be no need to download any. Examples of this are: Google Colab, Jupyter lab etc.
Python basics
To learn python for data science one first has to learn python fundamentals which include data types, operators, sequences/ compound data structures, conditional statements, loops, functions and external libraries.
Python data types comprise of:
Strings: sequence of characters put in single or double quotes. Written as str in python. Examples: ‘apple’, “python”, ‘1’, ‘1+3’, “3.65”, “”, “ ”.
#To check data type in python
fruit='apple'
print(type(fruit))
Integers: These are whole numbers that can be either positive, negative or 0. Written as int in python. Examples: -3, 2,0.
#To check data type in python
number = -3
print(type(number))
Boolean: These are truth and false values. Written as bool in python. Examples: True, False, 1, 1.0, 0, 0.0.
#To check data type in python
a = True
print(type(a))
Floating point numbers: These are numbers expressed as decimals that can be either positive or negative. Written as float in python. Examples: 3.3, -3.3, 5e10, -5e10, 4., 0.0 and many more.
#To check data type in python
m = -5e10
print(type(m))
Python sequences store collections of data in a single variable and comprise of:
Lists: These are ordered, indexed, changeable collections of data enclosed in square brackets and separated by commas e.g. list_A=[‘apple’, 1, True, 3.3]
#To check type which should be list
list_A=['apple', 1, True, 3.3]
print(type(list_A))
Sets: These are unordered and unindexed collections of data enclosed in curly brackets and separated by commas which can not have duplicates e.g. Set_A= {‘apple’, 0, True, 3.3}
#To check type which should be set
Set_A= {'apple', 0, True, 3.3}
print(type(Set_A))
Tuples: These are ordered, unchangeable and indexed collections of data enclosed in parenthesis and separated by commas e.g. Tuple_A= (‘apple’, 0, True, 3.3)
#To check type which should be tuple
Tuple_A= ('apple', 0, True, 3.3)
print(type(Tuple_A))
Dictionaries: These are key value pairs that are ordered, changeable and indexed. They comprise of a key separated from a value with a colon then the pairs are separated from others with a comma and it is all enclosed in curly brackets e.g. Dict_A= {‘Name’: ‘Jane’, ‘Age’: 24, ‘Country’: ‘Kenya’}
#To check type which should be dictionary
Dict_A= {'Name': 'Jane', 'Age': 24, 'Country': 'Kenya'}
print(type(Dict_A))
Python operators are addition, subtraction, division, multiplication, increment operator, decrement operator, exponentiation and modulo. Below are examples of how some are executed:
a= 10
b= 5
#addition
print(a+b)
#subtraction
print(a-b)
#multiplication
print(a*b)
#classic division
print(a/b)
#floor division- rounds off result to nearest integer
#result will be 1 instead of 1.5
c=5
d=4
print(c//d)
Conditional statements control the flow of the program’s execution. In python they include: if statements, if else statements, if-elif and if-elif-else statements. Example of conditional statements to determine if a number is odd or even
Loops facilitate iteration which is repeatedly performing a set of instructions until a certain condition is reached. There are for loops, while loops and nested loops. Below is an example of a while loop to print the multiplication table of any number from 10 to 1:
#prompt for user to enter a number
number=int(input('Enter a number:'))
#while loop
# count is the loop variable that has been initialized by starting at 10
count=10
# while count is equal to or less than 10 and greater than or equal to 1
while count<=10 and count>=1:
product= count*number
print(f"{number} * {count} = {product}")
count-=1
Functions are code written to perform specific tasks. Python has inbuilt functions such as len(), range() and many others but you can also create custom functions. Below is a function to check whether a letter is a vowel or consonant.
Data processing and EDA
Once these basics are learnt one moves to the data processing, cleaning, data wrangling and Exploratory Data Analysis (EDA) libraries which include: pandas, NumPy, matplotlib, seaborn and many more. Their applications will be seen best by taking a hands on approach and using them in a project.
After this the next step is learning statistics. Statistics include: univariate analysis, bivariate analysis, multivariate analysis, sampling, distributions and hypothesis testing. The best approach to this is learning then carrying out a hands on project with the suitable python libraries.
𝐋𝐢𝐤𝐞
𝐒𝐡𝐚𝐫𝐞
Tags