You can perform some kind of data analytics on sets of data in Python. This is made possible by certain functions and constructs.
In this article we would cover the following
1. map
Map allows you to apply an operation to each element of a sequence. You provide two inputs to map: the function to apply and the sequence.
The output is a sequence of the results.
For example, assuming you have a function that calculates the area of a circle. This function is given below.
import math def calculatearea(radius): # calculate area of a circle area = math.pi * radius * radius return area
Also, let’s say we have a list of radii as:
radii = [4, 5.2, 9, 10, 3,8, 6, 7.5]
To calculate the areas of circle with these radii, we can loop through the list of radii and call the calculatearea() function for each of the radii.
The code to do this is show below.
areas = [] for radius in radii: area = calculatearea(radius) areas.append(area) print(areas)
This method above is fine and good. However, we can do the same thing with a single line of code. In this case, we use the map function.
To do that, you simply call the map function, and provide the name of the function followed by the list of radii. This is shown below:
areas = map(calculatearea, radii) print(list(areas))
You will notice that in the second line, we cast the output areas to a list. This is because the map function returns a map object.
2. filter
You can use the filter function to filter a list. This like filtering out the data you do not need. So you can select certain data from the list based on some criteria.
Assuming you have a list of scores. Now, you will like to filter out scores below average. Let’s write do this the normal way. Then we also do it using the filter function. Here we assume that for a list of scores, 50 is the average score.
newscores = [] # holds scores above average for score in scores: if isaboveaverage(score): # if score is above average, add it to newscores newscores.append(score) scores = [45, 70, 94.2, 75, 51, 49, 35.1] print(newscores)
Again we can do this using the filter function. The code is given below:
newscores = filter(isaboveaverage, scores) print(list(newscores))
Also, don’t forget that the output is converted back to list using the list function. This is because, the output of a filter function is a filter object. If you want to see what a filter object looks like, then first print it without converting.
Filtering out missing data
One area you can use the filter function is to filter out missing data in a dataset. For example, the list below contains names of students with some missing values.
students = ["Jadon", "Solace", "" "Treasure", "", "", "Onyx", "Booboo"]
If you print this list, it would include the missing values. This is not desirable. To filter out this missing values, you can use the filter function. In this case, you pass in None as the first parameter.
This is shown below:
newStudents = filter(None, students) print(list(newStudents))
I recommend you try out all these yourself to see how it works. Maybe you can change up the dataset.
3. reduce
You use this this function to find some kind of aggregate of a list of item. The reduce function is a bit difference that the previous two. And you probably would not need to use if much. This is because, a loop tends to be better.
Now this is how it works:
It takes a sequence of items and applies a function to each item cumulatively. That means that it first applies the function to the first two items, then it applies the function to this result and the third item, then this result and the fourth item and so on. It continues until it reaches the last item.
This is illustrated below
Let’s take for example, finding the product of all numbers in a list. However, we need to first import the reduce function from the functools module.
from functools import reduce data = [3, 5, 3, 1.2, 2.6, 3.5, 1] product = lambda x, y: x*y result = reduce(product, data)
Watch the Video
Remember, you will understand it better if you do it yourself. Do leave a comment if you have any.