6.2. Vectorized Functions#
6.2.1. The Essence of Vectorization#
In mathematics, we often apply an operation to an entire set of numbers at once. In programming, this is called vectorization. Instead of writing a for
loop to handle each element individually, you use functions that operate on the entire array in a single, expressive statement. We’ve already seen this with the dot-operator (.
).
Julia provides a rich library of these vectorized functions. They lead to code that is not only shorter and more readable, but often significantly faster, as Julia can use highly optimized, low-level routines to perform the computation. Let’s explore some of the most common types of array functions, starting with reduction functions.
6.2.1.1. Reduction Functions: Summarizing Your Data#
A reduction function takes an array and “reduces” it to a single value. A perfect example is summing all the elements:
# Create a vector of 5 random integers between 1 and 100.
x = rand(1:100, 5)
# The sum() function is a clean and efficient way to add up all elements.
println("Sum of ", x, " = ", sum(x))
Sum of [97, 100, 44, 14, 100] = 355
Many reduction functions can also operate along a specific dimension of a multi-dimensional array using the dims
keyword argument. This allows you to, for example, compute column-wise or row-wise summaries.
dims=1
: Operates along the first dimension (collapses the rows to produce a result for each column).dims=2
: Operates along the second dimension (collapses the columns to produce a result for each row).
# Let's create a 3x5 matrix to see this in action.
A = rand(-100:100, 3, 5)
3×5 Matrix{Int64}:
-34 81 97 -13 -36
7 -55 50 95 -41
41 -37 73 -66 -99
# Sum down the columns (dims=1), producing a 1x5 row vector of column sums.
sum(A, dims=1)
1×5 Matrix{Int64}:
14 -11 220 16 -176
# Sum across the rows (dims=2), producing a 3x1 column vector of row sums.
sum(A, dims=2)
3×1 Matrix{Int64}:
95
56
-88
Note: Preserving Dimensions Notice that when you reduce a 2D array, the result is still a 2D array (a
1x5
or3x1
matrix), not a 1D vector. Julia does this to maintain type consistency. If you need a true 1D vector as the output, you can easily “flatten” the result using[:]
.
# The result of the column-wise sum is now a 5-element Vector.
sum(A, dims=1)[:]
5-element Vector{Int64}:
14
-11
220
16
-176
Here are a few other essential reduction functions:
x = rand(1:100, 5)
display(x)
# Calculate the product of all elements.
prod(x)
5-element Vector{Int64}:
4
5
61
81
59
5830380
display(A)
# Find the largest element in each column.
display(maximum(A, dims=1))
# Find the smallest element in each column.
display(minimum(A, dims=1))
3×5 Matrix{Int64}:
-34 81 97 -13 -36
7 -55 50 95 -41
41 -37 73 -66 -99
1×5 Matrix{Int64}:
41 81 97 95 -36
1×5 Matrix{Int64}:
-34 -55 50 -66 -99
6.2.1.2. Cumulative Functions: Building on What Came Before#
Cumulative functions are another flavor of vectorization. Instead of reducing the array to a single value, they produce an array of the same size where each element is the result of an operation on all preceding elements. The most common are cumsum
and cumprod
.
x = 1:5
# Cumulative sum: the k-th element is the sum of x[1] through x[k].
display(cumsum(x)) # Expected: [1, 1+2, 1+2+3, ...]
# Cumulative product: the k-th element is the product of x[1] through x[k] (factorial).
display(cumprod(x)) # Expected: [1, 1*2, 1*2*3, ...]
5-element Vector{Int64}:
1
3
6
10
15
5-element Vector{Int64}:
1
2
6
24
120
# Like before, we can apply these along specific dimensions.
A = reshape(1:6, 2, 3)
2×3 reshape(::UnitRange{Int64}, 2, 3) with eltype Int64:
1 3 5
2 4 6
# Perform a cumulative sum down the columns (dims=1).
cumsum(A, dims=1)
2×3 Matrix{Int64}:
1 3 5
3 7 11
# Perform a cumulative product across the rows (dims=2).
cumprod(A, dims=2)
2×3 Matrix{Int64}:
1 3 15
2 8 48
6.2.2. Case Study: Vectorizing a Taylor Polynomial#
Let’s revisit the Taylor series for \(\cos(x)\) that we previously implemented with a for
loop.
The original loop-based code looked like this:
function taylor_cos_loop(x, n)
term = 1.0
y = 1.0
for k = 1:n
term *= -x^2 / ((2k-1) * 2k)
y += term
end
return y
end
We can formulate a much more elegant solution using vectorization. The key insight is that each term in the series is simply the previous term multiplied by a factor of -x^2 / ((2k-1) * 2k)
.
Our vectorized strategy is:
Create an array of these multiplicative
factors
fork
from 1 ton
.Use
cumprod()
on this array. This will generate a new array containing all the terms of the series (except for the initial term of 1).Use
sum()
to add up all the generated terms, and finally add the initial1
.
function taylor_cos(x, n)
# Step 1: Create a vector of the multiplicative factors.
k = 1:n
factors = @. -x^2 / ((2k - 1) * 2k)
# Step 2 & 3: Generate all terms with cumprod and sum them.
# We add the leading term of the series, which is always 1.
y = 1 + sum(cumprod(factors))
end
taylor_cos (generic function with 1 method)
# Test the vectorized function.
println("Taylor approximation: ", taylor_cos(10, 50))
println("Julia's cos(10): ", cos(10))
Taylor approximation: -0.8390715290752269
Julia's cos(10): -0.8390715290764524
This vectorized approach isn’t just for single values. We can use a comprehension to efficiently calculate the Taylor series for many x
values and many degrees n
all at once, creating a 2D array of results perfect for plotting.
# Set up the domain for our plot.
xx = -10:0.01:10
# Calculate the true cosine values.
yy = cos.(xx)
# In one line, calculate approximations for all x values for degrees 0 through 5.
# The result, yy_taylor, is a matrix where columns correspond to different degrees.
yy_taylor = [ taylor_cos(x, n) for x in xx, n in 0:5 ]
# Now, let's plot it.
using Plots
# Create the plot and add the first series (the true function)
# All global plot settings like title and limits are set here.
plot(xx, yy,
linewidth = 2, # Set line width (lw=3 also works)
color = :black, # Set line color
label = "cos(x)", # Label for the legend
title = "Taylor Series Approximations of cos(x)",
xlims = (-10, 10), # Set x-axis limits
ylims = (-2, 2) # Set y-axis limits
)
# Add the Taylor approximation series to the existing plot using plot!()
# Plots.jl automatically plots each column of yy_taylor as a separate line.
plot!(xx, yy_taylor,
alpha = 0.8, # Set line transparency
label = "" # Use empty label to avoid cluttering the legend
)
6.2.3. Querying Arrays: Booleans and Indices#
Another powerful set of array functions allows you to ask questions about your data. These functions typically involve boolean logic (true
/false
) and are essential for filtering and analyzing arrays.
6.2.3.1. Membership and Logical Checks#
These functions check for the presence of elements or conditions across an entire array.
# Let's work with the odd numbers from 1 to 999.
x = 1:2:1000
# The `in` function (or the symbol ∈, typed \in<TAB>) checks for membership.
println("Is 503 in x? ", 503 in x)
println("Is 1000 in x? ", 1000 ∈ x)
Is 503 in x? true
Is 1000 in x? false
# The `all()` function returns `true` only if every element satisfies the condition.
println("Are all elements less than 500? ", all(x .< 500)) # false
println("Are all elements greater than 0? ", all(x .> 0)) # true
Are all elements less than 500? false
Are all elements greater than 0? true
# The `any()` function returns `true` if at least one element satisfies the condition.
println("Is any element equal to 503? ", any(x .== 503)) # true
println("Is any element equal to 1000? ", any(x .== 1000)) # false
Is any element equal to 503? true
Is any element equal to 1000? false
6.2.3.2. Finding and Counting#
Often, it’s not enough to know if a condition is met; you need to know where or how many times.
# `findfirst()` returns the index of the first element that satisfies a condition.
idx = findfirst(x .== 503)
println("The value 503 is at index: ", idx)
println("Check: x[", idx, "] = ", x[idx])
The value 503 is at index: 252
Check: x[252] = 503
# `findall()` returns a vector of indices for all elements that satisfy the condition.
# Note: `@.` is a macro that applies the dot to every operation in the expression.
ind = findall(@. x % 97 == 0)
println("Indices of multiples of 97: ", ind)
println("The corresponding values are: ", x[ind])
Indices of multiples of 97: [49, 146, 243, 340, 437]
The corresponding values are: [97, 291, 485, 679, 873]
# `count()` returns the number of elements that satisfy a condition.
num_multiples = count(@. x % 97 == 0)
println("Found ", num_multiples, " multiples of 97.")
Found 5 multiples of 97.