Maximum Collinear Points

Given a set of points in a two-dimensional plane, determine the maximum number of points that lie along the same straight line.

Example:

Input: points = [[1, 1], [1, 3], [2, 2], [3, 1], [3, 3], [4, 4]]
Output: 4

Constraints:

The input won’t contain duplicate points.

Intuition

Two or more points are collinear if they lie on the same straight line. In other words, the slope between any pair of points among them will be equal:

Let’s remind ourselves how the slope of a line is calculated given two points ( xax_axa, yay_aya) and ( xbx_bxb, yby_byb):

slope=rise/run=(yb−ya)/(xb−xa)slope = rise / run = (y_b-y_a) / (x_b-x_a)slope=rise/run=(yb−ya)/(xb−xa)

Using this formula, we can calculate the slope between all pairs of points from the input, and determine the largest number of pairs that share the same slope. However, this approach is flawed because two pairs of points with the same slope value may not be collinear:

Let’s think of a different way to find the answer. What if we try to find the maximum number of points collinear with a specific point? Let’s call this point a focal point.

We can calculate the slope between the focal point and every other point in the input, using a hash map to count how many points correspond with each slope value. Note that the frequencies of points stored in the hash map do not include the focal point.

This allows us to find the maximum number of points collinear with this focal point:

Here, the slope with the highest frequency is 3, which means the number of points on the line defined by that slope is 3 + 1 = 4, where +1 is used to account for the focal point itself.

By repeating the process for every point, we can determine the maximum number of points that are collinear with each focal point. Our final answer is equal to the largest of these maximums.

Edge case: two points on the same x-axis
When two collinear points share the same x-axis, the denominator of the slope equation will equal 0 (i.e., run=xb−xa=0run = x_b - x_a = 0run=xb−xa=0). This is problematic because dividing by 0 is undefined:

To handle this issue, we can check if our run value is equal to 0. If so, we can just use infinity (float('inf')) to represent the value of this slope.

Avoiding precision issues
A crucial thing to be mindful of is the precision of slopes when storing them as floats or doubles. Consider the following slopes:

$Image represents two calculations of slopes, slope1 and slope2, each defined as the ratio of rise to run. slope1 is calculated as 499999999/500000000, which, due to floating-point precision limitations, results in 0.999999998. slope2 is calculated as 499999998/499999999, also resulting in 0.999999998 after floating-point precision is applied. The diagram visually shows these calculations with rectangular boxes representing the fraction calculations (e.g., 499999999/500000000 and 499999998/499999999) and rounded rectangles representing the final floating-point results (0.999999998). Arrows connect the fraction calculations to their respective floating-point results, labeled 'float precision'. A vertical arrow and the label 'not equal' connect the two fraction calculations, indicating that despite their near-identical results after floating-point conversion, the original fractions are not mathematically equal. Similarly, a vertical arrow and the label 'equal' connect the two floating-point results, highlighting that the final results are considered equal due to the limitations of floating-point precision.$

As we can see, despite the fractions themselves representing different slopes, presenting them as a float or double doesn’t provide enough decimal-point precision to distinguish between them accurately. This can result in incorrectly identifying distinct slopes as the same.

To avoid this, we can represent a slope as a pair of integers, stored as a tuple: (rise, run). For example, the slope with a rise of 1 and a run of 2 can be represented as (1, 2) instead of 1 / 2 = 0.5.

Ensuring consistent representation of slopes
There’s just one more challenge to address: ensuring the representation of a slope is consistent for all equivalent slope fractions:

$Image represents three fractions, ¹⁄₂, ²⁄₄, and ¹³/₂₆, each positioned at the top and slightly separated from each other. Curved grey lines connect each fraction individually to a single point below. This point is labeled 'same value, different fractional representations,' indicating that despite their different numerators and denominators, all three fractions are equivalent and represent the same numerical value (0.5). The arrangement visually demonstrates the concept of equivalent fractions, showcasing how a single value can have multiple fractional representations.$

If we reduce fractions to their simplest forms, we can consistently represent equal fractions that have different initial representations.

$Image represents a diagram illustrating a pattern in fraction simplification. The diagram shows three columns, each representing a separate fraction simplification example. The top of each column displays an initial fraction: ¹⁄₂ in the first column, ²⁄₄ in the second, and ¹³/₂₆ in the third. A downward-pointing grey arrow connects each initial fraction to a simplified equivalent fraction at the bottom of the column: ¹⁄₂ in the first column, ¹⁄₂ in the second, and ¹⁄₂ in the third. The arrangement visually demonstrates that despite starting with different numerical representations, each fraction simplifies to the same result, ¹⁄₂, highlighting a pattern of fraction reduction.$

But how can we do this? We can reduce fractions by dividing both the rise and run by their greatest common divisor (GCD).

Greatest common divisor
The GCD of two numbers is the largest number that divides both of them exactly. By dividing the rise and run by their GCD, we reduce the slope to its simplest form:

This ensures all equal fractions are represented in the same way.

Implementation

Python

JavaScript

Java

SVG Image

from typing import List, Tuple
from collections import defaultdict
    
def maximum_collinear_points(points: List[List[int]]) -> int:
    res = 0
    # Treat each point as a focal point, and determine the maximum number of points
    # that are collinear with each focal point. The largest of these maximums is the
    # answer.
    for i in range(len(points)):
        res = max(res, max_points_from_focal_point(i, points))
    return res
    
def max_points_from_focal_point(focal_point_index: int, points: List[List[int]]) -> int:
    slopes_map = defaultdict(int)
    max_points = 0
    # For the current focal point, calculate the slope between it and every other
    # point. This allows us to group points that share the same slope.
    for j in range(len(points)):
        if j != focal_point_index:
            curr_slope = get_slope(points[focal_point_index], points[j])
            slopes_map[curr_slope] += 1
            # Update the maximum count of collinear points for the current focal
            # point.
            max_points = max(max_points, slopes_map[curr_slope])
    # Add 1 to the maximum count to include the focal point itself.
    return max_points + 1
    
def get_slope(p1: List[int], p2: List[int]) -> Tuple[int, int]:
    rise = p2[1] - p1[1]
    run = p2[0] - p1[0]
    # Handle vertical lines separately to avoid dividing by 0.
    if run == 0:
        return (1, 0)
    # Simplify the slope to its reduced form.
    gcd_val = gcd(rise, run)
    return (rise // gcd_val, run // gcd_val)

While some programming languages, Python included, have their own internal implementation of the GCD function, its implementation is provided below for your information. This implementation is commonly known as the Euclidean algorithm:

Python

JavaScript

Java

SVG Image

# The Euclidean algorithm.
def gcd(a, b):
    while b != 0:
        a, b = b, a % b
    return a

Complexity Analysis

Time complexity: The time complexity of maximum_collinear_points is O(n2log⁡(m))O(n^2\log(m))O(n2log(m)), where nnn denotes the number of points, and mmm denotes the largest value among the coordinates. Here’s why:

The time complexity of the gcd(rise, run) function is O(log⁡(min(rise,run)))O(\log(min(rise, run)))O(log(min(rise,run))), which is approximately equal to O(log⁡(m))O(\log(m))O(log(m)) in the worst case.
The helper function max_points_from_focal_point, calls the gcd function a total of n-1 times: one for each point excluding the focal point, giving a time complexity of O(nlog⁡(m))O(n\log(m))O(nlog(m)).
The max_points_from_focal_point function is called a total of nnn times, resulting in an overall time complexity of O(n2log⁡(m))O(n^2\log(m))O(n2log(m)).

Space complexity: The space complexity is O(n)O(n)O(n) due to the hash map, which in the worst case, stores n−1n-1n−1 key-value pairs: one for each point excluding the focal point.