Skip to main content
CodingPipe.com

Refactoring complex loops with transformation pipelines

Complex loops that validate data, apply business rules, and generate statistics in one place become increasingly difficult to maintain. Each new requirement risks breaking existing logic, and testing these interdependent operations becomes challenging.

The problem: complex loops #

Complex loops typically start simple but grow organically as developers add "just one more condition" rather than refactoring. The example below started as basic GPS point processing but expanded with additional steps for elevation tracking, speed calculation, and terrain classification. With each new feature, the code became harder to maintain:

public class ActivitySummary ProcessGpxData(List<GpxPoint> gpxPoints)
{
    // State variables
    double totalDistance = 0;
    double totalElevationGain = 0;
    List<GpxPoint> processedPoints = new List<GpxPoint>();
    
    for (int i = 0; i < gpxPoints.Count; i++)
    {
        var point = gpxPoints[i];
        
        if (i > 0)
        {
            var prevPoint = gpxPoints[i-1];
            
            // Calculate distance and speed
            double segmentDistance = CalculateHaversineDistance(prevPoint, point);
            double speed = CalculateSpeed(segmentDistance, point.Timestamp - prevPoint.Timestamp);
                
            // Skip GPS errors
            if (speed > 30) continue;
            
            // Track elevation changes
            double elevationDelta = point.Elevation - prevPoint.Elevation;
            if (elevationDelta > 0.5)
                totalElevationGain += elevationDelta;
                
            // Update accumulated values
            totalDistance += segmentDistance;
            
            // Update point with calculated metrics
            point.SegmentDistance = segmentDistance;
            point.Speed = speed;
            
            // Apply elevation smoothing (simplified)
            point.SmoothedElevation = CalculateSmoothedElevation(i, gpxPoints);
            
            // Classify terrain (multiple concerns mixed together)
            point.TerrainType = DetermineTerrainType(elevationDelta, speed);
        }
        else
        {
            // Initialize first point
            point.SegmentDistance = 0;
            point.Speed = 0;
            point.SmoothedElevation = point.Elevation;
            point.TerrainType = "flat";
        }
        
        processedPoints.Add(point);
    }
    
    return new ActivitySummary 
    {
        ProcessedPoints = processedPoints,
        TotalDistance = totalDistance,
        TotalElevationGain = totalElevationGain
    };
}

These loops:

The solution: transformation pipelines #

The solution is to break down complex processing into a transformation pipeline where each step has a single responsibility:

public ActivitySummary ProcessGpxData(List<GpxPoint> gpxPoints)
{
    // Convert raw GPS points to our working structure
    var intermediateData = MapToIntermediateData(gpxPoints);
    
    var withSegmentMetrics = CalculateSegmentMetrics(intermediateData); 
    var filteredPoints = FilterGpsGlitches(withSegmentMetrics);         
    var withElevationData = ProcessElevationData(filteredPoints);       
    var smoothedData = ApplyDataSmoothing(withElevationData);           
    var enrichedData = ClassifyTerrainTypes(smoothedData);              
    var processedPoints = CalculateAccumulatedTotals(enrichedData);     
    
    // Create final output with statistics
    return GenerateActivitySummary(processedPoints);                    
}

Each step has a single responsibility and can be implemented, tested, and modified independently. The key to making this work is an intermediate data structure that carries accumulated state between processing steps:

private class GpxIntermediateData
{
    // Original data
    public double Latitude { get; set; }
    public double Longitude { get; set; }
    public double Elevation { get; set; }
    public DateTime Timestamp { get; set; }
    
    // Derived/accumulated data
    public double SegmentDistance { get; set; }
    public double Speed { get; set; }
    public double ElevationDelta { get; set; }
    public bool IsGpsGlitch { get; set; }
    public double SmoothedElevation { get; set; }
    public string TerrainType { get; set; }
    public double AccumulatedDistance { get; set; }
    public double AccumulatedElevationGain { get; set; }
    public double AccumulatedElevationLoss { get; set; }
}

Implementation details #

Let's see how some of the pipeline steps would be implemented:

// Step 1: Map input to intermediate structure
private List<GpxIntermediateData> MapToIntermediateData(List<GpxPoint> gpxPoints)
{
    return gpxPoints.Select(p => new GpxIntermediateData
    {
        Latitude = p.Latitude,
        Longitude = p.Longitude,
        Elevation = p.Elevation,
        Timestamp = p.Timestamp
    }).ToList();
}

// Step 2: Calculate metrics between adjacent points
private List<GpxIntermediateData> CalculateSegmentMetrics(List<GpxIntermediateData> points)
{
    for (int i = 1; i < points.Count; i++)
    {
        var current = points[i];
        var previous = points[i - 1];
        
        current.SegmentDistance = CalculateHaversineDistance(previous, current);
        current.Speed = CalculateSpeed(current.SegmentDistance, 
                                     current.Timestamp - previous.Timestamp);
        current.ElevationDelta = current.Elevation - previous.Elevation;
    }
    
    return points;
}

// Step 3: Filter out GPS glitches based on unreasonable speeds
private List<GpxIntermediateData> FilterGpsGlitches(List<GpxIntermediateData> points)
{
    // Mark points with unreasonable speed as glitches
    foreach (var point in points)
    {
        point.IsGpsGlitch = point.Speed > 30; // 30 m/s threshold
    }
    
    // Filter out glitches
    return points.Where(p => !p.IsGpsGlitch).ToList();
}

Benefits of the pipeline approach #

Creating intermediate data structures improves code clarity, which is often worth the small performance cost in most applications. This same approach works for booking systems, commerce workflows, reporting pipelines, hardware sensor processing, or any multi-step data processing.

This approach leverages several established patterns:

Unlike distributed implementations of these patterns, our approach applies them at method level for everyday code.

Conclusion #

When you spot a loop doing too much (tracking state, applying business rules, and formatting output simultaneously), consider refactoring it to a data transformation pipeline.

Break problems into subproblems, solve them independently, then merge them in the final solution. This approach improves code quality and makes future maintenance easier.