GoPro Mission 1 PRO: GP3 Image Processing Architecture
Posted by Mark Kirschenbaum on
I've spent the last two weeks researching the GoPro Mission 1 PRO, both internally and in freefall. Instead of a typical review, I first decided to research the new Image Signal Processor (ISP), GoPro's GP3. This deep dive allows me to truly understand what changes in the various imaging modes and why. I suggest, if you haven't already, reading the GP2 research article as this covers the baseline platform the GP3 builds upon.
GoPro's marketing for the Mission 1 centers on one claim: a new dedicated AI chip that delivers "market-leading low-light performance." GoPro's GP2 processor handled noise reduction entirely in fixed-function silicon. So what actually changed? And what does the image processing pipeline look like all the way through?
TL;DR My Observations
Overall, as you increase framerate and resolution you start dropping color quality; 4K at 30fps seems to be the sweet spot for low-light video and 4K at 60fps for action sports. Low-light modes use AI to reduce noise; everything else is standard ISP functionality. Motion Blur controls how fast the automatic exposure reacts, and stabilization now feeds into this block. Sport image tuning is preferable for skydiving or any sport where you are carving around your subjects. Besides the "1-inch" sensor, the ambient light sensor is the biggest hardware addition to the GoPro lineup and makes a real difference in image quality. Be sure not to cover up the ALS sensor.
I do not yet own a GoPro Mission 1 Lite, but my inclination is that it is the same hardware as the Mission 1 Pro with high-end modes disabled in software. The firmware for both cameras is encrypted and signed. Upgrading a Mission 1 Lite to a Pro is not straightforward, though it it may be feasible. For $100, it's not worth the trouble.
Replacing your still camera with this is not yet practical if you want to choose your shots. There is no continuous shutter mode (holding the shutter button to burst a sequence of frames). If you do not mind culling, you can shoot 2 photos per second at 12MP or one photo every 3 seconds at 50MP. RAW capture is also speed-limited. Hopefully GoPro will address the UI; as it stands it is not ideal for skydivers.
In summary, the image quality in this camera is exceptional and it is a great tool. The GoPro Mission 1 is heavier (46 grams more) and a bit longer (8.3mm) than previous GoPro HERO10 body style cameras. Buy this camera if you shoot bigways or events. Buy a standard GoPro if you are shooting teams, tandems, or doing it for the gram.
What Is GP3?
GP3 is GoPro's name for the custom Socionext Milbeaut Image Signal Processor as configured in the GoPro Mission 1 PRO & Lite. It builds on the GP2, which shares the same Socionext Milbeaut IP: the same ARM Cortex-A53 cluster, the same XM6 DSP, and more or less, the same fixed-function ISP blocks. GP3 differs in two main ways: it adds a VIPLite NPU (Neural Processing Unit) and is fabricated on a newer TSMC process node. GoPro's press release claims 5nm, versus the 12nm (12FFC) TSMC process used for GP2. This typically leads to lower power consumption which is very apparent when using the GoPro Mission 1.
At the architecture level, three things changed: GoPro/Socionext added a VeriSilicon VIPLite 8-core NPU block, a new hardware H.265 encoder, and it upscaled the image processing blocks to handle 8K video. Each processing block also gained independent power control for fine-tuning power consumption.
The GoPro Mission 1 Pro camera's codename is Sandbar MP (mass production). This research is based upon firmware version H26.01.01.10.00, compiled May 22, 2026.
GoPro HERO13 (GP2) vs GoPro Mission 1 Pro (GP3): What Actually Changed
| Feature | GP2 (HERO13) | GP3 (Mission 1) |
|---|---|---|
| SoC | Socionext Milbeaut TPK (Karine) 12nm |
Socionext Milbeaut TPK (Karine) IP, 5nm |
| NPU inference hardware | XM6 NPE sub-block (CEVA NeuPro) | VeriSilicon VIPLite 8 cores, separate |
| NPU vendor IP | CEVA NeuPro (CevaCnnAcc API) | VeriSilicon VIPLite (Vivante lineage) |
| Network format | XM6NeuProS binary | VNET/NBG (VIPLite graph format) |
| Video NR (standard modes) | Milbeaut TDNR (state not confirmed) | Milbeaut TDNR (ON in 4K, OFF in 8K) |
| Neural video noise reduction | None | AI-based RAW denoiser |
| DSP role | Primary inference + preprocessing | Preprocessing/postprocessing for VIPLite only |
| Image sensor | Sony IMX677L, SLVS-EC, 27.5 MP 8:7 (HERO11 to HERO13) | Sony IMX06A, C-PHY, higher resolution (Lite & Pro) |
| Max video resolution | 5.3K | 8K |
| RAM | 4GB | 4GB |
| GPU | GV380S UI display only | GV380S UI display only (unchanged) |
| CPU | Cortex-A53 x4, same config | Cortex-A53 x4, same config |
| ISP power domains | Coarse: VCORE0 / VCORE1 (all ISP blocks together) | Fine-grained: B2B1, B2R1, HDR, LTM1, R2Y1, CNR1, TDNR, IPU (each independently gated) |
| ISP algo nodes | Ae, Awb, Acls, Eis, Gtm, Rtm, Ltm, SceneCl, Fd, Ms, Ob | Ae, Awb, Eis, Gtm, Rtm, Ltm, SceneCl, Fd, Ms, Ob, Stitch, FlareEstim |
| Lens shading correction |
Acls: adaptive algorithm, running at 2-15 Hz |
UnifiedShading: static CLS+LLS map (65x65) applied inside SRO or B2B container |
| Sensor readout | Single-window | Dual-window: two half-width sensor windows stitched to full frame (dimensions scale with readout mode) |
| Battery | Enduro: 1900mAh 14°F (-10°C) |
Enduro2: 2150 mAh 4°F (-20°C) |
The XM6 DSP in GP3 is the same IP as GP2: GoPro did not upgrade it. The NeuPro sub-block inside the XM6 DSP seems to be idle in GP3, never loaded, never clocked. The 4.9 MB NeuPro runtime that existed in the GP2 is gone; in its place is a 69 KB stub that does nothing except format data to hand to the VIPLite. The XM6's entire role has been demoted from "inference engine" to "tensor shape adapter."
The fixed-function ISP blocks (B2B, B2R, HDR, LTM, R2Y, CNR, TDNR) appear identical across both generations: same names, same pipeline order, same log structure. GP3 adds a B2B_MAIN_2P dual-pipe B2B variant that activates in ~240fps super slo-mo scenarios, where the bandwidth of the high frame rate requires parallel Bayer processing. The meaningful changes are in what surrounds them. The Acls (Automatic Color Lens Shading) algorithm from GP2 is gone, replaced by a static lens shading map written once at scenario load. Two new algorithm nodes appear in every GP3 scenario: Stitch, which seams a pair of half-width sensor readout windows into the full resolution frame, and FlareEstim, which estimates and corrects lens flare inline. On the power management side, GP3 exposes each ISP stage as its own individually switchable power domain rather than grouping everything under two coarse voltage rails.
MISSION1 Platform at a Glance
| Block | Details |
|---|---|
| SoC | Socionext Milbeaut TPK ("Karine") |
| CPU | 4x ARM Cortex-A53, AArch64, 804 MHz (max 1200 MHz) |
| CPU split | RTOS cores 0-2 / Linux core 3 |
| NPU | NEW VeriSilicon VIPLite, 8 cores, 500 MHz idle / 720 MHz during inference |
| DSP | XM6, ISP pre/post-processing for NPU path |
| Video codec | Chips&Media WAVE6 x2, H.265 encode/decode IP integrated in SoC |
| GPU | Socionext Takumi GV380S (display composition only) |
| RAM | 4GB total, ~1.56 GB imaging heap, 192 MB Linux, 152 MB RTOS |
| Image sensor | NEW Sony IMX06A, I3C control + MIPI C-PHY |
| Lens | NEW "Ringtail" |
| ALS | NEW AMS TCS3530 ambient light + color sensor (new in GP3) |
| IMU | Bosch BMI260 6-axis |
| Compass | Bosch BMM150 magnetometer |
| GPS | u-blox M10 |
| Wireless | NEW Broadcom BCM4446 WiFi 6E / BT5.3 via PCIe |
Optics and Image Sensor
The Mission 1's built-in wide-angle lens is codenamed RINGTAIL, the physical optic GoPro ships on the camera. Optional lenses (Max Lens Mod, Ultra Wide, etc.) mount over this lens and are identified by the hall effect sensor system, which reads a magnet pattern embedded in each lens body.
The image sensor is the Sony IMX06A, connected via:
- I3C bus for sensor register control (faster than I2C, lower pin count)
- MIPI C-PHY for image data (higher lane bandwidth than D-PHY, required for 8K readout)
The IMX06A is a 50.33 megapixel back-illuminated stacked CMOS sensor with 8192 x 6144 active pixels at 1.60 µm pixel pitch, 16.384 mm diagonal (Type 1/0.98) hence the 1-inch nomenclature. Each pixel uses a Dual PD (dual photodiode) structure: two photodiodes per pixel capturing the scene at different conversion gains simultaneously, giving the sensor native in-pixel HDR without multiple exposures. This is the hardware basis for Sony's Diagonal Binning QBC HDR mode and the DCG/HDR readout path GoPro uses for standard 4K video. The MIPI C-PHY interface runs at up to 6.0 Gsps per trio (MIPI C-PHY Specification v2.0) across a 2/3 trio configuration, delivering the bandwidth for 30fps full resolution 8192 x 6144 readout. Output is RAW10 or RAW12.
Sony's four published native drive modes for the IMX06A, mapped to GoPro's IQ bin naming:
| Sony drive mode | Active pixels | Max fps (C-PHY) | Output | GoPro IQ bin |
|---|---|---|---|---|
| Full resolution | 8192 x 6144 | 30 fps | RAW10 | H1V1 (8K video, Star Trail) |
| Binned (2x each axis) | 4096 x 3072 | 120 fps (RAW10) or 60 fps (RAW12) | RAW10/RAW12 | H2V2 (standard 4K) |
| Diagonal Binning QBC HDR | 4096 x 3072 | 60 fps | RAW10 | H2V2 DCG (4K HDR) |
| Low Noise 2 binned | 4096 x 3072 | 60 fps | RAW10 | - |
The GoPro HERO11 to HERO13 used the Sony IMX677L on a standard SLVS-EC interface. The IMX06A is a larger-format part with significantly more pixels and the C-PHY interface to push the additional bandwidth. The RTOS selects from multiple hardware readout modes depending on the use case.
Pixel binning is a hardware operation where the sensor combines adjacent photosites into a single output value before the data leaves the chip. In 2x2 binning, each block of four neighboring pixels is merged into one, cutting both horizontal and vertical resolution in half: the IMX06A's 8192x6144 array becomes 4096x3072. In 4x4 binning, sixteen pixels merge into one, producing 2048x1536. Once combined, the individual pixel values are gone permanently; no amount of processing downstream can recover the discarded spatial detail. The tradeoff is bandwidth: a binned frame contains a fraction of the data of a full resolution frame, which is how the same sensor interface that caps at 30fps full resolution can sustain 120fps in 2x2 mode.
The Full Image Pipeline
This is the section that has never been publicly documented. Here is the full signal path from sensor to compressed video.
Sony IMX06A (C-PHY)
│
▼
┌────────────────────────────────────────────────────┐
│ SRO2-3: Sensor Readout + Stitch │
│ (dual half-width windows stitched to full frame) │
└────────────────────────────────────────────────────┘
│ │
│ [ night only "VAI" ]
│ ┌──────────────────▼───────────────────┐
│ │ XM6 DSP Pre-process: │
│ │ 12-bit Bayer → 64-ch 8-bit tiles │
│ └──────────────────┬───────────────────┘
│ │
│ ┌──────────────▼───────────────┐
│ │ VIPLite NPU 720 MHz │
│ │ 8 cores, 25 ms/frame │
│ │ 4 gain-select CNN networks │
│ └──────────────┬───────────────┘
│ │
│ ┌──────────────────▼────────────────────┐
│ │ XM6 DSP Post-process: │
│ │ 32-ch 16-bit → 14-bit denoised Bayer │
│ └──────────────────┬────────────────────┘
│◄──────────────────────────┘ (replaces raw SRO output)
│
┌─────▼───────────────────────────────────────────────────┐
│ FIXED-FUNCTION ISP (always active) │
│ │
│ B2B1: Bayer-to-Bayer │
│ (black level, lens shading) │
│ │ │
│ B2R1: Bayer-to-RGB │
│ │ │
│ HDR: HDR tone mapping / DR compression │
│ │ │
│ LTM1: Local Tone Mapping │
│ │ │
│ R2Y1: RGB-to-YUV color space conversion │
│ │ │
│ CNR1: Chroma Noise Reduction (always ON) │
│ │ │
│ TDNR: Temporal/Spatial Denoise - mode gated │
│ │
└─────────────────────────────────────────────────────────┘
│
┌─────▼────────────────────────────────────────┐
│ IPU │
│ (Image Processing Unit: EIS, crop, format) │
└──────────────────────────────────────────────┘
│
┌─────▼──────────────────┐
│ WAVE6 x 2 │
│ H.265 encode │
│ (recording only) │
└────────────────────────┘
Stage 1: SRO and Fixed-Function ISP
Each photosite in the IMX06A captures only one color channel (red, green, or blue) arranged in a "mosaic" pattern called a Bayer array. Each stage in the Bayer pipeline works on the raw color-channel data directly without interpolating; only at Bayer-to-RGB does the image become full-color RGB.
SRO2-3 (Sensor Readout, Power Domain 10) is the first hardware stage after the sensor. It manages the dual-window readout from the IMX06A: the sensor outputs two independent Bayer windows that SRO stitches into a single frame. This is where the Stitch algo node operates. In AI night mode (VAI in GoPro's terminology), the SRO output is intercepted at this point and routed through the DSP/NPU/DSP denoising chain before the ISP sees it (see Stage 2). In all other modes, the SRO output flows directly into Bayer-to-Bayer.
All capture modes (4K30, 8K, slow-motion, Star Trail, and VAI night) run through the fixed-function ISP pipeline. This hardware runs in the Socionext Milbeaut SoC and has been present, albeit upgraded, since GP1. It is not glamorous but it is doing most of the image quality work in the majority of modes.
B2B1 (Bayer-to-Bayer, Power Domain 12) applies corrections that must happen before any color interpretation. Black level correction subtracts the sensor's baseline voltage offset so that true black reads as zero rather than as a small positive value. Without this, shadows are lifted and colors shift. Lens shading compensation corrects for the natural brightness falloff from lens center to corners (every lens passes slightly less light at the periphery), using the static UnifiedShading 65x65 map loaded at scenario start. Defective pixel correction interpolates over known-bad photosites. Output is still a Bayer mosaic.
B2R1 (Bayer-to-RGB, Power Domain 13) performs demosaic: the reconstruction of a full-color RGB value at every pixel. Because each photosite only recorded one color channel, the missing two channels at each location are estimated by interpolating from the surrounding pixels of the correct color. This is the step that converts the raw sensor mosaic into a recognizable image.
HDR (High Dynamic Range, Power Domain 14) performs tone mapping for high dynamic range captures. On the standard H2V2 DCG-HDR readout mode used for 4K video, the sensor simultaneously captures two exposures per pixel using different conversion gains; this block merges the high-gain (shadow detail) and low-gain (highlight detail) data into a single output image. On SDR recordings it is essentially a pass-through.
LTM1 (Local Tone Mapping, Power Domain 15) applies contrast enhancement independently across spatial zones rather than globally across the whole frame. A global tone curve applied uniformly would either blow out highlights or crush shadows depending on the scene's overall brightness; local tone mapping lifts shadows and rolls off highlights in each zone independently. This is what gives GoPro footage its characteristic "punchy" look straight out of camera.
R2Y1 (RGB-to-YUV, Power Domain 16) converts from RGB color space to YUV. YUV separates brightness (the Y, or luma, channel) from color (the U and V, or chroma, channels). Human vision is significantly more sensitive to brightness variation than to color variation, which means color can be represented at lower resolution without visible quality loss. All downstream video compression exploits this by compressing the chroma channels more aggressively than luma. After this stage, all further processing operates in YUV.
CNR1 (Chroma Noise Reduction, Power Domain 17) handles independent smoothing of the U and V color channels without affecting luma. Noise in the color channels manifests as random per-pixel color shifts (colored speckling on what should be a uniform surface). CNR suppresses this while preserving luminance detail. Runs in every mode.
TDNR (Temporal Denoiser, Power Domain 18) is the temporal/spatial denoiser, the main video noise reduction block in standard modes. Unlike CNR, which works within a single frame, TDNR compares consecutive frames. Areas of the frame that are stationary between frames (where pixel values should be identical) are averaged over time to suppress random noise; moving areas fall back to spatial-only processing to avoid ghosting artifacts on the subject. This block is mode-dependent:
| Mode | TDNR | Reason |
|---|---|---|
| 4K video (all variants, including VAI) | ON | DDR bandwidth available |
| 8K continuous video | OFF | DDR bandwidth constraint: WAVE6 alone saturates available bus |
| Star Trail 8K (1 fps stills) | ON | Low frame rate gives plenty of bandwidth |
| Preview / viewfinder | OFF |
Stage 2: VAI Neural AI Path
The AI pipeline activates in usecases with the _CNN_RAW_DENOISE_VAI suffix. The firmware ships with three VSNET model files: VAI_VSNET_4K30_AR16_9, VAI_VSNET_4K60_AR16_9, and VAI_VSNET_4K30_AR4_3 (4K30 16:9, 4K60 16:9, 4K30 4:3).
Sensor Configuration in VAI Mode
When VAI activates, the RTOS switches the IMX06A to sensor mode 0 (H1V1), full resolution unbinned readout. This is different from the H2V2 binning used in standard 4K. The CNN needs full resolution spatial detail; binning would pre-average the noise the network is trying to characterize and remove.
The tradeoff: H1V1 supports ISO up to 20000 and gain up to 96x. Standard 4K H2V2 caps at ISO 6400, gain 32x. The network extends usable low-light range by about 1.5 stops beyond what TDNR alone can handle cleanly.
DSP Pre-Processing
4K30 VAI Mode - Night HIGH quality 4K30 16:9
SANDBAR_VIDEO_4K_AR16_9_30_HIGH_CNN_RAW_DENOISE_VAI
The XM6 DSP performs the first transformation: it takes the 12-bit Bayer RAW from the sensor (4 channels: R, Gr, Gb, B) at 2048x1536 and converts it to 64-channel 8-bit feature tiles at 512x770. The network processes the full frame as a 4x2 grid of these tiles, where 512x4 = 2048 (full width) and 770x2 = 1540 (close to the 1536 active rows, with 4 rows of convolution padding). This is confirmed by the network name suffix logged at runtime: net_X_CS_1540_2048. The DSP is acting as a shape adapter, repacking the Bayer sensor output into the tensor format the VIPLite NPU expects.
The full input buffer is ~24 MB (2048x1536 x 4 channels x 2 bytes, stored as 16-bit aligned = 25,165,824 bytes). The NPU input tile buffer is also ~24 MB (512x770 x 64 channels x 1 byte = 25,231,360 bytes). The DSP takes approximately 8 ms to complete this transformation per frame.
4K60 VAI Mode - Night HIGH quality 4K60 16:9
VAI_SANDBAR_VIDEO_4K_AR16_9_60_CNN_RAW_DENOISE_VAI
As at 4K30, the model is gain-banded: the firmware loads several networks that are identical in structure but tuned for different gain ranges, and the RTOS selects one per frame based on the current sensor gain.
The meaningful difference between the two modes is how much of the sensor each one feeds the network. At 4K30 the denoiser works on a full 2048x1536 four-channel RAW frame. At 4K60 it works on roughly half that, about 1664x936 per channel, and the network itself is lighter (32 feature channels instead of 64). The reason is simply the frame budget. At 30fps the pipeline has twice the per-frame time, so it can afford to read and process more of the sensor. At 60fps it trims the input to keep up.
The practical consequence is that 4K30 is the stronger low-light mode. It pulls more spatial detail off the sensor for the network to separate signal from noise, and it reaches a higher ISO ceiling (ISO 20000 versus 12800 at 4K60). If you are shooting dark scenes and can accept 30fps, 4K30 gives the denoiser more to work with.
VIPLite NPU Inference
The NPU runs one of four sub-networks per frame, chosen by the current sensor gain so the denoiser always matches the noise level of the exposure:
| Network | Gain Range | ISO Range | Noise Level |
|---|---|---|---|
| net_0 | 8 - 16 | 200 - 400 | Low |
| net_1 | 16 - 32 | 400 - 800 | Medium |
| net_2 | 32 - 64 | 800 - 1600 | High |
| net_3 | 64+ | 1600 and up | Max |
The gain brackets are logged directly. The ISO figures are derived from the mode's ISO 200 floor at gain 8 and are approximate, with the top band carrying up to the mode's ceiling (ISO 12800 in the 4K60 capture).
The four sub-networks are gain-banded variants packed into a single VSNET model file, loaded from eMMC when recording starts. The aspect-ratio suffix in the file name (AR16_9, AR4_3) selects which model loads for the mode. The CNN's input geometry is set per mode and does not always match the recording aspect.
The network is non-recurrent. It holds no state between frames and denoises each frame on its own, using only the learned noise model for the current gain bracket. This is the opposite of TDNR, which works by comparing consecutive frames. Both run together in low light: VAI removes spatial noise in the Bayer RAW domain before the ISP, and TDNR cleans residual temporal noise in the processed YUV downstream.
DSP Post-Processing
The NPU outputs 32-channel 16-bit feature tiles (512x770x32x2 = 25,231,360 bytes, confirmed in the DSP payload log). The channel count halves from the 64-channel 8-bit input, with precision doubling to 16-bit to preserve the output fidelity of the denoising operation. The DSP takes these and converts back to a 4-channel 14-bit Bayer image at 2048x1536, confirmed by a prec=14 buffer of 25,165,824 bytes in the session log.
The output is 14-bit Bayer, two bits wider than the 12-bit input. This is not adding information; it is expanding precision to preserve the subtlety of the denoising operation. The denoised 14-bit Bayer replaces the raw SRO output as the ISP input and enters the full ISP pipeline from B2B1 onward, going through B2R1, HDR, LTM, R2Y, CNR, and TDNR exactly as normal.
One detail worth noting: after denoising, the DSP intentionally re-injects a small amount of synthetic noise according to a gain-dependent LUT (gain 8 add 80, gain 16 add 85, gain 32 add 90, gain 64 add 90). The alpha_ratio: 80 field logged per frame at gain=8 confirms this directly. This prevents the over-smooth "plastic" look that pure neural denoising tends to produce; the camera adds back just enough texture to look photorealistic.
Timing Budget
At 30fps each frame has a 33.3 ms budget. The three pipeline stages sum to 47 ms on paper (DSP pre-process ~8 ms, NPU inference ~25 ms, DSP post-process ~14 ms) which appears to exceed the budget. What makes 30fps sustainable is double-buffered execution with overlapping stages. While the NPU is running inference on frame N (25 ms), the DSP is simultaneously pre-processing frame N+1 in a separate memory buffer. The DSP post-process for frame N runs in parallel with the next frame's pre-process. The critical path per frame is therefore DSP-pre + NPU = 8 + 25 = 33 ms, which fits inside the 33.3 ms window.
| Stage | Time | Overlap |
|---|---|---|
| DSP pre-process (Bayer to tiles) | ~8 ms | Runs concurrently with previous frame's NPU inference |
| VIPLite NPU inference | ~25 ms | Dominant stage; defines the critical path |
| DSP post-process (tiles to Bayer) | ~14 ms | Runs concurrently with next frame's DSP pre-process |
| Critical path (pre + NPU) | ~33 ms | Fits within the 33.3 ms frame window |
Stage 3: EIS, Stabilization, and Codec
After the ISP (and VAI if active), the Distortion Correction Engine (DCE), a dedicated hardware block inside the IPU, applies Hypersmooth electronic image stabilization. Understanding what DCE actually does requires unpacking two separate problems it solves simultaneously.
The rolling shutter problem. Camera sensors do not capture all rows of a frame at the same instant. The IMX06A reads out rows sequentially from top to bottom; at 4K30 resolution with 3072 active lines at 2.57 µs per line, the full frame takes 7,889 µs (nearly 8 ms) to read out. If the camera moves during that window (which it always does to some degree) each row was captured from a slightly different camera position. On fast lateral motion this produces the characteristic horizontal skew known as the "jello" effect. DCE corrects this by applying a per-row geometric warp using IMU timestamps aligned to the readout timing, bringing every row back to a common reference epoch. The time offset of 3,944 µs (half the readout duration) is the reference against which per-row corrections are computed.
The stabilization problem. Even with rolling shutter corrected, the raw footage still reflects every hand shake and mount vibration. Basic electronic image stabilization works by using only a central portion of the sensor output and shifting the crop window opposite to camera motion: if the camera tilts up, the crop moves down, keeping the horizon level. GoPro's implementation goes considerably further.
DCE performs a per-frame mesh warp across a grid of control points rather than shifting a rectangular crop window. In the 4K30 session log this grid is 18x17 points for the primary recording stream. This dense displacement grid allows DCE to simultaneously correct rolling shutter skew, apply gyro-based motion compensation, account for the lens's barrel distortion, and handle the aspect ratio conversion from the sensor's 4:3 output to the 16:9 recording format, all in a single hardware pass. Because the warp is computed from a gyro trajectory rather than from a pixel comparison motion estimate, it can correct rotational motion (the most common type of camera shake) accurately without the smearing artifacts that optical-flow-based approaches produce on fast motion.
Three independent streams are warped simultaneously from the same per-frame gyro solution. The values below are from the 4K30 16:9 log session; input dimensions and grid sizes scale with the active recording mode:
| Stream | Input (4K30) | Output (4K30) | Grid (4K30) | Notes |
|---|---|---|---|---|
| Primary (recording) | 4096 x 3072 | 3840 x 2160 | 18 x 17 | Optical center (2048, 1536); scale ~0.87 x 0.85; Quality constraint |
| Secondary (encode) | 1920 x 1440 | 1920 x 1080 | 14 x 13 | Optical center (960, 720) |
| Live preview (LCD) | 1280 x 960 | 848 x 480 | 18 x 17 | Fast constraint mode (lower latency than Quality) |
The live preview path uses a Fast solver constraint rather than the Quality constraint applied to the recorded stream, allowing the viewfinder to update with lower latency at the cost of some smoothness fidelity.
Lookahead buffering. What separates Hypersmooth from basic EIS is not the warp itself but the planning horizon. The stabilization algorithm (gpeis / Hypersmooth::HypersmoothAdapter) maintains a delayed output path that buffers frames ahead of the recording position. By planning a smoothed camera trajectory across many future frames, rather than reacting to each shake as it arrives, the algorithm can anticipate motion and begin compensating before a disturbance reaches the output. Two parallel output paths handle recording and viewfinder independently:
useDelayedPath = 1 (recording)
useLivePath = 1 (viewfinder / preview)
The lookahead depth scales with the stabilization tier:
| Tier | fps | Lookahead frames | Lookahead time |
|---|---|---|---|
| AutoBoost | 29.98 | 125 | ~4.2 s |
| On (4K60) | 59.97 | 65 | ~1.1 s |
| On (4K30) | 59.96 | 35 | ~0.6 s |
| Lite | 29.98 | 15 | ~0.5 s |
| On (Super SloMo ~240fps) | 239.86 | 248 | ~1.0 s |
AutoBoost gets the largest window (over four seconds of future context at 4K30) because it activates in the VAI Night path where motion tends to be slow and sustained, benefiting most from long-horizon trajectory planning. Lite mode has no live buffer and the shortest delayed window, trading smoothness quality for lower latency.
IMU and sensor fusion. Stabilization relies on an MEKF (Multiplicative Extended Kalman Filter) running in the sensor fusion module. A Kalman filter is a recursive estimator that maintains a probabilistic model of camera orientation and updates it continuously as new gyro and accelerometer data arrives, optimally weighting noisy sensor readings against a motion prediction model. The Multiplicative variant handles rotational state correctly on its non-Euclidean manifold, avoiding the gimbal lock problems of simpler Euler-angle approaches. The IMU-to-lens transform is calibrated at (0, 0, 0): no angular offset between the gyro and the optical axis in the Mission 1 PRO.
SmartAe integration. EIS exports per-frame motion blur magnitude back to the SmartAe autoexposure algorithm via Eis::getMotionBlur. Normally autoexposure has no knowledge of camera motion speed; this coupling means that when the camera is moving rapidly, the AE system receives a direct signal to bias toward shorter exposure times to reduce subject blur, independent of the stabilization state.
Stabilization modes. The gpeis algorithm operates in two named modes (Sport and Standard) selected automatically by the active Image Tuning preset. Standard mode is used for the Underwater, Face, and Balanced presets, and is also forced by the VAI Night AutoBoost path regardless of what preset the user has selected. The difference between modes reflects expected camera motion patterns: Sport is tuned for aggressive, fast direction changes; Standard is tuned for slower, more sustained motion.
The Underwater preset additionally loads a distinct lens distortion profile that accounts for the refractive index of water (n = 1.33 vs air n = 1.0) at the port of a flooded housing. Refraction at the housing port changes the effective focal length and distortion geometry of the lens. Without correction, the warp grid would be applying air-calibrated geometry to an optically different system. The underwater profile bakes the correction for this directly into the distortion model so Hypersmooth remains geometrically accurate when shooting submerged.
Bypass mode (EIS off) still corrects for rolling shutter. When the user disables stabilization, the DCE enters a bypass path but still adds extra pixel margin for rolling shutter correction (Adding 5.90 pix of extra margin for ERS). The smoothing solver is not invoked, but per-row readout geometry compensation still runs. Rolling shutter correction cannot be disabled independently of EIS.
DCE as dedicated IPU hardware. The gpeis library runs on the ARM CPU to compute per-frame warp grid parameters from gyro data; the IPU DCE hardware performs the actual pixel interpolation at full resolution. DCE operates as a dedicated AXI bus master, confirmed in the DDR QoS log. This split keeps the ARM workload to geometry computation while offloading the bandwidth-heavy warp operation to dedicated silicon.
Codec. The warped output goes directly to the WAVE6 H.265 encoder (Power Domain 2 and 3). WAVE6 is a Chips&Media multi-codec IP block integrated into the Socionext Milbeaut SoC, not a separate chip. The IP supports AV1, HEVC, and AVC encoding at the hardware level; GoPro's firmware configures H.265 for both the main recording (MRV) and proxy (LRV) streams.
What Actually Runs on the NPU vs ARM CPU
A common misconception is that GoPro's AI features (scene recognition, subject tracking) run on the NPU. They do not. Only the RAW denoiser uses the NPU.
| Feature | Runs on |
|---|---|
| Low-light RAW denoising (Night VAI) | VIPLite NPU at 720 MHz |
| Scene classification (SceneCl) | ARM CPU (SC_FOREST decision forest, 3-13 Hz) |
| Subject detection (tracking) | ARM NEON (FotoNation LibODPro-TPK-3.0.10.101) |
| Still image denoising (GDNet) | VIPLite NPU (present in firmware, not observed active) |
| Temporal video denoising (TDNR) | Fixed-function Milbeaut ISP silicon |
| Chroma noise reduction (CNR) | Fixed-function Milbeaut ISP silicon |
| Video encode | Chips&Media WAVE6 dedicated silicon |
The NPU does one job in video recording: process the Bayer frame in the RAW domain at night. Everything else is either dedicated fixed-function hardware or ARM CPU code.
Motion Blur and SmartAe Modes
The Motion Blur setting in the Mission 1 PRO UI directly selects the AE (autoexposure) algorithm mode inside SensorController's SmartAe engine. Autoexposure continuously adjusts shutter speed and ISO gain to hit a target brightness; the mode determines how aggressively it prioritizes shorter shutter times to freeze motion versus longer shutter times to produce cinematic blur. This is a purely ARM CPU decision, with no NPU or DSP involvement.
| MOBL | UI name | Internal AE mode | AE mode string | CinematicStrength |
|---|---|---|---|---|
| 0 | Max Blur Reduction | 6 | AutoJitterReductionOff | N/A |
| 1 | Adaptive | 0 | Auto | N/A |
| 2 | Cinematic Low | 3 | Cinematic | Low |
| 3 | Cinematic Medium | 3 | Cinematic | Medium |
| 4 | Cinematic High | 3 | Cinematic | High |
Max Blur Reduction (MOBL=0 / AutoJitterReductionOff) disables SmartAe's jitter reduction, the hysteresis normally applied to shutter decisions to prevent inter-frame flicker. Without it, AE reacts aggressively to minimize exposure time, driving toward the fastest shutter speed the light level allows. ISO climbs as needed. The name "AutoJitterReductionOff" refers to this smoothing layer being removed from the AE control loop, not EIS jitter.
Adaptive (MOBL=1 / Auto) is the standard SmartAe auto mode, the default for all recorded sessions in existing logs (Sport image tuning preset). Shutter and ISO adapt together with normal hysteresis active.
Cinematic variants (MOBL=2-4 / Cinematic) all use the same Cinematic AE mode (aeMode=3), differentiated only by CinematicStrength (Low/Medium/High). The Cinematic mode targets a specific amount of motion blur rather than minimizing it; it biases exposure time toward longer values to achieve the look. Cinematic High aims for roughly 180-degree shutter rule behavior at the set framerate. ISO and shutter range (PSRI=5) are unchanged across all five modes.
Ambient Light Sensor and Flicker Detection
The AMS TCS3530 ambient light and color sensor deserves a mention because it feeds directly into the image pipeline. The RTOS runs two ISG (Image Signal Generator) modules continuously:
-
isg::als::ambient_light: reads TCS3530 lux and color temperature, outputs a lux gain value used by the IQ (Image Quality) pipeline to adjust tone curves and noise reduction thresholds -
isg::als::flicker_detection: detects 50 Hz or 60 Hz mains frequency from the TCS3530's high-speed sampling, adjusts sensor integration time to avoid banding under fluorescent or LED lighting
This is the same class of functionality you find in flagship smartphones. The camera is continuously measuring the ambient light environment and tuning its processing accordingly.
Comment
I did my best to reconstruct the GP3/Mission 1 image processing pipeline. I am only human so there may be small errors in this writing and my understanding, but I am fairly confident with what I have committed here. I continuously update my articles so check back occasionally. I have not dived deep into the burst slo-mo modes and feel an overview table is needed to truly understand the processing of each combination of modes.
Summary
The Mission 1 PRO's image pipeline has five distinct processing stages between sensor and compressed file:
- Sensor readout + SRO: IMX06A via C-PHY; SRO2-3 stitches dual half-width sensor windows to a full resolution frame (window dimensions scale with readout mode); H1V1 full res for 8K and VAI night, H2V2 DCG-HDR for standard 4K, H4V4 for slow-motion and preview
- VAI RAW denoising (night mode only, postSRO): XM6 DSP tiles the 12-bit Bayer data, VIPLite NPU denoises at 720 MHz, XM6 reconstructs 14-bit Bayer; replaces the raw SRO output before the ISP
- Fixed-function ISP: B2B1 > B2R1 > HDR > LTM > R2Y > CNR > TDNR (always active; input is raw SRO output in standard modes or CNN-denoised 14-bit Bayer in VAI mode; TDNR mode-dependent)
- IPU / DCE: per-frame mesh warp for EIS, simultaneous ERS rolling shutter correction, aspect ratio crop, dual delayed/live output paths
- WAVE6 encode: H.265 compression
The NPU is real, it activates, and it does make a measurable difference in night video. But it runs in exactly one mode. The majority of GoPro's AI-branded features (subject tracking, scene recognition) are ARM CPU algorithms that were already present in GP2 and have not changed. The marketing claim of "dedicated AI cores for scene recognition" is misleading: those features do not touch the NPU.
What did genuinely change from GP2: the denoising architecture moved from a CEVA NeuPro inside the XM6 to a standalone VIPLite block with considerably more compute; the sensor moved to a larger-format IMX06A with a dual-window SRO readout path that did not exist in GP2; and the ISP gained fine-grained per-block power domains replacing the two coarse voltage rails of GP2.
This took way too long to research so thank you for reading,
-Trunk