[논문리뷰] LEDNet: Joint Low-light Enhancement and Deblurring in the Dark

Dec 13, 2022

LEDNet

LEDNet: Joint Low-light Enhancement and Deblurring in the Dark

paper: https://arxiv.org/pdf/2202.03373.pdf

github: GitHub - sczhou/LEDNet: [ECCV 2022] LEDNet: Joint Low-light Enhancement and Deblurring in the Dark

Intro

어두운 환경에서의 사진을 촬영 할 때 motion blur, low light 는 항상 공존하는 문제입니다. 이를 독립된 task로 수행하지 않고, 하나의 single task로 학습하여 기존의 문제점들을 해결합니다.

대부분의 LLIE 모델은 exposure correction + denoising 에 초점을 맞추고 있어, over-smoothing으로 일어날 수 있는 정보 손실이 일어날 수 있습니다.

또한, 기존처럼 분리된 task를 순서대로 실행 (low light → deblurring, deblurring → low light) 하게 될 경우 효과가 없거나(c,d), 오히려 blur가 더 심해지는 경우(b)가 발생합니다.

결국 두 가지 task를 통합하여 end-to-end로 모델을 학습시켜야 하는데, low-light blurry and normal-light image pair에 대한 적절한 데이터셋이 존재하지 않고, 취득하기도 까다롭다는 문제가 있습니다.

때문에 저자는 이 문제를 해결하기 위해 realistic image synthesis method을 사용하여 새로운 dataset (LOL-blur)을 구성하였습니다.

위의 과정을 통해 얻은 dataset을 학습시키기 위해, 논문에서는 low light enhancement에 특화된 Encoder, deblurring에 특화된 decoder 를 구성하고 이 둘을 Adaptive skip connection으로 이어준 새로운 Encoder-decoder 구조의 LEDNet 을 제시합니다.

Image synthesis

(Image generation에 대한 부분은 간략하게 설명하고 넘어가겠습니다. 바로 Network 문단으로 넘겨도 상관이 없습니다.)

Data 한 장을 생성하기 위해 카메라로 250fps로 비디오 촬영 → VBM4D denoising → 7~9 frame 을 가져와 사용합니다.

Exposure-Conditioned Zero-DCE (EC-Zero-DCE)
- zero-dce에 reversed curve adjustment를 적용합니다.
- 기존 exp loss 를 고정된값의 exposure value로 변경, 나머지 loss들은 그대로 사용하여 학습합니다.
- 결론적으로 전체적으로 light degradation 하는게 아니라, zero-dce를 통해 pixel-wise, spatially-varying 하게 exposure level을 조절하여 더 나은 품질의 low light image 생성 가능하게 됩니다.
Frame Interpolation
- discontinuous blur를 방지하기 위해 frame interpolation network를 통해 2000fps으로 향상합니다.
Clipping Reverse for Saturated Region
- if lightness channel $L$ > 98 → $s = s + r$
Frame averaging
- 2000 fps → 24 fps averaging
Add blur, noise
- Gaussian kernel을 사용해 blur를 만들고, CycleISP 를 통해서 realistic noise를 추가합니다.
  
  (CycleISP: https://arxiv.org/pdf/2003.07761.pdf)

좌: 생성된 이미지, 우: 원본 HDR 이미지

(hdr image를 EC-Zero-DCE에 통과시킨 결과. blur, noise는 추가되지 않았습니다.)

Network

LEDNet, Image from paper.

Encoder

Encoder는 lowlight enhancement에 초점을 맞추어 구성되어있습니다.

3개의 scale block으로 이루어져있으며, 각 block 에는 Residual downsample block, PPM(Pyramid pooling module), Curve Non-Linear Unit(Curve NLU) 이 존재합니다.

PPM (Pyramid Pooling Module)

LLIE에서, 특히 high resolution image를 input으로 넣었을 때 local artifact가 생기는 고질적인 문제가 있습니다.

LEDNet에서는 PPM block 을 사용하여 global context를 feature에 더해주어 해당 문제점을 개선하였습니다. 논문에서는 1,2,3,6 size의 mean pooling을 적용하였습니다.

Ablation study에서 without PPM 의 경우, global prior를 잘 캐치하지 못할 뿐 만 아니라 local artifact가 생기는것을 통해 PPM block의 유효성을 검증하였습니다.

Curve Non-Linear Unit

Zero-DCE와 유사하게, LEDNet에서도 iterative high-order curve function을 사용합니다.

position coordinates of features $p$ 에 대해 다음과 같은 식에 대입합니다.

여기서 $A_{n-1}$ 은 $n$ 번째 estimated curve의 curve parameter를 의미합니다.

Decoder

LE-Encoder에서 enhanced된 feature를 input으로 사용하기 때문에, Decoder에서는 Deblurring에 더 집중을 할 수 있게 됩니다.

FASC (Filter Adaptive Skip Connection)

Deblurring은 spatially varying problem이기 때문에 dynamic spatial kernel을 사용하는것이 일반적입니다.

하지만 LE-Encoder 각 block의 output feature는 Curve-NLU를 통해 pixel-wise로 계산되어 나온 feature이기 때문에 spatial information이 부족하게 됩니다.

따라서 논문에서는 skip connection을 하기 전에 decoder output feature의 각 element에 대해 FAC(Filter Adaptive Convolutional) layer를 통과시켜 dynamic convolution filter의 효과를 내게 합니다.

FAC layer의 dynamic filter는 LE-Encoder $n$ th block의 output feature를 3x3, 1x1 conv에 통과시켜 만들어 지게 됩니다.

생성된 filter에 decoder output를 통과시켜 준 뒤 encoder feature와 skip connection을 더하여 최종적으로 Filter Adaptive Skip Connection을 수행하게 됩니다.

FAC layer는 일반적인 Conv layer와는 달리 sample-specific 하기 때문에 spatially variant information을 더 효과적으로 잡을 수 있게 해줍니다.

Code

# FAC filter
kernel = nn.Sequential(
            nn.Conv2d(ch4, ch4, 3, stride=1, padding=1), nn.ReLU(),
            nn.Conv2d(ch4, ch4, 3, stride=1, padding=1), nn.ReLU(),
            nn.Conv2d(ch4, ch4, 3, stride=1, padding=1), nn.ReLU(),
            nn.Conv2d(ch4, ch4* ks_2d**2, 1, stride=1))

# FAC layer
class KernelConv2D(nn.Module):
    def __init__(self, ksize=5, act=True):
        super(KernelConv2D, self).__init__()
        self.ksize = ksize
        self.act = act

    def forward(self, feat_in, kernel):
        channels = feat_in.size(1)
        N, kernels, H, W = kernel.size()
        pad = (self.ksize - 1) // 2

        feat_in = F.pad(feat_in, (pad, pad, pad, pad), mode="replicate")
        feat_in = feat_in.unfold(2, self.ksize, 1).unfold(3, self.ksize, 1)
        feat_in = feat_in.permute(0, 2, 3, 1, 4, 5).contiguous()
        feat_in = feat_in.reshape(N, H, W, channels, -1)

        kernel = kernel.permute(0, 2, 3, 1).reshape(N, H, W, channels, -1)
        feat_out = torch.sum(feat_in * kernel, -1)
        feat_out = feat_out.permute(0, 3, 1, 2).contiguous()
        if self.act:
            feat_out = F.leaky_relu(feat_out, negative_slope=0.2, inplace=True)
        return feat_out