CNN과 손글씨 도형 분류 - 합성곱 신경망, 학습, FastAPI·Gradio 서빙

합성곱부터 이미지 분류·API 서빙까지

CNN(Convolutional Neural Network, 합성곱 신경망) 은 이미지의 공간적 구조를 살리면서 특징(에지, 모양 등)을 추출하는 딥러닝 모델이다. 이번 포스트에서는 CNN의 합성곱·스트라이드·패딩·풀링과 DNN 대비 가중치 공유를 짚은 뒤, 손글씨 도형(○, △, ×) 데이터로 CNN을 학습시키고, 모델 저장·불러오기를 정리한다. 마지막으로 학습된 모델을 FastAPI로 REST API로 서빙하고 Gradio로 웹 인터페이스를 붙여 본다.

(CNN, 손글씨 도형 분류하기, 손글씨 도형 분류 FastAPI로 서빙을 기반으로 재구성)

1. CNN이란?

CNN은 합성곱 계층(Convolution Layer) 으로 이미지에서 특징을 추출하고, 풀링 계층으로 크기를 줄인 뒤, 완전 연결 계층(FC) 으로 최종 클래스를 예측한다. 이미지 분류, 객체 탐지, 영상 처리 등에서 널리 쓰인다.

※ 영상처리에서 CNN을 쓰는 이유

작은 필터로 에지·텍스처 등을 효율적으로 추출하고, 같은 필터를 여러 위치에 써서 학습 파라미터 수를 줄인다.
이미지가 조금 이동·변형되어도 특징을 잘 잡을 수 있고, 단순한 패턴부터 복잡한 패턴까지 단계적으로 학습할 수 있다.
DNN은 픽셀마다 별도 가중치를 두어 계산이 무겁고, 공간 구조를 잘 반영하지 못한다.

2. 입력 이미지

컴퓨터는 이미지를 숫자 행렬로 다룬다. 흑백은 밝기 0~255, 컬러는 RGB 채널별 행렬로 표현된다. PyTorch에서는 배치×채널×높이×너비 (N, C, H, W) 형태를 사용한다.

3. 합성곱 계층 (Convolution Layer)

합성곱 계층은 필터(커널) 를 이미지 위로 움직이면서, 해당 영역과 element-wise 곱의 합을 계산해 특징 맵(feature map) 을 만든다. 필터는 학습되며 에지·텍스처 등 패턴을 감지하도록 바뀐다.

3-1. 합성곱 연산

필터와 겹치는 영역의 픽셀을 곱한 뒤 모두 더해 하나의 값으로 만든다. 예: (1×1)+(2×0)+(3×1)+(2×1)+(1×0)+(0×1)+(3×0)+(0×1)+(1×0) = 6.

3-2. 스트라이드 (Stride)

필터가 한 번에 이동하는 칸 수다. stride=1이면 한 칸씩, stride=2면 두 칸씩 이동해 출력 크기가 줄어들고 연산량도 감소한다. 너무 크면 세부 정보가 손실될 수 있다.

3-3. 패딩 (Padding)

입력 가장자리에 값(보통 0) 을 붙여 출력 크기를 조절하거나 경계 정보 손실을 줄인다. same 패딩은 입력과 비슷한 크기의 출력을 만들 때 쓴다.

3-4. 풀링 (Pooling)

특징 맵의 크기를 줄이고 중요한 정보만 남긴다. Max Pooling 은 작은 영역에서 최댓값을 취하고, Average Pooling 은 평균을 취한다. 위치 변화에 덜 민감해지고 연산량·과적합을 줄이는 데 도움이 된다.

4. DNN과 CNN의 가중치·편향

구분	DNN	CNN
연결	각 입력이 모든 출력과 연결 (완전 연결)	필터로 일부 영역만 처리
가중치	위치마다 다른 가중치 → 수가 매우 많음	가중치 공유: 같은 필터를 전체에 적용 → 수 적음
편향	노드마다 별도 편향	필터마다 하나의 편향, 적용 결과 전체에 더함

5. 다채널 합성곱

컬러 이미지(3채널 RGB)처럼 입력 채널이 여러 개일 때는, 필터도 채널별로 두고 각 채널끼리 합성곱한 뒤 채널 방향으로 합해 하나의 출력 값으로 만든다. 스트라이드에 따라 필터를 이동시키며 한 채널짜리 특징 맵을 만든다.

6. CNN 레이어 구성 요약

입력 레이어: 이미지를 (N, C, H, W) 텐서로 전달
합성곱 레이어: 작은 필터로 특징 추출, 여러 필터 사용
활성화(ReLU): 비선형 추가
풀링: 공간 크기 축소
완전 연결(FC): flatten 후 최종 클래스 수로 매핑
출력: Softmax 등으로 클래스별 확률

CNN 체험하기 에서 시각적으로 확인할 수 있다.

7. 간단한 CNN 모델 (PyTorch)

28×28 그레이스케일 입력 → Conv-ReLU-Pool 두 번 → Flatten → Dropout-FC → 10클래스 출력 예시다.

import torch
import torch.nn as nn

# 입력: 배치×채널(1: 그레이스케일)×높이×너비
inputs = torch.Tensor(1, 1, 28, 28)
print(inputs.shape)  # torch.Size([1, 1, 28, 28])

conv1 = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding='same'),
    nn.ReLU()
)
out = conv1(inputs)
print(out.shape)  # (1, 32, 28, 28)

pool1 = nn.MaxPool2d(kernel_size=2)
out = pool1(out)
print(out.shape)  # (1, 32, 14, 14)

conv2 = nn.Sequential(
    nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'),
    nn.ReLU()
)
out = conv2(out)
print(out.shape)  # (1, 64, 14, 14)

pool2 = nn.MaxPool2d(kernel_size=2)
out = pool2(out)
print(out.shape)  # (1, 64, 7, 7)

flatten = nn.Flatten()
out = flatten(out)
print(out.shape)  # (1, 3136)

fc = nn.Sequential(
    nn.Dropout(0.5),
    nn.Linear(3136, 10)
)
out = fc(out)
print(out.shape)  # (1, 10)

8. 손글씨 도형 분류 (○, △, ×)

그림판으로 그린 손글씨 도형 이미지(총 300장 등)를 train/test 폴더로 나누어 두고, ImageFolder로 불러와 CNN으로 3클래스(cir, tri, x) 분류를 한다. shape.zip을 압축 해제한 뒤 train/, test/ 아래에 클래스별 폴더(cir, tri, x)를 두면 된다.

8-1. 데이터 경로 및 transform

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

transform = transforms.Compose([
    transforms.Resize((28, 28)),
    transforms.Grayscale(1),
    transforms.ToTensor(),
    transforms.RandomInvert(1),   # 랜덤 반전 (1이면 항상 반전)
    transforms.Normalize((0.5,), (0.5,))
])

# 예: Colab '/content/.../shape/train' / Windows 'C:\\Soomin\\Python\\train'
train_path = '/본인 경로/shape/train'
test_path = '/본인 경로/shape/test'

trainset = torchvision.datasets.ImageFolder(root=train_path, transform=transform)
testset = torchvision.datasets.ImageFolder(root=test_path, transform=transform)
print(len(trainset), len(testset))
print(trainset.classes, testset.classes)

8-2. DataLoader 및 시각화

class_map = {0: 'cir', 1: 'tri', 2: 'x'}

loader = DataLoader(dataset=trainset, batch_size=64, shuffle=True)
imgs, labels = next(iter(loader))
print(imgs.shape, labels.shape)

fig, axes = plt.subplots(8, 8, figsize=(16, 16))
for ax, img, label in zip(axes.flatten(), imgs, labels):
    ax.imshow(img.reshape(28, 28), cmap='gray')
    ax.set_title(class_map[label.item()])
    ax.axis('off')

8-3. CNN 모델 정의

Conv → ReLU 두 번 → MaxPool → Dropout → 다시 Conv → ReLU 두 번 → MaxPool → Dropout → Flatten → Linear(5677, 3).

device = 'cuda' if torch.cuda.is_available() else 'cpu'

class ConvNeuralNetwork(nn.Module):
    def __init__(self):
        super(ConvNeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.classifier = nn.Sequential(
            nn.Conv2d(1, 28, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(28, 28, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),   # (28, 14, 14)
            nn.Dropout(0.25),
            nn.Conv2d(28, 56, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(56, 56, kernel_size=3, padding='same'),  # (56, 14, 14)
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),   # (56, 7, 7)
            nn.Dropout(0.25),
        )
        self.Linear = nn.Linear(56 * 7 * 7, 3)

    def forward(self, x):
        x = self.classifier(x)
        x = self.flatten(x)
        return self.Linear(x)

model = ConvNeuralNetwork().to(device)
print(model)

8-4. 학습 루프

loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def train_loop(train_loader, model, loss_fn, optimizer):
    model.train()
    sum_losses = 0
    sum_accs = 0
    for x_batch, y_batch in train_loader:
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)
        y_pred = model(x_batch)
        loss = loss_fn(y_pred, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        sum_losses = sum_losses + loss
        y_prob = nn.Softmax(dim=1)(y_pred)
        y_pred_index = torch.argmax(y_prob, dim=1)
        acc = (y_batch == y_pred_index).float().sum() / len(y_batch) * 100
        sum_accs = sum_accs + acc

    avg_loss = sum_losses / len(train_loader)
    avg_acc = sum_accs / len(train_loader)
    return avg_loss, avg_acc

epochs = 50
for i in range(epochs):
    print('-' * 50)
    avg_loss, avg_acc = train_loop(loader, model, loss_fn, optimizer)
    print(f'Epoch {i:4d}/{epochs} Loss: {avg_loss:.6f} Accuracy: {avg_acc:.2f}%')
print('Done!')

8-5. 테스트 및 예측 시각화

test_loader = DataLoader(dataset=testset, batch_size=32, shuffle=False)

def test(model, loader):
    model.eval()
    sum_accs = 0
    img_list = torch.Tensor().to(device)
    y_pred_list = torch.Tensor().to(device)
    y_true_list = torch.Tensor().to(device)

    for x_batch, y_batch in loader:
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)
        y_pred = model(x_batch)
        y_prob = nn.Softmax(dim=1)(y_pred)
        y_pred_index = torch.argmax(y_prob, dim=1)
        y_pred_list = torch.cat((y_pred_list, y_pred_index), dim=0)
        y_true_list = torch.cat((y_true_list, y_batch), dim=0)
        img_list = torch.cat((img_list, x_batch), dim=0)
        acc = (y_batch == y_pred_index).float().sum() / len(y_batch) * 100
        sum_accs += acc

    avg_acc = sum_accs / len(loader)
    return y_pred_list, y_true_list, img_list, avg_acc

y_pred_list, y_true_list, img_list, avg_acc = test(model, test_loader)
print(f'테스트 정확도: {avg_acc:.2f}%')

fig, axes = plt.subplots(4, 8, figsize=(16, 8))
for ax, img, y_pred, y_true in zip(
    axes.flatten(),
    img_list.cpu(), y_pred_list.cpu(), y_true_list.cpu()
):
    ax.imshow(img.reshape(28, 28), cmap='gray')
    ax.set_title(f'pred: {class_map[y_pred.item()]}, true: {class_map[y_true.item()]}')
    ax.axis('off')
plt.show()

9. 모델 저장과 불러오기

9-1. state_dict만 저장

가중치·편향만 저장한다. 모델 구조는 저장되지 않으므로 복원할 때 동일한 모델 클래스를 정의한 뒤 load_state_dict() 해야 한다.

torch.save(model.state_dict(), 'model_weights.pth')

model2 = ConvNeuralNetwork().to(device)
y_pred_list, y_true_list, img_list, avg_acc = test(model2, test_loader)
print(f'테스트 정확도는 {avg_acc:.2f}% 입니다.')

model2.load_state_dict(torch.load('model_weights.pth', map_location=device))
y_pred_list, y_true_list, img_list, avg_acc = test(model2, test_loader)
print(f'테스트 정확도는 {avg_acc:.2f}% 입니다.')

9-2. 모델 전체 저장

모델 클래스와 가중치를 함께 저장한다. 복원 시 클래스 정의 없이 torch.load() 만 하면 되며, PyTorch 버전에 따라 weights_only=False 옵션이 필요할 수 있다.

torch.save(model, 'model.pth')

model3 = torch.load('model.pth', weights_only=False)
y_pred_list, y_true_list, img_list, avg_acc = test(model3, test_loader)
print(f'테스트 정확도는 {avg_acc:.2f}% 입니다.')

10. 딥러닝 모델 서빙과 FastAPI

모델 서빙은 학습된 모델을 REST API·gRPC·WebSocket 등으로 노출해, 애플리케이션에서 입력을 보내면 예측을 받을 수 있게 하는 과정이다. 전처리 → 모델 추론 → 후처리 → 응답을 하나의 파이프라인으로 만든다.

10-1. FastAPI 서버 (main.py)

모델 클래스 정의, model_weights.pth 로드, 전처리(학습 시와 동일한 Resize·Grayscale·ToTensor·RandomInvert·Normalize) 적용 후 /classify 에서 이미지 업로드 → 예측 라벨 반환.

# main.py (FastAPI backend)
from fastapi import FastAPI, UploadFile, File
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
import torch
from torchvision import transforms
from PIL import Image
import io
import torch.nn as nn

class ConvNeuralNetwork(nn.Module):
    def __init__(self):
        super(ConvNeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.classifier = nn.Sequential(
            nn.Conv2d(1, 28, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(28, 28, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(0.25),
            nn.Conv2d(28, 56, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(56, 56, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(0.25),
        )
        self.Linear = nn.Linear(56 * 7 * 7, 3)

    def forward(self, x):
        x = self.classifier(x)
        x = self.flatten(x)
        return self.Linear(x)

model = ConvNeuralNetwork()
state_dict = torch.load('./model_weights.pth', map_location=torch.device('cpu'))
model.load_state_dict(state_dict)
model.eval()

CLASSES = ['cir', 'tri', 'x']

def preprocess_image(image_bytes):
    transform = transforms.Compose([
        transforms.Resize((28, 28)),
        transforms.Grayscale(1),
        transforms.ToTensor(),
        transforms.RandomInvert(1),
        transforms.Normalize((0.5,), (0.5,))
    ])
    image = Image.open(io.BytesIO(image_bytes)).convert('L')
    return transform(image).unsqueeze(0)

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.post("/classify")
async def classify_image(file: UploadFile = File(...)):
    try:
        image_bytes = await file.read()
        print(f"Received file: {file.filename}, size: {len(image_bytes)} bytes")

        input_tensor = preprocess_image(image_bytes)
        print(f"input tensor shape: {input_tensor.shape}")

        with torch.no_grad():
            outputs = model(input_tensor)
            print(f"Model outputs: {outputs}")
            _, predicted = torch.max(outputs, 1)
            label = CLASSES[predicted.item()]
            print(f"Predicted label: {label}")

        return JSONResponse(content={"label": label})
    except Exception as e:
        print(f"Error: {e}")
        return JSONResponse(content={"error": str(e)}, status_code=500)

서버 실행: uvicorn main:app --reload (개발 시). 클라이언트는 /classify에 이미지 파일을 POST하면 {"label": "cir"} 형태로 응답을 받는다.

10-2. Gradio 프론트엔드 (app.py)

로컬 FastAPI 서버(http://127.0.0.1:8000/classify)에 이미지를 POST하고, 반환된 라벨을 화면에 표시한다.

# app.py (Gradio frontend)
import gradio as gr
import requests
import io

def classify_with_backend(image):
    url = "http://127.0.0.1:8000/classify"
    image_bytes = io.BytesIO()
    image.save(image_bytes, format="PNG")
    image_bytes = image_bytes.getvalue()
    response = requests.post(url, files={"file": ("image.png", image_bytes, "image/png")})
    if response.status_code == 200:
        return response.json().get("label", "Error")
    else:
        return "Error"

iface = gr.Interface(
    fn=classify_with_backend,
    inputs=gr.Image(type="pil"),
    outputs="text",
    title="손글씨 도형 분류하기",
    description="○, ×, △ 이미지를 넣어주세요 !!"
)

if __name__ == "__main__":
    iface.launch()

실행: python app.py. FastAPI 서버를 먼저 띄운 뒤 Gradio 앱에서 이미지를 업로드하면 예측 라벨이 출력된다.

마치며

CNN은 합성곱(필터·스트라이드·패딩)·풀링으로 공간 구조를 살리며 특징을 추출하고, FC로 분류한다. DNN 대비 가중치 공유로 파라미터가 적다.
손글씨 도형 데이터로 ImageFolder + DataLoader + ConvNeuralNetwork 를 사용해 학습·테스트·예측 시각화까지 진행했다.
저장: state_dict만 저장하면 구조는 코드로 유지해야 하고, 모델 전체 저장은 이식이 쉽다.
서빙: FastAPI로 /classify API를 만들고, Gradio로 업로드 기반 웹 UI를 붙이면 로컬에서 바로 체험할 수 있다.

다음으로는 ResNet·EfficientNet 등 전이 학습이나 다른 데이터셋으로 CNN을 확장해 보면 좋다.

(CNN · 손글씨 도형 분류하기 · 손글씨 도형 분류 FastAPI로 서빙 참고)

'AI·머신러닝 > 딥러닝·비전' 카테고리의 다른 글

전이 학습 실전 - Alien vs Predator · 콘크리트 균열 탐지, AlexNet · VGG19 활용 (0)	2025.12.29
AlexNet과 CIFAR-10 이미지 분류 - ILSVRC, 전처리, 혼동 행렬 실습 (0)	2025.12.25
Multi-class Weather 이미지 분류 실습 - ImageFolder, DataLoader, 완전연결 신경망 (0)	2025.12.22
논리 회귀부터 손글씨 숫자·퍼셉트론까지 - 분류, DataLoader, 데이터 증강, MLP 실습 (0)	2025.12.20
파이토치로 시작하는 딥러닝 - 텐서, 선형 회귀, 논리 회귀 실습 (0)	2025.12.20

플오그래밍