mysterious_model

machine-learning

pytorch

ml security

model inversion

Competetion

iCTF24

Challenge Author

can

Date

Jan. 9, 2025

One day, you found a mysterious device with a camera attached. You figured out that this device takes pictures of grayscale (28x28 pixels) hand-written digits and classifies them using a convolutional neural network model into six different classes. Your task is to find out which six digits this mysterious model is supposed to classify (i.e., the digits that correspond to each of the model's output logits). You're given four files for this challenge - [1::mysterious_model.pth], the PyTorch model that you need to reverse engineer [2::images.npy], a collection of 100 images of digits (10 images per digit) that you can test the model with (a 100x28x28 numpy array) - [3::labels.npy], the labels (zero to nine) for the given 100 test images (a 100-dimensional numpy array) [4::architecture.py], a Python script containing the model's architecture definition (in PyTorch), a utility function (load_model) to load the model into memory for testing, and a utility function (display_image) to display individual images in the provided data. For this challenge, you need to create a Python environment with PyTorch (see https://pytorch.org/get-started/locally to get started) and Matplotlib (https://pypi.org/project/matplotlib/). You can easily solve this challenge only using a CPU (you won't need a GPU). Let's say the model's six output logits correspond to digits 3,7,1,2,5,9 (e.g., the first logit is for digit 3, the second logit is for digit 7, and so on). The flag will be 'ictf{3,7,1,2,5,9}' (no quotes).

Hints

Just like a human, a model is also more confident on what it has been trained to do.

Solution

From a given AI model that classifies hand-written numbers, we need to find out which logits are associated with each digit. A logit is a function that maps probabilities. We can measure each logit by testing it against the given data to figure out which is which. We are given various helper functions that allow us to load the model and display an image in the test data. I used the softmax function to convert the raw scores into probabilities and argmax to map them to the logits. ```python model = load_model('mysterious_model.pth', device) images = np.load('images.npy') labels = np.load('labels.npy') images = torch.tensor(images, dtype=torch.float32).unsqueeze(1) logits = model(images.to(device)) probs = F.softmax(logits, dim=1) predictions = torch.argmax(probs, dim=1, keepdim=False) ``` After that, I looped through each logit associated with each prediction and added one to it's count for that number. ```python logit_num_count = defaultdict(lambda: defaultdict(int)) for i in range(len(predictions)): logit = predictions[i].item() number = labels[i] logit_num_count[logit][number] += 1 ``` Now that we have the counts for each logit, we can just sort through out dictionary and figure out which logit scored the highest for each number ```python logit_to_num = {} for logit, counts in logit_num_count.items(): largest = 0 logit_num = 0 for num, count in sorted(counts.items()): if count > largest: largest = count logit_num = num logit_to_num[logit] = logit_num for logit, num in sorted(logit_to_num.items()): print(f"Logit {logit}: {num}") ``` Now all we need to do is run the program: ``` [jacob]~/ctf/ictf/MysteriousModel$ python3 architecture.py Model loaded form mysterious_model.pth Logit 0: 2 Logit 1: 8 Logit 2: 4 Logit 3: 9 Logit 4: 1 Logit 5: 6 ``` Putting it together we get the flag: ``` Ictf{2,8,4,9,1,6} ```