Closer

Closer #20

GPT-5 is out and underwhelming. Our mini six-week experiment has come to an end. There's only five more regular editions of this newsletter before we wrap up with some preliminary conclusions in edition #25.

Mitchel Lensink
Aug 20, 2025
4 min read
The image shared with ChatGPT as input.

Welcome to Closer,

A weekly experiment in AI interpretation: I share a photo, ask AI to describe it, then generate an image from that description. This is edition #020. Prompts are in grey boxes. Descriptions are in italics.

Now that our unfruitful mini-experiment has come to an end, we can conclude we shouldn't ask ChatGPT to write prompts for us, that are then used to describe input images. Unless you want underwhelming results.

Also, yes, you read it correctly, I've now definitively decided to hit pause on this newsletter in five weeks. We've gathered sufficient sample data on the state of AI image-generation in 2025. Significant advancements have to be made in the field before we pick this back up.

Please stay subscribed if you want, I won't message you in the meantime. The moment something interesting happens with the technology that has impact for photographers, writers, artists in general, you'll hear from me again.

A quick word from our SPONSOR, which is.... you

Everything I make is self-funded — driven by curiosity, shared to add a little good to the world. Want to support that?

You can:
- purchase a print
- buy a book
- become a member

Thanks so much in advance.

LLM models used

GPT-5 for image interpretation, and image generation.

The input

Ivar Blaauw, who previously contributed to this newsletter already, has shared an interesting image-generation approach he came across on social media with me. If you haven't unsubscribed these past six weeks for the lack of 'hot-take tech' or general interesting results, perhaps this hyper-technical edition will do it! 😅 (plz don't though)

I recommend you watch the video linked above to get a quick understanding of the concept. If you don't wanna (which I understand), let's first just 'do the thing,' and then I'll explain what is happening here.

✍️
provide me with an advanced JSON context profile
The image shared with ChatGPT as input.
The image shared with ChatGPT as input.

The output

{
  "scene": {
    "location_type": "European town square",
    "environment": {
      "setting": "outdoor",
      "ground": "cobblestone street",
      "vegetation": ["large trees with green foliage", "hanging flower baskets"],
      "lighting": "soft daylight, overcast"
    },
    "architecture": {
      "buildings": "historic European style, white facades",
      "outdoor_seating": true,
      "cafes_restaurants": true
    }
  },
  "subjects": {
    "foreground": {
      "main_couple": {
        "approximate_age": "elderly",
        "clothing_style": "elegant, classic",
        "man": {
          "outfit": "white shirt, brown suspenders, beige trousers",
          "accessory": "dark leather briefcase"
        },
        "woman": {
          "outfit": "white blouse with tie-back, beige trousers, white sneakers",
          "accessory": "small handbag"
        },
        "gesture": "walking together, holding hands"
      }
    },
    "midground": [
      {
        "group": "families, children, stroller",
        "activity": "casual walking, socializing"
      },
      {
        "dog": {
          "breed_type": "small companion dog",
          "action": "walking alongside owner"
        }
      }
    ],
    "background": {
      "people": "seated at outdoor café tables",
      "activity": "dining, talking, socializing"
    }
  },
  "objects": {
    "vehicle": {
      "type": "delivery truck",
      "branding": "pharma logistics",
      "license_plate_visible": true
    },
    "street_elements": [
      "lamp posts with hanging flowers",
      "no parking sign",
      "restaurant signage"
    ],
    "furniture": ["tables", "chairs", "strollers"]
  },
  "atmosphere": {
    "mood": "leisurely, social, relaxed",
    "tone": "warm, nostalgic, communal"
  },
  "possible_metadata": {
    "image_type": "documentary / street photography",
    "time_period": "contemporary",
    "season": "summer (based on light clothing)",
    "location_guess": "Western Europe (likely Netherlands or Belgium)"
  }
}

A piece of JSON code that's generated based on the input image.

What we did here is break the photo down into simple building blocks, almost like a recipe card for the scene. Instead of a flowing story, it lists:

  • the where (a town square),
  • the who (an elderly couple, families, a dog),
  • the what (a truck, flowers, café tables),
  • and the vibe (relaxed, social, summery).

This format is called JSON in the tech world, but you can think of it as a neat way to organize messy reality into clear categories. It’s a way of describing a moment so that both people and computers can make sense of it.

✍️
Generate an image with an aspect ratio of 3:2 based on this description: {the description previously generated}

This content is only available to subscribers

Subscribe now and have access to all our stories, enjoy exclusive content and stay up to date with constant updates.

Unlock content

Subscribe to Dialogue

Become a member and get full access to my monthly newsletter 'Dialogue' about photography, walking and exploration, and my weekly newsletter 'Closer' about the role of AI in photography. For photographers, writers, artists, and explorers alike.

Oops! There was an error sending the email, please try again.

Awesome! Now check your inbox and click the link to confirm your subscription.