Skip to content

v3.2 Arrange encoding information more clearly #4562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 7, 2025

Conversation

handrews
Copy link
Member

@handrews handrews commented May 1, 2025

[NOTE: The empty "Media Type Registry" header is there to avoid build errors- it will be a minor conflict to resolve when PR #4554 is merged] [NOTE 2: No longer needed as the registry part has been removed from this PR.]

TL;DR: Supporting multipart/mixed (streaming or otherwise) will require a different way to map Encoding Objects to data structures, so this splits the mapping definition (controlled by the Media Type Object's encoding field) from the logic of the Encoding Object itself. Previously the mapping details were in the Encoding Object description, despite the Encoding Object only applying to one value at a time.


As dicussed in today's TDC call, we have increasing (and modern) use cases for supporting multipart/mixed (which we previously claimed to support but never did). This refactor makes possible future support easier by moving the array special case, which is governed by the multipart/form-data RFC, out of the Encoding Object (which needs to work with other multipart formats) and placing it with the Media Type Object's encoding field (which is web form-format-specific).

This PR refactors encoding guidance to put the rules for mapping Encoding Objects to valules with the encoding field (which performs the mapping) rather than having most of it in the Encoding Object (which should focus on how to apply a single Encoding Object to a single value).

This notably takes the special handling of arrays as repeated values out of the Encoding Object section (and the default contentType field value table) and moves it to the Media Type Object. The Encoding Object behavior is now consistent for all types, while the mapping done by the encoding field handles the special case.

The only change (as opposed to re-organization and re-wording) in this PR is the addition of a default contentType of application/json for array values, which in the context of the existing behavior is only relevant for array values nested under a top-level array. Past OAS versions were silent on this topic, and presumably it just does not come up much, but it was a gap we should fill.

  • schema changes are included in this pull request
  • schema changes are needed for this pull request but not done yet
  • no schema changes are needed for this pull request

Refactor this to put the rules for mapping Encoding Objects
to valules with the `encoding` field (which performs the mapping)
rather than having most of it in the Encoding Object (which should
focus on how to apply a single Encoding Object to a single value).

This notably takes the special handling of arrays as repeated
values out of the Encoding Object section (and the default
`contentType` field value table) and moves it to the Media Type
Object.  The Encoding Object behavior is now consistent for all
types, while the _mapping_ done by the `encoding` field handles
the special case.

The only change (as opposed to re-organization and re-wording)
in this PR is the addition of a default `contentType` of
`application/json` for array values, which in the context of
the existing behavior is only relevant for array values nested
under a top-level array.  Past OAS versions were silent on this
topic, and presumably it just does not come up much, but it was
a gap we should fill.

As dicussed in today's TDC call, we have increasing (and modern)
use cases for supporting `multipart/mixed` (which we previously
claimed to support but never did).  This refactor makes possible
future support easier by moving the array special case, which
is governed by the `multipart/form-data` RFC, out of the Encoding
Object (which needs to work with other `multipart` formats) and
places it with the `encoding` field (which is web form-format-specific).
@handrews handrews added the media and encoding Issues regarding media type support and how to encode data (outside of query/path params) label May 1, 2025
@handrews handrews added this to the v3.2.0 milestone May 1, 2025
@handrews handrews requested review from a team as code owners May 1, 2025 20:56
@handrews handrews changed the title Arrange encoding information more clearly v3.2 Arrange encoding information more clearly May 1, 2025
@handrews
Copy link
Member Author

handrews commented May 1, 2025

Note that a lot of this could be backported to 3.1.2, although it's not clear to me that doing so would be the ideal move.

Copy link
Contributor

@mikekistler mikekistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks right, but I don't feel confident in my own understanding of the details to "approve". But I did suggest a few minor rewordings and asked for clarification on one curious point.

src/oas.md Outdated
@@ -1615,10 +1617,33 @@ See [Working With Examples](#working-with-examples) for further guidance regardi
| <a name="media-type-schema"></a>schema | [Schema Object](#schema-object) | The schema defining the content of the request, response, parameter, or header. |
| <a name="media-type-example"></a>example | Any | Example of the media type; see [Working With Examples](#working-with-examples). |
| <a name="media-type-examples"></a>examples | Map[ `string`, [Example Object](#example-object) \| [Reference Object](#reference-object)] | Examples of the media type; see [Working With Examples](#working-with-examples). |
| <a name="media-type-encoding"></a>encoding | Map[`string`, [Encoding Object](#encoding-object)] | A map between a property name and its encoding information. The key, being the property name, MUST exist in the schema as a property. The `encoding` field SHALL only apply when the media type is `multipart` or `application/x-www-form-urlencoded`. If no Encoding Object is provided for a property, the behavior is determined by the default values documented for the Encoding Object. |
| <a name="media-type-encoding"></a>encoding | Map[`string`, [Encoding Object](#encoding-object)] | A map between a property name and its encoding information for media types supporting name-value pairs and allowing duplicate names, as defined under [Encoding Usage and Restrictions](#encoding-usage-and-restrictions). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear why it is necessary for the media type to allow duplicate names. What breaks if the media type does not allow duplicate names?

Copy link
Member Author

@handrews handrews May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler If you try to use an array value with a media type that does not support duplicate names, you will get duplicate names anyway, and that usually isn't good.

The only media types for which the encoding field (and indeed the Encoding Object as a whole) has well-defined behavior at all are application/x-www-form-urlencoded and multipart/form-data, both of which support duplicate names (which is how you are required to specify multiple file uploads). So this does not add any new restriction, it just documents the implicit restriction that already exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler I just pushed a commit noting the history of this field in implementing web forms, which (to me, at least) explains why the field has such a specific usage pattern. Please let me know if that helps! I agree that otherwise it seems quite random.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just unclear why the language changed from

The encoding field SHALL only apply when the media type is multipart or application/x-www-form-urlencoded.

to

for media types supporting name-value pairs and allowing duplicate names

If encoding only has meaning for application/x-www-form-urlencoded and multipart/form-data, then why not just say that rather than make a generalization that a) may arbitrarily exclude some valid use cases, and b) expands the specification in a way that is not actually well defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler two reasons


There's a difference between "well-defined behavior" and just "defined behavior."

  1. We have always said this works for multipart, not just multipart/form-data
  2. In 3.0.4 and 3.1.1, to deal with the surprising lack of support for multipart/mixed (despite us having an example in the spec that claimed to work), we noted that the form-data value for Content-Disposition is not restricted to multipart/form-data, so a workaround would be to use it (and its name parameter) for multipart/mixed support (and therefore all multipart support, as the RFCs state that unknown multipart forms are to be treated as multipart/mixed).

So application/x-www-form-urlencoded and multipart/form-data are the only media types where everything that is needed is covered in the relevant RFCs. But we've always claimed general multipart support. So we can't just lock it down to multipart/form-data.


We now have a registry of media types. There may be future media types that can work with this system, therefore it would be a really bad idea to enumerate the media types in the spec. The proper thing to do, as this PR does, is defer that to the registry.


Also, why do you want to restrict this? That would be a change, and I can't figure out a motivation other than that you just don't like the wording. The wording doesn't change how this works, it just makes it more clear how it always worked, and that's what you're ultimately objecting to AFAICT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why do you want to restrict this? That would be a change

What triggered this whole discussion was a change in the language. If you want to argue against change, then I don't see how that is consistent with arguing to broaden the spec in the way the new language does (at least that's how I read it). Put another way, can you explain why removing the restriction is appropriate? (I know this is not really a restriction, since it is a "SHALL" and not "MUST", but whatever the term is for a statement of this type).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler OK, I see there are two distinct things here:

  1. Clarifying the previously muddled langauge around support for multipart (which was never supported the way the old language implied)
  2. Moving the support details out to the registry, which requires generalizing the statement of support in a way that does not, in practice, change anything, but could be used to add support of more media types in the future

The first one is more important to me, so let's limit this PR to that and then I will submit a follow-on PR focusing on how to integrate the media type registry. That will also avoid the conflict with the other PR that adds the Media Type Registry section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler I will note, however, that the effect of the language I had here is the same, because of this line:

For all media types where no mapping is defined by either this specification or the Media Type Registry, the encoding field SHALL be ignored.

handrews and others added 4 commits May 5, 2025 16:40
Co-authored-by: Mike Kistler <mikekistler@microsoft.com>
Co-authored-by: Mike Kistler <mikekistler@microsoft.com>
The oddities of its media type support derive from its history
as the OAS implementation of web forms.
This removes the more general language allowing for future
expansion with the media type registry (although the general
language still had the same effect of restricting to `multipart`
and `application/x-www-form-urlencoded` in practice).
Copy link
Contributor

@mikekistler mikekistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

@handrews handrews merged commit 1164f03 into OAI:v3.2-dev May 7, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants