Created: 2024-03-16

Implementing an Atom Feed

Contents

A Pseudo History Lesson
An Atom Feed Taxidermy
Bare-bones Implementation
Validations
Testing
Sources

Recently I've been reading around on the IndieWeb homepage about all the different open protocols that exist to make websites more interoperable with each other and it made me quite hopeful about the state of the internet that so many people are already working on making it more decentralised and generally, just an interesting place to learn about new things or express yourself.

A Pseudo History Lesson

Long story short, having a large set of heterogeneous websites makes it somewhat difficult to keep track and be notified of new content and so we need a convenient solution to deal with this problem. Luckily, the internet of ye olden days didn't have any big monopolies like today and so a mature (a.k.a. old) solution to that problem already exists: Feeds

And so in the late 90s RSS surfaced as the first popular implementation for web syndication. RSS 1.0 was finally specified

the December of 2000. RSS got its second major version in 2002 and has since received some smaller updates but largely remained the same.

In 2005 Atom 1.0 was released to replace everything RSS 2.0 has done wrong with its own flaws (arguably fewer tho) but never really reached the same popularity as RSS did.

Nowadays most feed readers support both protocols and so it is largely a choice of preference. I went for Atom mainly because:

It looked like a slightly easier format to implement
The official (?) RSS specification page is riddled with ads (so I'm not gonna link it - take that corpos!).

An Atom Feed Taxidermy

Let's have a look what actually constitues an Atom feed by looking at an example.

          <?xml version="1.0" encoding="utf-8"?>
          <feed xmlns="http://www.w3.org/2005/Atom">
        <title>Example Feed</title>
        <link href="http://example.org/"/>
        <updated>2003-12-13T18:30:02Z</updated>
        <author>
          <name>John Doe</name>
        </author>
        <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

        <entry>
          <title>Atom-Powered Robots Run Amok</title>
          <link href="http://example.org/2003/12/13/atom03"/>
          <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
          <updated>2003-12-13T18:30:02Z</updated>
          <summary>Some text.</summary>
        </entry>
          </feed>

The specification took place in the early 2000s, so you guessed it - we're working with an XML file. Although a bit on the verbose side it is still a solid data exchange format.

Leaving my XML stockholm syndrome aside, we mainly have two entities to deal with - the feed and a list of entries.

The Feed

The feed contains global meta-data to identify what we're looking at (title), where the corresponding website can be found (link), when the last update happened (updated) and some information about the author (author).

The most interesting part is the id field, because it has to be globally unique so our feed shows up as a separate entity in a user's feed reader and doesn't get mingled with New York Times articles for example.

Furthermore, the ID needs to be a valid URI. So if we own a domain name we can use our website's homepage as the ID because every URL is also a valid URI. Otherwise we could do as in the example and use randomly generated ID. In essence it doesn't really matter but it has to be unique and permanent, because if we ever change this ID, feed reader software is free to interpret it as an entirely new feed and will mark all entries as unread.

The Entries

The entry element is pretty self-explanatory as it only contains information about a specific content piece.

One slightly weird restriction is the relationship between summary, content and link - a content is required if no link with an rel=alternate is present and a summary should be provided if no content is present or the whole content isn't inlined. Ah yes, the content element can either contain the whole content of your article, link to it via a src attribute or contain it as base64 encoded string. This is probably the most confusing part to get correct but otherwise it is pretty straight forward.

Again, we have to watch out to choose a suitable value for the id element as all the previously mentioned restrictions also apply here.

With the specification out of the way, let's get coding - violently rubbs hands

Bare-bones Implementation

This implementation has been done in Go but it shouldn't be too different in other languages so it can still act as a starting point even if you want to use something else.

First, we should lay out our basic data structures that represent the XML feed. For this we can use the awesome xml-to-go website, paste in the example feed and already get a good base for our purposes.

If we remove some of the optional fields that we don't strictly care about and move the inline definitions to external definitions (exline? outline?) we end up with something like this:

          type Link struct {
                  Rel  string `xml:"rel,attr"`
                  Href string `xml:"href,attr"`
          }

          type Author struct {
                  Name string `xml:"name"`
          }

          type Content struct {
                  Type string `xml:"type,attr"`
                  Src  string `xml:"src,attr"`
          }

          type Entry struct {
                  Title   string    `xml:"title"`
                  ID      string    `xml:"id"`
                  Updated time.Time `xml:"updated"`
                  Summary string    `xml:"summary"`
                  Link    Link      `xml:"link"`
          }

          type Feed struct {
                  XMLName xml.Name  `xml:"feed"`
                  Xmlns   string    `xml:"xmlns,attr"`
                  Title   string    `xml:"title"`
                  Link    Link      `xml:"link"`
                  Updated time.Time `xml:"updated"`
                  Author  Author    `xml:"author"`
                  ID      string    `xml:"id"`
                  Entries []Entry   `xml:"entry"`
          }

The tags on the struct fields are hints for the encoding/xml package of the Go standard library and mostly only specify how the field should be named in the resulting XML. attr on the other hand tells the marshalling logic to threat this field as an attribute of the parent tag and not as a new child element.

Next we need to actually convert our data structures to XML which can be achieved in Go with just a couple of lines:

          func (f *Feed) Render() (string, error) {
                  data, err := xml.Marshal(f)
                  if err != nil {
                          return "", fmt.Errorf("failed to encode f: %w", err)
                  }

                  return string(data), nil
          }

Technically we could stop now and start using our new code buuut it is not very convinient to use because users have to be very aware of the Atom spec and figure out which fields are required and which are optional. Luckily we can easily improve upon it and make our abstraction leak a bit less.

Validations

In theory we could go to great lengths to make the API of our package completely foolproof to use by making all of our struct's fields private, allowing them to only be populated by bespoke constructor functions and putting all the validations inside of it.

          type Entry struct {
                  title string
                  id string
          }

          func NewEntry(title, id string) (Entry, error) {
                  // Validations
                  ...
          }

This would put us firmly on the parsing side of the parse, don't validate discussion but it also comes with the drawback, that Go's XML encoder only looks at exported fields, which would force us to internally convert to a second set of structs just for the XML conversion to properly work or write custom XML marshalling functions that take unexported fields into account.

Because this piece of code will most probably only be used by this blog and to save my time and the time of you my dear reader, we can choose the objectively slightly worse approach and bolt on validations to our existing design.

Our validations for the Entry struct now look like this:

          func isDefaultValue[T comparable](x T) bool {
                  var defaultValue T
                  return x == defaultValue
          }

          func require[T comparable](x T, name string) error {
                  if isDefaultValue(x) {
                          return fmt.Errorf("field %q is required but was '%v'", name, x)
                  }

                  return nil
          }

          func (e *Entry) Validate() error {
                  errs := []error{}
                  if err := require(e.ID, "ID"); err != nil {
                          errs = append(errs, err)
                  }
                  ...
                  if e.Link.Rel == "" {
                          e.Link.Rel = "alternate"
                  }
                  if err := e.Link.Validate(); err != nil {
                          errs = append(errs, err)
                  }

                  return errors.Join(errs...)
          }

We've introduces two helper functions which, by to power of generics, slightly ease the repetitiveness of the value checks. We can also make use of the amazing errors.Join function to combine all validation errors into one so we immediately see all fields that are faulty and don't have to fix them one by one.

Another thing to note is that we have some special logic for our Link struct. We check if the Rel field is set and otherwise inject a default value for it. A validation function that also modifies the entity it validates - feels borderline illegal but also does exactly what we need without needing a larger change of approach.

We just need to adapt our rendering logic slightly and we're done.

          func (f *Feed) Render() (string, error) {
                  if err := f.Validate(); err != nil {
                          return "", err
                  }

                  data, err := xml.Marshal(f)
                  if err != nil {
                          return "", fmt.Errorf("failed to encode f: %w", err)
                  }

                  return string(data), nil
          }

Testing

One last bit left to do before calling it a day is to verify that our implementation does what we think it does.

We can use W3C's Atom feed validator to check if our output adheres to the specification. It also recommends possible improvements that are not strictly necessary but can improve compatibility with the most common feed readers - nice!

An example output will look like this:

Atom Validator

It also can't hurt to test our validation logic, which is a prime candidate for using Go's table-driven tests. This allows us to then test multiple scenarios with essentially the same test code to keep the overall lines of code count low:

          func TestFeedValidations(t *testing.T) {
                  tests := []struct {
                          name    string
                          feed    Feed
                          wantErr bool
                  }{
                          {"valid feed", feed(func(f *Feed) {}), false},
                          {"no entries", feed(func(f *Feed) { f.Entries = []Entry{} }), false},
                          ...
                          }

                  for _, tt := range tests {
                          t.Run(tt.name, func(t *testing.T) {
                                  _, err := tt.feed.Render()
                                  if tt.wantErr {
                                          AssertError(t, err, "RenderFeed")
                                  } else {
                                          AssertNoError(t, err, "RenderFeed")
                                  }
                          })
                  }
          }

Now all that's left is to wire our new atom package up with the rest of this blog's logic (mostly boring struct conversions so I will spare you this part) and voilà, we have a functioning feed.