BackendTask.Glob

type alias Glob a =
Glob a

A pattern to match local files and capture parts of the path into a nice Elm data type.

This module helps you get a List of matching file paths from your local file system as a BackendTask. See the BackendTask module documentation for ways you can combine and map BackendTasks.

A common example would be to find all the markdown files of your blog posts. If you have all your blog posts in content/blog/*.md , then you could use that glob pattern in most shells to refer to each of those files.

With the BackendTask.Glob API, you could get all of those files like so:

import BackendTask exposing (BackendTask)

blogPostsGlob : BackendTask error (List String)
blogPostsGlob =
    Glob.succeed (\slug -> slug)
        |> Glob.match (Glob.literal "content/blog/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

Let's say you have these files locally:

- elm.json
- src/
- content/
  - blog/
    - first-post.md
    - second-post.md

We would end up with a BackendTask like this:

BackendTask.succeed [ "first-post", "second-post" ]

Of course, if you add or remove matching files, the BackendTask will get those new files (unlike BackendTask.succeed). That's why we have Glob!

You can even see the elm-pages dev server will automatically flow through any added/removed matching files with its hot module reloading.

But why did we get "first-post" instead of a full file path, like "content/blog/first-post.md"? That's the difference between capture and match.

Capture and Match

There are two functions for building up a Glob pattern: capture and match.

capture and match both build up a Glob pattern that will match 0 or more files on your local file system. There will be one argument for every capture in your pipeline, whereas match does not apply any arguments.

import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

blogPostsGlob : BackendTask error (List String)
blogPostsGlob =
    Glob.succeed (\slug -> slug)
        -- no argument from this, but we will only
        -- match files that begin with `content/blog/`
        |> Glob.match (Glob.literal "content/blog/")
        -- we get the value of the `wildcard`
        -- as the slug argument
        |> Glob.capture Glob.wildcard
        -- no argument from this, but we will only
        -- match files that end with `.md`
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

So to understand which files will match, you can ignore whether you are using capture or match and just read the patterns you're using in order to understand what will match. To understand what Elm data type you will get for each matching file, you need to see which parts are being captured and how each of those captured values are being used in the function you use in Glob.succeed.

capture : Glob a -> Glob (a -> value) -> Glob value

Adds on to the glob pattern, and captures it in the resulting Elm match value. That means this both changes which files will match, and gives you the sub-match as Elm data for each matching file.

Exactly the same as match except it also captures the matched sub-pattern.

type alias ArchivesArticle =
    { year : String
    , month : String
    , day : String
    , slug : String
    }

archives : BackendTask error ArchivesArticle
archives =
    Glob.succeed ArchivesArticle
        |> Glob.match (Glob.literal "archive/")
        |> Glob.capture Glob.int
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.int
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.int
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

The file archive/1977/06/10/apple-2-released.md will give us this match:

matches : List error ArchivesArticle
matches =
    BackendTask.succeed
        [ { year = 1977
          , month = 6
          , day = 10
          , slug = "apple-2-released"
          }
        ]

When possible, it's best to grab data and turn it into structured Elm data when you have it. That way, you don't end up with duplicate validation logic and data normalization, and your code will be more robust.

If you only care about getting the full matched file paths, you can use match. capture is very useful because you can pick apart structured data as you build up your glob pattern. This follows the principle of Parse, Don't Validate.

match : Glob a -> Glob value -> Glob value

Adds on to the glob pattern, but does not capture it in the resulting Elm match value. That means this changes which files will match, but does not change the Elm data type you get for each matching file.

Exactly the same as capture except it doesn't capture the matched sub-pattern.

capture is a lot like building up a JSON decoder with a pipeline.

Let's try our blogPostsGlob from before, but change every match to capture.

import BackendTask exposing (BackendTask)
import FilePath exposing (FilePath)

blogPostsGlob :
    BackendTask
        error
        (List
            { filePath : FilePath
            , slug : String
            }
        )
blogPostsGlob =
    Glob.succeed
        (\capture1 capture2 capture3 ->
            { filePath = FilePath.fromString (capture1 ++ capture2 ++ capture3)
            , slug = capture2
            }
        )
        |> Glob.capture (Glob.literal "content/blog/")
        |> Glob.capture Glob.wildcard
        |> Glob.capture (Glob.literal ".md")
        |> Glob.toBackendTask

Notice that we now need 3 arguments at the start of our pipeline instead of 1. That's because we apply 1 more argument every time we do a Glob.capture, much like Json.Decode.Pipeline.required, or other pipeline APIs.

Now we actually have the full file path of our files. But having that slug (like first-post) is also very helpful sometimes, so we kept that in our record as well. So we'll now have the equivalent of this BackendTask with the current .md files in our blog folder:

BackendTask.succeed
    [ { filePath = FilePath.fromString "content/blog/first-post.md"
      , slug = "first-post"
      }
    , { filePath = FilePath.fromString "content/blog/second-post.md"
      , slug = "second-post"
      }
    ]

Having the full file path lets us read in files. But concatenating it manually is tedious and error prone. That's what the captureFilePath helper is for.

Reading matching files

You can use the less powerful but more familiar and terse fromString helpers if you only need to find matching file paths (but don't care about parsing out parts of the paths). This is helpful when you need a reference to matching files and you are only using the file paths to then read or do scripting tasks with those paths.

fromString : String -> BackendTask error (List FilePath)

Runs a glob string directly, with include = FilesAndFolders. Behavior is similar to using glob patterns in a shell.

If you need to capture specific parts of the path, you can use capture and match functions instead. fromString only allows you to capture a list of matching file paths.

The following glob syntax is supported:

  • * matches any number of characters except for /
  • ** matches any number of characters including /

For example, if we have the following files:

- src/
    - Main.elm
    - Ui/
        - Icon.elm
- content/
    - blog/
        - first-post.md
        - second-post.md
import BackendTask.Glob as Glob

import FilePath exposing (FilePath)

blogPosts : BackendTask error (List FilePath)
blogPosts =
    Glob.fromString "content/blog/*.md"

--> BackendTask.succeed [ FilePath.fromString "content/blog/first-post.md", FilePath.fromString "content/blog/second-post.md" ]
elmFiles : BackendTask error (List FilePath)
elmFiles =
    Glob.fromString "src/**/*.elm"

--> BackendTask.succeed [ FilePath.fromString "src/Main.elm", FilePath.fromString "src/Ui", FilePath.fromString "src/Ui/Icon.elm" ]
fromStringWithOptions : Options -> String -> BackendTask error (List FilePath)

Same as fromString, but with custom Options.

captureFilePath : Glob (FilePath -> value) -> Glob value
import BackendTask exposing (BackendTask)
import FilePath exposing (FilePath)
import BackendTask.Glob as Glob

blogPosts :
    BackendTask
        error
        (List
            { filePath : FilePath
            , slug : String
            }
        )
blogPosts =
    Glob.succeed
        (\filePath slug ->
            { filePath = filePath
            , slug = slug
            }
        )
        |> Glob.captureFilePath
        |> Glob.match (Glob.literal "content/blog/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

This function does not change which files will or will not match. It just gives you the full matching file path in your Glob pipeline.

Whenever possible, it's a good idea to use function to make sure you have an accurate file path when you need to read a file.

In many cases you will want to take the matching files from a Glob and then read the body or frontmatter from matching files.

Reading Metadata for each Glob Match

For example, if we had files like this:

---
title: My First Post
---
This is my first post!

Then we could read that title for our blog post list page using our blogPosts BackendTask that we defined above.

import BackendTask.File
import FatalError exposing (FatalError)
import Json.Decode as Decode exposing (Decoder)

titles : BackendTask FatalError (List BlogPost)
titles =
    blogPosts
        |> BackendTask.map
            (List.map
                (\blogPost ->
                    BackendTask.File.onlyFrontmatter
                        blogFrontmatterDecoder
                        blogPost.filePath
                )
            )
        |> BackendTask.resolve
        |> BackendTask.allowFatal

type alias BlogPost =
    { title : String }

blogFrontmatterDecoder : Decoder BlogPost
blogFrontmatterDecoder =
    Decode.map BlogPost
        (Decode.field "title" Decode.string)

That will give us

BackendTask.succeed
    [ { title = "My First Post" }
    , { title = "My Second Post" }
    ]

Capturing Patterns

wildcard : Glob String

Matches anything except for a / in a file path. You may be familiar with this syntax from shells like bash where you can run commands like rm client/*.js to remove all .js files in the client directory.

Just like a * glob pattern in bash, this Glob.wildcard function will only match within a path part. If you need to match 0 or more path parts like, see recursiveWildcard.

import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

type alias BlogPost =
    { year : String
    , month : String
    , day : String
    , slug : String
    }

example : BackendTask error (List BlogPost)
example =
    Glob.succeed BlogPost
        |> Glob.match (Glob.literal "blog/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal "-")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal "-")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

- blog/
  - 2021-05-27/
    - first-post.md

That will match to:

results : BackendTask error (List BlogPost)
results =
    BackendTask.succeed
        [ { year = "2021"
          , month = "05"
          , day = "27"
          , slug = "first-post"
          }
        ]

Note that we can "destructure" the date part of this file path in the format yyyy-mm-dd. The wildcard matches will match within a path part (think between the slashes of a file path). recursiveWildcard can match across path parts.

recursiveWildcard : Glob (List String)

Matches any number of characters, including /, as long as it's the only thing in a path part.

In contrast, wildcard will never match /, so it only matches within a single path part.

This is the elm-pages equivalent of **/*.txt in standard shell syntax:

import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

example : BackendTask error (List ( List String, String ))
example =
    Glob.succeed Tuple.pair
        |> Glob.match (Glob.literal "articles/")
        |> Glob.capture Glob.recursiveWildcard
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".txt")
        |> Glob.toBackendTask

With these files:

- articles/
  - google-io-2021-recap.txt
  - archive/
    - 1977/
      - 06/
        - 10/
          - apple-2-announced.txt

We would get the following matches:

matches : BackendTask error (List ( List String, String ))
matches =
    BackendTask.succeed
        [ ( [ "archive", "1977", "06", "10" ], "apple-2-announced" )
        , ( [], "google-io-2021-recap" )
        ]

Note that the recursive wildcard conveniently gives us a List String, where each String is a path part with no slashes (like archive).

And also note that it matches 0 path parts into an empty list.

If we didn't include the wildcard after the recursiveWildcard, then we would only get a single level of matches because it is followed by a file extension.

example : BackendTask error (List String)
example =
    Glob.succeed identity
        |> Glob.match (Glob.literal "articles/")
        |> Glob.capture Glob.recursiveWildcard
        |> Glob.match (Glob.literal ".txt")

matches : BackendTask error (List String)
matches =
    BackendTask.succeed
        [ "google-io-2021-recap"
        ]

This is usually not what is intended. Using recursiveWildcard is usually followed by a wildcard for this reason.

Capturing Specific Characters

int : Glob Int

Same as digits, but it safely turns the digits String into an Int.

Leading 0's are ignored.

import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

slides : BackendTask error (List Int)
slides =
    Glob.succeed identity
        |> Glob.match (Glob.literal "slide-")
        |> Glob.capture Glob.int
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

With files

- slide-no-match.md
- slide-.md
- slide-1.md
- slide-01.md
- slide-2.md
- slide-03.md
- slide-4.md
- slide-05.md
- slide-06.md
- slide-007.md
- slide-08.md
- slide-09.md
- slide-10.md
- slide-11.md

Yields

matches : BackendTask error (List Int)
matches =
    BackendTask.succeed
        [ 1
        , 1
        , 2
        , 3
        , 4
        , 5
        , 6
        , 7
        , 8
        , 9
        , 10
        , 11
        ]

Note that neither slide-no-match.md nor slide-.md match. And both slide-1.md and slide-01.md match and turn into 1.

digits : Glob String

This is similar to wildcard, but it will only match 1 or more digits (i.e. [0-9]+).

See int for a convenience function to get an Int value instead of a String of digits.

Capturing File Stats

You can access a file's stats including timestamps when the file was created and modified, and file size.

type alias FileStats =
{ fullPath : FilePath
, sizeInBytes : Int
, lastContentChange : Posix
, lastAccess : Posix
, lastFileChange : Posix
, createdAt : Posix
, isDirectory : Bool
}

The information about a file that you can access when you use captureStats.

captureStats : Glob (FileStats -> value) -> Glob value
import BackendTask.Glob as Glob

recentlyChangedRouteModules : BackendTask error (List ( Time.Posix, List String ))
recentlyChangedRouteModules =
    Glob.succeed
        (\fileStats directoryName fileName ->
            ( fileStats.lastContentChange
            , directoryName ++ [ fileName ]
            )
        )
        |> Glob.captureStats
        |> Glob.match (Glob.literal "app/Route/")
        |> Glob.capture Glob.recursiveWildcard
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".elm")
        |> Glob.toBackendTask
        |> BackendTask.map
            (\entries ->
                entries
                    |> List.sortBy (\( lastChanged, _ ) -> Time.posixToMillis lastChanged)
                    |> List.reverse
            )

Matching a Specific Number of Files

Glob a
-> BackendTask
{ fatal : FatalError
, recoverable : String
}
a

Sometimes you want to make sure there is a unique file matching a particular pattern. This is a simple helper that will give you a BackendTask error if there isn't exactly 1 matching file. If there is exactly 1, then you successfully get back that single match.

For example, maybe you can have

import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

import FilePath exposing (FilePath)

findBlogBySlug : String -> BackendTask FatalError FilePath
findBlogBySlug slug =
    Glob.succeed identity
        |> Glob.captureFilePath
        |> Glob.match (Glob.literal "blog/")
        |> Glob.match (Glob.literal slug)
        |> Glob.match
            (Glob.oneOf
                ( ( "", () )
                , [ ( "/index", () ) ]
                )
            )
        |> Glob.match (Glob.literal ".md")
        |> Glob.expectUniqueMatch
        |> BackendTask.allowFatal

If we used findBlogBySlug "first-post" with these files:

- blog/
    - first-post/
        - index.md

This would give us:

results : BackendTask FatalError FilePath
results =
    BackendTask.succeed (FilePath.fromString "blog/first-post/index.md")

If we used findBlogBySlug "first-post" with these files:

- blog/
    - first-post.md
    - first-post/
        - index.md

Then we will get a BackendTask error saying More than one file matched. Keep in mind that BackendTask failures in build-time routes will cause a build failure, giving you the opportunity to fix the problem before users see the issue, so it's ideal to make this kind of assertion rather than having fallback behavior that could silently cover up issues (like if we had instead ignored the case where there are two or more matching blog post files).

expectUniqueMatchFromList : List (Glob a) -> BackendTask String a

Glob Patterns

literal : String -> Glob String

Match a literal part of a path. Can include /s.

Some common uses include

  • The leading part of a pattern, to say "starts with content/blog/"
  • The ending part of a pattern, to say "ends with .md"
  • In-between wildcards, to say "these dynamic parts are separated by /"
import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

blogPosts =
    Glob.succeed
        (\section slug ->
            { section = section, slug = slug }
        )
        |> Glob.match (Glob.literal "content/blog/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".md")
map : (a -> b) -> Glob a -> Glob b

A Glob can be mapped. This can be useful for transforming a sub-match in-place.

For example, if you wanted to take the slugs for a blog post and make sure they are normalized to be all lowercase, you could use

import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

blogPostsGlob : BackendTask error (List String)
blogPostsGlob =
    Glob.succeed (\slug -> slug)
        |> Glob.match (Glob.literal "content/blog/")
        |> Glob.capture (Glob.wildcard |> Glob.map String.toLower)
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

If you want to validate file formats, you can combine that with some BackendTask helpers to turn a Glob (Result String value) into a BackendTask FatalError (List value).

For example, you could take a date and parse it.

import BackendTask exposing (BackendTask)
import BackendTask.Glob as Glob

example : BackendTask FatalError (List ( String, String ))
example =
    Glob.succeed
        (\dateResult slug ->
            dateResult
                |> Result.map (\okDate -> ( okDate, slug ))
        )
        |> Glob.match (Glob.literal "blog/")
        |> Glob.capture (Glob.recursiveWildcard |> Glob.map expectDateFormat)
        |> Glob.match (Glob.literal "/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask
        |> BackendTask.map (List.map BackendTask.fromResult)
        |> BackendTask.resolve

expectDateFormat : List String -> Result FatalError String
expectDateFormat dateParts =
    case dateParts of
        [ year, month, date ] ->
            Ok (String.join "-" [ year, month, date ])

        _ ->
            Err <| FatalError.fromString "Unexpected date format, expected yyyy/mm/dd folder structure."
succeed : constructor -> Glob constructor

succeed is how you start a pipeline for a Glob. You will need one argument for each capture in your Glob.

oneOf : ( ( String, a ), List ( String, a ) ) -> Glob a
import BackendTask.Glob as Glob

type Extension
    = Json
    | Yml

type alias DataFile =
    { name : String
    , extension : String
    }

dataFiles : BackendTask error (List DataFile)
dataFiles =
    Glob.succeed DataFile
        |> Glob.match (Glob.literal "my-data/")
        |> Glob.capture Glob.wildcard
        |> Glob.match (Glob.literal ".")
        |> Glob.capture
            (Glob.oneOf
                ( ( "yml", Yml )
                , [ ( "json", Json )
                  ]
                )
            )

If we have the following files

- my-data/
    - authors.yml
    - events.json

That gives us

results : BackendTask error (List DataFile)
results =
    BackendTask.succeed
        [ { name = "authors"
          , extension = Yml
          }
        , { name = "events"
          , extension = Json
          }
        ]

You could also match an optional file path segment using oneOf.

rootFilesMd : BackendTask error (List String)
rootFilesMd =
    Glob.succeed (\slug -> slug)
        |> Glob.match (Glob.literal "blog/")
        |> Glob.capture Glob.wildcard
        |> Glob.match
            (Glob.oneOf
                ( ( "", () )
                , [ ( "/index", () ) ]
                )
            )
        |> Glob.match (Glob.literal ".md")
        |> Glob.toBackendTask

With these files:

- blog/
    - first-post.md
    - second-post/
        - index.md

This would give us:

results : BackendTask error (List String)
results =
    BackendTask.succeed
        [ "first-post"
        , "second-post"
        ]
zeroOrMore : List String -> Glob (Maybe String)
atLeastOne : ( ( String, a ), List ( String, a ) ) -> Glob ( a, List a )

Getting Glob Data from a BackendTask

toBackendTask : Glob a -> BackendTask error (List a)

In order to get match data from your glob, turn it into a BackendTask with this function.

With Custom Options

toBackendTaskWithOptions : Options -> Glob a -> BackendTask error (List a)

Same as toBackendTask, but lets you set custom glob options. For example, to list folders instead of files,

import BackendTask.Glob as Glob exposing (OnlyFolders, defaultOptions)

matchingFiles : Glob a -> BackendTask error (List a)
matchingFiles glob =
    glob
        |> Glob.toBackendTaskWithOptions { defaultOptions | include = OnlyFolders }
defaultOptions : Options

The default options used in toBackendTask. To use a custom set of options, use toBackendTaskWithOptions.

type alias Options =
{ includeDotFiles : Bool
, include : Include
, followSymbolicLinks : Bool
, caseSensitiveMatch : Bool
, gitignore : Bool
, maxDepth : Maybe Int
}

Custom options you can pass in to run the glob with toBackendTaskWithOptions.

{ includeDotFiles = Bool -- https://github.com/mrmlnc/fast-glob#dot
, include = Include -- return results that are `OnlyFiles`, `OnlyFolders`, or both `FilesAndFolders` (default is `OnlyFiles`)
, followSymbolicLinks = Bool -- https://github.com/mrmlnc/fast-glob#followsymboliclinks
, caseSensitiveMatch = Bool -- https://github.com/mrmlnc/fast-glob#casesensitivematch
, gitignore = Bool -- https://www.npmjs.com/package/globby#gitignore
, maxDepth = Maybe Int -- https://github.com/mrmlnc/fast-glob#deep
}