The EPiServer Find CMS integration does not index any files stored in the VPP by default. A convention is included in the integration that index files visible in the file manager and it is enabled by setting the VisibleInFilemanagerVPPIndexingConvention on the FilieIndexer conventions:

FileIndexer.Instance.Conventions.ShouldIndexVPPConvention 
  = new VisibleInFilemanagerVPPIndexingConvention();

However, this convention can be a little bit aggressive. As soon as an editor adds a file it is searchable and even though no access control mechanism are overruled some might think it is hidden until it is actually used on the site. So how do we proceed to achieve this?

Built in into the CMS there is the ContentSoftLinkRepository where we can query if files (or any IContent for that matter) is linked from within another IContent. By using this we can then create a file indexing convention that checks if the file is linked from some indexed IContent and if so we index it:

FileIndexer.Instance.Conventions.ForInstancesOf<UnifiedFile>().ShouldIndex(x =>
{
    var contentRepository = 
        ServiceLocation.ServiceLocator.Current.GetInstance<IContentRepository>();
    var contentSoftLinkRepository = 
        ServiceLocation.ServiceLocator.Current.GetInstance<ContentSoftLinkRepository>();
    var softLinks = contentSoftLinkRepository.Load(x.VirtualPath);

    try
    {
        foreach (var softLink in softLinks)
        {
            
            if (softLink.SoftLinkType == ReferenceType.ExternalReference ||
                softLink.SoftLinkType == ReferenceType.ImageReference)
            {
                var content = 
                    contentRepository.Get<IContent>(softLink.OwnerContentLink);
                if (!ContentIndexer.Instance.Conventions.ShouldIndexConvention.ShouldIndex(content).Value) // don't index referenced file if content is marked as not indexed

                {
                    continue;
                }

                // only index if content is published

                var publicationStatus = 
                    content.PublishedInLanguage()[softLink.OwnerLanguage.Name];

                if (publicationStatus != null &&
                    (publicationStatus.StartPublish == null ||
                     publicationStatus.StartPublish < DateTime.Now) &&
                    (publicationStatus.StopPublish == null ||
                     DateTime.Now < publicationStatus.StopPublish))
                {
                    return true;
                }
            }
        }
    }
    catch
    {
        // ooops something went wrong. Better not index this one ;-)

    }

    return false;
});

Using this convention only files that are referenced from an indexed IContent, that also is published (as by default also unpublished IContent is indexed to provide better querying in editor mode).

I hope you may find this useful when making your site awesome with EPiServer Find!