We've spent some time evaluating the USDA search results that we receive and while its great that you provide a single source for querying data, the issue that you are aware of is the truncation of the the raw data. In looking at ways to curate the data, we are more interested in looking at ways to automate or leverage functionality which will hit all the data vs. manually updating each ingredient one-by-one. A daunting task. So we had an idea: what if an interface could be built which would allow us to translate raw response keywords to their correct translation, and then apply that throughout the database?
For example using the search query "banana":
Current Raw Response:
- Babyfood,fruit,bananas&pnappl w/tapioca,jr
From here, we can see there are several key words that have been truncated, such as pnappl = pineapple, juc = juice, mxd = mixed, crl=cereal, etc. If we can apply translations for these keywords along with proper spacing after commas, the search result might look like this:
- Babyfood, fruit, bananas & pineapple with/tapioca,jr
- Babyfood, juice, orange & apple & banana
- Babyfood, juice, orange & banana
- Babyfood, cereal, mixed, with/bananas, dry
- Babyfood, cereal, mixed, with/applesauce & banana
This proposed structure obviously is more legible.
Another consideration would be how relevant the response is. Considering that we searched "banana" and we got babyfood as the first initial set of results. Not relevant. Perhaps a suggestion is to filter results that have the query term, in this case, "banana" listed as the initial result.
Look forward to your thoughts...