In the past year, Quilt has shuttled terabytes of data to and from Amazon S3. Globally, S3 houses more than two trillion objects, and handles more than a million requests per second. Love it or hate it, S3 is the world’s data lake.
On the love side, S3 is fast, scales beautifully, and is reasonably priced. At its worst, S3 is opaque and hard to use — a write-only database.
Today we’re announcing an open source project, called T4. T4 gives S3 buckets superpowers, transforming S3 into a team data hub. T4 is for data scientists, data engineers, and data-driven teams.
Teams wish to be data driven, but this wish is trumped by the fact that data are scattered across systems, formats, and organizational silos. As a result, no one has a complete and accurate picture of the latest data.
We see the need for a unified, low-cost data layer that houses all of a team’s canonical data, and covers all four quadrants of the experiment-to-production lifecycle. The experiment-to-production lifecycle is a pattern that we’ve observed as data graduates from individual experiments to production endpoints.
Data science often begins with single-developer, ad hoc experiments (upper left quadrant). These experiments are gradually shared outside of the organization. As and when experiments prove successful, data engineers scale experimental models and data into production equivalents. Version control — for code, containers, and data — forms an essential thread across all four phases: immutable hashes from systems like Git, Docker, and Quilt ensure that workflows are reproducible.
S3 is an excellent candidate for capturing the full data lifecycle. S3 accepts any data format, scales well, and offers granular permissions. But S3 is missing key features to make data findable, accessible, interoperable, and reusable. T4 adds these missing features to S3.
T4 uses S3 versions to track the history of every path in S3. With S3 versions, you can travel time, detect changes, and recover from accidental deletions. One level above S3 versions, T4 introduces snapshots, the spiritual successor to Quilt packages. A snapshot is an immutable collection of one or more objects, usually a directory or an entire S3 bucket. T4 snapshots are immutable and, once sealed, can never change. As a result, data pipelines that use snapshots are reproducible. T4’s snapshot diff()
shows what’s changed across two snapshots.
S3 requires users to download objects in order to know what’s in them. With T4’s web catalog, you can preview images, markdown files, and Jupyter notebooks — without downloading anything from S3.
Where there’s data, there’s visualization. T4 constructs visual summaries of data in S3, with user-defined combinations of Vega specs, images, Jupyter notebooks, and markdown files.
Elasticsearch is Amazon’s de facto search solution. Elasticsearch is fast and precise. But, like so many instruments in the AWS toolbox, Elasticsearch is challenging to set up and tune. T4 automatically configures a private Elasticsearch endpoint for your S3 bucket, and attaches Lambda functions to index files as they land in S3. By default, T4 builds a full-text index for Jupyter notebooks and markdown files.
When notebooks are searchable, it’s easier to find and reuse past results — whether the results were generated by you or by a colleague, moments ago or last year.
Each object that you write to T4 can be annotated with custom metadata: put(obj, meta={"author":"aneesh"})
. Faceted search gives your queries greater precision. You can sift through metadata facets with queries like user_meta.author:"aneesh"
.
T4 adds syntactic sugar to Amazon’s S3 client, so that it’s easy to read and write Python objects to and from S3. We’ve found it convenient to stash data frames, numpy arrays, and dictionaries in S3 with one T4 command, put()
.
T4 runs on top of your own S3 buckets — giving you total control over permissions. In the near future, we’ll offer a hosted version of T4.
T4 is alpha software. We do not yet recommend T4 for production work. The T4 APIs and features will be in flux until we reach version 1.0.
We’re offering T4 as a glimpse into the world’s data lake, S3. Come and help us to build T4 on GitHub.
Almost two years ago I started to include a Hardware section into my Deep Learning presentations. It was dedicated to a review of the current state and a set of trends for the nearest 1–5+ years.
Here is a version from April 2016, and here is an update from October 2017. Last year we saw a lot of interesting announces, I gave some talks with updated slides, and now I am updating it for February/March 2018. I will publish it soon as a separate presentation, and this text(s) will be a companion post(s) to the slides with the goal to make it more readable and useful as a reference.
I started to write it as a single post, but it soon became too large, so I decided to split it into a series of bite-sized posts.
I will constantly update the texts to fix errors and include recent news and announces. See the release notes at the bottom of the current post. Feel free to comment the posts and/or drop me an email to [email protected].
Here is a short summary of what the series will be about.
Part 1: Introduction and Executive summary (this post)
Part 4: FPGA
Part 5: ASIC
Part 6: Mobile AI
Part 7: Neuromorphic computing
Part 8: Quantum computing
2018/02/26: “Part 1: Introduction" published.
2018/02/26: “Part 2: CPU" published.
2018/03/14: “Part 3: GPU" published
One of the best-selling T-shirts for the Indian e-commerce site Myntra is an olive, blue and yellow colorblocked design. It was conceived not by a human but by a computer algorithm — or rather two algorithms.
The first algorithm generated random images that it tried to pass off as clothing. The second had to distinguish between those images and clothes in Myntra’s inventory. Through a long game of one-upmanship, the first algorithm got better at producing images that resembled clothing, and the second got better at determining whether they were like — but not identical to — actual products.
This back and forth, an example of artificial intelligence at work, created designs whose sales are now “growing at 100 percent," said Ananth Narayanan, the company’s chief executive. “It’s working."
Clothing design is only the leading edge of the way algorithms are transforming the fashion and retail industries. Companies now routinely use artificial intelligence to decide which clothes to stock and what to recommend to customers.
And fashion, which has long shed blue-collar jobs in the United States, is in turn a leading example of how artificial intelligence is affecting a range of white-collar work as well. That’s especially true of jobs that place a premium on spotting patterns, from picking stocks to diagnosing cancer.
“A much broader set of tasks will be automated or augmented by machines over the coming years," Erik Brynjolfsson, an economist at the Massachusetts Institute of Technology, and Tom Mitchell, a Carnegie Mellon computer scientist, wrote in the journal Science last year. They argued that most of the jobs affected would become partly automated rather than disappear altogether.
The fashion industry illustrates how machines can intrude even on workers known more for their creativity than for cold empirical judgments. Among those directly affected will be the buyers and merchandise planners who decide which dresses, tops and pants should populate their stores’ inventory.
A key part of a buyer’s job is to anticipate what customers will want using a well-honed sense of where fashion trends are headed. “Based on the fact that you sold 500 pairs of platform shoes last month, maybe you could sell 1,000 next month," said Kristina Shiroka, who spent several years as a buyer for the Outnet, an online retailer. “But people might be over it by then, so you cut the buy."
Merchandise planners then use the buyer’s input to figure out what mix of clothing — say, how many sandals, pumps and flats — will help the company reach its sales goals.
In the small but growing precincts of the industry where high-powered algorithms roam free, however, it is the machine — and not the buyer’s gut — that often anticipates what customers will want.
That’s the case at Stitch Fix, an online styling service that sends customers boxes of clothing whose contents they can keep or return, and maintains detailed profiles of customers to personalize their shipments.
Stitch Fix relies heavily on algorithms to guide its buying decisions — in fact, its business probably could not exist without them. Those algorithms project how many clients will be in a given situation, or “state," several months into the future (like expanding their wardrobe after, say, starting a new job), and what volume of clothes people tend to buy in each situation. The algorithms also know which styles people with different profiles tend to favor — say, a petite nurse with children who lives in Texas.
Myntra, the Indian online retailer, arms its buyers with algorithms that calculate the probability that an item will sell well based on how clothes with similar attributes — sleeves, colors, fabric — have sold in the past. (The buyers are free to ignore the projection.)
All of this has clouded the future of buyers and merchandise planners, high-status workers whose annual earnings can exceed $100,000.
At more conventional retailers, a team of buyers and support workers is assigned to each type of clothing (like designer, contemporary or casual) or each apparel category, like dresses or tops. Some retailers have separate teams for knit tops and woven tops. A parallel merchandise-planning group could employ nearly as many people.
Buyers say this specialization helps them intuitively understand trends in styles and colors. “You’re so immersed in it, you almost get a feeling," said Helena Levin, a longtime buyer at retailers like Charlotte Russe and ModCloth.
Ms. Levin cited mint-green dresses, a top seller earlier this decade. “One day it just died," she said. “It stopped. ‘O.K., everything mint, get out.’ Right after, it looked old. You could feel it."
But retailers adept at using algorithms and big data tend to employ fewer buyers and assign each a wider range of categories, partly because they rely less on intuition.
At Le Tote, an online rental and retail service for women’s clothing that does hundreds of millions of dollars in business each year, a six-person team handles buying for all branded apparel — dresses, tops, pants, jackets.
Brett Northart, a co-founder, said the company’s algorithms could identify what to add to its stock based on how many customers placed the items on their digital wish lists, along with factors like online ratings and recent purchases.
Bombfell, a box service similar to Stitch Fix catering only to men, relies on a single employee, Nathan Cates, to buy all of its tops and accessories.
The company has built algorithmic tools and a vast repository of data to help Mr. Cates, who said he could more accurately project demand for clothing than a buyer at a traditional operation.
“We know exactly who our customers are," he said. “We know exactly where they live, what their jobs are, what their sizing is."
For now, at least, only a human can do parts of his job. Mr. Cates is obsessive about touching the fabric before purchasing an item and almost always tries it on first.
“If this is a light color, are we going to see your nipples?" he explained. (The verdict on a mint T-shirt he donned at the company’s headquarters in New York? “A little nipply.")
There are other checks on automation. Negotiations with suppliers typically require a human touch. Even if an algorithm can help buyers make decisions more quickly and accurately, there are limits to the number of supplier relationships they can juggle.
Arti Zeighami, who oversees advanced analytics and artificial intelligence for the H & M group, which uses artificial intelligence to guide supply-chain decisions, said the company was “enhancing and empowering" human buyers and planners, not replacing them. But he conceded it was hard to predict the effect on employment in five to 10 years.
Experts say some of these jobs will be automated away. The Bureau of Labor Statistics expects employment of wholesale and retail buyers to contract by 2 percent over a decade, versus a 7 percent increase for all occupations. Some of this is because of the automation of less sophisticated tasks, like cataloging inventory, and buying for less stylistically demanding retailers (say, auto parts).
There is at least one area of the industry where the machines are creating jobs rather than eliminating them, however. Bombfell, Stitch Fix and many competitors in the box-fashion niche employ a growing army of human stylists who receive recommendations from algorithms about clothes that might work for a customer, but decide for themselves what to send.
“If they’re not overly enthusiastic upfront when I ask how do you feel about it, I’m making a note of it," said Jade Carmosino, a sales manager and stylist at Trunk Club, a Stitch Fix competitor owned by Nordstrom.
In this, stylists appear to reflect a broader trend in industries where artificial intelligence is automating white-collar jobs: the hiring of more humans to stand between machines and customers.
For example, Chida Khatua, the chief executive of EquBot, which helped create an exchange-traded fund that is actively managed by artificial intelligence, predicted that the asset-management industry would hire more financial advisers even as investing became largely automated.
Follow Noam Scheiber on Twitter: @noamscheiber.
Our columnist Andrew Ross Sorkin and his Times colleagues help you make sense of major business and policy headlines — and the power-brokers who shape them.
Trending on NYTimes