百度大数据实验室

百度大数据实验室

百度大数据实验室

Baidu Big Data Lab
  • Large Scale Machine Learning
    Research advanced machine learning technologies by directly collaborating with BaiDu’s core product lines such as Fengchao and Nuomi to solve their modeling problems. The new machine learning technologies and solutions developed during the collaboration is integrated into BDL’s large scale Machine Learning platform Pulsar. We can deal with hundreds billions of samples and features by higher efficiency on Pulsar.
  • Core Search Technologies

    Search and advertising are among the most essential business models of Baidu. Current research and development activities include:Fundamental hashing techniques, hashing-based large-scale machine learning, hashing-based indexing and fast similarity search, image search, web search, learning to rank, etc.


  • Advertising & Recommendation Technologies Application
    The goal is to research advanced machine learning technologies by directly collaborating with core product lines in Baidu to improve benefits, and improve users’ experience.
  • Du Nurse
    Du Nurse combines state-of-the-art deep learning technologies like Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) with Knowledge Graph algorithms to conduct automatic intelligent dialogues on providing medical knowledge. The system may help patients for more accurate medical service information and may assist doctors in their diagnosis process.
  • Smarter City

    By leveraging heterogeneous mobile big data in Baidu, BDL builds Smarter City application which is able to:

    1) Monitor human crowds in real-time and predict crowd anomaly in the future;

    2) Analyze user mobility data and enables data-driven urban planning, transportation optimization etc.

    3) Quantity the population dynamics of urban network in China, and provide insights into urban development

    All the above features are implemented into the API named Crowd Analytics, which is going to release soon. 
  • Deep Dialogue
    We use various machine learning methods, NLP, graph mining, Bayesian network enabled inference techniques to construct natural language dialogue robot that not only can have a human-like dialogue that is “entertaining” (vs some of the state-of-the-art chatbots ), but importantly, it attempts to listen and understand human’s latent intention, clarifies by asking logically connected questions, and then provides “professional-level” information and advice just like an expert.  Examples include medical chatbot “DuNurse” that can provide healthcare related know-hows as well as diagnose potential medical conditions and suggest action plans. Other areas include financial consultant, sales consultant, etc. 
  • Spatial-temporal Big Data
    We focuses on mining spatial-temporal big data from Baidu including location data, trajectory data and search data from Baidu Maps for both academic research and industrial applications.  Background of our team member are very interdisciplinary, such as computer science, mathematics, econometrics, biology behavior and architecture.
  • Industrial Big Data
    More and more companies of traditional industry are catching up with the big data era, such as retail, travel, finance, health care and so on. These companies are just finished so-called the informationization, i.e. data are generated and saved, and they are going to utilize the value of their data to optimize their business. In BDL, we are exploring innovative ways of applying big data technologies for the traditional industry.
  • Knowledge Based Machine Learning
    We are developing theories, algorithms and systems to help building and evolving structured knowledge bases form unstructured information such as natural language texts, voices and images. With the constructed knowledge bases, we are building intelligent machines, taking the knowledge bases as their brains, to solve difficult tasks in various domains, such as professional query-answering, high quality web search, automatic law service, etc.
  • Streaming Big Data Architectures
    We are developing innovative big data processing systems to facilate realtime analytics, ingest and export of data streams on next-generation hardware innovations, such as multi-core CPUs, non-volatile random-access memories and remote direct memory access (RDMA). Just now, we are focused on distributed in-memory relational database systems (OceanDB) to offer streaming analytics with transactions for fast big data pipelines.  OceanDB ingests streams, process transactions, and perform analytics simultaneously at scale in a unified database system, enabling applications to analyse fast changing business data. OceanDB will be used to medical, financial and some high-speed applications to achieve high performance extensions at Baidu.
  • O2O DMP
    DMP (data management platform) is crucial for customer targeting in digital advertising industry.  Most existed DMPs profile customers based on online web data and/or third-party offline data. By leveraging the spatial-temporal big data in Baidu, including location data, offline transaction data and query data from Baidu Maps, we are developing models to semantic user offline behavior, infer their mobile social network, identify users across various types of devices, and for building a most comprehensive  O2O (offline to online) DMP in industry.
  • MobiMetrics

    In mobile internet era, especially in the forthcoming internet of things era, each user can be viewed as a sensor of our society. We believe such mobile big data provides novel tools to measure the economic system of China in a totally new perspective, turning spatial-temporal big data into insights for investment decision making,and therefore we give it a name - MobiMetrics. The very inter-disciplinary background of our team enables us to achieve this goal by combing the expertise from filed including machine learning, complex network, econometrics and social science. 


    Applications:

           Mapping ghost cities in China via mobile big data


           Baidu Economic Index

  • Smarter Sales Platform
    This platform integrates online user behavior data from the products of Baidu and offline user behavior data from shopping malls. With analyzing Baidu’s user profiles, searching and browsing behaviors as well as users’ offline purchasing behaviors, we provide services like retail sales promotion, customer relationship management and so on.