The field of edge AI and heterogeneous computing is rapidly evolving, with a focus on improving performance, energy efficiency, and cost-effectiveness. Recent developments have centered around the design of specialized architectures, such as chiplet-based systems and field-programmable gate arrays (FPGAs), to accelerate machine learning workloads. Additionally, there is a growing emphasis on optimizing concurrent deep neural network (DNN) training and inferencing on edge devices, as well as characterizing the performance of accelerated edge devices for training DNN models. Noteworthy papers in this area include: Automatic Microarchitecture-Aware Custom Instruction Design for RISC-V Processors, which presents a front-to-back tool for ASIP design that automatically analyzes hotspots in RISC-V applications and generates custom instruction suggestions. Decentor-V, which extends L-SGD to RISC-V-based MCUs and introduces an 8-bit quantized version of L-SGD for RISC-V, achieving nearly 4x reduction in memory usage and a 2.2x speedup in training time. Chiplet-Based RISC-V SoC with Modular AI Acceleration, which presents a novel chiplet-based RISC-V SoC architecture that integrates 4 different key innovations and achieves significant performance improvements. Pagoda, which develops a time roofline and a novel energy roofline model for the Jetson Orin AGX and couples it with an analytical model to analyze DNN inference workloads from first principles. Fulcrum, which designs an intelligent time-slicing approach for concurrent DNN training and inferencing on Jetsons and proposes efficient multi-dimensional gradient descent search and Active Learning techniques to optimize the training throughput and stay within latency and power budgets.