Publications

 

Multi-Tenant In-memory Key-Value Cache Partitioning Using Efficient Random Sampling-Based LRU Model, Y Wang, J Yang, Z Wang, IEEE Transactions on Cloud Computing, Aug. 2023.

 

FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance, J Xiao, Y Xiang, X Wang, Y Luo, A Pimentel, Z Wang, Proceedings of the 37th International Conference on Supercomputing (ICS’23), 25-36, Best Paper Award, Orlando, FL, June 21-23, 2023

 

vTMM: Tiered Memory Management for Virtual Machines, S Sha, C Li, Y Luo, X Wang, Z Wang, Proceedings of the Eighteenth European Conference on Computer Systems (EuroSys’23), 283-297, Rome, Italy, May 8-12, 2023

 

Graph Neural Networks Based Memory Inefficiency Detection Using Selective Sampling, Z Yang, Z Ye, T Fu, J Luo, X Wei, Y Luo, X Wang, Z Wang, T Zhang, 2022 IEEE 40th International Conference on Computer Design (ICCD), 672-680, Olympic Valley, CA, Oct. 23-26, 2022

 

Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster, Li, Y Guo, Y Luo, X Wang, Z Wang, and X Liu, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 13-18, 2022

 

GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs, Y Wang, R Watling, J Qiu, Z Wang, 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS’22), Lyon, France, May 30-June 3, 2022

 

Accelerating Address Translation for Virtualization by Leveraging Hardware Mode, S Sha, Y Zhang, Y Luo, X Wang, Z Wang, IEEE Transactions on Computers, Jan. 2022

 

GRAPHSPY: Fused Program Semantic Embedding through Graph Neural Networks for Memory Efficiency, Y Guo, P Li, Y Luo, X Wang, Z Wang, 58th ACM/IEEE Design Automation Conference (DAC), 1045-1050, Dec. 2021

 

Efficient Modeling of Random Sampling-Based LRU, J Yang, Y Wang, Z Wang 50th International Conference on Parallel Processing (ICPP’21), 1-11, August 2021

 

Penalty-and locality-aware memory allocation in Redis using enhanced AET, C Pan, X Wang, Y Luo, Z Wang, ACM Transactions on Storage (TOS) 17 (2), 1-45, May 2021

 

Swift shadow paging (SSP): no write-protection but following TLB flushing, S Sha, Y Zhang, Y Luo, X Wang, Z Wang, Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21), April, 2021

 

Dynamically Configuring LRU Replacement Policy in Redis, Y Wang, J Yang, Z Wang, The International Symposium on Memory Systems (MemSys’20), 272-280, Sept. 2020

 

Huge page friendly virtualized memory management, S Sha, JY Hu, YW Luo, XL Wang, Z Wang, Journal of Computer Science and Technology 35 (2), 433-452

 

pRedis: Penalty and Locality Aware Memory Allocation in Redis, Cheng Pan, Yingwei Luo, Xiaolin Wang, and Zhenlin Wang, Proceedings of the ACM Symposium on Cloud Computing (SoCC '19), Nov. 20-22, Santa Cruz, CA.

 

Faster Slab Reassignment in Memcached, Daniel Byrne, Nilufer Onder and Zhenlin Wang, Proceedings of the 2019 International Symposium on Memory Systems (MemSys '19), Washington DC, Sept. 30-Oct. 3, 2019

 

Machine Learning for Fine-Grained Hardware Prefetcher Control, Jason Hiebel, Laura Brown, and Zhenlin Wang, Proceedings of the 48h International Conference on Parallel Processing (ICPP’19), Kyoto, Japan, Aug. 5-8, 2019.

 

EMBA: Efficient Memory Bandwidth Allocation to Improve Performance on Intel Commodity Processor, Yaocheng Xiang, Chencheng Ye, Xiaolin Wang, Yingwei Luo, and Zhenlin Wang, Proceedings of the 48h International Conference on Parallel Processing (ICPP’19), Kyoto, Japan, Aug. 5-8, 2019.

 

Lightweight and Accurate Memory Allocation in Key-Value Cache, Cheng Pan, Lan Zhou, Yingwei Luo, Xiaolin Wang, and Zhenlin Wang, International Journal of Parallel Programming, Springer, Dec. 2018 (also appeared at IFIP NPC '18).

 

Constructing Dynamic Policies for Paging Mode Selection, Sai Sha, Yingwei Luo, Zhenlin Wang, and Xiaolin Wang, Proceedings of the 16th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA '18), Melbourne, Australia, Dec. 10-12, 2018

 

Working Set Size Estimation with Hugepages in Virtualization, Jinyuan Hu, Xiaokuang Bai, Sai Sha, Yingwei Luo, Xiaolin Wang, and Zhenlin Wang, roceedings of 16th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA '18), Melbourne, Australia, Dec. 10-12, 2018

 

HUB: hugepage ballooning in kernel-based virtual machines, Jinyuan Hu, Xiaokuang Bai, Yingwei Luo, Xiaolin Wang, and Zhenlin Wang, Proceedings of thethe 2018 International Symposium on Memory Systems (MemSys '18), Washington DC, Oct. 1-4, 2018

 

PACE: Penalty Aware Cache Modeling with Enhanced AET, Cheng Pan, Xiameng Hu, Lan Zhou, Yongwei Luo, Xiaolin Wang, and Zhenlin Wang, Proceedings of the 2018 ACM Asia-Pacific Workshop on Systems (APSys'18)}, Jeju Island, South Korea, Aug. 27-28, 2018

 

Constructing Dynamic Policies for Paging Mode Selection, Jason Hiebel, Laura Brown, and Zhenlin Wang, Proceedings of the 47th International Conference on Parallel Processing (ICPP’18), Eugene, OR, Aug. 13-16, 2018.

 

mPart: Miss-Ratio Curve Guided Partitioning in Key-Value Stores, Daniel Byrne, Nilufer Onder and Zhenlin Wang, Proceedings of the 2018 International Symposium on Memory Management (ISMM’18), Philadelphia, PA, June 18, 2018.

 

Get Out of the Valley: Power-Efficient Address Mapping for GPUs, Yuxi Liu, Xia Zhao, Magnus Jahre, Zhenlin Wang, Xiaolin Wang, Yingwei Luo and Lieven Eeckhout. Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA'18), June 4-6, 2018.

 

Fast MRC Modeling for Storage Cache, Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Zhenlin Wang, Chen Ding, and Chenchen Ye. ACM Transactions on Storage (TOS'18), 2018.

 

DCAPS: Dynamic Cache Allocation with Partial Sharing, Yaocheng Xiang, Xiaolin Wang, Zihui Huang, Zeyu Wang, Yingwei Luo and Zhenlin Wang. Proceedings of the 13th EuroSys Conference (EuroSys'18), Porto, Portugal, April 23-26, 2018.

 

BACM: Barrier-Aware Cache Management for Irregular Memory_intensive GPGPU Workloads, Yuxi Liu, Xia Zhao, Zhibin Yu, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, Lieven Eeckhout, Proceedings of the 2017 International Conference on Computer Design (ICCD’17), Boston, MA, Nov. 5-8, 2017.

 

Optimized Locality-aware Memory Management for Key-value Cache, Xiameng Hu, Xiaolin Wang,  Lan Zhou, Yingwei Luo, Chen Ding, Song Jiang, and Zhenlin Wang,  IEEE Transactions on Computers (TC'17), VOl.: 66, Issue: 5, May 1 2017.

 

Optimal Symbiosis and Fair Scheduling in Shared Cache, Xianmeng Hu, Xiaolin Wang, Yechen Li, Yingwei Luo,  Chen Ding, and Zhenlin Wang, IEEE Transactions on Parallel and Distributed Systems (TPDS'17). Vol.: 28, Issue: 4, April 1, 2017. 

 

Evaluating the impacts of hugepage on virtual machines, Xiaolin Wang,  Taowei Luo, Jiangyuan Hu, Zhenlin Wang and Yingwei Luo,  Science China Information Sciences 60:012103, Janauary, 2017.

  

Kinetic Modeling of Data Eviction, Xiameng Hu, Xiaolin Wang, Yechen Li, Lan Zhou, Yingwei Luo, Chen Ding, and Zhenlin Wang, Proceedings of the 2016 USENIX Annual Technical Conference (ATC'16), Dever, CO, June 22-24, 2016.

 

Barrier-Aware Warp Scheduling for Throughput Processors, Yuxi Liu, Zhigin Yu, Lieven Eeckhout, Vijay Janapa Reddi, Yingwei Luo, Xiaolin Wang, Zhenlin Wang and Chengzhong Xu, Proceedings of the 2016 International Conference on Supercomputing (ICS'16), Istanbul, Turkey, June 1-3, 2016.

 

Dynamic Memory Balancing for Virtualization, Zhigang Wang, Xiaolin Wang, Fan Hou, Yingwei Luo, and Zhenlin Wang, ACM Transactions on Architecture and Code Optimization, Vol. 13, 1, Article 2, April 2016, (TACO'16).

 

LAMA: Optimized Locality-aware Memory Allocation for Key-value Cache, Xiameng Hu, Xiaolin Wang, Yechen Li, Lan Zhou, Yingwei Luo, Chen Ding, Song Jiang, and Zhenlin Wang, Proceedings of the 2015 USENIX Annual Technical Conference (ATC'15), Santa Clara, CA, July 8-10, 2015.

 

Modeling Cross-Architecture Co-Tenancy Performance Interference. Wei Kuang, Laura E. Brown, and Zhenlin Wang, Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'15), Shenzhen, China, May 4-7, 2015.

 

Optimal Footprint Symbiosis in Shared Cache, Xiaolin Wang, Yechen Li, Yingwei Luo, Xiameng Hu, Jacob Brock, Chen Ding, and Zhenlin Wang, Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'15), Shenzhen, China, May 4-7, 2015 

 

Transfer Learning-based Co-run Scheduling for Heterogeneous Datacenters,  Wei Kuang, Laura E. Brown, and Zhenlin Wang, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15), PhD Consortium, Austin, Texas, USA, Jan. 25-29, 2015.

 

Selective Switching Mechanism in Virtual Machines via Support Vector Machine and Transfer Learning, Wei Kuang, Laura E. Brown, and Zhenlin Wang, Machine Learning, July, 2014. 

 

Verifying Micro-architecture Simulators using Event Traces, Hui Meen Nyew, Nilufer Onder, Soner Onder, and Zhenlin Wang, The 28th ACM International Conference on Supercomputing (ICS’14), Munich, Germany, June 10- 13, 2014. 

 

Revisiting Memory Management on Virtualized Environments, Xiaolin Wang, Lingmei Weng, Yingwei Luo, and Zhenlin Wang, ACM Transactions on Architecture and Code Optimization, Volume 10 Issue 4 No. 48, Dec. 2013 ( TACO'13, Presented at HiPEAC'14).

 

Towards Eliminating Memory Virtualization Overhead, Xiaolin Wang, Lingmei Weng, Yingwei Luo, and Zhenlin Wang, APPT'13 (See the TACO'13 paper for the extension of this work).

 

A First-Order Logic Based Framework for Verifying Simulations, Hui Meen Nyew, Nilufer Onder, Soner Onder and Zhenlin Wang, AAAI Conference on Artificial Intelligence (AAAI'13),  PhD Consortium, June, 2013.

 

Dynamic Cache Partitioning Based on Hot Page Migration, Xiaolin Wang, Xiang Wen, Yechen Li, Zhenlin Wang, Yingwei Luo, and Xiaoming Li, Frontiers of Computer Science, 2012 6(4): 363-372, 2012.

 

Low Cost Working Set Size Tracking, Weiming Zhao, Xinxin Jin, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, and Xiaoming Li, Proceedings of the 2011 USENIX Annual Technical Conference (ATC'11), ortland, OR, June 15-17, 2011 (short paper).

 

Efficient LRU-Based Working Set Size Tracking, Weiming Zhao, Xinxin Jin, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, and Xiaoming Li, Computer Science Technical Report, CS-TR-11-01, March, 20011.

 

Selective Hardware/Software Memory Virtualization, Xiaolin Wang, Jian Rui Zang, Zhenlin Wang, Yingwei Luo, and Xiaoming Li, Proceedings of the 2011 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'11), Newport Beach, CA, March 9-11, 2011.

 

Predicting Remove Reuse Distance Patterns in UPC Applications, Steven Vormwald, Wei Wang, Steve Carr, Steve Seidel, and Zhenlin Wang, Proceedings of the ACM Fourth Conference on Partitioned Global Address Space Programming Models (PGAS'10), NY, Oct. 12-15, 2010.

 

ScaleUPC: A UPC Compiler for Multi-Core Systems, Weiming Zhao and Zhenlin Wang, Proceedings of the ACM Third Conference on Partitioned Global Address Space Programming Models (PGAS'09), Ashburn, VA, Oct. 5-8, 2009.

 

Dynamic Memory Balancing for Virtual Machines, Weiming Zhao and Zhenlin Wang, Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'09), Washington, DC, March 11-13, 2009.

 

ScaleUPC: A UPC Compiler for Multi-Core Systems, Weiming Zhao and Zhenlin Wang, Michigan Tech University, Technical Report CS-TR 08-02, Sept., 2008.

 

Live and Incremental Whole-System Migration of Virtual Machines Using Block-Bitmap, Yingwei Luo, Binbin Zhang, Xiaolin Wang, Zhenlin Wang, Yifeng Sun, and Haogang Chen, IEEE Cluster 2008, Tsukuba, Japan, Sept.28-Oct.1, 2008.

 

A Transparent Remote Paging Model for Virtual Machines, Haogang Chen, Yingwei Luo, Xiaolin Wang, Binbin Zhang, Yifeng Sun and Zhenlin Wang, International Workshop on Virtualization Technology (IWVT in conjunction with ISCA'08), Beijing, June 2008.

 

Feedback-directed Memory Disambiguation Through Store Distance Analysis, Changpeng Fang, Steve Carr, Soner Onder, and Zhenlin Wang, The 20th ACM International Conference on Supercomputing (ICS'06), Cairns, Australia, June 28- July 1, 2006.

Path-based Reuse Distance Analysis
, Changpeng Fang, Steve Carr, Soner Onder, and Zhenlin Wang, Proceedings of the 15-th International Conference on Compiler Construction (CC'06), Vienna, Austria, March, 2006.

Instruction Based Memory Distance Analysis and its Application
, Changpeng Fang, Steve Carr, Soner Onder, and Zhenlin Wang, Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT '05), St. Louis, September 17-21, 2005.

Cooperative Caching with Keep-Me and Evict-Me
, Jennifer B. Sator, Subramaniam Venkiteswaran, Kathryn S. McKinley, and Zhenlin Wang, the 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT-9), In conjunction with 11th International Symposium on High-Performance Computer Architecture (HPCA-11), San Francisco, Febuary, 2005

The Garbage Collection Advantage: Improving Program Locality
, Xianglong Huang, Stephen M. Blackburn, Kathryn S. McKinley,  J. Eliot B. Moss, Zhenlin Wang, and Perry Cheng,   Proceedings of the 19th ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA '04), Vancouver, Canada, October, 2004.

Reuse-distance-based Miss-rate Prediction on a Per Instruction Basis
, Changpeng Fang, Steve Carr, Soner Onder, and Zhenlin Wang, the third Workshop on Memory System Performance (MSP) in conjunction with PLDI 2004, Washington DC, June 2004.

Combining Cooperative Software/Hardware Prefetching and Cache Replacement
, Zhenlin Wang, Kathryn S. McKinley, and Doug Burger,  IBM Austin CAS Center for Advanced Studies Conference, Austin, TX, February 2004.

Cooperative Hardware/Software Caching for Next-Generation Memory Systems
, PhD Dissertation, University of Massachusetts, Amherst, Sept., 2003.

Guided Region Prefetching: A Cooperative Hardware/Software Approach, Zhenlin Wang, Doug Burger, Steven K. Reinhardt , Kathryn S. McKinley, Charles C. Weems, Proceedings of the Thirtieth International Symposium on Computer Architecture (ISCA '03), San Diego, CA, June 9-11, 2003 (This version contains a couple of non-critical corrections to our published one).

Using the Compiler to Improve Cache Replacement Decisions, Zhenlin Wang, Kathryn S. McKinley, Arnold L. Rosenberg, and Charles C. Weems, Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT '02), Charlottesville, Virginia, September 22-25, 2002.

Compiling for the Impulse Memory Controller, Xianglong Huang, Zhenlin Wang, and Kathryn S. McKinley, Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT '01), Barcelona, Spain, September 8-12, 2001.

On Memory Behavior of Scalars in Embedded Multimedia Systems, Osman S. Unsal, Zhenlin Wang, Israel Koren, C. Mani Krishna, and Csaba Andras Moritz, Proceedings of WMPI'01, Workshop on Memory Performance Issues, Goteborg, Sweden, June, 2001.

Improving Replacement Decisions in Set-Associative Caches, Zhenlin Wang, Kathryn S. McKinley, and Arnold L. Rosenberg, Proceedings of MASPLAS'01, The Mid-Atlantic Student Workshop on Programming Languages and Systems, IBM Watson Research Center, Hawthorne, NY, April, 2001.

Improving Replacement Decisions in Set-Associative Caches, Zhenlin Wang, Kathryn S. McKinley, and Arnold L. Rosenberg, University of Massachusetts at Amherst, Technical Report TR 01-02, March, 2001.